Frequently Asked Questions About site Search

Note:- Some of the questions were asked by users using site Search version(s) 1.00/1.01/.102. However these questions still are relevant with version 2.00.
Note:- This FAQ assumes you have read the "readme.html" file.


1. I don't have cgi-lib.pl, will I be able to run site Search ?

No, site Search requires cgi-lib.pl to parse the contents of the form input and without cgi-lib.pl it cannot do it. cgi-lib.pl is a free software and can be downloaded from the cgi-lib.pl home page. Now if you are a perl hacker, you may be wondering why the contents of the form cannot be parsed using %ENV variable. To get an authoritative answer on this part of the question refer to the latest version of the perlfaq. Dated 1997/03/17, perlfaq(9) - Networking (Revision 1.13) question 9.

2. Why is the output not sorted on the basis of frequency of hits?

To be frank, I seriously think this method of sorting/indexing documents is ineffective. How can it be assumed that the document having the largest number of search term(s) is most relevant to the user. If the documents are to be sorted, then it should be done on some kind of natural language system. If you are a C programmer, you must be fully aware of the slow bubble-sorting or the really efficient quick-sorting methods, either of which could have been employed here. But since this is not an efficient method, I chose to output the line containing the search term rather than sorting. In fact sorting the documents on the basis of modification date would be more effective as the user would be at least getting the most current information.

3. How do I create a description file used by "$desc1" variable?

To create a description file, you would have to run the "mkdesc.pl" program. However, first the options in "site_Search.conf" have to be configured for your site. The description file "site_Search.desc" is created which is of the form,

Full Path to the File[sep]
Now the [sep] character is short for "separator", which separates the file from whatever description you want to assign to the file. So please do not delete this character.
After you create the "site_Search.desc" file, edit this file and enter whatever description you choose for each file. Remember to enter the description after the [sep] character and make sure the entire description for each file stays on one line. For example,
Full Path to File[sep]whatever description you want.
4. How do I include a link in the description file?

After you create a description file, just type in the link as you would normally do in a HTML document after the [sep] character. For example,

Full Path to the File[sep]<A HREF="http://mysite.com/">something</A>
5. How can I run site Search faster?

If you are not satisfied with the speed of site Search, one way to make it faster is to rewrite the code yourself :-) . The author has made every possible effort to write as optimum a code as possible. However he is not a perl guru and would gladly accept any suggestions to make site Search faster. Also, if you do not want the "no of hits" or the "line output" displayed in the output, disable these variables i.e.:- "$output_hits", and "$output_line", this makes site Search a bit faster as it now has to do comparatively lesser work. Also refer to the question "Future improvements ...".

6. How can I exclude certain directories/files from my search path?

The variables to consider here are ,
1. $base_path,
2. @directories_to_avoid,
3. @files_to_avoid,
4. @files_to_include,
5. @filetypes.

Now these are a lot of variables. However, this feature makes site Search extremely customizable. The $base_path is the base directory or the directory where all the ".html" files are resident. Now if you want any directories within this "$base_path" avoided, assign that directory to the variable @directories_to_avoid. For example,

@directories_to_avoid = ("$base_path/some_dir");
Note:- The "$scratch" directory is a good candidate for the above variable.

If there are any particular files to be avoided assign them to variable @files_to_avoid. For example,

@files_to_avoid = ("$base_path/somefile.html",$base_path/some_dir/somefile.html);
The fourth variable is actually not what the name suggests. If you want to explicitly include any file, then assign them to this variable. For example,
@files_to_include = ("$base_path/somefile.txt");
This variable is to bypass the values in the next variable "@filetypes". Assign the types of files you want to include to this variable, for example,
@filetypes = (".html",".htm",".shtml");
Now suppose you don't want all ".txt" files searched but only "somefile.txt" then you can explicitly include it in the @files_to_include variable.

7. What is the use of recording site Search usage?

If you enable the variable "$record_usage", then site Search usage is recorded in the file "site_Search.usage". The uses of recording the usage is as follows,

  • The web-administrator is made aware of the information end-users are specifically seeking from his/her web site.
  • Based on this information, the web-administrator can restructure his/her web site to ensure users can quickly access the information they want.
  • Based on the frequency of site Search usage, the web-adminsitrator gets a knowledge of how efficient the web site design is in providing users with information.
  • 8. How do I modify the search output to suit the needs of my web site?

    I recommend this only if have some knowledge of perl. The modules that perform the search output are "Display_Results1" and "Display_Results2". Try hacking those modules.

    9. What is the function/meaning of the variables,$smart, $output_line, $output_hits, $scratch ?

    Firstly the only reason I am providing the "$output_hits" variable, which enables/disables the output of the number of occurences of each search term(s), is go by the suggestion from many users to do this. If enabled, a separate module is called each time a file is searched to gather the number of occurrences of each search term(s).

    The "$output_line" variable outputs the first line that contains any one of the search term(s). It also displays the line before and the line after. This gives the user a sense of what the file contains, if he/she hasn't already got enough information from the name of the file, the title and the description.

    The "$smart" variable, if turned on, splits the file into separate lines in a slightly more efficient manner than if turned off. But I find it really does not make much of a difference. If you want to get into the details, send me a mail and I will return a detailed explanation. The need for splitting a file into lines is to use those lines in the search output if the "$output_line" variable is turned on.

    The "$scratch" variable specifies the directory used by site Search to store temporary files. If you enable the "$multi_display" variable, then users would be able to view the output "n" hits per page. Now site Search needs a directory to store the temporary files. For instance, if 40 hits are found and the user is viewing 10 hits per page, site Search outputs the first 10 hits to the browser and stores the next 30 in 3 files in the temporary directory (3 x 10). Now this leads to temporary files accumulating in the "$scratch" directory. You can clean them manually or let site Search clean it for you. To make site Search clean it, enable the "$clean_scratch" variable and give a valid duration in the "$empty_scratch" variable. This duration should be a valid time in days for example 0.25 signifies 1/4th of a day. 0.5 is a good value. Remember not to give too low a value since you want the user to actually access the file before it is deleted :-)
    Note:- site Search cleans the "$scratch" directory every time it is run and if the "$clean_scratch" variable is enabled. That means site Search cleans the output from the previous run. Also you should make sure that the site Search has permissions to clean the directory. (rwxrwxr-x) if the user running the HTTP server and the owner of the scratch directory are the same.

    10. Can I use my own form instead of the form supplied along with site Search?

    Yes, But remember to include the following HTML tags and *REMEMBER* to keep the name/value pairs same or site Search will not work. The name/value pairs are:-
     

    <SELECT Name="Case">
    <OPTION Value="Insensitive">
    <OPTION Value="Sensitive">&nbsp;
    </SELECT>
    <SELECT Name="Construct">
    <OPTION Value="As a phrase">
    <OPTION Value="Any search term">
    <OPTION Value="All search terms">
    </SELECT>
    <SELECT Name="Hits">
    <OPTION Value="5">
    <OPTION Value="10">
    <OPTION Value="15">
    <OPTION Value="25">
    <OPTION Value="ALL">  
    </SELECT>
    11. Can I use my own form in the search output?
    Yes, If you mean the form outputted along with the successful hits. You should set the values of the variable "$output_form" to "1" and the value of the variable "$form_to_use" the complete path name of the HTML file containing the form. Remember this HTML file should not have things like a starting HTML tag ending HTML tag etc. as this form would be inserted within an HTML file produced by site Search. But *REMEMBER* you wont be able to make use of site Search to do the search. If you want to do both of the above , the only option is to edit the script, and modify the module that does the output. The module is "output_form".

    12. How can I decide what number(s) of hits can be chosen by the end user?

    Simple, just change the value of the Hits in the form to whatever your choices are. For example,
     

    <SELECT Name="Hits">
    
    <OPTION Value="1">
    <OPTION Value="2">
    <OPTION Value="3">
    <OPTION Value="4">
    <OPTION Value="5">
    <OPTION Value="ALL">
    </SELECT>
    etc ...

    13. Are there any known limitaions, bugs ?

    This is a beta version and there may be some bugs as far as limitations, The one limitation I am aware of is the inability of the script to remove html tags from HTML files if the tags continue over *seperate* lines. I am working on this. One way is to use the Html::Parse module but I have noticed it considerably slows down the script's execution time. If any one has any suggestion I would appreciate it if you mail them to me.

    14. Will there be any future improvements on site Search?

    Yes, If you have any suggestions please mail them to me at krishnan@bayou.uh.edu. On my part I am trying to optimize the code for speed, find some way to sort the output other than on the basis of number of occurrences, include a module to let the end user know of the search progress instead of letting him/her stare at the screen (this may require something other than perl, maybe Java :-)) etc . 


    site Search version 2.00b Copyright 1997 Krishnan Jayakrishnan