![]() |
Choose a document collection by clicking the button "Choose" and selecting a directory. Soekia parses all HTML documents ending in ".htm" or ".html".
If you want to search all subdirectories for HTML files, please select the option "include subdirectories".
The buttons:
![]() |
Opens this help window. |
![]() |
Displays information about Soekia. |
![]() |
The index is stored in a directory. You have to specify a directory on your hard disk. Click the button "Choose" to open a file chooser dialog. Soekia creates a subdirectory called "soekia-index" unless the specified directory has this name.
![]() |
Language | You can choose between English and German. The selected language is important for the stemming algorithm and the stop word list. |
Stop words | Stop words are very frequent words that are not to be listed in the index. There is a predefined list of the 50 most frequent words of the selected language. Alternatively you can specify your own list. |
Stemming | For the English language the famous Porter stemming algorithm is used. For German we developped a simple algorithm that cuts off the most frequent German endings as well as the prefixes ge-, ver- and un-. |
![]() |
Clicking the button "Create index" builds the index for the specified document collection using the selected index parameters. If there exisits already an index, the index is overwritten. If the index creation lasts long, a dialog with progress bar and cancel button will appear. If you cancel the creation the index is corrupt and has to be rebuild in order to use it.
The button "Show index" opens a browser window displaying the index in a table. You can open several windows to compare different indices. Sometimes the programm runs out of memory and cannot display the index.
![]() |
To start a query type some search terms into the text field and click "Search". The order of the search terms has no effect on the result.
The ranking principles determine how the result is sorted. Soekia provides the following ranking principles: