-
Notifications
You must be signed in to change notification settings - Fork 0
roeicohen10/SearchEngine2020
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
This is a Read Me instruction file for using the Program. Throughout the course “Information Retrieval” we had an assignment – to build a search engine for a selected corpus with an amount of 490k+ documents. The engine parse the documents and work in order to retrieve the most relevant documents for a specific query inserted by the user. This engine use a BM25 Ranking Algorithm in order to retrieve the documents, also using Word2Vec semantic model in order to retrieve documents with the same semantic meaning for inserted queries. Instructions for using the program: Run - pressing on this button will start the process of creating Inverted Index from the corpus. Reset - pressing on this buttion will delete everything the program saved on the Main Memory and on the Hard Disk. Stem - clicking on this box will make the program use Porter Stemmer algorithm. Corpus Path - in this Text box you will write the path to your corpus folder, you can also chose the folder by pressing on the Browse button next to the Text box. Posting Path - in this Text box you will write the path to your Posting folder, you can also chose the folder by pressing on the Browse button next to the Text box. Show Dictionary - pressing on this button will allow you to show the dictionary. in order to show the dictionary you must first Run the program or Load the dictionary from file after Running the program once. Load Dictionary - pressing this button will allow you to load the dictionary. in order to load the dictionary you must first Run the program once. Run Query - pressing on this button will allow you to run the search proccess on the query instered into the text box next to the button. Browse and run queries file - pressing on this button will allow you to choose with File Chooser a text file that contains queries with specific format and run the search proccess on it. Semantic - clicking on this check box will allow you to apply Semantic model on the query search. TREC_EVAL results to - in this Text box you will write the path to the directory that will save the results file in a format that trec_eval program can read. Insert Query - in this Text box you can write one query you would like to run. Workflow for Indexing the Corpus: 1.Press the application start file called "InvertedIndex" 2.Choose the path to the corpus directory. 3.Choose the path to the Posting directory. 4.Check/Uncheck Stem Check Box. 5.Click on Run button. 6.Click on Show Dictionary in order to show the Dicationary the program created. Workflow for Running one query: 1.Insert a query into the Text box "Insert Query" 2.Press on Run Query. 3.The retriveled relevant documents will appear on the list view in the program. 4.*Optional* In order to see if the Document have Entities you can press on the Document name. (Document can have a maximum of 5 entities showed) Workflow for Running queries file: 1.Press on the button "Browse and run queries file". 2.Choose query file and make sure he is written in the right format 3.Wait for the system to retrive documents. 4.The retriveled relevant documents will appear on the list view in the program. 5.*Optional* In order to see if the Document have Entities you can press on the Document name. (Document can have a maximum of 5 entities showed)
About
Search Engine that retrieves the most relevant documents for user queries.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published