This repository contains the source code for the project developed as part of the Multimedia Information Retrieval and Computer Vision course, within the Master's Degree in Artificial Intelligence and Data Engineering at the University of Pisa for the academic year 2023/2024.
The project implements an information retrieval system capable of efficiently processing, indexing, and querying large collections of textual data. It features a robust indexing system, effective query processing algorithms, and uses optimization techniques to ensure high performance and scalability.
- Implements the Single-Pass In-Memory Indexing (SPIMI) algorithm for efficient indexing.
- Supports various types of queries, including conjunctive and disjunctive queries.
- Employs optimization techniques such as LFU Caching and Skipping Blocks.
- Performance evaluation using standard TREC metrics.
These instructions will help you set up and run the project on your local machine for development and testing purposes.
Ensure you have the following software installed:
- Java JDK 8 or higher
- Apache Maven
To set up the project, follow these steps:
-
Clone the repository:
git clone https://github.com/BaffoBello14/SearchEngine -
Create a folder named "Collection" and insert the "collection.tar.gz" file
-
Build the project with Maven:
mvn clean install -
Execute:
java -jar target/SearchEngine-1.0-SNAPSHOT.jarNote: If it's the first time running the application, it will prompt you to create the index. Follow the on-screen instructions. If you want to change the type of indexing just delete the "IndexData" folder.
Run the automated tests for this system using: mvn test
- Insert in the "Collection" folder the "msmarco-test2019-queries.tsv.gz" file
- Execute:
java -cp target/SearchEngine-1.0-SNAPSHOT.jar it.unipi.MIRCV.PerformanceEvaluation.PerformanceEvaluationOfQueries
Giulio Bello Federico Frati Chang Liu