Skip to content

A demo application to explore the power of the Apache Lucene search library

Notifications You must be signed in to change notification settings

Dentuca/wikicene

Repository files navigation

Wikicene

Wikicene is a demo project I created while investigating the Apache Lucene search library. It is a web application that provides the ability to search through a dump of Wikipedia articles. Its goal is to showcase some of Lucene's features (e.g. different analyzers and query types) in an attractive form.

wikicene frontend

Run it

Before starting the main Wikicene application, you need to generate a dump of Wikipedia articles. To do that, simply run the python script:

$ python3 wikirandom.py

This will begin downloading a random Wikipedia article from this endpoint each couple of seconds and putting it into a random-articles.dump text file (this is the file Wikicene will read on start-up). You can stop the script with CTRL+C. The download cadence can be increased, but we want to avoid spamming such a generous endpoint.

After you have your desired amount of articles in the dump (Wikicene can handle tens of thousands easily), you can start Wikicene with:

$ ./gradlew clean build run

Note that the first time you run it, Wikicene will index all the articles in the dump several times, once per supported analyzer. A normal application would only use one analyzer when indexing, but this is a demo designed to showcase the search behavior with different analyzers. For around 25k articles, expect a first-run time of about 1 or 2 minutes before you can interact with the application, as well as about 0.5 GB of disk usage.

Then you can interact with Wikicene in one of two ways. First, you can make requests via a REST-like API by accessing the endpoint:

localhost:8080/?term=planet

This endpoint will return a JSON response with the results.

However, you'll likely want to interact with Wikicene using its humble frontend, available at:

localhost:8080/frontend/

Software/frameworks used

If you believe I lack any mention or attribution, feel free to contact me.

About

A demo application to explore the power of the Apache Lucene search library

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published