diff --git a/README.md b/README.md
index b1a0946..e6cc74c 100644
--- a/README.md
+++ b/README.md
@@ -105,12 +105,12 @@ Input data format
The input file should list all completions in
*lexicographical* order.
-For example, see the the file `test_data/trec05_efficiency_queries/trec05_efficiency_queries.completions`.
+For example, see the the file `test_data/trec_05_efficiency_queries/trec_05_efficiency_queries.completions`.
The first column represent the
ID of the completion; the other columns contain the
tokens separated by white spaces.
-(The IDs for the file `trec05_efficiency_queries.completions` are
+(The IDs for the file `trec_05_efficiency_queries.completions` are
fake, i.e., they do not take into account any
particular assignment.)
@@ -119,49 +119,49 @@ preparing the datasets for indexing:
1. The command
- $ extract_dict.py trec05_efficiency_queries/trec05_efficiency_queries.completions
+ $ extract_dict.py trec_05_efficiency_queries/trec_05_efficiency_queries.completions
extract the dictionary
from a file listing all completions in textual form.
2. The command
- $ python map_dataset.py trec05_efficiency_queries/trec05_efficiency_queries.completions
+ $ python map_dataset.py trec_05_efficiency_queries/trec_05_efficiency_queries.completions
maps strings to integer ids.
3. The command
- $ python build_stats.py trec05_efficiency_queries/trec05_efficiency_queries.completions.mapped
+ $ python build_stats.py trec_05_efficiency_queries/trec_05_efficiency_queries.completions.mapped
calulcates the dataset statistics.
4. The command
- $ python build_inverted_and_forward.py trec05_efficiency_queries/trec05_efficiency_queries.completions
+ $ python build_inverted_and_forward.py trec_05_efficiency_queries/trec_05_efficiency_queries.completions
builds the inverted and forward files.
If you run the scripts in the reported order, you will get:
-- `trec05_efficiency_queries.completions.dict`: lists all the distinct
+- `trec_05_efficiency_queries.completions.dict`: lists all the distinct
tokens in the completions sorted in lexicographical
order.
-- `trec05_efficiency_queries.completions.mapped`: lists all completions
+- `trec_05_efficiency_queries.completions.mapped`: lists all completions
whose tokens have been mapped to integer ids
as assigned by a lexicographically-sorted
string dictionary (that should be built from the
-tokens listed in `trec05_efficiency_queries.completions.dict`).
+tokens listed in `trec_05_efficiency_queries.completions.dict`).
Each completion terminates with the id `0`.
-- `trec05_efficiency_queries.completions.mapped.stats` contains some
+- `trec_05_efficiency_queries.completions.mapped.stats` contains some
statistics about the datasets, needed to build
the data structures more efficiently.
- `trec05_efficiency_queries.completions.inverted` is the inverted file.
-- `trec05_efficiency_queries.completions.forward` is the forward file. Note that each list is *not* sorted, thus the lists are the same as the ones contained in `trec05_efficiency_queries.completions.mapped` but sorted in docID order.
+- `trec_05_efficiency_queries.completions.forward` is the forward file. Note that each list is *not* sorted, thus the lists are the same as the ones contained in `trec_05_efficiency_queries.completions.mapped` but sorted in docID order.
Benchmarks
----------
@@ -174,4 +174,4 @@ Live demo
----------
Start the web server with the program `./web_server ` and access the demo at
-`localhost:`.
\ No newline at end of file
+`localhost:`.