Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 8 additions & 5 deletions docs/src/guide/inverting.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,15 @@ existence of a forward index in the path `path/to/forward/cw09b`:

$ mkdir -p path/to/inverted
$ ./invert -i path/to/forward/cw09b \
-o path/to/inverted/cw09b \
--term-count `wc -w < path/to/forward/cw09b.terms`
-o path/to/inverted/cw09b

Note that the script requires as parameter the number of terms to be
indexed, which is obtained by embedding the
`wc -w < path/to/forward/cw09b.terms` instruction.
Inverting an index requires the knowledge of the number of terms in
the lexicon ahead of time. In the above example, the `invert` command
assumes that a `cw09b.termlex' file exists from the output of
`parse_collection` which is used to lookup the term count.

Note that the number of terms can be provided using `--term-count` in
case the lexicon is not available or on a different path.

## Inverted index format

Expand Down
8 changes: 7 additions & 1 deletion tools/app.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,13 @@ auto Threads::print_args(std::ostream& os) const -> std::ostream& {
Invert::Invert(CLI::App* app) {
app->add_option("-i,--input", m_input_basename, "Forward index basename")->required();
app->add_option("-o,--output", m_output_basename, "Output inverted index basename")->required();
app->add_option("--term-count", m_term_count, "Number of distinct terms in the forward index");
app->add_option(
"--term-count",
m_term_count,
"Number of distinct terms in the forward index.\n"
"When omitted, the term count from the lexicon\n"
"file `{input}.termlex` is used."
);
}

auto Invert::input_basename() const -> std::string {
Expand Down