Skip to content

Handle analyzers/CIFF in pisactl #618

@elshize

Description

@elshize

Our experimental pisactl script defines analyzers at indexing time and stores them in metadata.

With CIFF, the collection is already parsed and we need to somehow define it but currently it assumes a default analyzer.

Proposal

I think to avoid confusion while setting the analyzer for CIFF (one might think we're setting the analyzer for parsing, but no parsing is being done), we should split analyzer into index-time analyzer and query-time analyzer.

When indexing from a raw source, we give the user the ability to define both. By default, the query analyzer will be the same as index analyzer. We also allow not defining anything and fall back to the default one. This makes sense from the correctness point of view because the same analyzer will be used at indexing and query times.

When indexing from CIFF, we force the user to define the (default) query analyzer. We have no way of knowing what it would be so we cannot silently provide a default one.

Finally, at query time we also allow to override the analyzer from command line.


Because indexes do not support updates, metadata can still have only one (query) analyzer. It's the CLI options that should cleanly differentiate between the two.

We can consider storing the indexing analyzer to serve as the information for how it was indexed in the first place. But that would have to be optional because CIFF won't have it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions