-
-
Notifications
You must be signed in to change notification settings - Fork 72
Description
Our experimental pisactl script defines analyzers at indexing time and stores them in metadata.
With CIFF, the collection is already parsed and we need to somehow define it but currently it assumes a default analyzer.
Proposal
I think to avoid confusion while setting the analyzer for CIFF (one might think we're setting the analyzer for parsing, but no parsing is being done), we should split analyzer into index-time analyzer and query-time analyzer.
When indexing from a raw source, we give the user the ability to define both. By default, the query analyzer will be the same as index analyzer. We also allow not defining anything and fall back to the default one. This makes sense from the correctness point of view because the same analyzer will be used at indexing and query times.
When indexing from CIFF, we force the user to define the (default) query analyzer. We have no way of knowing what it would be so we cannot silently provide a default one.
Finally, at query time we also allow to override the analyzer from command line.
Because indexes do not support updates, metadata can still have only one (query) analyzer. It's the CLI options that should cleanly differentiate between the two.
We can consider storing the indexing analyzer to serve as the information for how it was indexed in the first place. But that would have to be optional because CIFF won't have it.