-
-
Notifications
You must be signed in to change notification settings - Fork 72
Scorer option for compute_intersection
#620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
compute_intersectioncompute_intersection
tools/compute_intersection.cpp
Outdated
| auto filtered_queries = ranges::views::filter(queries, [&](auto&& query) { | ||
| auto size = query.terms().size(); | ||
| return size < min_query_len || size > max_query_len; | ||
| return size > min_query_len || size < max_query_len; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be min_query_len <= size && size <= max_query_len? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thanks that is a more readable way to put it and that it should include the boundary min/max. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, that plus your initial fix still used &&. Just confirming: this function should return true if size is in [min_query_len, max_query_len] range (inclusive), right? I'm double checking because it's weird it was the opposite before, seems like a very big oversight, so I'm second guessing myself on this... But yeah, it should be right now I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it was odd that it was the opposite before. And I second guessed myself a couple of times, but then verified by way of that the change in this PR was the only way to get the intersections printed out when running the the tool. We don't have tests in place for this tool similar to test/cli/*, but maybe we (or I) should have some there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, this wouldn't be implemented directly in the tool source file but rather in the library and then unit tested. Perhaps subtype arg::Query and have it deal with filtering query? Not sure to be honest. I also wouldn't say it's a hard requirement.
If you have a spare moment to write CLI tests, it would be great. But I won't block this for that, it's good to have this fix in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. While working on this PR I did try to look at moving the
min/max query length filter into include/pisa/intersect.hpp. But I didn't
get to a good solution becuase of the way for_all_subsets works for the
--combinations flag and the --max-term-count filter.
To me a good solution would result in a reasonable design that works well with
for_all_subsets and for Intersection::compute, where either of those
intersection modes may have the min/max query length filter.
Then in tool/compute_intersection there would be no need to pass a
function argument print_intersection which does more than only printing
the intersection (it does the intersection as well as print to stdout).
Also, I did not consider other ways of solving this outside of
pisa/include/intersection.hpp. Thanks, this is a good suggestion and
Args:Query that sounds like a good place to start looking (because it is
more along the lines of a TermPolicy but at the query level).
This could make sense for generating sub-queries too as it is a query
filter in the opposite direction. But then again if it is not used outside
of compute_intersection then probably best to keep that local to the
intersection implementation as it may have additional complications for
example the query ID and so forth.
I would prefer to try and resolve the min/max query filter with unit
tests rather than the CLI tests. So I won't add the CLI tests for now.
|
Also, just to add some output for reference. This output is with the new min/max change (in this PR): |
|
Updated the PR and am looking for feedback on the new changes introduced. Added tests in Any suggestions on whether to continue along this path using Another option might be if they are useful is to rework them under the main |
elshize
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the app tests could be implemented as part of the regular test directory, but they need to be still separate to an extent from the rest. We probably don't want to link app to the lib tests and vice versa. It doesn't need to be done here though.
tools/app.hpp
Outdated
| private: | ||
| int m_min_length = 1; | ||
| int m_max_length = std::numeric_limits<int>::max(); | ||
| CLI::App* m_app = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd rather not set it to nullptr. The reason is that it may suggest that it may not have a set value, but it's not true because the only constructor sets it.
Are there any consequences of not having one variable set here? I don't think so, but I might be wrong. For consistency, we can also not set any of the values here and set the min/max in the constructor as well. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback, sure can move these to the constructor.
The member initializer way is sort of a habbit as as code gets changed later on (for example C.48). But then again the usecase here is quite specific and arg doesn't change that often. And probably more importantly it moving to the constructor will make it consistent with existing code. (Perhaps I will also move query_filter_add_options to the constructor too; then there is no need for a member m_app.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no strong preference as to where to initialize, I just prefer not to initialize anything with a value if it's always overridden in the constructor, because I would want to avoid a situation where we accidentally remove the initialization from the constructor and we end up with explicit nullptr, and static analysis won't detect an issue there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, thanks for the context on asan; will update the patch
Ok yes, thanks for the tips on how it should work if it ends up in The reason I was asking the above is because currently there is no CI/Build I will try leaving the app tests where they are for now and maybe it is |
|
This sounds good. I'm fine moving it to test directory as well. Whatever is easier here, and we can address it in a later PR. We don't have to fix everything here. |
This moves the query length filter out of the main `compute_intersection` binary for automated testing. - Enable `tools/tests` in cmake build - Fix old tests in `tools/tests/test_app.cpp` - Add `args::QueryFilter` tests for `compute_intersection` The `tools/tests` are not yet configured for CI builds as they require both options `PISA_BUILD_TOOLS=ON` and `PISA_ENABLE_TESTING=ON`. Although, they can be run locally. This will be resolved in a later PR (see pisa-engine#620). See pisa-engine#392
Thanks, have left the tests under |
|
Sorry for the late reply, I've been traveling. Just double-checking: this is ready to merge, correct? Or did you plan any more changes? |
|
Yes, this should be good to merge now. No further changes for this one. Thanks! |
Add CLI option to customize scorer params for BM25 and beyond.
Fixes #392
Changes proposed in this pull request:
--scoreroption to tools/compute_intersection.cpp