NDCG=0.0 instead of nan for IDCG=0.0 by lironT74 · Pull Request #25 · joaopalotti/trectools

lironT74 · 2021-05-18T09:53:57Z

Hi,
I believe this update is necessary, since if IDCG is 0 then NDCG should be 0, and not nan.

guidozuc · 2021-05-18T10:56:46Z

thanks @lironT74. This is a quite particular case.
Mathematically, to compute nDCG, the DCG value is divided by the IDCG value. Then, if IDCG=0, x/0 = nan, i.e. dividing a number by zero is undefined.
In practice, if IDCG=0, it means there are no relevant documents for that query in the qrels. Thus, one should exclude the query from the dataset or the evaluation: whatever you do on that query, will be impossible to evaluate (as you have no relevant documents for that query).

ishnid · 2021-05-18T13:22:40Z

I can see both sides :-)

I would suggest that the project should aim to follow whatever behaviour the official trec_eval has in situations like these.

lironT74 · 2021-05-23T09:10:40Z

@guidozuc I agree, but as @ishnid mentioned, trec_eval outputs 0.0 for those cases.

joaopalotti

Hi all,

Thanks for your comments, @lironT74, @ishnid and @guidozuc.
This issue here is somewhat close to #24. For the input that we got from there, we have recently modified some of the metrics to behave closer to trec_eval for particular corner cases like the one that you mentioned.

Unfortunately, your PR does not fix the problem as a whole. For example, get_ndcg with per_query==True would return a dataframe with all known queries (even the ones that are not in the run) and doing something like returned_df.mean() would result in a value that is different than running get_ndcg with per_query==False. I have taken this into account in my latest try to fix this issue (please see this commit here).
But while it fixes the problem for NDCG, we still need to propagate this fix to other metrics.
Would you be up for this challenge, @lironT74?

trectools/trec_eval.py

lironT74 · 2021-05-24T07:41:07Z

@joaopalotti I see. It seems that the fix is changing other metrics which might suffer from this issue the (almost exact) same way you did for NDCG + the fillna per query fix, correct?

I can work on it in my spare time, but I am not sure which additional metrics need to be changed.

joaopalotti · 2021-08-03T18:04:06Z

Thanks @lironT74, your suggestion was taken and included in the last commit!

NDCG=0.0 instead of nan for IDCG=0.0

6a0a73d

joaopalotti requested changes May 23, 2021

View reviewed changes

trectools/trec_eval.py Outdated Show resolved Hide resolved

now fillna is "inplace" for ndcg_per_query

b6ddb82

joaopalotti closed this Aug 3, 2021

TSoli mentioned this pull request Dec 11, 2024

Fixed nan returned for ndcg, rprec if no relevant docs retrieved #51

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NDCG=0.0 instead of nan for IDCG=0.0#25

NDCG=0.0 instead of nan for IDCG=0.0#25
lironT74 wants to merge 2 commits intojoaopalotti:masterfrom
lironT74:master

lironT74 commented May 18, 2021

Uh oh!

guidozuc commented May 18, 2021

Uh oh!

ishnid commented May 18, 2021

Uh oh!

lironT74 commented May 23, 2021

Uh oh!

joaopalotti left a comment

Uh oh!

Uh oh!

lironT74 commented May 24, 2021

Uh oh!

joaopalotti commented Aug 3, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

lironT74 commented May 18, 2021

Uh oh!

guidozuc commented May 18, 2021

Uh oh!

ishnid commented May 18, 2021

Uh oh!

lironT74 commented May 23, 2021

Uh oh!

joaopalotti left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lironT74 commented May 24, 2021

Uh oh!

joaopalotti commented Aug 3, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants