Skip to content

Conversation

@tianchen92
Copy link
Contributor

Related to ARROW-1184.
The Dictionary.equals method does not return True when the dictionaries are equal. This is because equals is not implemented for FieldVector and so that comparison defaults to comparing the two objects only and not the vector data.

Copy link
Member

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this! Can you add a test with some encoded strings also?

@tianchen92
Copy link
Contributor Author

Thanks for doing this! Can you add a test with some encoded strings also?

Thanks, Encoded strings equals are not supported yet and I remarked it as TODO and will do it after #4792 merged, that PR introduces a ByteArrayBuffer for byte[] compare.

@tianchen92
Copy link
Contributor Author

@BryanCutler Thanks a lot for your feedback, I updated this PR, would you mind taking a look again? thanks!

@tianchen92
Copy link
Contributor Author

@emkornfield would you mind take a look at this one? thanks :)

Copy link
Member

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tianchen92 thanks for doing this. I don't want to open a can of worms, but I think to do this right we need to address equals for vectors first and reconcile with what is already in vector.util.Validator. There should also be some unit tests for these across vector types.

If you want to fix up Dictionary.equals to call vector.util.Validator for now as a bandaid, I guess that will be ok

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly implemented in vector.util.Validator, which is used in Integration tests. It should probably be refactored, but it is ok to just call the functions for now, e.g. compareFieldVectors.

@tianchen92
Copy link
Contributor Author

@BryanCutler Thanks a lot for your comments.
Agreed that need implement equals for vector which I am doing in another one #4846.
This one as a bandaid, I refactor Validator logic (make it return boolean value instead of throwing Exception) and call it in Dictionary#equals. And since #4792 has been merged, make equals support for binary type and also add string and binary UT.
Would you mind taking a look again if we could get this merged? thanks a lot!

Copy link
Member

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good in general, but I think you should also be doing the same thing for DictionaryEncoding equals. Also, the integration test failure looks relevant, can you check what went wrong?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain this change? I don't the the exception message should be changing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this change, Validator could throw Exception with different message, since we make it return boolean value, log the explicit message instead of throwing it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to compare encodings with the Validator also, e.g. a function Validator.compareDictionaryEncodings. Isn't Object.equals incorrect here too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Object.equals works fine with DictionaryEncoding since it implemented hashCode & equals already.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at it further, DictionaryEncoder implements equals so I think you should call that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean DictionaryEncoding? I think Object.equals will call DictionaryEncoding#equals?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake, you are correct

@tianchen92
Copy link
Contributor Author

It looks good in general, but I think you should also be doing the same thing for DictionaryEncoding equals. Also, the integration test failure looks relevant, can you check what went wrong?

Thanks a lot for your comments, the relevant test failure is fixed now.

@tianchen92 tianchen92 closed this Jul 17, 2019
@tianchen92 tianchen92 reopened this Jul 17, 2019
Copy link
Member

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tianchen92 I didn't realize you would need to change the exceptions to logs to use the Validator. I don't think we should make that drastic of a change here because it is pretty important to integration tests. I think you should revert the changes to Validator/Integration and we have to options to work around:

  1. As a temporary fix, just call Validator.compareFieldVectors and handle the exception in Dictionary.equals

  2. Move the logic in Validator.compareFieldVectors to FieldVector.equals and call that from Validator and Dictionary.equals

You could also do (1) now and (2) as a followup. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at it further, DictionaryEncoder implements equals so I think you should call that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's pretty important to be able to see the different error messages when an integration test fails and I'm not sure only logging them is good enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to keep original error message, it might not proper to refactor Validator to make it has return type and use it in Dictionary, just copy it logic to Dictionary as previous commit does? Or any good suggestions? Thanks!

@tianchen92
Copy link
Contributor Author

@tianchen92 I didn't realize you would need to change the exceptions to logs to use the Validator. I don't think we should make that drastic of a change here because it is pretty important to integration tests. I think you should revert the changes to Validator/Integration and we have to options to work around:

  1. As a temporary fix, just call Validator.compareFieldVectors and handle the exception in Dictionary.equals
  2. Move the logic in Validator.compareFieldVectors to FieldVector.equals and call that from Validator and Dictionary.equals

You could also do (1) now and (2) as a followup. What do you think?

Sure, sounds reasonable, thanks!

@tianchen92
Copy link
Contributor Author

@BryanCutler Thanks for your effort, PR updated.

@codecov-io
Copy link

codecov-io commented Jul 19, 2019

Codecov Report

Merging #4843 into master will increase coverage by 2.13%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4843      +/-   ##
==========================================
+ Coverage   87.44%   89.58%   +2.13%     
==========================================
  Files         995      661     -334     
  Lines      140479    96636   -43843     
  Branches     1418        0    -1418     
==========================================
- Hits       122841    86570   -36271     
+ Misses      17276    10066    -7210     
+ Partials      362        0     -362
Impacted Files Coverage Δ
cpp/src/arrow/json/converter.cc 90.53% <0%> (-1.78%) ⬇️
cpp/src/arrow/json/chunked-builder.cc 79.91% <0%> (-1.68%) ⬇️
r/src/recordbatch.cpp
r/R/Table.R
js/src/util/fn.ts
go/arrow/array/bufferbuilder.go
r/src/symbols.cpp
rust/datafusion/src/execution/projection.rs
rust/datafusion/src/execution/filter.rs
rust/arrow/src/csv/writer.rs
... and 327 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e1559f2...83242dc. Read the comment docs.

1 similar comment
@codecov-io
Copy link

codecov-io commented Jul 19, 2019

Codecov Report

Merging #4843 into master will increase coverage by 2.13%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4843      +/-   ##
==========================================
+ Coverage   87.44%   89.58%   +2.13%     
==========================================
  Files         995      661     -334     
  Lines      140479    96636   -43843     
  Branches     1418        0    -1418     
==========================================
- Hits       122841    86570   -36271     
+ Misses      17276    10066    -7210     
+ Partials      362        0     -362
Impacted Files Coverage Δ
cpp/src/arrow/json/converter.cc 90.53% <0%> (-1.78%) ⬇️
cpp/src/arrow/json/chunked-builder.cc 79.91% <0%> (-1.68%) ⬇️
r/src/recordbatch.cpp
r/R/Table.R
js/src/util/fn.ts
go/arrow/array/bufferbuilder.go
r/src/symbols.cpp
rust/datafusion/src/execution/projection.rs
rust/datafusion/src/execution/filter.rs
rust/arrow/src/csv/writer.rs
... and 327 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e1559f2...83242dc. Read the comment docs.

@BryanCutler
Copy link
Member

@tianchen92 looks like there was a problem with your merge, can you rebase with master?

@tianchen92
Copy link
Contributor Author

@tianchen92 looks like there was a problem with your merge, can you rebase with master?

Sure, rebased.

Copy link
Member

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but there is just one change I'm not sure of and seem to be a test failure

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change this?

Copy link
Contributor Author

@tianchen92 tianchen92 Jul 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Field implemented equals method but it also compare names, that is to say, two same type vector with different name will not equal here even they have same elements, so compare with getClass is better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to compare Field names for the integration tests. Ideally, we should be able to compare just the data in ValueVectors and the entire FieldVector also. For the dictionary vector, I don't think the field name is important, but shouldn't hurt to include the name check as well. Were you seeing a test failure if you revert this?

Copy link
Contributor Author

@tianchen92 tianchen92 Jul 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like https://github.com/apache/arrow/pull/4843/files#diff-945722295a5f95e681897e48eedfcb08R459 vectors with different names ("v1" "v2") will not equals.
If you think we shouldn't change behavior of Validator, maybe we could refactor this PR and do not use Validator logic after #4933 merged, that PR introduces equals API in ValueVector. What do you think and could you also take a look? thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather not change the behavior of Validator here, can you revert this and we can make it less restrictive later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a TODO note here to clean this up once we can compare vectors without using the Validator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed.

Copy link
Member

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@BryanCutler
Copy link
Member

CI tests all passed, merged to master. Thanks @tianchen92 !

pribor pushed a commit to GlobalWebIndex/arrow that referenced this pull request Oct 24, 2025
Related to [ARROW-1184](https://issues.apache.org/jira/browse/ARROW-1184).
The Dictionary.equals method does not return True when the dictionaries are equal. This is because equals is not implemented for FieldVector and so that comparison defaults to comparing the two objects only and not the vector data.

Closes apache#4843 from tianchen92/ARROW-1184 and squashes the following commits:

3511857 <tianchen> revert
527a9d7 <tianchen> add TODO
0c69588 <tianchen> fix equals
1f41366 <tianchen> fix build
87b7f66 <tianchen> fix
3fe9389 <tianchen> use Validator logic in Dictionary#equals and add UT
be74915 <tianchen> move UT
314d697 <tianchen> ARROW-1184:  Dictionary.equals is not working correctly

Authored-by: tianchen <niki.lj@alibaba-inc.com>
Signed-off-by: Bryan Cutler <cutlerb@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants