-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-7386: [C#] Array offset does not work properly #6029
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| @@ -0,0 +1,107 @@ | |||
| using Apache.Arrow.Tests.Fixtures; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a license header?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your comment!
I added the license header.
| { | ||
| var offsets = ValueOffsets; | ||
| var offset = Offset + index; | ||
| var offset = index; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this variable and use index directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified.
| //Check all offset and length | ||
| for (var offset = 0; offset < totalLength; offset++) | ||
| { | ||
| for(var length = 1; length + offset <= totalLength; length++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a space after for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I formatted all the changed files.
Add a license header Format codes
| public class ArrowArrayTests | ||
| { | ||
| [Fact] | ||
| public void SliceArray() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for testing Slice is that Slice changes Offset internally.
kou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@eerhardt Do you want to review this too?
| [MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
| public int GetValueLength(int index) | ||
| { | ||
| var offsets = ValueOffsets; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this?
If we remove this, do we call public ReadOnlySpan<int> ValueOffsets twice?
eerhardt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks good. Just a few minor things to fix up.
| var offset = Offset + index; | ||
|
|
||
| return offsets[offset + 1] - offsets[offset]; | ||
| return ValueOffsets[index + 1] - ValueOffsets[index]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to check the index for out of range?
Also - no need to call the property twice. Can cache the span in a local variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your comments.
- I added a boundary check to this method and similar methods.
- I modified to use local variable.
| Assert.True( | ||
| baseArray.ValueOffsetsBuffer.Span.CastTo<int>().Slice(slicedArray.Offset, slicedArray.Length + 1) | ||
| .SequenceEqual(slicedArray.ValueOffsets)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we validate the Values as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added Values check and GetValue check
Fix GetDate Add boundary check to GetValue and so on Improve tests
|
Looking back at the changes, I noticed Array.IsValid and DateArray.GetDate have the same issue and fixed them. |
| [MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
| public int GetValueOffset(int index) | ||
| { | ||
| if (index < 0 || index >= Length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting the value offset at index == Length is valid, right? That's how you can tell how long the last element is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I fixed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting the value offset at index == Length is valid, right?
I think that this is invalid. Because it returns the offset of nonexistent value.
That's how you can tell how long the last element is.
I think that users should use GetValueLength() instead of computing value length by GetValueOffset() values. If advanced users want to compute by themselves, using ValueOffsets directly may be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The more I look at this method, the less value I see in it. Maybe we should just remove the method all together and people can index into ValueOffsets directly. That’s all this method does.
It used to do the “Offset + index” calculation, but that is no longer necessary since the ValueOffsets does that for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with removing this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with removing but I'm a little concerned about the impact on current users.
If we don't need to consider it, I'll remove this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about marking this method as "deprecated" in the next release, then remove this method in the future release?
(I'm not sure how to do this in C#. Documentation and warning message?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In .NET there is an [Obsolete] attribute you can apply to a method. Callers will receive a build time warning that the method is obsolete. We can add that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okey. I'll add an [Obsolete] attribute.
|
|
||
| return DateTimeOffset.FromUnixTimeMilliseconds(value * MillisecondsPerDay); | ||
| var value = GetValue(index); | ||
| return value.HasValue ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it wouldn't be simpler to just write:
if (!value.HasValue)
{
return value; // or `return default;`
}
return DateTimeOffset.FromUnixTimeMilliseconds(value.Value * MillisecondsPerDay);That way a reader doesn't have to reason about null as DateTimeOffset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I fixed it.
Refactor GetDate
|
I have a concern that But we can work on this as a follow-up task if it's needed. I'll merge this for now. |
"Array.Values" always starts from first index of "ValueBuffer".
It should start from "Offset".