-
Notifications
You must be signed in to change notification settings - Fork 89
perf(arrow): Reduce the amount of allocated objects #645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
perf(arrow): Reduce the amount of allocated objects #645
Conversation
| sc := &Schema{ | ||
| fields: make([]Field, 0, len(fields)), | ||
| fields: fields, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only problem I see with this is that currently we explicitly do the copy to ensure that if the caller maintains a reference to the slice they can't modify the fields after calling this.
By just putting the slice in there, then technically a caller could modify the slice afterwards and possibly screw up the schema.
Is copying the fields really causing that much memory allocations and leakage?
zeroshade
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this, just a couple questions / nit picks
| if metadata != nil { | ||
| sc.meta = metadata.clone() | ||
| sc.meta = *metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above, since the members of the Metadata object are slices (pointer types) just copying the metadata struct like this means that the caller could potentially modify the metadata after constructing the schema. This is less concerning than possibly modifying the fields slice so I'm more inclined to not see this as an issue here.
| func (sc *Schema) Fields() []Field { | ||
| fields := make([]Field, len(sc.fields)) | ||
| copy(fields, sc.fields) | ||
| return fields | ||
| } | ||
|
|
||
| // Fields returns the fields of the schema. | ||
| // The result is not a clone of the schema's fields and therefore should not be modified by the caller. | ||
| func (sc *Schema) Fields() []Field { return sc.fields } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A while back we explicitly converted this into being a clone operation.
Instead of changing this, can we introduce a function that returns iter.Seq[Field]? that way we can avoid the slice copy while still preventing a user from being able to modify the slice.
| } else { | ||
| return nil, false | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| } else { | |
| return nil, false | |
| } | |
| } | |
| return nil, false | |
| } |
Rationale for this change
apache/arrow-gois used in the new query engine in Grafana Loki and we are actively working on improving its performance. Here are few low hanging fruits that reduce allocations by many gigabytes when used on the hot path.What changes are included in this PR?
This PR reduces amount of allocated objects.
arrow/datatype_binary.go: straightforward, thoseLayoutobjects are allocated on each call currently.arrow/schema.go:Schema.Fields()does not return a clone anymore andNewSchemaWithEndian()does not clone inputs anymore. All call sites are func-signature compatible, but should not modify those args/results anymore. In our opinion this is a reasonable trade-off for higher performance. I am happy to discuss other ways to implement that.Are these changes tested?
apache/arrow-gounit test suite.grafana/lokitest suite for the query engine.Are there any user-facing changes?
Yes. Explained above.