Skip to content

Conversation

@spiridonov
Copy link

Rationale for this change

apache/arrow-go is used in the new query engine in Grafana Loki and we are actively working on improving its performance. Here are few low hanging fruits that reduce allocations by many gigabytes when used on the hot path.

What changes are included in this PR?

This PR reduces amount of allocated objects.

  • arrow/datatype_binary.go: straightforward, those Layout objects are allocated on each call currently.
  • arrow/schema.go: Schema.Fields() does not return a clone anymore and NewSchemaWithEndian() does not clone inputs anymore. All call sites are func-signature compatible, but should not modify those args/results anymore. In our opinion this is a reasonable trade-off for higher performance. I am happy to discuss other ways to implement that.

Are these changes tested?

  • Tested with the current apache/arrow-go unit test suite.
  • Tested with grafana/loki test suite for the query engine.

Are there any user-facing changes?

Yes. Explained above.

@spiridonov spiridonov requested a review from zeroshade as a code owner January 20, 2026 21:14
Comment on lines 172 to +173
sc := &Schema{
fields: make([]Field, 0, len(fields)),
fields: fields,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only problem I see with this is that currently we explicitly do the copy to ensure that if the caller maintains a reference to the slice they can't modify the fields after calling this.

By just putting the slice in there, then technically a caller could modify the slice afterwards and possibly screw up the schema.

Is copying the fields really causing that much memory allocations and leakage?

Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, just a couple questions / nit picks

Comment on lines 177 to +178
if metadata != nil {
sc.meta = metadata.clone()
sc.meta = *metadata
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, since the members of the Metadata object are slices (pointer types) just copying the metadata struct like this means that the caller could potentially modify the metadata after constructing the schema. This is less concerning than possibly modifying the fields slice so I'm more inclined to not see this as an issue here.

Comment on lines -197 to +199
func (sc *Schema) Fields() []Field {
fields := make([]Field, len(sc.fields))
copy(fields, sc.fields)
return fields
}

// Fields returns the fields of the schema.
// The result is not a clone of the schema's fields and therefore should not be modified by the caller.
func (sc *Schema) Fields() []Field { return sc.fields }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A while back we explicitly converted this into being a clone operation.

Instead of changing this, can we introduce a function that returns iter.Seq[Field]? that way we can avoid the slice copy while still preventing a user from being able to modify the slice.

Comment on lines +216 to 218
} else {
return nil, false
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} else {
return nil, false
}
}
return nil, false
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants