Skip to content

Conversation

@jhrotko
Copy link
Contributor

@jhrotko jhrotko commented Jan 7, 2026

What's Changed

The current UUID vector implementation creates new buffer slices when reading values through holders, which has several drawbacks:

  • Memory overhead: Each slice creates a new ArrowBuf object
  • Performance impact: Buffer slicing is slower than direct buffer indexing
  • Inconsistency: Other fixed-width types (like Decimal) use buffer indexing with a start offset field

Proposed Changes

  1. Add start field to UUID holders to track buffer offsets:
    • UuidHolder: Add public int start = 0;
    • NullableUuidHolder: Add public int start = 0;
  2. Update UuidVector to use buffer indexing
  3. Update readers and writers

Related Work

Closes #948

@github-actions

This comment has been minimized.

@jhrotko jhrotko marked this pull request as ready for review January 7, 2026 18:08
@jbonofre
Copy link
Member

jbonofre commented Jan 7, 2026

Should we use a start offset ? Why not having diifferent vectors ? Just wondering ...

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also means that a holder can inadvertently extend the lifetime of a vector's backing buffer, right? (I suppose slicing would as well.) If we care about efficiency maybe storing two longs is better?

@jbonofre
Copy link
Member

jbonofre commented Jan 8, 2026

@lidavidm yes, that's my point as well. I wonder if we should not "square" the vector (with two "edges").

@gszadovszky
Copy link
Contributor

The current approach is just following what we have for variable length binaries (e.g. VarChar). I'm not against using the two-longs approach, but whatever stands against this one, stands for the var-len holders as well.
Also, DecimalHolder is very similar to this one, while the two-longs approach would work in that case as well.

@jhrotko
Copy link
Contributor Author

jhrotko commented Jan 8, 2026

Thanks for the feedback. After looking at this more carefully, let me be clear of my current understanding of these two options:

The main concern is that when a holder stores an ArrowBuf reference, it keeps the entire buffer alive even after the vector is closed. This is especially problematic if you're saving holders for later use, you end up keeping large buffers in memory when you only need a few values.

Did you mean something like this?
take the example:

// Scenario: Process UUIDs from a large batch, keep some for later
public class UuidProcessor {
    private List<UuidHolder> importantUuids = new ArrayList<>();
    
    public void processBatch(UuidVector vector) {
        // Vector has 100,000 UUIDs = 1.6 MB of data
        UuidHolder holder = new UuidHolder();
        
        for (int i = 0; i < vector.getValueCount(); i++) {
            vector.get(i, holder);
            
            // Check if this UUID is "important"
            if (isImportant(holder)) {
                
                UuidHolder saved = new UuidHolder();
                saved.buffer = holder.buffer;  // ← References entire 1.6 MB buffer!
                saved.start = holder.start;
                saved.isSet = holder.isSet;
                importantUuids.add(saved);
            }
        }
        
        // Done processing, close the vector
        vector.close();
        
        // Vector is closed, but buffer is not freed
        // Even if we only saved 10 UUIDs (160 bytes), we're keeping 1.6 MB alive
    }
}
sequenceDiagram
    participant App as Application
    participant Vector as UuidVector
    participant Buffer as ArrowBuf (1000 UUIDs)
    participant Holder as UuidHolder
    participant Allocator as BufferAllocator
    
    Note over App,Allocator: Step 1: Create vector with 1000 UUIDs
    App->>Vector: allocateNew(1000)
    Vector->>Allocator: allocate 16,000 bytes
    Allocator->>Buffer: Create buffer (refCount=1)
    Buffer-->>Vector: buffer reference
    
    Note over App,Allocator: Step 2: Read ONE UUID into holder
    App->>Vector: get(0, holder)
    Vector->>Holder: holder.buffer = vector.buffer
    Vector->>Holder: holder.start = 0
    Note over Buffer: refCount=2 (vector + holder)
    
    Note over App,Allocator: Step 3: Close vector (done with it)
    App->>Vector: close()
    Vector->>Buffer: release() - refCount=2→1
    Note over Buffer: ❌ Buffer NOT freed!<br/>Holder still references it
    
    Note over App,Allocator: Step 4: Application keeps holder
    Note over Holder: Holder only needs 16 bytes<br/>but keeps 16,000 bytes alive!
    Note over Buffer: ❌ 15,984 bytes wasted!
    
    Note over App,Allocator: Step 5: Eventually holder goes out of scope
    App->>Holder: (garbage collected)
    Holder->>Buffer: release() - refCount=1→0
    Buffer->>Allocator: free memory
    Note over Buffer: ✅ Finally freed
Loading

I initially modeled UUID after VarChar and Decimal, but if this is the lifetime risk for VarcharHolder we accept this risk since copying can be very expensive while for a UuidHolder copying is trivial, so there's no reason to take on the buffer lifetime risk.

@lidavidm
Copy link
Member

lidavidm commented Jan 8, 2026

I don't think Varchar is an appropriate comparison here since it is variable-length. UUID is a fixed length (known at compile-time, unlike fixed-length binary) so it can be simplified. I assume both memory-wise and performance-wise, storing two longs is preferable to slicing/copying/storing a reference + length, and less prone to issues with forgetting to retain/release the buffer.

UuidHolder uuidHolder = new UuidHolder();
uuidReader.read(uuidHolder);
UUID actualUuid = UuidUtility.uuidFromArrowBuf(uuidHolder.buffer, 0);
UUID actualUuid = new UUID(uuidHolder.mostSigBits, uuidHolder.leastSigBits);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worthwhile to just have a UUID getUUID() method on the holder?

} else {
return getBufferSlicePostNullCheck(index);
holder.isSet = 1;
final ByteBuffer bb = ByteBuffer.wrap(getUnderlyingVector().getObject(index));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not possible to directly use ArrowBuf#getLong? (Or even MemoryUtil#getLong + a manual bounds check to avoid two bounds checks, though I'd expect the JIT would optimize that out)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you take a look again?

@jbonofre jbonofre added the enhancement PRs that add or improve features. label Jan 9, 2026
@github-actions github-actions bot added this to the 19.0.0 milestone Jan 9, 2026
@jhrotko
Copy link
Contributor Author

jhrotko commented Jan 9, 2026

Let me do a quick benchmark just to compare both versions

@jhrotko
Copy link
Contributor Author

jhrotko commented Jan 9, 2026

The results of the benchmarking comparing the two versions (v1) buffer indexing from e35ed7c and (v2) two MSB and LSB longs UUID holder implementations across 5 scales (1k, 10k, 100k, 1M, 10M elements) for 7 different operations.

There were no major differences with exception of the test getWithUuidHolder where there was a 65% slow down with v2. I can provide full data for all methods if requested.

Scale v1 (µs/op) v2 (µs/op) Diff (µs) % Change v1/v2 ratio
1k 1.000 1.666 +0.666 +66.6% 1.67x slower
10k 9.954 16.462 +6.508 +65.4% 1.65x slower
100k 102.048 162.946 +60.898 +59.7% 1.60x slower
1M 1043.961 1669.155 +625.194 +59.9% 1.60x slower
10M 10121.154 16718.845 +6597.691 +65.2% 1.65x slower

The getWithUuidHolder regression is due to different implementation approaches:

V1 (ArrowBuf reference):

holder.buffer = getDataBuffer();      // Copy pointer (8 bytes)
holder.start = getStartOffset(index); // Copy offset (4 bytes)
// Total: 12 bytes copied, NO data reading (deferred work)

V2 (MSB/LSB longs):

holder.mostSigBits = Long.reverseBytes(dataBuffer.getLong(start));      // Read + reverse
holder.leastSigBits = Long.reverseBytes(dataBuffer.getLong(start + 8)); // Read + reverse
// Total: 16 bytes READ + 2 byte reversals (immediate work)

The old version defers work (just stores references), while the new version does work upfront (reads and byte-reverses data -> O(N)).

My point of view

Given the 65% performance regression and the importance of maintaining Arrow's zero-copy design pattern, I would prefer the buffer reference approach v1. This aligns with how other holder types handle data larger than 8 bytes (Decimal, VarChar, and VarBinary) all use buffer references to enable zero-copy access. At 16 bytes, UUID fits this same category and should follow the established pattern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking-change enhancement PRs that add or improve features.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve memory management in UUID vector

4 participants