Skip to content

[Proposal] for removing Integer.MAX column size limit. #3513

@akashdw

Description

@akashdw

Today the column size for a given segment is limited to ~2GB because of ByteBuffer Integer.Max limitation. At flurry we started hitting this limit for complex columns like Theta Sketches.
Proposal:
To work around this limit, we would like to propose splitting the existing column file into multiple column files, if it exceeds the max limit. We will store metadata for the chunks while writing the columns and while reading we use this metadata to reconstruct the column.

Keeping in mind backward compatibility, here are the changes:

  1. Version 3 of GenericIndexed will be introduced. This version stores the following new fields
  • Number of file splits
  • Number of rows each split stored as power of 2 (all splits, except the “last” one, will have an equivalent number of rows).

Columns which are fit within the current 2GB limit will still continue to use version 1.

  1. In order to do this, the serialization logic needs to have access to the FileSmoosher while serializing, so we propose that GenericIndexedWriter.writeChannel goes

FROM
public void writeToChannel(WritableByteChannel channel)
TO
public void writeToChannel(WritableByteChannel channel, FileSmoosher smoosher)

Passing a null for the FileSmoosher is considered valid and will result in version 3 not being an option.

One of the places that calls GenericIndexedWriter.writeToChannel is the GenericColumnSerializer.writeToChannel. This method will require the same adjustment to its interface. So,

FROM
Public void GenericColumnSerializer.writeToChannel(WritableByteChannel channel)
TO
Public void GenericColumnSerializer.writeToChannel(WritableByteChannel channel, FileSmoosher smoosher)

Presently, complex columns are serialized out using ComplexColumnSerializer which is generic over all complex columns and requires that serialization be done via the ObjectStrategy. This has two implications

  • It makes it hard to support both versions at the same time
  • It makes it impossible for a ComplexColumnSerde to control how it is serialized, sometimes resulting in sub-optimal disk layouts.

Therefore, we propose adding a new method to ComplexMetricSerde:

public GenericColumnSerializer getSerializer(IOPeon peon, String column) {
  return new ComplexColumnSerializer(peon, column, this);
}

This will mean that in order to go beyond the 2GB limit, new ComplexMetricSerde implementations will need to provide their own implementation of this method. We will be adding a new class ComplexColumnSerializerV2 to aide implementations that want to use the new functionality. This will be useable through an incantation like

@Override
  public GenericColumnSerializer getSerializer(IOPeon peon, String metric)
   {
     return ComplexColumnSerializerV2.createWithColumnSize(peon, metric, this.getObjectStrategy(), Integer.MAX_VALUE);
  }
  1. Given how the code is currently written, the above changes will result in a FileSmoosher object being passed from IndexMerger to GenericIndexedWriter while it (the FileSmoosher) already has an OutputStream open.

The writeToChannel implementation is expected to create more files using the FileSmoosher and likely close them before the already opened OutputStream has been closed. This means that FileSmoosher will also need to be updated to handle this sort of usage.

We propose to support this by having the FileSmoosher detect when it already has an OutputStream open and redirect newly opened OutputStreams to new files on the file system. When any of the open OutputStream objects are closed, they will also check to see if any of the other files have been closed in the meantime. If there are OutputStreams that have been closed, they will be copied on to the main smoosh file and the extra underlying file will be cleaned up. This has the downside of introducing an extra copy to these files, but it given that the strategy will only be used when absolutely necessary, this shouldn’t result in any noticeable performance degradation during indexing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions