[Proposal] for removing Integer.MAX column size limit.

Today the column size for a given segment is limited to ~2GB because of `ByteBuffer Integer.Max` limitation. At flurry we started hitting this limit for complex columns like Theta Sketches.
Proposal:
To work around this limit, we would like to propose splitting the existing column file into multiple column files, if it exceeds the max limit. We will store metadata for the chunks while writing the columns and while reading we use this metadata to reconstruct the column.

Keeping in mind backward compatibility, here are the changes:

1)  Version `3` of GenericIndexed will be introduced. This version stores the following new fields
- Number of file splits
- Number of rows each split stored as power of 2 (all splits, except the “last” one, will have an equivalent number of rows). 

Columns which are fit within the current 2GB limit will still continue to use version 1.

2)  In order to do this, the serialization logic needs to have access to the FileSmoosher while serializing, so we propose that GenericIndexedWriter.writeChannel goes 

FROM
`public void writeToChannel(WritableByteChannel channel)
`
TO 
`public void writeToChannel(WritableByteChannel channel, FileSmoosher smoosher)
`

Passing a `null` for the FileSmoosher is considered valid and will result in version 3 not being an option.

One of the places that calls GenericIndexedWriter.writeToChannel is the GenericColumnSerializer.writeToChannel.  This method will require the same adjustment to its interface.  So,

FROM
`Public void GenericColumnSerializer.writeToChannel(WritableByteChannel channel)
`
TO
`Public void  GenericColumnSerializer.writeToChannel(WritableByteChannel channel, FileSmoosher smoosher)
`

Presently, complex columns are serialized out using ComplexColumnSerializer which is generic over all complex columns and requires that serialization be done via the ObjectStrategy.  This has two implications
- It makes it hard to support both versions at the same time
- It makes it impossible for a ComplexColumnSerde to control how it is serialized, sometimes resulting in sub-optimal disk layouts.

Therefore, we propose adding a new method to ComplexMetricSerde:

```
public GenericColumnSerializer getSerializer(IOPeon peon, String column) {
  return new ComplexColumnSerializer(peon, column, this);
}
```

This will mean that in order to go beyond the 2GB limit, new `ComplexMetricSerde` implementations will need to provide their own implementation of this method.  We will be adding a new class `ComplexColumnSerializerV2` to aide implementations that want to use the new functionality.  This will be useable through an incantation like

```
@Override
  public GenericColumnSerializer getSerializer(IOPeon peon, String metric)
   {
     return ComplexColumnSerializerV2.createWithColumnSize(peon, metric, this.getObjectStrategy(), Integer.MAX_VALUE);
  }
```

3) Given how the code is currently written, the above changes will result in a `FileSmoosher` object being passed from `IndexMerger` to `GenericIndexedWriter` while it (the `FileSmoosher`) already has an OutputStream open.

The `writeToChannel` implementation is expected to create more files using the `FileSmoosher` and likely close them before the already opened OutputStream has been closed.  This means that `FileSmoosher` will also need to be updated to handle this sort of usage.

We propose to support this by having the `FileSmoosher` detect when it already has an OutputStream open and redirect newly opened OutputStreams to new files on the file system.  When any of the open OutputStream objects are closed, they will also check to see if any of the other files have been closed in the meantime.  If there are OutputStreams that have been closed, they will be copied on to the main smoosh file and the extra underlying file will be cleaned up.  This has the downside of introducing an extra copy to these files, but it given that the strategy will only be used when absolutely necessary, this shouldn’t result in any noticeable performance degradation during indexing.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] for removing Integer.MAX column size limit. #3513

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Proposal] for removing Integer.MAX column size limit. #3513

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions