-
Notifications
You must be signed in to change notification settings - Fork 3k
Flink: write the CDC records into apache iceberg tables #1663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
6de9d80
ea1a2d4
a7e2e69
7f06d23
7e4fcf2
6f742f0
9a58c5c
47972c8
8b7bb62
f29e37d
cce17ce
b286bdd
c0f4983
ddb326a
f8c072c
1cb68ad
e41fb8e
b8c4795
54673c1
360bc4b
af75785
b83ac21
a374fd6
11e61de
9deb59d
0195145
3449d87
69b2eb2
ee87c08
fb2e6f8
18a6808
0788b63
0b8ded6
d432d3f
65971bf
fb556e4
7ce7bc5
dd94466
3ba759b
957e3be
72940c3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,48 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one | ||
| * or more contributor license agreements. See the NOTICE file | ||
| * distributed with this work for additional information | ||
| * regarding copyright ownership. The ASF licenses this file | ||
| * to you under the Apache License, Version 2.0 (the | ||
| * "License"); you may not use this file except in compliance | ||
| * with the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, | ||
| * software distributed under the License is distributed on an | ||
| * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| * KIND, either express or implied. See the License for the | ||
| * specific language governing permissions and limitations | ||
| * under the License. | ||
| */ | ||
|
|
||
| package org.apache.iceberg; | ||
|
|
||
| import java.io.Closeable; | ||
| import java.util.Iterator; | ||
|
|
||
| public interface ContentFileWriter<T, R> extends Closeable { | ||
|
|
||
| void write(R record); | ||
|
|
||
| default void writeAll(Iterator<R> values) { | ||
| while (values.hasNext()) { | ||
| write(values.next()); | ||
| } | ||
| } | ||
|
|
||
| default void writeAll(Iterable<R> values) { | ||
| writeAll(values.iterator()); | ||
| } | ||
|
|
||
| /** | ||
| * Returns the length of this file. | ||
| */ | ||
| long length(); | ||
|
|
||
| /** | ||
| * Return a {@link ContentFile} which is either {@link DeleteFile} or {@link DataFile}. | ||
| */ | ||
| T toContentFile(); | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one | ||
| * or more contributor license agreements. See the NOTICE file | ||
| * distributed with this work for additional information | ||
| * regarding copyright ownership. The ASF licenses this file | ||
| * to you under the Apache License, Version 2.0 (the | ||
| * "License"); you may not use this file except in compliance | ||
| * with the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, | ||
| * software distributed under the License is distributed on an | ||
| * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| * KIND, either express or implied. See the License for the | ||
| * specific language governing permissions and limitations | ||
| * under the License. | ||
| */ | ||
|
|
||
| package org.apache.iceberg; | ||
|
|
||
| import org.apache.iceberg.encryption.EncryptedOutputFile; | ||
|
|
||
| /** | ||
| * Factory to create a new {@link ContentFileWriter} to write INSERT or DELETE records. | ||
| * | ||
| * @param <T> Content file type, it's either {@link DataFile} or {@link DeleteFile}. | ||
| * @param <R> data type of the rows to write. | ||
| */ | ||
| public interface ContentFileWriterFactory<T, R> { | ||
|
|
||
| /** | ||
| * Create a new {@link ContentFileWriter} | ||
| * | ||
| * @param partitionKey an partition key to indicate which partition the rows will be written. Null if it's | ||
| * unpartitioned. | ||
| * @param outputFile an OutputFile used to create an output stream. | ||
| * @param fileFormat File format. | ||
| * @return a newly created {@link ContentFileWriter} | ||
| */ | ||
| ContentFileWriter<T, R> createWriter(PartitionKey partitionKey, EncryptedOutputFile outputFile, | ||
| FileFormat fileFormat); | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,79 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one | ||
| * or more contributor license agreements. See the NOTICE file | ||
| * distributed with this work for additional information | ||
| * regarding copyright ownership. The ASF licenses this file | ||
| * to you under the Apache License, Version 2.0 (the | ||
| * "License"); you may not use this file except in compliance | ||
| * with the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, | ||
| * software distributed under the License is distributed on an | ||
| * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| * KIND, either express or implied. See the License for the | ||
| * specific language governing permissions and limitations | ||
| * under the License. | ||
| */ | ||
|
|
||
| package org.apache.iceberg; | ||
|
|
||
| import java.io.IOException; | ||
| import java.nio.ByteBuffer; | ||
| import org.apache.iceberg.encryption.EncryptionKeyMetadata; | ||
| import org.apache.iceberg.io.FileAppender; | ||
| import org.apache.iceberg.relocated.com.google.common.base.Preconditions; | ||
|
|
||
| public class DataFileWriter<T> implements ContentFileWriter<DataFile, T> { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Much of this duplicates
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think so, because |
||
| private final FileAppender<T> appender; | ||
| private final FileFormat format; | ||
| private final String location; | ||
| private final PartitionKey partitionKey; | ||
| private final PartitionSpec spec; | ||
| private final ByteBuffer keyMetadata; | ||
| private DataFile dataFile = null; | ||
|
|
||
| public DataFileWriter(FileAppender<T> appender, FileFormat format, | ||
| String location, PartitionKey partitionKey, PartitionSpec spec, | ||
| EncryptionKeyMetadata keyMetadata) { | ||
| this.appender = appender; | ||
| this.format = format; | ||
| this.location = location; | ||
| this.partitionKey = partitionKey; // set null if unpartitioned. | ||
| this.spec = spec; | ||
| this.keyMetadata = keyMetadata != null ? keyMetadata.buffer() : null; | ||
| } | ||
|
|
||
| @Override | ||
| public void write(T row) { | ||
| appender.add(row); | ||
| } | ||
|
|
||
| @Override | ||
| public long length() { | ||
| return appender.length(); | ||
| } | ||
|
|
||
| @Override | ||
| public DataFile toContentFile() { | ||
| Preconditions.checkState(dataFile != null, "Cannot create data file from unclosed writer"); | ||
| return dataFile; | ||
| } | ||
|
|
||
| @Override | ||
| public void close() throws IOException { | ||
| if (dataFile == null) { | ||
| appender.close(); | ||
| this.dataFile = DataFiles.builder(spec) | ||
| .withEncryptionKeyMetadata(keyMetadata) | ||
| .withFormat(format) | ||
| .withPath(location) | ||
| .withFileSizeInBytes(appender.length()) | ||
| .withPartition(partitionKey) // set null if unpartitioned | ||
| .withMetrics(appender.metrics()) | ||
| .withSplitOffsets(appender.splitOffsets()) | ||
| .build(); | ||
| } | ||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,53 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one | ||
| * or more contributor license agreements. See the NOTICE file | ||
| * distributed with this work for additional information | ||
| * regarding copyright ownership. The ASF licenses this file | ||
| * to you under the Apache License, Version 2.0 (the | ||
| * "License"); you may not use this file except in compliance | ||
| * with the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, | ||
| * software distributed under the License is distributed on an | ||
| * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| * KIND, either express or implied. See the License for the | ||
| * specific language governing permissions and limitations | ||
| * under the License. | ||
| */ | ||
|
|
||
| package org.apache.iceberg.deletes; | ||
|
|
||
| import org.apache.iceberg.MetadataColumns; | ||
| import org.apache.iceberg.Schema; | ||
| import org.apache.iceberg.relocated.com.google.common.base.Preconditions; | ||
| import org.apache.iceberg.types.Types; | ||
|
|
||
| public class DeletesUtil { | ||
|
|
||
| private DeletesUtil() { | ||
| } | ||
|
|
||
| public static Schema pathPosSchema() { | ||
| return new Schema( | ||
| MetadataColumns.DELETE_FILE_PATH, | ||
| MetadataColumns.DELETE_FILE_POS); | ||
| } | ||
|
|
||
| public static Schema pathPosSchema(Schema rowSchema) { | ||
| Preconditions.checkNotNull(rowSchema, "Row schema should not be null when constructing the pos-delete schema."); | ||
|
|
||
| // the appender uses the row schema wrapped with position fields | ||
| return new Schema( | ||
| MetadataColumns.DELETE_FILE_PATH, | ||
| MetadataColumns.DELETE_FILE_POS, | ||
| Types.NestedField.required( | ||
| MetadataColumns.DELETE_FILE_ROW_FIELD_ID, "row", rowSchema.asStruct(), | ||
| MetadataColumns.DELETE_FILE_ROW_DOC)); | ||
| } | ||
|
|
||
| public static Schema posDeleteSchema(Schema rowSchema) { | ||
| return rowSchema == null ? pathPosSchema() : pathPosSchema(rowSchema); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little hesitant to add this. I think it is needed so that the same
RollingContentFileWritercan be used for delete files and data files, but this introduces a lot of changes and new interfaces just to share about 20 lines of code. I'm not sure that it is worth the extra complexity, when compared to having one for data files and one for delete files.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my throught, the whole write workflow should be in the following:
For each executor/task in compute engine, it have a TaskWriter to write
generic record. If it use the fanout policy to write records then it will
have multiple DeltaWriters and each one will write records for a single
partition, while if use the grouped polciy in spark then we might just have one
DeltaWriter in TaskWriter. The DeltaWriter could accept both INSERT/EQ-DELETE/POS-DELETE
records, each kind of record we will have a RollingFileWriter which will roll its file appender
to a newly opened file appender once its size reach the threshold.
In the RollingFileWriter, we should have the same logic. So in theory it's good to define an abstracted ContentFileWriter so that we don't have to define three kinds of RollingFileWriter.
Another way is to define a BaseRollingFileWriter and put the common logic there, then the DeltaWriter would use the BaseRollingFileWriter. when constructing the DeltaWriter, we would need to pass those subclasses PosDeleteRollingFileWriter, EqDeleteRollingFileWriter, DataRollingFileWriter to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this way, we will need to created the PosDeleteRollingFileWriter, EqDeleteRollingFileWriter, DataRollingFileWriter for different engines, for example FlinkPosDeleteRollingWriter, SparkPosDeleteRollingWriter because different engines need to contruct different FileAppenders to convert the specified data type into the unified parquet/orc/avro files. I think that won't be less complexity compared to current solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we would require a different class for each engine. The file writers are currently created using an
AppenderFactoryand we could continue using that.Also, we would not need 3 rolling writers that are nearly identical. We would only need 2 because the position delete writer will be substantially different because of its sort order requirement.
Because deletes may come in any order relative to inserts and we need to write out a sorted delete file, we will need to buffer the deletes in memory. That's not as expensive as it seems at first because the file location will be reused (multiple deletes in the same data file) and so the main cost is the number of positions that get deleted, which is the number of rows written and deleted in the same checkpoint per partition. The position delete writer and logic to roll over to a new file is probably not going to be shared. I don't think I would even build a rolling position delete writer unless we see real cases where it is needed.
That leaves just the equality delete writer and the data file writer. I think it would be cleaner to just have two rolling file writers because the rolling logic is so small. I went ahead and started a PR to show what it would look like: #1802
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read the PR, here is my feeling:
First of all, I like the idea to use a single
FileAppenderFactoryto customize different writers for different computing engines, it's unified and graceful. Developers would find it's easy to understand and customize.Second, I agreed that it's necessary to consider the sort order for position delete writer. We have to sort the pairs in memory (it's similar to the process about flushing the sorted memstore to HFiles in HBase.), once our memory size reached the threshold then would flush it to pos-delete file, then the rolling policy is decided by the memory size rather than the current file size. Sounds reasonable to make it to be a separate pos-writer.
Third, I'd like to finish the whole cdc write path work (PoC) based on #1802 to see whether there're other issues.
Thanks.