Flink: add monitor metrics for Flink sink #5410

stevenzwu · 2022-08-01T17:01:44Z

No description provided.

rdblue · 2022-08-07T23:09:51Z

flink/v1.15/flink/src/main/java/org/apache/iceberg/flink/sink/CommitSummary.java

+  private final AtomicLong deleteFilesRecordCount = new AtomicLong();
+  private final AtomicLong deleteFilesByteCount = new AtomicLong();
+
+  CommitSummary(NavigableMap<Long, WriteResult> pendingResults) {


Shouldn't all of these metrics come from the commit path? I think that these would be produced by core and then passed to Flink by adding a way to set the metrics context.

@nastra, what do you think?

I think that makes a lot of sense to track those metrics in core itself and then have a particular metrics context in order to customize what type of metrics framework to use underneath.

the problem with the Listeners API is that it is a global static. There is no unregister API. it is difficult to associate listener at table scope, as we want to have IcebergFilesCommitter in Flink to register listener for the table that it writes to.

It would be convenient if PendingUpdate#commit return a CommitResult (instead of void). What do you think of adding a new UpdateResult PendingUpdate#apply() method that return the action result? @rdblue @nastra

BTW, this class also consolidate the computations of the file count in various places in the code path. FlinkSink already have the information here. It doesn't have to rely on the callback from core module. Maybe we can share/reuse the CommitSummary class with the core module?

mas-chen · 2022-08-16T01:25:31Z

flink/v1.15/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergStreamWriterMetrics.java

+  private final Histogram deleteFilesSizeHistogram;
+
+  IcebergStreamWriterMetrics(MetricGroup metrics, String fullTableName) {
+    MetricGroup writerMetrics = metrics.addGroup("IcebergStreamWriter", fullTableName);


The group is equivalent to a tag for a tag based reporter in Flink (e.g. Prometheus). Would something like table be clearer name for users who want to aggregate metrics by the full table name?

You can still use IcebergStreamWriter as a group without a value as a means to identify that metric belongs to IcebergStreamWriter in addition to the suggestion

agree. Thanks for your suggestions!

mas-chen · 2022-08-16T01:28:20Z

Does such change require documentation in the iceberg project? Otherwise it would be very difficult for the user to discover these additional metrics

mas-chen · 2022-08-16T01:52:33Z

flink/v1.15/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergStreamWriter.java

+    WriteResult result = writer.complete();
+    writerMetrics.updateFlushResult(result);
    output.collect(new StreamRecord<>(result));
+    writerMetrics.flushDuration(TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - startNano));


Maybe it is clearer and better to avoid unnecessary conversion by using System.currentTimeMillis() and millis for the calculations since there is no extra precision from doing it in nanoseconds? Unless you want the extra precision, for which the return type long wouldn't suffice. Ditto on the similar instances to this calculation

This is intentional (not for the reason of precision though).

nanoTime is monotonically increasing number. it is best fit for measuring duration. currentTimeMillis can jump forward or backward in the case of clock adjustments (e.g. NTP).

https://stackoverflow.com/questions/2978598/will-system-currenttimemillis-always-return-a-value-previous-calls

Thanks for the explanation!

stevenzwu · 2022-08-16T04:12:02Z

Does such change require documentation in the iceberg project? Otherwise it would be very difficult for the user to discover these additional metrics

Doc update is typically done by separate PRs. Yes, we can add the metrics to the Iceberg docs.

Basically, IcebergSourceReader will become part of metric name and table k-v pair will become a tag. This is suggested by Mason Chen in the comment below so that it is more consistent as other Flink connector metrics. apache#5410 (comment)

rdblue · 2022-08-19T16:32:29Z

Thanks, @stevenzwu!

github-actions bot added build flink labels Aug 1, 2022

stevenzwu force-pushed the flink-sink-metrics branch from ac24945 to e35087f Compare August 1, 2022 17:09

stevenzwu closed this Aug 1, 2022

stevenzwu reopened this Aug 1, 2022

stevenzwu closed this Aug 1, 2022

stevenzwu reopened this Aug 1, 2022

rdblue reviewed Aug 7, 2022

View reviewed changes

Flink: add monitor metrics for Flink sink

075af33

stevenzwu force-pushed the flink-sink-metrics branch from 0e3cb16 to 075af33 Compare August 11, 2022 20:21

mas-chen reviewed Aug 16, 2022

View reviewed changes

update metric group name as Mason suggested

740eacc

stevenzwu mentioned this pull request Aug 17, 2022

Flink: fix the bug where metrics are registered in split reader. Also updated reader metric group to be more consistent with Flink metrics style. #5554

Merged

rdblue approved these changes Aug 19, 2022

View reviewed changes

rdblue merged commit 24a057f into apache:master Aug 19, 2022

nastra mentioned this pull request Oct 18, 2022

Flink: Fix NoClassDefFound with Flink runtime jar / Add integration test #6001

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flink: add monitor metrics for Flink sink #5410

Flink: add monitor metrics for Flink sink #5410

Uh oh!

stevenzwu commented Aug 1, 2022

Uh oh!

rdblue Aug 7, 2022

Uh oh!

nastra Aug 8, 2022

Uh oh!

stevenzwu Aug 8, 2022

Uh oh!

stevenzwu Aug 11, 2022 •

edited

Loading

Uh oh!

mas-chen Aug 16, 2022

Uh oh!

stevenzwu Aug 16, 2022

Uh oh!

mas-chen commented Aug 16, 2022

Uh oh!

mas-chen Aug 16, 2022 •

edited

Loading

Uh oh!

stevenzwu Aug 16, 2022

Uh oh!

mas-chen Aug 16, 2022

Uh oh!

stevenzwu commented Aug 16, 2022

Uh oh!

rdblue commented Aug 19, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Flink: add monitor metrics for Flink sink #5410

Flink: add monitor metrics for Flink sink #5410

Uh oh!

Conversation

stevenzwu commented Aug 1, 2022

Uh oh!

rdblue Aug 7, 2022

Choose a reason for hiding this comment

Uh oh!

nastra Aug 8, 2022

Choose a reason for hiding this comment

Uh oh!

stevenzwu Aug 8, 2022

Choose a reason for hiding this comment

Uh oh!

stevenzwu Aug 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mas-chen Aug 16, 2022

Choose a reason for hiding this comment

Uh oh!

stevenzwu Aug 16, 2022

Choose a reason for hiding this comment

Uh oh!

mas-chen commented Aug 16, 2022

Uh oh!

mas-chen Aug 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevenzwu Aug 16, 2022

Choose a reason for hiding this comment

Uh oh!

mas-chen Aug 16, 2022

Choose a reason for hiding this comment

Uh oh!

stevenzwu commented Aug 16, 2022

Uh oh!

rdblue commented Aug 19, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

stevenzwu Aug 11, 2022 •

edited

Loading

mas-chen Aug 16, 2022 •

edited

Loading