Skip to content

Conversation

@stevenzwu
Copy link
Contributor

No description provided.

private final AtomicLong deleteFilesRecordCount = new AtomicLong();
private final AtomicLong deleteFilesByteCount = new AtomicLong();

CommitSummary(NavigableMap<Long, WriteResult> pendingResults) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't all of these metrics come from the commit path? I think that these would be produced by core and then passed to Flink by adding a way to set the metrics context.

@nastra, what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that makes a lot of sense to track those metrics in core itself and then have a particular metrics context in order to customize what type of metrics framework to use underneath.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem with the Listeners API is that it is a global static. There is no unregister API. it is difficult to associate listener at table scope, as we want to have IcebergFilesCommitter in Flink to register listener for the table that it writes to.

It would be convenient if PendingUpdate#commit return a CommitResult (instead of void). What do you think of adding a new UpdateResult PendingUpdate#apply() method that return the action result? @rdblue @nastra

Copy link
Contributor Author

@stevenzwu stevenzwu Aug 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, this class also consolidate the computations of the file count in various places in the code path. FlinkSink already have the information here. It doesn't have to rely on the callback from core module. Maybe we can share/reuse the CommitSummary class with the core module?

private final Histogram deleteFilesSizeHistogram;

IcebergStreamWriterMetrics(MetricGroup metrics, String fullTableName) {
MetricGroup writerMetrics = metrics.addGroup("IcebergStreamWriter", fullTableName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The group is equivalent to a tag for a tag based reporter in Flink (e.g. Prometheus). Would something like table be clearer name for users who want to aggregate metrics by the full table name?

You can still use IcebergStreamWriter as a group without a value as a means to identify that metric belongs to IcebergStreamWriter in addition to the suggestion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree. Thanks for your suggestions!

@mas-chen
Copy link
Contributor

Does such change require documentation in the iceberg project? Otherwise it would be very difficult for the user to discover these additional metrics

WriteResult result = writer.complete();
writerMetrics.updateFlushResult(result);
output.collect(new StreamRecord<>(result));
writerMetrics.flushDuration(TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - startNano));
Copy link
Contributor

@mas-chen mas-chen Aug 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is clearer and better to avoid unnecessary conversion by using System.currentTimeMillis() and millis for the calculations since there is no extra precision from doing it in nanoseconds? Unless you want the extra precision, for which the return type long wouldn't suffice. Ditto on the similar instances to this calculation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional (not for the reason of precision though).

nanoTime is monotonically increasing number. it is best fit for measuring duration. currentTimeMillis can jump forward or backward in the case of clock adjustments (e.g. NTP).

https://stackoverflow.com/questions/2978598/will-system-currenttimemillis-always-return-a-value-previous-calls

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation!

@stevenzwu
Copy link
Contributor Author

Does such change require documentation in the iceberg project? Otherwise it would be very difficult for the user to discover these additional metrics

Doc update is typically done by separate PRs. Yes, we can add the metrics to the Iceberg docs.

stevenzwu added a commit to stevenzwu/iceberg that referenced this pull request Aug 17, 2022
Basically, IcebergSourceReader will become part of metric name and table k-v pair will become a tag. This is suggested by Mason Chen in the comment below so that it is more consistent as other Flink connector metrics.

apache#5410 (comment)
stevenzwu added a commit to stevenzwu/iceberg that referenced this pull request Aug 18, 2022
Basically, IcebergSourceReader will become part of metric name and table k-v pair will become a tag. This is suggested by Mason Chen in the comment below so that it is more consistent as other Flink connector metrics.

apache#5410 (comment)
@rdblue rdblue merged commit 24a057f into apache:master Aug 19, 2022
@rdblue
Copy link
Contributor

rdblue commented Aug 19, 2022

Thanks, @stevenzwu!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants