Skip to content

[Bug] compact_database StreamingCompactorSource Busy(max) 100% #5463

@skdfeitian

Description

@skdfeitian

Search before asking

  • I searched in the issues and found nothing similar.

Paimon version

1.0

Compute Engine

flink1.17.1

Minimal reproduce step

public class CompactDatabaseTestSourceBusy extends CompactActionITCaseBase {
public static void main(String[] args) throws Exception {
CompactDatabaseAction action;
action =
createAction(
CompactDatabaseAction.class,
"compact_database",
"--warehouse",
"hdfs:///user/paimon/warehouse_dw",
"--mode",
"combined",
"--including_databases",
"dap_dev_test",
"--table_conf",
"snapshot.num-retained.min=3",
"--table_conf",
"snapshot.time-retained=5m",
"--table_conf",
"full-compaction.delta-commits=5",
"--table_conf",
CoreOptions.CONTINUOUS_DISCOVERY_INTERVAL.key() + "=120s");

    Configuration conf = new Configuration();
    conf.setString(RestOptions.BIND_PORT, "8081-8089");
    conf.setBoolean("rest.flamegraph.enabled",true);
    StreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(conf);
    env.enableCheckpointing(30000);
    action.withStreamExecutionEnvironment(env).build();
    env.executeAsync();

}

}

Image

What doesn't meet your expectations?

My question is why the following two operators have consistently been showing "Busy(max): 100%":
Source: Combine-MultiBucketTables--StreamingCompactorSource
Source: Combined-UnawareBucketTables-StreamingCompactorSource

Under this warehouse, there are only 8 databases, and the dap_dev_test database contains only 2 tables. The computation for merging tables shouldn't consume this much CPU. Moreover, through analyzing the source code, I found that the Thread.sleep(monitorInterval) in the following two functions is indeed working properly:

org.apache.paimon.flink.source.operator.CombinedAwareStreamingSource.Reader#pollNext

org.apache.paimon.flink.source.operator.CombinedUnawareStreamingSource.Reader#pollNext

This indicates that the program enters these functions and sleeps for 120 seconds as intended. Therefore, it is unclear what operations are causing the two operators to remain in the "Busy(max): 100%" state.

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions