Skip to content

Conversation

@wecharyu
Copy link
Contributor

Change Logs

  1. Add clearJobStatus api in HoodieEngineContext
  2. Add call of clearJobStatus after setJobStatus.

Impact

Fix the incorrect job group and descriptions.

Risk level (write none, low medium or high below)

None.

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

String keyField = hoodieTable.getMetaClient().getTableConfig().getRecordKeyFieldProp();

List<Pair<String, HoodieBaseFile>> baseFilesForAllPartitions = HoodieIndexUtils.getLatestBaseFilesForAllPartitions(partitions, context, hoodieTable);
context.clearJobStatus();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be added. Key range loading has not finished here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code after here will not create new job, it seems OK to clear job status here. WDYT?

}

// Now delete partially written files
context.setJobStatus(this.getClass().getSimpleName(), "Delete all partially written files: " + config.getTableName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why deleting this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This status will be overwritten in HoodieTable#deleteInvalidFilesByPartitions, so just delete it.

// perform index loop up to get existing location of records
context.setJobStatus(this.getClass().getSimpleName(), "Tagging: " + table.getConfig().getTableName());
taggedRecords = tag(dedupedRecords, context, table);
context.clearJobStatus();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If lazy execution happens afterwards, the job status may not be properly populated. Have you verified all places that this won't happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check all lazy execution. For this one, "Tagging xxx" status will also populate todeduplicateRecords, but clear here will not affect other jobs, so we retain this line.

partitionFsPair.getRight().getLeft(), keyGenerator));
partitionFsPair.getRight().getLeft(), keyGenerator));
} finally {
context.clearJobStatus();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method composes a DAG and is triggered by lazy execution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove clearJobStatus of lazy execution in CommitActionExecutor, because it will clear job status finally:

} finally {
// close the write client in all cases
val asyncCompactionEnabled = isAsyncCompactionEnabled(writeClient, tableConfig, parameters, jsc.hadoopConfiguration())
val asyncClusteringEnabled = isAsyncClusteringEnabled(writeClient, parameters)
if (!asyncCompactionEnabled && !asyncClusteringEnabled) {
log.info("Closing write client")
writeClient.close()

}
return recordsAndPendingClusteringFileGroups.getLeft();
} finally {
context.clearJobStatus();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check here for lazy execution too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

};
});
} finally {
engineContext.clearJobStatus();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be subject to lazy execution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this clearJobStatus.

});
});
} finally {
engineContext.clearJobStatus();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check this one too for lazy execution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

new HoodieJsonPayload(genericRecord.toString()));
});
} finally {
context.clearJobStatus();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

new HoodieJsonPayload(genericRecord.toString()));
});
} finally {
context.clearJobStatus();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

executorOutputFs.getConf());
}, parallelism);
} finally {
context.clearJobStatus();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be at the end of the method, correct? since context.foreach also triggers Spark stages?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

String keyField = hoodieTable.getMetaClient().getTableConfig().getRecordKeyFieldProp();

List<Pair<String, HoodieBaseFile>> baseFilesForAllPartitions = HoodieIndexUtils.getLatestBaseFilesForAllPartitions(partitions, context, hoodieTable);
context.clearJobStatus();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code after here will not create new job, it seems OK to clear job status here. WDYT?

}

// Now delete partially written files
context.setJobStatus(this.getClass().getSimpleName(), "Delete all partially written files: " + config.getTableName());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This status will be overwritten in HoodieTable#deleteInvalidFilesByPartitions, so just delete it.

// perform index loop up to get existing location of records
context.setJobStatus(this.getClass().getSimpleName(), "Tagging: " + table.getConfig().getTableName());
taggedRecords = tag(dedupedRecords, context, table);
context.clearJobStatus();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check all lazy execution. For this one, "Tagging xxx" status will also populate todeduplicateRecords, but clear here will not affect other jobs, so we retain this line.

partitionFsPair.getRight().getLeft(), keyGenerator));
partitionFsPair.getRight().getLeft(), keyGenerator));
} finally {
context.clearJobStatus();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove clearJobStatus of lazy execution in CommitActionExecutor, because it will clear job status finally:

} finally {
// close the write client in all cases
val asyncCompactionEnabled = isAsyncCompactionEnabled(writeClient, tableConfig, parameters, jsc.hadoopConfiguration())
val asyncClusteringEnabled = isAsyncClusteringEnabled(writeClient, parameters)
if (!asyncCompactionEnabled && !asyncClusteringEnabled) {
log.info("Closing write client")
writeClient.close()

}
return recordsAndPendingClusteringFileGroups.getLeft();
} finally {
context.clearJobStatus();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

};
});
} finally {
engineContext.clearJobStatus();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this clearJobStatus.

new HoodieJsonPayload(genericRecord.toString()));
});
} finally {
context.clearJobStatus();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

executorOutputFs.getConf());
}, parallelism);
} finally {
context.clearJobStatus();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

return Pair.of(schemaProvider, Pair.of(checkpointStr, records));
return Pair.of(schemaProvider, Pair.of(checkpointStr, records));
} finally {
hoodieSparkContext.clearJobStatus();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one seems also lazy execution.

new HoodieJsonPayload(genericRecord.toString()));
});
} finally {
context.clearJobStatus();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@nsivabalan nsivabalan added priority:critical Production degraded; pipelines stalled release-0.14.1 labels Nov 15, 2023
@nsivabalan
Copy link
Contributor

hey @yihua and @wecharyu : can we target this for 0.14.1 ? Is this patch landable in a weeks time?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:critical Production degraded; pipelines stalled release-0.14.1 size:XL PR with lines of changes > 1000

Projects

Status: 🆕 New

Development

Successfully merging this pull request may close these issues.

4 participants