Skip to content

Conversation

@moiseenkov
Copy link
Contributor

Refactored DataprocCreateBatchOperator:

  • significantly refactored the execute() method for decreasing its accumulated complexity and code duplication.
  • made the batch_id parameter optional as it is supported by API
  • made the region parameter required because (1) it is required by the API, and (2) it was already required de-facto because the operator used to raise and exception manually: raise AirflowException("Region should be set here")
  • added a specific error message to the operator logs (in both deferrable=True|False modes), so it would be more convenient for users to debug their batch jobs using the operator logs directly.

Also additionally slight refactored Dataproc system tests:

  • reduced parallelism
  • added retry to cluster creation tasks in a hope to suppress the error

This PR also rolls back changes in pre-commit hook made for Dataflow system tests. From now those changes rea not needed.

@moiseenkov moiseenkov requested review from ashb and potiuk as code owners August 16, 2024 11:10
@moiseenkov moiseenkov force-pushed the dataproc/create_batch_operator/fix_logging branch 3 times, most recently from 8897353 to b336671 Compare August 16, 2024 13:05
@moiseenkov moiseenkov force-pushed the dataproc/create_batch_operator/fix_logging branch from b336671 to 7092dcb Compare August 16, 2024 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants