-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Dynamo to S3 Sample DAG and Docs #21920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamo to S3 Sample DAG and Docs #21920
Conversation
|
@eladkal - Another short one in the DAG/Docs improvement project if you get time, since you reviewed the previous ones. |
26e5bbf to
0ac66c6
Compare
|
Failing image build; rebased |
| backup_db = DynamoDBToS3Operator( | ||
| task_id='backup_db', | ||
| dynamodb_table_name=TABLE_NAME, | ||
| s3_bucket_name=BUCKET_NAME, | ||
| # Max output file size in bytes. If the Table is too large, multiple files will be created. | ||
| file_size=1000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please extend the example/docs to include also the segment explanation?
airflow/airflow/providers/amazon/aws/transfers/dynamodb_to_s3.py
Lines 65 to 87 in 602abe8
| To parallelize the replication, users can create multiple tasks of DynamoDBToS3Operator. | |
| For instance to replicate with parallelism of 2, create two tasks like: | |
| .. code-block:: python | |
| op1 = DynamoDBToS3Operator( | |
| task_id="replicator-1", | |
| dynamodb_table_name="hello", | |
| dynamodb_scan_kwargs={ | |
| "TotalSegments": 2, | |
| "Segment": 0, | |
| }, | |
| ..., | |
| ) | |
| op2 = DynamoDBToS3Operator( | |
| task_id="replicator-2", | |
| dynamodb_table_name="hello", | |
| dynamodb_scan_kwargs={ | |
| "TotalSegments": 2, | |
| "Segment": 1, | |
| }, | |
| ..., |
(and remove the segment example from the comments)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changed should be in b0278a1
|
The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease. |
Adds a sample DAG and docs for the existing Dynamo to S3 Transfer operator.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.