Skip to content

Conversation

@ferruzzi
Copy link
Contributor

@ferruzzi ferruzzi commented Mar 1, 2022

Adds a sample DAG and docs for the existing Dynamo to S3 Transfer operator.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@ferruzzi ferruzzi requested a review from mik-laj as a code owner March 1, 2022 22:16
@ferruzzi ferruzzi closed this Mar 3, 2022
@ferruzzi ferruzzi reopened this Mar 3, 2022
@ferruzzi
Copy link
Contributor Author

ferruzzi commented Mar 3, 2022

@eladkal - Another short one in the DAG/Docs improvement project if you get time, since you reviewed the previous ones.

@ferruzzi ferruzzi force-pushed the ferruzzi/docs-update/dynamo-to-s3 branch from 26e5bbf to 0ac66c6 Compare March 3, 2022 18:54
@ferruzzi
Copy link
Contributor Author

ferruzzi commented Mar 3, 2022

Failing image build; rebased

Comment on lines +36 to +41
backup_db = DynamoDBToS3Operator(
task_id='backup_db',
dynamodb_table_name=TABLE_NAME,
s3_bucket_name=BUCKET_NAME,
# Max output file size in bytes. If the Table is too large, multiple files will be created.
file_size=1000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please extend the example/docs to include also the segment explanation?

To parallelize the replication, users can create multiple tasks of DynamoDBToS3Operator.
For instance to replicate with parallelism of 2, create two tasks like:
.. code-block:: python
op1 = DynamoDBToS3Operator(
task_id="replicator-1",
dynamodb_table_name="hello",
dynamodb_scan_kwargs={
"TotalSegments": 2,
"Segment": 0,
},
...,
)
op2 = DynamoDBToS3Operator(
task_id="replicator-2",
dynamodb_table_name="hello",
dynamodb_scan_kwargs={
"TotalSegments": 2,
"Segment": 1,
},
...,

(and remove the segment example from the comments)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changed should be in b0278a1

@github-actions github-actions bot added the okay to merge It's ok to merge this PR as it does not require more tests label Mar 8, 2022
@github-actions
Copy link

github-actions bot commented Mar 8, 2022

The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers kind:documentation okay to merge It's ok to merge this PR as it does not require more tests provider:amazon AWS/Amazon - related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants