-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Add incremental export and cross account export functionality in DynamoDBToS3Operator
#41304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add incremental export and cross account export functionality in DynamoDBToS3Operator
#41304
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
|
…ty-for-DynamoDBToS3Operator
…ty-for-DynamoDBToS3Operator
airflow/providers/fab/auth_manager/cli_commands/user_command.py
Outdated
Show resolved
Hide resolved
DynamoDBToS3Operator
…ty-for-DynamoDBToS3Operator
…ty-for-DynamoDBToS3Operator
…ty-for-DynamoDBToS3Operator
…ty-for-DynamoDBToS3Operator
…ty-for-DynamoDBToS3Operator
…ty-for-DynamoDBToS3Operator
…ty-for-DynamoDBToS3Operator
…ty-for-DynamoDBToS3Operator
vincbeck
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work!
|
Static checks are failing. Could you please run them |
|
I have ran and update the import so that the Static checks should pass. However, I have noticed that documentation build and spell checks have failed in the GitHub Action for commit 0a195aa but are not related to my changes. Should I try to fix those now or maybe open a different Pull Request for it? |
…ty-for-DynamoDBToS3Operator
I think it has been fixed by #41449, could you please update your fork and update this branch? It should solve the issue |
…ty-for-DynamoDBToS3Operator
…ty-for-DynamoDBToS3Operator
…ty-for-DynamoDBToS3Operator
Looks like it is still failing. 😞 |
|
I resolved the doc building issue. Now, there is one unit test (Kubernetes provider) that is failing. It is definitely not related to this PR |
|
But I am not sure what is going on either .... |
yeah it is strange. I will also try to look at it tomorrow when I have time |
|
It has been fixed in #41500, please rebase your PR :) |
…ty-for-DynamoDBToS3Operator
All checks have passed! 🎉 |
|
Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions. |
ferruzzi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change has caused the test to start failing. I have found at least three issues; two I have left the fix for in the comments, the third I'm not sure yet. @Ghoul-SSZ - can you have a look and try to make the fixes?
| export_time, | ||
| backup_db_to_point_in_time, | ||
| backup_db_to_point_in_time_full_export, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Insert latest_export_time between these two lines
export_time,
latest_export_time
backup_db_to_point_in_time_full_export,
Without it listed in the chain(), it fires off at the very beginning and causes the test to fail because env_id isn't populated yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in #41517
| self.point_in_time_export = point_in_time_export | ||
| self.export_time = export_time | ||
| self.export_format = export_format | ||
| self.export_table_to_point_in_time_kwargs = export_table_to_point_in_time_kwargs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to be
self.export_table_to_point_in_time_kwargs = export_table_to_point_in_time_kwargs or {}
otherwise if None is passed in, Line 199 fails because None can't be unpacked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in #41517
| "ExportFromTime": export_time, | ||
| "ExportToTime": latest_export_time, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an issue I have run out of time for today. with the other two fixes applied locally, the test keeps failing and I've tried a number of solutions. Here's what I've found so far:
error: "from" time can't be equal to "to" time
tried: so I set "from" to export_time - 1_minute
error: "from" can't be less than table creation time
tried: revert the above and set "to" to be latest_export_time + 1_minute
error: Difference between "from" time and "to" is less than 15 minutes
tried: revert the above and set "to" to be latest_export_time + 16_minutes
error: "to" time cannot be greater than the current time
tried: revert the above and set "to" time to utcnow()
error: this gets set at parse time and "to" time is now in the past
We will either need to figure out a way to make this happy or set that task to not run using an unreachable trigger_rule or a branch operator to skip it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"error: Difference between "from" time and "to" is less than 15 minutes" is the most scary. That means we need to wait for 15 minutes so that we can create the incremental export. In that case we dont want to run it as part of the system test, so either we delete it from the system test, or we, somehow, skip it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I just put up #41546 which skips that task. That way we can still have the snippet in the docs.
…amoDBToS3Operator` (apache#41304) --------- Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com>
…amoDBToS3Operator` (apache#41304) --------- Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com>
Problem
scan" or "full exports using boto3'sexport_table_to_point_in_time". When usingexport_table_to_point_in_timemethod, you can also use "Incremental Export" instead of "Full Export" if you only wish for the delta of the data changes between the specified period. However this functionality is currently not supported in the DynamoDBToS3Operator.scanmethod, you can specify 2 aws connection to facilitate this need. But forexport_table_to_point_in_time, we would need an additional argument (s3_bucket_owner) to allow cross account export.Proposed Changes
export_type,incremental_export_from_time,incremental_export_to_time,incremental_export_view_typearguments to allow Incremental Export.s3_bucket_ownerarguments to allow cross-account exportcloses: #40737
related: #40737
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.