-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Streaming option in SFTPToGCSOperator #48107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming option in SFTPToGCSOperator #48107
Conversation
56d3656 to
752dc79
Compare
providers/google/src/airflow/providers/google/cloud/transfers/sftp_to_gcs.py
Outdated
Show resolved
Hide resolved
providers/google/src/airflow/providers/google/cloud/transfers/sftp_to_gcs.py
Outdated
Show resolved
Hide resolved
providers/google/tests/unit/google/cloud/transfers/test_sftp_to_gcs.py
Outdated
Show resolved
Hide resolved
821b63c to
c8a756d
Compare
providers/google/src/airflow/providers/google/cloud/transfers/sftp_to_gcs.py
Outdated
Show resolved
Hide resolved
c8a756d to
747c23f
Compare
|
Please check latest verison and run CI/CD if acceptable |
|
Nice. Let's see if the tests pass. |
|
Things to fix |
4cbf627 to
ea7f8bb
Compare
|
I struggle a bit with understanding pipeline results. I found 3 problems:
Could you help me to understand what else I'm supposed to fix? |
|
You have duplicated test - see the output of static tests. I think all else is coming from that |
ea7f8bb to
3d35ad4
Compare
|
I'm really sorry, but I don't understand the error from documentation build. |
ashb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These suggestions might fix the docs build
providers/google/src/airflow/providers/google/cloud/hooks/gcs.py
Outdated
Show resolved
Hide resolved
providers/google/src/airflow/providers/google/cloud/transfers/sftp_to_gcs.py
Outdated
Show resolved
Hide resolved
|
@ashb unfortunately ci/CD still failing |
Those are the errors. Matching those to source lines is tricky sadly, but using the command @potiuk gave you in Slack should help iterate on fixing it. This example might help :param opt: Table creation option. Take the following values:
* ``'exact'`` - create table from exact csv file.
* ``'foo'`` - create FooTable from foo8 file.
* ``'bar'`` - create BarTable from bar8 file.
* ``'baz'`` - create or update partial table from columns.
* ``'qaz'`` - add bar values to FooTable
:type opt: ``str`` |
molcay
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @kkulczak,
Thank you for the contribution 👍🏼
I put some nit-picking comments to follow the convention of the Google provider, please have a look and let me know :)
providers/google/src/airflow/providers/google/cloud/transfers/sftp_to_gcs.py
Show resolved
Hide resolved
Yeah. With #48223 I am aiming to simplify it and make iteration on docs way faster (there will be followup to #48223 with more detailed instructions and guidelines. And doc building should work consistently with localvenv, breeze command and codespaces then. |
504b3b7 to
a5387a9
Compare
|
@potiuk Ready to be merged from my point of view |
SFTPToGCSOperator always copy files into Airflow worker disk and later uploads them into GCS.
This change enables to use streaming from SFTP directly to GCS, reducing need for disk space to store large files.
As Google Cloud Consulting member I would need this feature for one of my customer projects.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.