Skip to content

Conversation

@kkulczak
Copy link
Contributor

@kkulczak kkulczak commented Mar 23, 2025

SFTPToGCSOperator always copy files into Airflow worker disk and later uploads them into GCS.
This change enables to use streaming from SFTP directly to GCS, reducing need for disk space to store large files.

As Google Cloud Consulting member I would need this feature for one of my customer projects.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:providers provider:google Google (including GCP) related issues labels Mar 23, 2025
@kkulczak kkulczak force-pushed the streaming-option-in-gcs-to-gcs-operator branch 3 times, most recently from 56d3656 to 752dc79 Compare March 23, 2025 06:24
@kkulczak kkulczak force-pushed the streaming-option-in-gcs-to-gcs-operator branch 2 times, most recently from 821b63c to c8a756d Compare March 27, 2025 15:12
@kkulczak kkulczak force-pushed the streaming-option-in-gcs-to-gcs-operator branch from c8a756d to 747c23f Compare March 28, 2025 11:32
@kkulczak
Copy link
Contributor Author

Please check latest verison and run CI/CD if acceptable

@potiuk
Copy link
Member

potiuk commented Mar 31, 2025

Nice. Let's see if the tests pass.

@potiuk
Copy link
Member

potiuk commented Mar 31, 2025

Things to fix

@kkulczak kkulczak force-pushed the streaming-option-in-gcs-to-gcs-operator branch 2 times, most recently from 4cbf627 to ea7f8bb Compare March 31, 2025 08:50
@kkulczak
Copy link
Contributor Author

I struggle a bit with understanding pipeline results.

I found 3 problems:

  • trailing-space by precommit: fixed in latest push
  • docs-spelling: I don't see error message in pipeline logs which would suggest what the problem actually is. When I tried to run those scripts in GitHub codespace I've encountered import errors.
  • Unit-tests: I don't see any providers-google relegated errors. I see problems with dependencies installation, but I did not make changes to those.

Could you help me to understand what else I'm supposed to fix?

@potiuk
Copy link
Member

potiuk commented Mar 31, 2025

You have duplicated test - see the output of static tests. I think all else is coming from that

@kkulczak kkulczak force-pushed the streaming-option-in-gcs-to-gcs-operator branch from ea7f8bb to 3d35ad4 Compare March 31, 2025 11:48
@kkulczak
Copy link
Contributor Author

I'm really sorry, but I don't understand the error from documentation build.
Especially with the fact that I did not modify it.

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These suggestions might fix the docs build

@kkulczak
Copy link
Contributor Author

kkulczak commented Apr 1, 2025

@ashb unfortunately ci/CD still failing

@ashb
Copy link
Member

ashb commented Apr 1, 2025


============================== apache-airflow-providers-google ==============================
------------------------------ Error   1 --------------------
 ERROR: Unexpected indentation.

File path: apache-airflow-providers-google/_api/airflow/providers/google/cloud/transfers/sftp_to_gcs/index.rst (76)
------------------------------ Error   2 --------------------
 WARNING: Block quote ends without a blank line; unexpected unindent.

File path: apache-airflow-providers-google/_api/airflow/providers/google/cloud/transfers/sftp_to_gcs/index.rst (77)

Those are the errors. Matching those to source lines is tricky sadly, but using the command @potiuk gave you in Slack should help iterate on fixing it.

This example might help https://github.com/sphinx-doc/sphinx/issues/2768#issuecomment-233096479 (code to avoid needlessly linking issues)

   :param opt: Table creation option. Take the following values:

      * ``'exact'`` - create table from exact csv file.
      * ``'foo'`` - create FooTable from foo8 file.
      * ``'bar'`` - create BarTable from bar8 file.
      * ``'baz'`` - create or update partial table from columns.
      * ``'qaz'`` - add bar values to FooTable

   :type opt: ``str``

Copy link
Contributor

@molcay molcay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kkulczak,
Thank you for the contribution 👍🏼

I put some nit-picking comments to follow the convention of the Google provider, please have a look and let me know :)

@potiuk
Copy link
Member

potiuk commented Apr 1, 2025

Those are the errors. Matching those to source lines is tricky sadly, but using the command @potiuk gave you in Slack should help iterate on fixing it.

Yeah. With #48223 I am aiming to simplify it and make iteration on docs way faster (there will be followup to #48223 with more detailed instructions and guidelines. And doc building should work consistently with localvenv, breeze command and codespaces then.

@kkulczak kkulczak force-pushed the streaming-option-in-gcs-to-gcs-operator branch from 504b3b7 to a5387a9 Compare April 1, 2025 16:45
@kkulczak
Copy link
Contributor Author

kkulczak commented Apr 1, 2025

@potiuk Ready to be merged from my point of view

@potiuk potiuk merged commit b7603b5 into apache:main Apr 1, 2025
61 checks passed
nailo2c pushed a commit to nailo2c/airflow that referenced this pull request Apr 4, 2025
diogotrodrigues pushed a commit to diogotrodrigues/airflow that referenced this pull request Apr 6, 2025
simonprydden pushed a commit to simonprydden/airflow that referenced this pull request Apr 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:google Google (including GCP) related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants