-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Closed
Labels
area:providersgood first issuekind:bugThis is a clearly a bugThis is a clearly a bugkind:featureFeature RequestsFeature Requestsprovider:amazonAWS/Amazon - related issuesAWS/Amazon - related issues
Description
Apache Airflow Provider(s)
amazon
Versions of Apache Airflow Providers
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.7.2/constraints-3.11.txt"
apache-airflow-providers-slack==8.1.0
apache-airflow-providers-amazon==8.7.1
apache-airflow-providers-jdbc==4.0.2
apache-airflow-providers-datadog==3.3.2
tableauserverclient==0.25
apache-airflow-providers-mysql==5.3.1
apache-airflow-providers-neo4j==3.3.3
aiobotocore==2.6.0
Apache Airflow version
2.7.2
Operating System
MacOS 14.2.1
Deployment
Amazon (AWS) MWAA
Deployment details
No response
What happened
when using the operator S3FileTransformOperator and submitting an s3 select expression, it can only read and write CSV's
(Not a sure if it's a bug or a feature request- please move if needed)
What you think should happen instead
The boto3 client can accept more options such as gzip, bzip and more types such as parquet and JSON, so the operator should accept the following params too (as they already exist in the s3 hook @ select_key method):
input_serialization
output_serialization
How to reproduce
This is not working:
transform_parquet = S3FileTransformOperator(
task_id='transform_parquet',
source_s3_key='s3://<bucket>/<prefix>/file.snappy.parquet',
dest_s3_key='s3://<bucket>/<prefix>/file.csv',
select_expression="SELECT * FROM s3object s LIMIT 5",
replace=True
)
This is working:
transform_csv = S3FileTransformOperator(
task_id='transform_csv',
source_s3_key='s3://<bucket>/<prefix>/file.csv',
dest_s3_key='s3://<bucket>/<other_prefix>/file.csv',
select_expression="SELECT * FROM s3object s LIMIT 5",
replace=True
)
Anything else
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Labels
area:providersgood first issuekind:bugThis is a clearly a bugThis is a clearly a bugkind:featureFeature RequestsFeature Requestsprovider:amazonAWS/Amazon - related issuesAWS/Amazon - related issues