Skip to content

Conversation

@utkarsharma2
Copy link
Contributor

@utkarsharma2 utkarsharma2 commented May 11, 2023

There was a missing s3_key_prefix in the system test for DynamoDBToS3Operator and the export_time date format was not correct.

@utkarsharma2 utkarsharma2 changed the title Add missing s3_key_prefix Add missing DynamoDBToS3Operator system test May 11, 2023
file_size=1000,
s3_bucket_name=bucket_name,
export_time=datetime(year=2023, month=4, day=10),
export_time=datetime.now(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change significant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because in system tests we just want to make sure the DynamoDBToS3Operator is successfully able to call export_table_to_point_in_time with the minimum time window. So the exported data is small. But I'm open to any suggestions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we don't get weird API behaviour with a relatively instantaneous export_time seems fine to me. I'm assuming you've tested this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have tested it on local and I have been developing using datetime.now() only, no issues so far.

s3_bucket_name=bucket_name,
export_time=datetime(year=2023, month=4, day=10),
export_time=datetime.now(),
s3_key_prefix=f"{S3_KEY_PREFIX}-3-",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we hard-coding the -3- part?

Copy link
Contributor Author

@utkarsharma2 utkarsharma2 May 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason was just following other examples in the file

I think it's good to have it since it isolates this test case from other test cases in the file. Otherwise, there can be some side-effects if two testcase uses the same prefix. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can get behind that reasoning. I didn't see the other two in the github "changes" diff, so I didn't catch the pattern.

@o-nikolas
Copy link
Contributor

I re-triggered the tests since the wait for ci images timed out.

Copy link
Contributor

@ferruzzi ferruzzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Third time's the charm? :P

@o-nikolas o-nikolas merged commit 37d8d62 into apache:main May 12, 2023
@ferruzzi
Copy link
Contributor

negative.



ERROR    airflow.executors.debug_executor.DebugExecutor:debug_executor.py:95 Failed to execute task: An error occurred (InvalidExportTimeException) when calling the ExportTableToPointInTime operation: Export Time is invalid.
--
Traceback (most recent call last):
File "/opt/airflow/airflow/executors/debug_executor.py", line 89, in _run_task
ti.run(job_id=ti.job_id, **params)
File "/opt/airflow/airflow/utils/session.py", line 76, in wrapper
return func(*args, session=session, **kwargs)
File "/opt/airflow/airflow/models/taskinstance.py", line 1798, in run
mark_success=mark_success, test_mode=test_mode, job_id=job_id, pool=pool, session=session
File "/opt/airflow/airflow/utils/session.py", line 73, in wrapper
return func(*args, **kwargs)
File "/opt/airflow/airflow/models/taskinstance.py", line 1485, in _run_raw_task
self._execute_task_with_callbacks(context, test_mode)
File "/opt/airflow/airflow/models/taskinstance.py", line 1636, in _execute_task_with_callbacks
result = self._execute_task(context, task_orig)
File "/opt/airflow/airflow/models/taskinstance.py", line 1704, in _execute_task
result = execute_callable(context=context)
File "/opt/airflow/airflow/providers/amazon/aws/transfers/dynamodb_to_s3.py", line 142, in execute
self._export_table_to_point_in_time()
File "/opt/airflow/airflow/providers/amazon/aws/transfers/dynamodb_to_s3.py", line 158, in _export_table_to_point_in_time
ExportFormat=self.export_format,
File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 530, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 960, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.errorfactory.InvalidExportTimeException: An error occurred (InvalidExportTimeException) when calling the ExportTableToPointInTime operation: Export Time is invalid

@utkarsharma2
Copy link
Contributor Author

utkarsharma2 commented May 13, 2023

@ferruzzi I'm not able to reproduce this on my local below code is working for me. I'm using breeze to run my test which is using docker image - ghcr.io/apache/airflow/main/ci/python3.7:latest

from datetime import datetime

from airflow import DAG
from airflow.providers.amazon.aws.transfers.dynamodb_to_s3 import DynamoDBToS3Operator

with DAG(
    dag_id='example_export_dynamodb',
    schedule_interval=None,
    start_date=datetime(2021, 1, 1),
    tags=['example'],
    catchup=False,
) as dag:
    dynamodb_to_s3_operator = DynamoDBToS3Operator(
        task_id="dynamodb_to_s3",
        dynamodb_table_name="test",
        s3_bucket_name="tmp9",
        file_size=4000,
        export_time=datetime.now(),
        s3_key_prefix="test"
    )

Screenshot 2023-05-13 at 8 46 53 PM

Can you please share the details of environment in which you are running tests? I'll try to reproduce this on my local.

@vincbeck
Copy link
Contributor

vincbeck commented May 15, 2023

Our environment is using Breeze in order to run system tests. We are building the last image with breeze ci-image build and then run the test using the command breeze testing tests <filename> --system amazon.

But given the exception I dont think it is question of breeze, the service Dynamodb is complaining that export_time is invalid. My guess is datetime.now() returns in some cases (depending on the timezone) a date in the future and in some cases a date in the present/past. I might recommend to use DateTime.UtcNow() instead with some buffer (e.g. exclude one week maybe?)

@potiuk
Copy link
Member

potiuk commented May 15, 2023

Our environment is using Breeze in order to run system tests. We are building the last image with breeze ci-image build and then run the test using the command breeze testing tests <filename> --system amazon.

But given the exception I dont think it is question of breeze, the service Dynamodb is complaining that export_time is invalid. My guess is datetime.now() returns in some cases (depending on the timezone) a date in the future and in some cases a date in the present/past. I might recommend to use DateTime.UtcNow() instead with some buffer (e.g. exclude one week maybe?)

Oh yeah. I've seen it happen in the past.

@utkarsharma2 utkarsharma2 changed the title Add missing DynamoDBToS3Operator system test Add missing fields in DynamoDBToS3Operator system test May 16, 2023
@utkarsharma2
Copy link
Contributor Author

utkarsharma2 commented May 16, 2023

@vincbeck I think the datetime.now() return date which is not timezone aware, so for me in India - IST this will always give time which is ahead of UTC and If the export API assumes UTC time, it should have failed for me every time(which it didn't), right? or I'm missing something?

@utkarsharma2
Copy link
Contributor Author

@vincbeck @potiuk Added a PR - #31307 please review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants