Skip to content

Conversation

@josh-fell
Copy link
Contributor

@josh-fell josh-fell commented Jan 20, 2022

This PR adds a new provider to interface with dbt Cloud via the dbt Cloud API. The provider includes:

  • DbtCloudHook which implements an abstraction for almost all of the available endpoints in the dbt Cloud API and inherits from the existing HttpHook.
  • Two operators, DbtCloudRunJobOperator and DbtCloudGetJobRunArtifactOperator, which triggers a dbt Cloud job and downloads a run artifact, respectively.
  • A sensor, DbtCloudJobRunSensor, to poll status of a specific dbt Cloud job run.
  • An operator link to allow users to navigate directly to the triggered dbt Cloud job run for monitoring.
  • A test_connection() method in the DbtCloudHook for users to test connections prior to executing DAGs.

^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@mik-laj
Copy link
Member

mik-laj commented Jan 22, 2022

@nathaniel-may @emmyoop @leahwicz Can you look at it?

@josh-fell josh-fell force-pushed the dbt-cloud-provider branch 2 times, most recently from 3b82ff8 to e00536e Compare January 31, 2022 17:23
@sungchun12
Copy link

@nathaniel-may @emmyoop @leahwicz Can you look at it?

@mik-laj These people are dbt-core contributors and do not actively work on the dbt Cloud API.

What did you want someone from dbt Labs to look into specifically for this pull request?

@mik-laj
Copy link
Member

mik-laj commented Feb 2, 2022

@sungchun12 DBT is used quite often with Apache Airflow, so I believe it is worth asking for reviews to give you an opportunity to share your thoughts on this contribution. The earlier the changes are introduced, the lower the cost of their implementation.

We also sometimes do not know all API features as they may not be widely promoted, but from your perspective they are important. For example, all requests to Google API contain client info, which allows them to track API usage by a specific solution

@property
def client_info(self) -> ClientInfo:
"""
Return client information used to generate a user-agent for API calls.
It allows for better errors tracking.
This object is only used by the google-cloud-* libraries that are built specifically for
the Google Cloud. It is not supported by The Google APIs Python Client that use Discovery
based APIs.
"""
client_info = ClientInfo(client_library_version='airflow_v' + version.version)
return client_info

Snowflake provider provides have a similar feature:
"application": os.environ.get("AIRFLOW_SNOWFLAKE_PARTNER", "AIRFLOW"),

On the other hand, it can also be a signal for you that there is a new feature in the third project and you can promote it, e.g. by updating the documentation https://docs.getdbt.com/docs/running-a-dbt-project/running-dbt-in-production#using-airflow

Does it make sense to you?

@sungchun12
Copy link

sungchun12 commented Feb 3, 2022

@mik-laj thanks a bunch for the follow up. I'm in conversations internally with the dbt Labs engineers to specify a User-Agent in the headers of the dbt Cloud API requests for tracking.

I'll submit a pull request for the dbt Labs docs after this is merged!

@josh-fell I'll follow up with you personally on the above.

@potiuk
Copy link
Member

potiuk commented Feb 6, 2022

Would you please rebase @josh-fell ?

@josh-fell
Copy link
Contributor Author

... specify a User-Agent in the headers of the dbt Cloud API requests for tracking.

@sungchun12 @mik-laj This is implemented now if you'd like to take a look. I also refactored the operator link slightly to not create ad hoc TaskInstances to align with #21285.

Copy link

@sungchun12 sungchun12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the change Josh. I verified it's tracking on the dbt Cloud side in our logs!

@josh-fell
Copy link
Contributor Author

@mik-laj Are there any other nuances/features I should try to incorporate in the provider? The last suggestion was a great one.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a great DBT provider start. Just in time for Feb release.

@potiuk potiuk merged commit e782b37 into apache:main Feb 27, 2022
@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Feb 27, 2022
@sungchun12
Copy link

@josh-fell amazing work, thanks for driving this to the finish line!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-tools area:providers changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) full tests needed We need to run full set of tests for this PR to merge kind:documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants