-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Add operator to invoke Azure-Synapse pipeline #35091
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add operator to invoke Azure-Synapse pipeline #35091
Conversation
ecfcd8d to
928fdef
Compare
CONTRIBUTING.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like you use Windows. As I could see there is LF (common posix line ending character) changes to CRLF (windows).
You need to configure your git client: https://docs.github.com/en/get-started/getting-started-with-git/configuring-git-to-handle-line-endings?platform=windows
In additional it might required to make additional changes into your editor, Airflow uses .editorconfig configuration, so if your editor support it (e.g. PyCharm support it out of the box), you could set it up by follow instructions: https://editorconfig.org/
In addition it might required to run Static Checks against your changes, see Instruction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3243f2b to
d9b1d96
Compare
84fe0b8 to
d50e553
Compare
tests/providers/microsoft/azure/hooks/test_azure_synapse_pipeline.py
Outdated
Show resolved
Hide resolved
tests/providers/microsoft/azure/operators/test_azure_synapse.py
Outdated
Show resolved
Hide resolved
10a03a1 to
30f6a3d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the motivation for a new file?
can't the hook be in the existed synapse.py?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My motivation is to keep code modular and easy to maintain. The synapse.py file deals with SparkClient and its functions, where as the synapse_pipeline.py file deals with ArtifactsClient and its functions. So both classes have different dependencies.
Although I had 2 options:
- Include every function of
AzureSynapsePipelineHookinAzureSynapseHookclass, But that would add on lots of conditional statements, difficulty in dependency management and separate naming conventions for functions with same functionality but different parameters and return values like(get_conn()) function, further leading to confusions when code is extended. - Make Separate classes for each type of service, where each class has its own set of dependencies and responsibility. Also, this approach provides consistent naming convention for functions within each class, improving code readability.
I went with 2nd approach. That's why I made a new file. Also, I think making new hook class with separate dependencies is easier way to code extension .
Please let me know your opinion on this, since you have diverse background on development.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Airflow we use mixed governance model for providers which means that the company behind the provider have shared responsibility over it. Sadly in case of Microsoft the Azure team is not involved (Unlike Google and Aws) as a result we have no way of knowing what is Azure stand point of this subject.
In that case my point of view is to "fallback" to the common practice of how we do things in other providers - you can see for example the effort we did in Amazon/Aws #20139 to have organized files where the motivation is ease of find.
So in the case in front of us and given the information you shared I still think we should use synapse.py.
I welcome Azure team to be more involved in the project should they decide to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through the #20139 , understood the general protocols followed in Airflow. So, I'll be moving AzureSynapsePipelineHook class in synapse.py. Also, I'll discuss internally if Azure team can involve more in Airflow project.
0df0a8c to
a9493f4
Compare
10a03a1 to
c644d8d
Compare
* Add a hook to interact with Azure Synapse Analytics
* Add a operator to trigger Synapse pipeline from DAG and operator link
* Add unit tests for operator and hook
* Update provider.yaml to support new operator, operator link and hook
* Update provider_dependencies to install azure-synapse-artifacts
* Add Mock Synapse Workspace URL
* Set Default wait_for_termination to False
* Fix all imports for the class
* Remove the file from provider.yaml
* Delete the synapse_pipeline.py file
a599393 to
24d3626
Compare
|
Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions. |

This PR adds a custom operator AzureSynapsePipelineOperator to the Microsoft provider. This operator simplifies the execution of Azure Synapse pipelines, allowing Airflow users to trigger Azure Synapse Pipelines directly from their DAGs..
Usage example: