-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Microsoft Power BI operator to refresh the dataset #40356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @dabla, |
Hello @ambika-garg, I've reviewed your PR and added comments in the code that could be improved, I didn't see much differences with the previous PR though, except the code has moved to the Microsoft Azure provider. |
|
Thanks a lot, @dabla, for reviewing the PR. I was struggling with how to leverage KiotaAdapterHook in my operator, and your comments have given me some insights. I will make the suggested changes. Thanks again! |
424423c to
31681f8
Compare
|
Hey @dabla, I added the Triggers class to extend the operator to works in deferable mode while extending async calls to MS graph operator. now this operator could support both async and sync mode. Can you please review it? |
961d02d to
c6d17e4
Compare
bfe29ed to
fd6b1c2
Compare
Add Power BI integration to the provider.yaml
* Extend PowerBIHook call to msgraph operator * Add the trigger class to enable deffering * Enable cache token
… into one hook, also take into account proxies. This is how I would do it, it isn't finished of course but that should put you in right direction. As there is a lot of polling involved, I would just like the MSGraphOperator, make it a pure async operator but that's my opinion.
fd6b1c2 to
483bbf6
Compare
…ary logging statements (don't just log info statements to log them, those can have performance/cost implications)
Hello @ambika-garg , I've done a small change and committed int into your PR. I also commented on the PowerBI, one last change I would like to see to make it better. Code is starting to look good! Also saw you did something similar for Fabric, which we would also be interested in using. Once this PR is done we could create a new one and merged that one also into the Azure provider package. Of course this will all have to be discussed with the Airflow maintainers, best would be start a dev discussion for this in the dev list. |
|
Hi @davidblain-infrabel, |
|
Hey @dabla, Finally I am able to pass all the checks of the CI, please review this PR once, that would help us to move forward with merging. |
|
@dabla @ambika-garg -> any other remaining points? It looks good to me but maybe I missed someething |
Hello @ambika-garg I see that @potiuk approved the PR so it's now a matter of time to have a good build and I suppose they will merge it. |
Also look good to me, @ambika-garg did a great job and I went over all files yesterday and looks good to me now. Depending on the order in which we merge this PR or mine with api version, it might require a a small modification but I'm aware of it and will apply it. |
* Add Power BI operator that refreshes the powerbi dataset Add Power BI integration to the provider.yaml * Extend Power BI Operator to support async mode * Extend PowerBIHook call to msgraph operator * Add the trigger class to enable deffering * Enable cache token * refactor: Refactored PowerBIHook based on the KiotaRequestAdapterHook into one hook, also take into account proxies. This is how I would do it, it isn't finished of course but that should put you in right direction. As there is a lot of polling involved, I would just like the MSGraphOperator, make it a pure async operator but that's my opinion. * Refactor: To support operator's async behavior * Add unit tests for the power bi trigger and refactor the code * unit tests for powerbi operator * refactor: Did some small changes to PowerBIOperator, removed unnecessary logging statements (don't just log info statements to log them, those can have performance/cost implications) * Fixed the unit test * Added more tests for full code coverage * Added system test for operator * Fix system test * Refactor: To use more of defferable mechanism, shifted all the async code in trigger * Fix unit tests and remove unnecessary parameters * refactor: Initialize hosts within constructor to make sure it's initialized correctly and immutable * fix: Changed the 'powerbi_conn_id' parameter to 'conn_id' for the dataset refresh example in PowerBI * Remove redundant system test for powerbi dataset refresh operator and rename the existing test more meaningfully * remove extra comments * Fix msgraph hook tests * Fix powerbi trigger tests * Refactor to pass the provider[microsoft.azure] tests * refactor: Removed commented out (dead) code * Refactor: Remove unused parameters and dead code --------- Co-authored-by: David Blain <david.blain@infrabel.be> Co-authored-by: David Blain <info@dabla.be>
* Add Power BI operator that refreshes the powerbi dataset Add Power BI integration to the provider.yaml * Extend Power BI Operator to support async mode * Extend PowerBIHook call to msgraph operator * Add the trigger class to enable deffering * Enable cache token * refactor: Refactored PowerBIHook based on the KiotaRequestAdapterHook into one hook, also take into account proxies. This is how I would do it, it isn't finished of course but that should put you in right direction. As there is a lot of polling involved, I would just like the MSGraphOperator, make it a pure async operator but that's my opinion. * Refactor: To support operator's async behavior * Add unit tests for the power bi trigger and refactor the code * unit tests for powerbi operator * refactor: Did some small changes to PowerBIOperator, removed unnecessary logging statements (don't just log info statements to log them, those can have performance/cost implications) * Fixed the unit test * Added more tests for full code coverage * Added system test for operator * Fix system test * Refactor: To use more of defferable mechanism, shifted all the async code in trigger * Fix unit tests and remove unnecessary parameters * refactor: Initialize hosts within constructor to make sure it's initialized correctly and immutable * fix: Changed the 'powerbi_conn_id' parameter to 'conn_id' for the dataset refresh example in PowerBI * Remove redundant system test for powerbi dataset refresh operator and rename the existing test more meaningfully * remove extra comments * Fix msgraph hook tests * Fix powerbi trigger tests * Refactor to pass the provider[microsoft.azure] tests * refactor: Removed commented out (dead) code * Refactor: Remove unused parameters and dead code --------- Co-authored-by: David Blain <david.blain@infrabel.be> Co-authored-by: David Blain <info@dabla.be>

Custom Operator to trigger the Power BI Dataset refresh.
Operators
PowerBIDatasetRefreshOperator
The operator triggers the Power BI dataset refresh and pushes the details of refresh in Xcom. It can accept the following parameters:
dataset_id: The dataset Id.group_id: The workspace Id.wait_for_termination: (Default value: True) Wait until the pre-existing or current triggered refresh completes before exiting.force_refresh: When enabled, it will force refresh the dataset again, after pre-existing ongoing refresh request is terminated.timeout: Time in seconds to wait for a dataset to reach a terminal status for non-asynchronous waits. Used only ifwait_for_terminationis True.check_interval: Number of seconds to wait before rechecking the refresh status.Hooks
PowerBI Hook
A hook to interact with Power BI.
powerbi_conn_id: Airflow Connection ID that contains the connection information for the Power BI account used for authentication.Custom Connection form
Connection type: Power BI
You need to store following credentials:
client_id: The Client ID of your service principal.client_secret: The Client Secret of your service principal.tenant_id: The Tenant Id of your service principal.Features
Xcom Integration: The Power BI Dataset refresh operator enriches the Xcom with essential fields for downstream tasks:
powerbi_dataset_refresh_id: Request Id of the Dataset Refresh.powerbi_dataset_refresh_status: Refresh Status.In Progress: Refresh state is unknown or a refresh is in progress.Completed: Refresh successfully completed.Failed: Refresh failed (details inpowerbi_dataset_refresh_error).Disabled: Refresh is disabled by a selective refresh.powerbi_dataset_refresh_end_time: The end date and time of the refresh (may be None if a refresh is in progress)powerbi_dataset_refresh_error: Failure error code in JSON format (None if no error)External Monitoring link: The operator conveniently provides a redirect link to the Power BI UI for monitoring refreshes.
Sample DAG to use the plugin.
Check out the sample DAG code below:
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.