Skip to content

Conversation

@ambika-garg
Copy link
Contributor

@ambika-garg ambika-garg commented Jun 20, 2024

Custom Operator to trigger the Power BI Dataset refresh.

Operators

PowerBIDatasetRefreshOperator

The operator triggers the Power BI dataset refresh and pushes the details of refresh in Xcom. It can accept the following parameters:

  • dataset_id: The dataset Id.
  • group_id: The workspace Id.
  • wait_for_termination: (Default value: True) Wait until the pre-existing or current triggered refresh completes before exiting.
  • force_refresh: When enabled, it will force refresh the dataset again, after pre-existing ongoing refresh request is terminated.
  • timeout: Time in seconds to wait for a dataset to reach a terminal status for non-asynchronous waits. Used only if wait_for_termination is True.
  • check_interval: Number of seconds to wait before rechecking the refresh status.

Hooks

PowerBI Hook

A hook to interact with Power BI.

  • powerbi_conn_id: Airflow Connection ID that contains the connection information for the Power BI account used for authentication.

Custom Connection form

Connection type: Power BI

You need to store following credentials:

  • client_id: The Client ID of your service principal.
  • client_secret: The Client Secret of your service principal.
  • tenant_id: The Tenant Id of your service principal.

Features

  • Xcom Integration: The Power BI Dataset refresh operator enriches the Xcom with essential fields for downstream tasks:

  1. powerbi_dataset_refresh_id: Request Id of the Dataset Refresh.
  2. powerbi_dataset_refresh_status: Refresh Status.
    • In Progress: Refresh state is unknown or a refresh is in progress.
    • Completed: Refresh successfully completed.
    • Failed: Refresh failed (details in powerbi_dataset_refresh_error).
    • Disabled: Refresh is disabled by a selective refresh.
  3. powerbi_dataset_refresh_end_time: The end date and time of the refresh (may be None if a refresh is in progress)
  4. powerbi_dataset_refresh_error: Failure error code in JSON format (None if no error)
  • External Monitoring link: The operator conveniently provides a redirect link to the Power BI UI for monitoring refreshes.

Sample DAG to use the plugin.

Check out the sample DAG code below:

from datetime import datetime

from airflow import DAG
from airflow.operators.bash import BashOperator
from operators.powerbi_refresh_dataset_operator import PowerBIDatasetRefreshOperator


with DAG(
        dag_id='refresh_dataset_powerbi',
        schedule_interval=None,
        start_date=datetime(2023, 8, 7),
        catchup=False,
        concurrency=20,
        tags=['powerbi', 'dataset', 'refresh']
) as dag:

    refresh_in_given_workspace = PowerBIDatasetRefreshOperator(
        task_id="refresh_in_given_workspace",
        dataset_id="<dataset_id",
        group_id="workspace_id",
        force_refresh = False,
        wait_for_termination = False
    )

    refresh_in_given_workspace

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@ambika-garg
Copy link
Contributor Author

ambika-garg commented Jun 21, 2024

Hi @dabla,
I'm not sure how to incorporate the msgraph operator into it to extend this operator to work in deferable mode. I would be really grateful if you could please review my PR and guide me through the same?

@ambika-garg ambika-garg changed the title Add Power BI operator that refreshes the powerbi dataset Microsoft Power BI operator to refresh the dataset Jun 22, 2024
@dabla
Copy link
Contributor

dabla commented Jun 24, 2024

Hi @dabla, I'm not sure how to incorporate the msgraph operator into it to extend this operator to work in deferable mode. I would be really grateful if you could please review my PR and guide me through the same?

Hello @ambika-garg, I've reviewed your PR and added comments in the code that could be improved, I didn't see much differences with the previous PR though, except the code has moved to the Microsoft Azure provider.

@ambika-garg
Copy link
Contributor Author

Thanks a lot, @dabla, for reviewing the PR. I was struggling with how to leverage KiotaAdapterHook in my operator, and your comments have given me some insights. I will make the suggested changes. Thanks again!

@ambika-garg
Copy link
Contributor Author

Hey @dabla, I added the Triggers class to extend the operator to works in deferable mode while extending async calls to MS graph operator. now this operator could support both async and sync mode. Can you please review it?

@ambika-garg ambika-garg force-pushed the powerbi_operator branch 2 times, most recently from 961d02d to c6d17e4 Compare July 12, 2024 16:26
ambika-garg and others added 6 commits July 15, 2024 22:20
Add Power BI integration to the provider.yaml
* Extend PowerBIHook call to msgraph operator
* Add the trigger class to enable deffering
* Enable cache token
… into one hook, also take into account proxies. This is how I would do it, it isn't finished of course but that should put you in right direction. As there is a lot of polling involved, I would just like the MSGraphOperator, make it a pure async operator but that's my opinion.
…ary logging statements (don't just log info statements to log them, those can have performance/cost implications)
@dabla
Copy link
Contributor

dabla commented Jul 18, 2024

Hey @dabla, I added the Triggers class to extend the operator to works in deferable mode while extending async calls to MS graph operator. now this operator could support both async and sync mode. Can you please review it?

Hello @ambika-garg , I've done a small change and committed int into your PR. I also commented on the PowerBI, one last change I would like to see to make it better. Code is starting to look good! Also saw you did something similar for Fabric, which we would also be interested in using. Once this PR is done we could create a new one and merged that one also into the Azure provider package. Of course this will all have to be discussed with the Airflow maintainers, best would be start a dev discussion for this in the dev list.

@ambika-garg
Copy link
Contributor Author

Hi @davidblain-infrabel,
I noticed that two of the test cases for the MSGraph hook were failing in CI, so I fixed them. You can refer to the screenshot for details.

image

@ambika-garg
Copy link
Contributor Author

Hey @dabla, Finally I am able to pass all the checks of the CI, please review this PR once, that would help us to move forward with merging.

@potiuk
Copy link
Member

potiuk commented Aug 14, 2024

@dabla @ambika-garg -> any other remaining points? It looks good to me but maybe I missed someething

@ambika-garg
Copy link
Contributor Author

ambika-garg commented Aug 14, 2024

Hey @potiuk & @dabla,
I've reviewed, tested and updated the code to the best of my ability. It looks good to me now, and I believe there are no remaining issues.

@dabla
Copy link
Contributor

dabla commented Aug 14, 2024

Hey @potiuk & @dabla, I've reviewed, tested and updated the code to the best of my ability. It looks good to me now, and I believe there are no remaining issues.

Hello @ambika-garg I see that @potiuk approved the PR so it's now a matter of time to have a good build and I suppose they will merge it.

@dabla
Copy link
Contributor

dabla commented Aug 14, 2024

@dabla @ambika-garg -> any other remaining points? It looks good to me but maybe I missed someething

Also look good to me, @ambika-garg did a great job and I went over all files yesterday and looks good to me now. Depending on the order in which we merge this PR or mine with api version, it might require a a small modification but I'm aware of it and will apply it.

@potiuk potiuk merged commit 0139083 into apache:main Aug 14, 2024
@ambika-garg ambika-garg deleted the powerbi_operator branch August 15, 2024 06:23
Artuz37 pushed a commit to Artuz37/airflow that referenced this pull request Aug 19, 2024
* Add Power BI operator that refreshes the powerbi dataset

Add Power BI integration to the provider.yaml

* Extend Power BI Operator to support async mode

* Extend PowerBIHook call to msgraph operator
* Add the trigger class to enable deffering
* Enable cache token

* refactor: Refactored PowerBIHook based on the KiotaRequestAdapterHook into one hook, also take into account proxies.  This is how I would do it, it isn't finished of course but that should put you in right direction. As there is a lot of polling involved, I would just like the MSGraphOperator, make it a pure async operator but that's my opinion.

* Refactor: To support operator's async behavior

* Add unit tests for the power bi trigger and refactor the code

* unit tests for powerbi operator

* refactor: Did some small changes to PowerBIOperator, removed unnecessary logging statements (don't just log info statements to log them, those can have performance/cost implications)

* Fixed the unit test

* Added more tests for full code coverage

* Added system test for operator

* Fix system test

* Refactor: To use more of defferable mechanism, shifted all the async code in trigger

* Fix unit tests and remove unnecessary parameters

* refactor: Initialize hosts within constructor to make sure it's initialized correctly and immutable

* fix: Changed the 'powerbi_conn_id' parameter to 'conn_id' for the dataset refresh example in PowerBI

* Remove redundant system test for powerbi dataset refresh operator and rename the existing test more meaningfully

* remove extra comments

* Fix msgraph hook tests

* Fix powerbi trigger tests

* Refactor to pass the provider[microsoft.azure] tests

* refactor: Removed commented out (dead) code

* Refactor: Remove unused parameters and dead code

---------

Co-authored-by: David Blain <david.blain@infrabel.be>
Co-authored-by: David Blain <info@dabla.be>
romsharon98 pushed a commit to romsharon98/airflow that referenced this pull request Aug 20, 2024
* Add Power BI operator that refreshes the powerbi dataset

Add Power BI integration to the provider.yaml

* Extend Power BI Operator to support async mode

* Extend PowerBIHook call to msgraph operator
* Add the trigger class to enable deffering
* Enable cache token

* refactor: Refactored PowerBIHook based on the KiotaRequestAdapterHook into one hook, also take into account proxies.  This is how I would do it, it isn't finished of course but that should put you in right direction. As there is a lot of polling involved, I would just like the MSGraphOperator, make it a pure async operator but that's my opinion.

* Refactor: To support operator's async behavior

* Add unit tests for the power bi trigger and refactor the code

* unit tests for powerbi operator

* refactor: Did some small changes to PowerBIOperator, removed unnecessary logging statements (don't just log info statements to log them, those can have performance/cost implications)

* Fixed the unit test

* Added more tests for full code coverage

* Added system test for operator

* Fix system test

* Refactor: To use more of defferable mechanism, shifted all the async code in trigger

* Fix unit tests and remove unnecessary parameters

* refactor: Initialize hosts within constructor to make sure it's initialized correctly and immutable

* fix: Changed the 'powerbi_conn_id' parameter to 'conn_id' for the dataset refresh example in PowerBI

* Remove redundant system test for powerbi dataset refresh operator and rename the existing test more meaningfully

* remove extra comments

* Fix msgraph hook tests

* Fix powerbi trigger tests

* Refactor to pass the provider[microsoft.azure] tests

* refactor: Removed commented out (dead) code

* Refactor: Remove unused parameters and dead code

---------

Co-authored-by: David Blain <david.blain@infrabel.be>
Co-authored-by: David Blain <info@dabla.be>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants