Skip to content

Conversation

@harishkrao
Copy link
Contributor

Databricks Delta table sensor.

@harishkrao
Copy link
Contributor Author

@alexott @josh-fell @Taragolis @eladkal @o-nikolas as I mentioned in #28950 I have opened a new PR to divide the changes. The current PR is for Databricks SQL Sensor only.
It would be great if you can review the changes. Thank you!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these tags for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They were not needed, removed them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a doc file (which would use this tag)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added doc md to the DAG object in the example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @harishkrao,

Sorry, I should have been more explicit! Those tags are meant to be used in doc readmes for each operator. You can see an example here: https://github.com/apache/airflow/blob/main/docs/apache-airflow-providers-amazon/operators/sqs.rst#sensors

There already exists a doc readme for Databricks sql Operator (link), you can just update that doc to include a section for the Sensor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the examples, let me add the section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@o-nikolas Added the section, please let me know if it looks good. Also, would be great if you can review once again. I've pushed all the changes from the feedback.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! The only thing missing is that you have doc tags for setting up the connection (howto_sensor_databricks_connection_setup) but you don't reference them from the doc. Can you either reference them in the doc you added, or remove the tags from this example dag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching that, I will add it to the doc.

@harishkrao
Copy link
Contributor Author

@o-nikolas pushed changes reflecting PR comments, can you please review once more?

@harishkrao harishkrao force-pushed the db-sql-sensor branch 4 times, most recently from 13119b0 to 53ba9f8 Compare March 25, 2023 05:36
Copy link
Contributor

@alexott alexott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sure that for sensor we need to use fetch_all_handler

Comment on lines 68 to 80
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(sql_sensor)
from tests.system.utils.watcher import watcher
# This test needs watcher in order to properly mark success/failure
# when "tearDown" task with trigger rule is part of the DAG
list(dag.tasks) >> watcher()

The (sql_sensor) expression doesn't look necessary. Also, you don't the watcher() since this DAG only has 1 task and there are no tasks which are considered "teardown" tasks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, will remove it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A fair callout Josh! Though, I'd personally suggest against that optimization. I think it's helpful for all system tests to follow the same pattern so that they have a familiar look and feel. It also makes it easier for this test to be expanded in the future without someone having to go dig up what the rest of the pattern is if you've removed it (if they even know to do so!).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely agree consistency is ideal, but not all system tests would need, or even have a valid use case for, a "teardown" task; just adds a redundant task to the system test. Maybe it's the comment for the "need" for the watcher that makes this more confusing for me.

Not a hill to die on of course. I'll let @harishkrao and @alexott run with the suggestion if they choose. All good with me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose a compromise could be to remove the comment. But I also agree that it's certainly no hill to die on! It's not the end of the world either way, happy to leave it up to Harish :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with both points -- consistency and the point about not having the watcher for this example since it doesn't need it.
I have pushed a change to keep the watcher as-is for consistency and as you pointed out @josh-fell, the comment above it is confusing, indicating that the watcher is mandatory, so I have re-worded it.
Same applies to the (sql_sensor), I have added that this is not needed, but is present here as an extensible pattern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about adding a test for the interaction between http_path and sql_endpoint_name based on the sensor docstring:

:param http_path: Optional string specifying HTTP path of Databricks SQL Endpoint or cluster.
            **If not specified, it should be either specified in the Databricks connection's
            extra parameters, or ``sql_endpoint_name`` must be specified.**
:param sql_endpoint_name: Optional name of Databricks SQL Endpoint. **If not specified, ``http_path``
            must be provided as described above**, defaults to None
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me add it. Thank you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@josh-fell I have addressed other items. I am working on adding unit tests for the above use case. I will push the change soon.

@harishkrao harishkrao force-pushed the db-sql-sensor branch 2 times, most recently from 9b06a17 to c225a9a Compare March 28, 2023 16:11
:param http_path: Optional string specifying HTTP path of Databricks SQL Endpoint or cluster.
If not specified, it should be either specified in the Databricks connection's
extra parameters, or ``sql_endpoint_name`` must be specified.
:param sql_endpoint_name: Optional name of Databricks SQL Endpoint. If not specified, ``http_path``

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit but I assume the "sql_endpoint_name" is going to be a more common and intuitive setting, does it make sense to define it above the "http_path"?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads wrong because SQL statements aren't really run on Delta tables ...? Rephrase as: "Sensor that runs a SQL query on Databricks"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • SQL Warehouse, not endpoint.
  • All Purpose cluster not cluster.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check sql_endpoint_name parameter ... can this be better named as sql_warehouse_name?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when you use a SQL Warehouse and specify Spark session params?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would spell out Databricks Runtime

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warehouse

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe explicitly make this PERSONAL_ACCESS_TOKEN?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warehouse

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants