-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Databricks SQL Sensor #30204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Databricks SQL Sensor #30204
Conversation
|
@alexott @josh-fell @Taragolis @eladkal @o-nikolas as I mentioned in #28950 I have opened a new PR to divide the changes. The current PR is for Databricks SQL Sensor only. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are these tags for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They were not needed, removed them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a doc file (which would use this tag)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added doc md to the DAG object in the example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @harishkrao,
Sorry, I should have been more explicit! Those tags are meant to be used in doc readmes for each operator. You can see an example here: https://github.com/apache/airflow/blob/main/docs/apache-airflow-providers-amazon/operators/sqs.rst#sensors
There already exists a doc readme for Databricks sql Operator (link), you can just update that doc to include a section for the Sensor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the examples, let me add the section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@o-nikolas Added the section, please let me know if it looks good. Also, would be great if you can review once again. I've pushed all the changes from the feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! The only thing missing is that you have doc tags for setting up the connection (howto_sensor_databricks_connection_setup) but you don't reference them from the doc. Can you either reference them in the doc you added, or remove the tags from this example dag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching that, I will add it to the doc.
8b08ff2 to
f8867e3
Compare
|
@o-nikolas pushed changes reflecting PR comments, can you please review once more? |
13119b0 to
53ba9f8
Compare
alexott
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not sure that for sensor we need to use fetch_all_handler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| (sql_sensor) | |
| from tests.system.utils.watcher import watcher | |
| # This test needs watcher in order to properly mark success/failure | |
| # when "tearDown" task with trigger rule is part of the DAG | |
| list(dag.tasks) >> watcher() |
The (sql_sensor) expression doesn't look necessary. Also, you don't the watcher() since this DAG only has 1 task and there are no tasks which are considered "teardown" tasks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, will remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A fair callout Josh! Though, I'd personally suggest against that optimization. I think it's helpful for all system tests to follow the same pattern so that they have a familiar look and feel. It also makes it easier for this test to be expanded in the future without someone having to go dig up what the rest of the pattern is if you've removed it (if they even know to do so!).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completely agree consistency is ideal, but not all system tests would need, or even have a valid use case for, a "teardown" task; just adds a redundant task to the system test. Maybe it's the comment for the "need" for the watcher that makes this more confusing for me.
Not a hill to die on of course. I'll let @harishkrao and @alexott run with the suggestion if they choose. All good with me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose a compromise could be to remove the comment. But I also agree that it's certainly no hill to die on! It's not the end of the world either way, happy to leave it up to Harish :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with both points -- consistency and the point about not having the watcher for this example since it doesn't need it.
I have pushed a change to keep the watcher as-is for consistency and as you pointed out @josh-fell, the comment above it is confusing, indicating that the watcher is mandatory, so I have re-worded it.
Same applies to the (sql_sensor), I have added that this is not needed, but is present here as an extensible pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT about adding a test for the interaction between http_path and sql_endpoint_name based on the sensor docstring:
:param http_path: Optional string specifying HTTP path of Databricks SQL Endpoint or cluster.
**If not specified, it should be either specified in the Databricks connection's
extra parameters, or ``sql_endpoint_name`` must be specified.**
:param sql_endpoint_name: Optional name of Databricks SQL Endpoint. **If not specified, ``http_path``
must be provided as described above**, defaults to None
...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let me add it. Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@josh-fell I have addressed other items. I am working on adding unit tests for the above use case. I will push the change soon.
9b06a17 to
c225a9a
Compare
Co-authored-by: Niko Oliveira <onikolas@amazon.com>
Co-authored-by: Josh Fell <48934154+josh-fell@users.noreply.github.com>
185bcd1 to
f590087
Compare
| :param http_path: Optional string specifying HTTP path of Databricks SQL Endpoint or cluster. | ||
| If not specified, it should be either specified in the Databricks connection's | ||
| extra parameters, or ``sql_endpoint_name`` must be specified. | ||
| :param sql_endpoint_name: Optional name of Databricks SQL Endpoint. If not specified, ``http_path`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit but I assume the "sql_endpoint_name" is going to be a more common and intuitive setting, does it make sense to define it above the "http_path"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reads wrong because SQL statements aren't really run on Delta tables ...? Rephrase as: "Sensor that runs a SQL query on Databricks"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- SQL Warehouse, not endpoint.
- All Purpose cluster not cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check sql_endpoint_name parameter ... can this be better named as sql_warehouse_name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when you use a SQL Warehouse and specify Spark session params?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would spell out Databricks Runtime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Warehouse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe explicitly make this PERSONAL_ACCESS_TOKEN?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Warehouse
Databricks Delta table sensor.