-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Add various Vertica connection parameters #32089
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
005338d to
3ecde69
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we fetch extra dict by calling conn.extra_dejson once rather than calling it repeatedly? It makes it look very wordy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I based my code on what it was done for the mysql hook, I didn't realize that extra_dejson was property that load json every time you call it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not that big of a performance concern since it would happen only while adding the connection (which is one time process). But the code looks very wordy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is False the right default value for log_level. From what I can see from here, it seems to be logging.WARNING.
The same question for log_path and many others...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the .get(<option_name>, False) is not used for declare a default value but for test the presence of the option key in extra dictionnary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mb seems like i presumed something i shouldn't. Looks good here!
72b5449 to
59f2b24
Compare
|
I tried to simplify code. I added a better handling for log_level parameter (add possibility to pass log level as "warning" and not just 30). Correction for request_complex_types which is true by default (impossible to pass false with the previous version). |
c58f82c to
06a13b2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason for going for a camel case instead of snake? It should be latter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my bad, the habit of coding in javascript/c#. I've fixed it
|
We need unit tests (to avoid regressions) and documentation (so that users have a chance to discover those features). It makes very little sense to implement such feature without exposing the new features to the users. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the purpose of this test? Can you please add docstring here. It will be helpful for everyone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the purpose of this test? Can you please add docstring here. It will be helpful for everyone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a docstring, this test is for parameters that can be passed as string or as int/bool, the first test test them as int/bool. The second as string. Since I don't really know how the mocking works I was not sure that I can make both test in one function and maybe it's possible to chain both case in on test like this :
std_options = ["session_label", "kerberos_host_name", "kerberos_service_name", "unicode_error", "workload", "ssl"]
for so in std_options:
extra_dict.update({ so: so})
bck_server_node = ["1.2.3.4", "4.3.2.1"]
conn_timeout = 30
log_lvl = 40
extra_dict.update({"backup_server_node":bck_server_node})
extra_dict.update({"connection_timeout":conn_timeout})
extra_dict.update({"log_level":log_lvl})
self.db_hook.get_conn()
assert mock_connect.call_count == 1
args, kwargs = mock_connect.call_args
assert args == ()
for bo in bool_options:
assert kwargs[bo] == True
assert kwargs["request_complex_types"] == False
for so in std_options:
assert kwargs[so] == so
assert bck_server_node[0] in kwargs["backup_server_node"]
assert bck_server_node[1] in kwargs["backup_server_node"]
assert kwargs["connection_timeout"] == conn_timeout
assert kwargs["log_level"] == log_lvl
assert kwargs["log_path"] is None
for bo in bool_options:
extra_dict.update({ bo: "True"})
extra_dict.update({ "request_complex_types": "False"})
extra_dict.update({"log_level":"Error"})
self.db_hook.get_conn()
assert mock_connect.call_count == 1 # 2 ??
args, kwargs = mock_connect.call_args
assert args == ()
for bo in bool_options:
assert kwargs[bo] == True
assert kwargs["request_complex_types"] == False
assert kwargs["log_level"] == logging.ERROR
assert kwargs["log_path"] is None
d4b3733 to
362a70c
Compare
|
Needs conflict resolving. |
48dfef9 to
332f2b6
Compare
|
Conflct should be resolved |
|
Let's see. |
9faa8ee to
9e59bdb
Compare
|
ok, I should have fix static checks and problem with test. May I ask why those checks doesn't run/return error when I push a commit ? (I may have fixed this issues faster if it didn't disappear every time I synced my branch) |
It's not a question to us - rather to how Github works. You can still see them in past runs - you can find your actions and results of it in "actions" tab - you can filter by your branch name for example. Also I think you can see them in "previous attempts when you go to details of new run:
Also the bit of difficulty for you personally is that the policy for all ASF project is that workflow of "new contributors" (so people who never committed to the project) have to be manually approved by committers -> this is to prevent various kind of abuse (for example using the resources that the ASF has donated for free by GitHub abused by bad actors who create fake new accounts and try to mine bitcoin by opening multiple pull requests with code that mines cryptocurrency. This has already been a pattern actively abused by bad actors so it is a real scenario of abuse that has been actively exploited: https://github.blog/2021-04-22-github-actions-update-helping-maintainers-combat-bad-actors/ But you are completely covered. You can run whatever the CI is doing with Breeze. https://github.com/apache/airflow/blob/main/STATIC_CODE_CHECKS.rst#running-static-code-checks-via-breeze -> shows the way how you can reproduce locally all tests |
|
All the other actions you see in CI can also be generally reproduced locally with Breeze - this is a one-stop CLI that allows you to run all kinds of test and steps locally and fix them before the push. For example you can see that docs building fails for you - but when you look at the command that is run, you can run the same breeze command and generally you should be able to reproduce it locally.
There are docs for Breeze in https://github.com/apache/airflow/blob/main/BREEZE.rst - explaining installation steps and all commands with help and screenshots. Also it supports auto-completion and Also we have a rather comprehensive example on how to run the suite of tests locally (we invested quite a lot of effort so that the contributors can easily locally reproduce all CI jobs. For example runnig all kind of unit/integration/system tests is described in https://github.com/apache/airflow/blob/main/TESTING.rst All those docs are nicely linked from "CONTRIBUTING.rst" which has step-by-step instructions and helpful guidelines on various parts of the development environment (including docs on how to setup tests in the IDEs of your choice). I hope that will be helpful. I'd usually recommend to run those checks locally if we know they are failing, and only pushing next attempt after you fixed them locally - that's what I usually do. |
|
BTW. for static checks - what we recommend is to install pre-commit. This is what the output of failed static checks recommends - then static checks will be run automatically when you commit or amend a change (and optimized only to your changes). Also All those tools are there specifically to be able to reproduce what CI does, locally so that you can fix them locally before you push (and in your case wait for someone to approve your changes). |
|
Does it answer the concerns you have @darkag ? :D |
|
Ok it's more clear for me. Really thanks for your response (and patience). |
No worries. I understand frustration - you want to add what seems a few lines of code yet you need to put a lot of effort to meke them at "community standards' and you need to learn a lot to see how you can implement the standards easily. Airflow dev env is, complex, to say the least, but it's not complicated IMHO. It takes time to see all the moving pieces and understand which one you can use, and find out that it is metciculously documented with lots of examples and all the steps of CI/dev can be easily reproduced locally. So especially when you implement a change that involves many of the parts failing you just need to take a bit more time to fix them all. So this is perfectly fine to need a bit of guidance sometimes if you are used to just submitting the code and merging it. We have many more steps that need to succeed in order to change being mergable - and once you set it up (pre-commit, breeze etc.) it's super easy, but the initial steps are not always obvious. The reason why it is so - we have > 2500 - most of them do not know all the expectations we have, and we have limited number of those who review the code - so our CI is the "first" reviewer and the complex set of tests and checks takes care of telling the contributors what needs to be fixed - instead of people having to do that. The result of it that every single contributor and commiter can rely on those things (formatting, correctness of typing, static checks, documentation building without warning etc. etc. and you have very small chance by adding a change to break work of other 2499 contributors (so to speak). |
|
ok this time it should be good. What bothered me the most is not to have to match standard but the fact that it took me some time to understand that the tests that failed weren't the ones that are runned automatically just after my commit (which make me believe that errors were solved by just updating branch). |
Add binding from extra to vertica connection parameters
Simplification of code, better handling of log level parameter, correction for "request_complex_types" which is true by default (it would have been impossible to pass false with the previous version)
Fix useless spaces, pass options only if present in extra, loop over options to reduce code size
512cc37 to
d588b94
Compare
|
Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions. |
|
🎉 🎉 🎉 🎉 🎉 |


Give the possibility to pass vertica connection parameters using extra. Especially it gives the possibility to enable load balancing (Vertica requires that load balancing is enabled client and server side)