Add various Vertica connection parameters #32089

darkag · 2023-06-23T09:18:50Z

Give the possibility to pass vertica connection parameters using extra. Especially it gives the possibility to enable load balancing (Vertica requires that load balancing is enabled client and server side)

boring-cyborg · 2023-06-23T09:18:54Z

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
Be sure to read the Airflow Coding style.
Apache Airflow is a community-driven project and together we are making it better 🚀.
In case of doubts contact the developers at:
Mailing List: dev@airflow.apache.org
Slack: https://s.apache.org/airflow-slack

Adaverse · 2023-06-23T09:54:27Z

airflow/providers/vertica/hooks/vertica.py

Can we fetch extra dict by calling conn.extra_dejson once rather than calling it repeatedly? It makes it look very wordy

Yes, I based my code on what it was done for the mysql hook, I didn't realize that extra_dejson was property that load json every time you call it.

It's not that big of a performance concern since it would happen only while adding the connection (which is one time process). But the code looks very wordy.

Adaverse · 2023-06-23T10:01:38Z

airflow/providers/vertica/hooks/vertica.py

is False the right default value for log_level. From what I can see from here, it seems to be logging.WARNING.

The same question for log_path and many others...

the .get(<option_name>, False) is not used for declare a default value but for test the presence of the option key in extra dictionnary.

mb seems like i presumed something i shouldn't. Looks good here!

darkag · 2023-06-23T19:28:32Z

I tried to simplify code. I added a better handling for log_level parameter (add possibility to pass log level as "warning" and not just 30). Correction for request_complex_types which is true by default (impossible to pass false with the previous version).

Adaverse · 2023-06-24T07:03:28Z

airflow/providers/vertica/hooks/vertica.py

Any reason for going for a camel case instead of snake? It should be latter.

my bad, the habit of coding in javascript/c#. I've fixed it

potiuk · 2023-06-25T07:25:12Z

We need unit tests (to avoid regressions) and documentation (so that users have a chance to discover those features). It makes very little sense to implement such feature without exposing the new features to the users.

airflow/providers/vertica/hooks/vertica.py

phanikumv · 2023-06-26T14:27:04Z

tests/providers/vertica/hooks/test_vertica.py

what is the purpose of this test? Can you please add docstring here. It will be helpful for everyone

phanikumv · 2023-06-26T14:27:15Z

tests/providers/vertica/hooks/test_vertica.py

what is the purpose of this test? Can you please add docstring here. It will be helpful for everyone

I've added a docstring, this test is for parameters that can be passed as string or as int/bool, the first test test them as int/bool. The second as string. Since I don't really know how the mocking works I was not sure that I can make both test in one function and maybe it's possible to chain both case in on test like this :

std_options = ["session_label", "kerberos_host_name", "kerberos_service_name", "unicode_error", "workload", "ssl"] for so in std_options: extra_dict.update({ so: so}) bck_server_node = ["1.2.3.4", "4.3.2.1"] conn_timeout = 30 log_lvl = 40 extra_dict.update({"backup_server_node":bck_server_node}) extra_dict.update({"connection_timeout":conn_timeout}) extra_dict.update({"log_level":log_lvl}) self.db_hook.get_conn() assert mock_connect.call_count == 1 args, kwargs = mock_connect.call_args assert args == () for bo in bool_options: assert kwargs[bo] == True assert kwargs["request_complex_types"] == False for so in std_options: assert kwargs[so] == so assert bck_server_node[0] in kwargs["backup_server_node"] assert bck_server_node[1] in kwargs["backup_server_node"] assert kwargs["connection_timeout"] == conn_timeout assert kwargs["log_level"] == log_lvl assert kwargs["log_path"] is None for bo in bool_options: extra_dict.update({ bo: "True"}) extra_dict.update({ "request_complex_types": "False"}) extra_dict.update({"log_level":"Error"}) self.db_hook.get_conn() assert mock_connect.call_count == 1 # 2 ?? args, kwargs = mock_connect.call_args assert args == () for bo in bool_options: assert kwargs[bo] == True assert kwargs["request_complex_types"] == False assert kwargs["log_level"] == logging.ERROR assert kwargs["log_path"] is None

potiuk · 2023-06-27T20:57:04Z

Needs conflict resolving.

darkag · 2023-06-28T17:17:56Z

Conflct should be resolved

potiuk · 2023-06-28T18:06:43Z

Let's see.

darkag · 2023-06-28T20:46:01Z

ok, I should have fix static checks and problem with test. May I ask why those checks doesn't run/return error when I push a commit ? (I may have fixed this issues faster if it didn't disappear every time I synced my branch)

potiuk · 2023-06-28T21:08:48Z

may have fixed this issues faster if it didn't disappear every time I synced my branch

It's not a question to us - rather to how Github works. You can still see them in past runs - you can find your actions and results of it in "actions" tab - you can filter by your branch name for example. Also I think you can see them in "previous attempts when you go to details of new run:

Also the bit of difficulty for you personally is that the policy for all ASF project is that workflow of "new contributors" (so people who never committed to the project) have to be manually approved by committers -> this is to prevent various kind of abuse (for example using the resources that the ASF has donated for free by GitHub abused by bad actors who create fake new accounts and try to mine bitcoin by opening multiple pull requests with code that mines cryptocurrency. This has already been a pattern actively abused by bad actors so it is a real scenario of abuse that has been actively exploited: https://github.blog/2021-04-22-github-actions-update-helping-maintainers-combat-bad-actors/

But you are completely covered. You can run whatever the CI is doing with Breeze. https://github.com/apache/airflow/blob/main/STATIC_CODE_CHECKS.rst#running-static-code-checks-via-breeze -> shows the way how you can reproduce locally all tests breeze static-checks --all-files without having to wait for CI to be approved and run. This is the reason we have breeze that provides commmon, completely reproducible environment that you can run locally to reproduce whatever CI errors you see.

potiuk · 2023-06-28T21:18:42Z

All the other actions you see in CI can also be generally reproduced locally with Breeze - this is a one-stop CLI that allows you to run all kinds of test and steps locally and fix them before the push.

For example you can see that docs building fails for you - but when you look at the command that is run, you can run the same breeze command and generally you should be able to reproduce it locally.

There are docs for Breeze in https://github.com/apache/airflow/blob/main/BREEZE.rst - explaining installation steps and all commands with help and screenshots. Also it supports auto-completion and --help in case you are in doubt what parameters you can use.

Also we have a rather comprehensive example on how to run the suite of tests locally (we invested quite a lot of effort so that the contributors can easily locally reproduce all CI jobs. For example runnig all kind of unit/integration/system tests is described in https://github.com/apache/airflow/blob/main/TESTING.rst

All those docs are nicely linked from "CONTRIBUTING.rst" which has step-by-step instructions and helpful guidelines on various parts of the development environment (including docs on how to setup tests in the IDEs of your choice).

I hope that will be helpful.

I'd usually recommend to run those checks locally if we know they are failing, and only pushing next attempt after you fixed them locally - that's what I usually do.

potiuk · 2023-06-28T21:23:01Z

BTW. for static checks - what we recommend is to install pre-commit. This is what the output of failed static checks recommends - then static checks will be run automatically when you commit or amend a change (and optimized only to your changes). Also breeze static-checks command is useful (especially with --last-commit or --only-my-changes switches which run the static checks for last commit or all commits that you branched from main of respectively.

All those tools are there specifically to be able to reproduce what CI does, locally so that you can fix them locally before you push (and in your case wait for someone to approve your changes).

potiuk · 2023-06-28T21:23:44Z

Does it answer the concerns you have @darkag ? :D

darkag · 2023-06-29T05:06:08Z

Ok it's more clear for me. Really thanks for your response (and patience).

potiuk · 2023-06-29T07:31:42Z

Ok it's more clear for me. Really thanks for your response (and patience).

No worries. I understand frustration - you want to add what seems a few lines of code yet you need to put a lot of effort to meke them at "community standards' and you need to learn a lot to see how you can implement the standards easily.

Airflow dev env is, complex, to say the least, but it's not complicated IMHO. It takes time to see all the moving pieces and understand which one you can use, and find out that it is metciculously documented with lots of examples and all the steps of CI/dev can be easily reproduced locally. So especially when you implement a change that involves many of the parts failing you just need to take a bit more time to fix them all. So this is perfectly fine to need a bit of guidance sometimes if you are used to just submitting the code and merging it. We have many more steps that need to succeed in order to change being mergable - and once you set it up (pre-commit, breeze etc.) it's super easy, but the initial steps are not always obvious.

The reason why it is so - we have > 2500 - most of them do not know all the expectations we have, and we have limited number of those who review the code - so our CI is the "first" reviewer and the complex set of tests and checks takes care of telling the contributors what needs to be fixed - instead of people having to do that. The result of it that every single contributor and commiter can rely on those things (formatting, correctness of typing, static checks, documentation building without warning etc. etc. and you have very small chance by adding a change to break work of other 2499 contributors (so to speak).

darkag · 2023-06-29T21:10:39Z

ok this time it should be good. What bothered me the most is not to have to match standard but the fact that it took me some time to understand that the tests that failed weren't the ones that are runned automatically just after my commit (which make me believe that errors were solved by just updating branch).

Add binding from extra to vertica connection parameters

Simplification of code, better handling of log level parameter, correction for "request_complex_types" which is true by default (it would have been impossible to pass false with the previous version)

Fix useless spaces, pass options only if present in extra, loop over options to reduce code size

boring-cyborg · 2023-06-30T07:55:26Z

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

potiuk · 2023-06-30T07:55:32Z

🎉 🎉 🎉 🎉 🎉

boring-cyborg bot added the area:providers label Jun 23, 2023

darkag force-pushed the parameters_for_vertica branch from 005338d to 3ecde69 Compare June 23, 2023 09:42

Adaverse suggested changes Jun 23, 2023

View reviewed changes

Adaverse reviewed Jun 23, 2023

View reviewed changes

darkag force-pushed the parameters_for_vertica branch from 72b5449 to 59f2b24 Compare June 23, 2023 19:23

darkag force-pushed the parameters_for_vertica branch 2 times, most recently from c58f82c to 06a13b2 Compare June 24, 2023 06:41

Adaverse reviewed Jun 24, 2023

View reviewed changes

potiuk reviewed Jun 25, 2023

View reviewed changes

airflow/providers/vertica/hooks/vertica.py Outdated Show resolved Hide resolved

phanikumv reviewed Jun 26, 2023

View reviewed changes

darkag force-pushed the parameters_for_vertica branch from d4b3733 to 362a70c Compare June 26, 2023 17:17

eladkal changed the title ~~Parameters for vertica~~ Add various Vertica connection parameters Jun 27, 2023

darkag force-pushed the parameters_for_vertica branch 2 times, most recently from 48dfef9 to 332f2b6 Compare June 28, 2023 16:45

darkag force-pushed the parameters_for_vertica branch 3 times, most recently from 9faa8ee to 9e59bdb Compare June 28, 2023 20:29

darkag and others added 12 commits June 30, 2023 04:55

first try parameters

d96a2fd

Add binding to all options

f5cd910

Add binding from extra to vertica connection parameters

Update vertica.py

60da8c3

Simplification of code, better handling of log level parameter, correction for "request_complex_types" which is true by default (it would have been impossible to pass false with the previous version)

Fix to pass test

478b4ef

Fix useless spaces, pass options only if present in extra, loop over options to reduce code size

fix naming convention

688adcc

add tests and doc

2e8a108

Remove log_path from Extra

da1a22b

remove log_path from example

78731fa

add docstring for test_get_conn_extra_parameters_cast

2b756d8

docstring

ebd4acd

Fix error in tests and typo for static checks

594f5ff

multiples fixes (doc, test, parsing)

d588b94

darkag force-pushed the parameters_for_vertica branch from 512cc37 to d588b94 Compare June 30, 2023 02:55

potiuk approved these changes Jun 30, 2023

View reviewed changes

potiuk merged commit 64a3787 into apache:main Jun 30, 2023

darkag deleted the parameters_for_vertica branch June 30, 2023 13:01

eladkal mentioned this pull request Jul 6, 2023

Status of testing Providers that were prepared on July 06, 2023 #32389

Closed

69 tasks

Add various Vertica connection parameters #32089

Add various Vertica connection parameters #32089

Uh oh!

Conversation

darkag commented Jun 23, 2023

Uh oh!

boring-cyborg bot commented Jun 23, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Adaverse Jun 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

darkag commented Jun 23, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

potiuk commented Jun 25, 2023

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

potiuk commented Jun 27, 2023

Uh oh!

darkag commented Jun 28, 2023

Uh oh!

potiuk commented Jun 28, 2023

Uh oh!

darkag commented Jun 28, 2023

Uh oh!

potiuk commented Jun 28, 2023

Uh oh!

potiuk commented Jun 28, 2023

Uh oh!

potiuk commented Jun 28, 2023

Uh oh!

potiuk commented Jun 28, 2023

Uh oh!

darkag commented Jun 29, 2023

Uh oh!

potiuk commented Jun 29, 2023

Uh oh!

darkag commented Jun 29, 2023

Uh oh!

boring-cyborg bot commented Jun 30, 2023

Uh oh!

potiuk commented Jun 30, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Adaverse Jun 23, 2023 •

edited

Loading