Skip to content

Conversation

@darkag
Copy link
Contributor

@darkag darkag commented Jun 23, 2023

Give the possibility to pass vertica connection parameters using extra. Especially it gives the possibility to enable load balancing (Vertica requires that load balancing is enabled client and server side)

@boring-cyborg
Copy link

boring-cyborg bot commented Jun 23, 2023

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@darkag darkag force-pushed the parameters_for_vertica branch from 005338d to 3ecde69 Compare June 23, 2023 09:42
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we fetch extra dict by calling conn.extra_dejson once rather than calling it repeatedly? It makes it look very wordy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I based my code on what it was done for the mysql hook, I didn't realize that extra_dejson was property that load json every time you call it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not that big of a performance concern since it would happen only while adding the connection (which is one time process). But the code looks very wordy.

Copy link
Contributor

@Adaverse Adaverse Jun 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is False the right default value for log_level. From what I can see from here, it seems to be logging.WARNING.

The same question for log_path and many others...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the .get(<option_name>, False) is not used for declare a default value but for test the presence of the option key in extra dictionnary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mb seems like i presumed something i shouldn't. Looks good here!

@darkag darkag force-pushed the parameters_for_vertica branch from 72b5449 to 59f2b24 Compare June 23, 2023 19:23
@darkag
Copy link
Contributor Author

darkag commented Jun 23, 2023

I tried to simplify code. I added a better handling for log_level parameter (add possibility to pass log level as "warning" and not just 30). Correction for request_complex_types which is true by default (impossible to pass false with the previous version).

@darkag darkag force-pushed the parameters_for_vertica branch 2 times, most recently from c58f82c to 06a13b2 Compare June 24, 2023 06:41
Comment on lines 49 to 65
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for going for a camel case instead of snake? It should be latter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad, the habit of coding in javascript/c#. I've fixed it

@potiuk
Copy link
Member

potiuk commented Jun 25, 2023

We need unit tests (to avoid regressions) and documentation (so that users have a chance to discover those features). It makes very little sense to implement such feature without exposing the new features to the users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the purpose of this test? Can you please add docstring here. It will be helpful for everyone

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the purpose of this test? Can you please add docstring here. It will be helpful for everyone

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a docstring, this test is for parameters that can be passed as string or as int/bool, the first test test them as int/bool. The second as string. Since I don't really know how the mocking works I was not sure that I can make both test in one function and maybe it's possible to chain both case in on test like this :

   std_options = ["session_label", "kerberos_host_name", "kerberos_service_name", "unicode_error", "workload", "ssl"]
    for so in std_options:
        extra_dict.update({ so: so})
    bck_server_node = ["1.2.3.4", "4.3.2.1"]
    conn_timeout = 30
    log_lvl = 40
    extra_dict.update({"backup_server_node":bck_server_node})
    extra_dict.update({"connection_timeout":conn_timeout})
    extra_dict.update({"log_level":log_lvl})

    self.db_hook.get_conn()
    assert mock_connect.call_count == 1
    args, kwargs = mock_connect.call_args
    assert args == ()
    for bo in bool_options:
        assert kwargs[bo] == True
    assert kwargs["request_complex_types"] == False
    for so in std_options:
        assert kwargs[so] == so
    assert bck_server_node[0] in kwargs["backup_server_node"]
    assert bck_server_node[1] in kwargs["backup_server_node"]
    assert kwargs["connection_timeout"] == conn_timeout
    assert kwargs["log_level"] == log_lvl
    assert kwargs["log_path"] is None

    for bo in bool_options:
        extra_dict.update({ bo: "True"})
    extra_dict.update({ "request_complex_types": "False"})
    extra_dict.update({"log_level":"Error"})
    self.db_hook.get_conn()
    assert mock_connect.call_count == 1 # 2 ??
    args, kwargs = mock_connect.call_args
    assert args == ()
    for bo in bool_options:
        assert kwargs[bo] == True
    assert kwargs["request_complex_types"] == False
    assert kwargs["log_level"] == logging.ERROR
    assert kwargs["log_path"] is None

@darkag darkag force-pushed the parameters_for_vertica branch from d4b3733 to 362a70c Compare June 26, 2023 17:17
@eladkal eladkal changed the title Parameters for vertica Add various Vertica connection parameters Jun 27, 2023
@potiuk
Copy link
Member

potiuk commented Jun 27, 2023

Needs conflict resolving.

@darkag darkag force-pushed the parameters_for_vertica branch 2 times, most recently from 48dfef9 to 332f2b6 Compare June 28, 2023 16:45
@darkag
Copy link
Contributor Author

darkag commented Jun 28, 2023

Conflct should be resolved

@potiuk
Copy link
Member

potiuk commented Jun 28, 2023

Let's see.

@darkag darkag force-pushed the parameters_for_vertica branch 3 times, most recently from 9faa8ee to 9e59bdb Compare June 28, 2023 20:29
@darkag
Copy link
Contributor Author

darkag commented Jun 28, 2023

ok, I should have fix static checks and problem with test. May I ask why those checks doesn't run/return error when I push a commit ? (I may have fixed this issues faster if it didn't disappear every time I synced my branch)

@potiuk
Copy link
Member

potiuk commented Jun 28, 2023

may have fixed this issues faster if it didn't disappear every time I synced my branch

It's not a question to us - rather to how Github works. You can still see them in past runs - you can find your actions and results of it in "actions" tab - you can filter by your branch name for example. Also I think you can see them in "previous attempts when you go to details of new run:

Screenshot 2023-06-28 at 22 59 54

Also the bit of difficulty for you personally is that the policy for all ASF project is that workflow of "new contributors" (so people who never committed to the project) have to be manually approved by committers -> this is to prevent various kind of abuse (for example using the resources that the ASF has donated for free by GitHub abused by bad actors who create fake new accounts and try to mine bitcoin by opening multiple pull requests with code that mines cryptocurrency. This has already been a pattern actively abused by bad actors so it is a real scenario of abuse that has been actively exploited: https://github.blog/2021-04-22-github-actions-update-helping-maintainers-combat-bad-actors/

But you are completely covered. You can run whatever the CI is doing with Breeze. https://github.com/apache/airflow/blob/main/STATIC_CODE_CHECKS.rst#running-static-code-checks-via-breeze -> shows the way how you can reproduce locally all tests breeze static-checks --all-files without having to wait for CI to be approved and run. This is the reason we have breeze that provides commmon, completely reproducible environment that you can run locally to reproduce whatever CI errors you see.

@potiuk
Copy link
Member

potiuk commented Jun 28, 2023

All the other actions you see in CI can also be generally reproduced locally with Breeze - this is a one-stop CLI that allows you to run all kinds of test and steps locally and fix them before the push.

For example you can see that docs building fails for you - but when you look at the command that is run, you can run the same breeze command and generally you should be able to reproduce it locally.

Screenshot 2023-06-28 at 23 11 21

There are docs for Breeze in https://github.com/apache/airflow/blob/main/BREEZE.rst - explaining installation steps and all commands with help and screenshots. Also it supports auto-completion and --help in case you are in doubt what parameters you can use.

Also we have a rather comprehensive example on how to run the suite of tests locally (we invested quite a lot of effort so that the contributors can easily locally reproduce all CI jobs. For example runnig all kind of unit/integration/system tests is described in https://github.com/apache/airflow/blob/main/TESTING.rst

All those docs are nicely linked from "CONTRIBUTING.rst" which has step-by-step instructions and helpful guidelines on various parts of the development environment (including docs on how to setup tests in the IDEs of your choice).

I hope that will be helpful.

I'd usually recommend to run those checks locally if we know they are failing, and only pushing next attempt after you fixed them locally - that's what I usually do.

@potiuk
Copy link
Member

potiuk commented Jun 28, 2023

BTW. for static checks - what we recommend is to install pre-commit. This is what the output of failed static checks recommends - then static checks will be run automatically when you commit or amend a change (and optimized only to your changes). Also breeze static-checks command is useful (especially with --last-commit or --only-my-changes switches which run the static checks for last commit or all commits that you branched from main of respectively.

All those tools are there specifically to be able to reproduce what CI does, locally so that you can fix them locally before you push (and in your case wait for someone to approve your changes).

@potiuk
Copy link
Member

potiuk commented Jun 28, 2023

Does it answer the concerns you have @darkag ? :D

@darkag
Copy link
Contributor Author

darkag commented Jun 29, 2023

Ok it's more clear for me. Really thanks for your response (and patience).

@potiuk
Copy link
Member

potiuk commented Jun 29, 2023

Ok it's more clear for me. Really thanks for your response (and patience).

No worries. I understand frustration - you want to add what seems a few lines of code yet you need to put a lot of effort to meke them at "community standards' and you need to learn a lot to see how you can implement the standards easily.

Airflow dev env is, complex, to say the least, but it's not complicated IMHO. It takes time to see all the moving pieces and understand which one you can use, and find out that it is metciculously documented with lots of examples and all the steps of CI/dev can be easily reproduced locally. So especially when you implement a change that involves many of the parts failing you just need to take a bit more time to fix them all. So this is perfectly fine to need a bit of guidance sometimes if you are used to just submitting the code and merging it. We have many more steps that need to succeed in order to change being mergable - and once you set it up (pre-commit, breeze etc.) it's super easy, but the initial steps are not always obvious.

The reason why it is so - we have > 2500 - most of them do not know all the expectations we have, and we have limited number of those who review the code - so our CI is the "first" reviewer and the complex set of tests and checks takes care of telling the contributors what needs to be fixed - instead of people having to do that. The result of it that every single contributor and commiter can rely on those things (formatting, correctness of typing, static checks, documentation building without warning etc. etc. and you have very small chance by adding a change to break work of other 2499 contributors (so to speak).

@darkag
Copy link
Contributor Author

darkag commented Jun 29, 2023

ok this time it should be good. What bothered me the most is not to have to match standard but the fact that it took me some time to understand that the tests that failed weren't the ones that are runned automatically just after my commit (which make me believe that errors were solved by just updating branch).

darkag and others added 12 commits June 30, 2023 04:55
Add binding from extra to vertica connection parameters
Simplification of code, better handling of log level parameter, correction for "request_complex_types" which is true by default (it would have been impossible to pass false with the previous version)
Fix useless spaces, pass options only if present in extra, loop over options to reduce code size
@darkag darkag force-pushed the parameters_for_vertica branch from 512cc37 to d588b94 Compare June 30, 2023 02:55
@potiuk potiuk merged commit 64a3787 into apache:main Jun 30, 2023
@boring-cyborg
Copy link

boring-cyborg bot commented Jun 30, 2023

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

@potiuk
Copy link
Member

potiuk commented Jun 30, 2023

🎉 🎉 🎉 🎉 🎉

@darkag darkag deleted the parameters_for_vertica branch June 30, 2023 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants