Skip to content

Conversation

@Mikhail-M
Copy link
Contributor

As written in section https://airflow.apache.org/docs/apache-airflow/stable/concepts/scheduler.html#database-requirements, MariaDB higher than 10.6 should work well in HA mode, but has not been tested, so I have tested it and it doesn't work because

  1. MariaDB 10.6 doesn't support update OF, but support NOWAIT and SKIP LOCKED well, so in current version of AIRFLOW if condition is wrong -- fixed in this PR
  2. SQLAlchemy==1.3.24 knew nothing about MariaDB 10.6 and contains this fragment of code https://github.com/sqlalchemy/sqlalchemy/blob/rel_1_3_24/lib/sqlalchemy/dialects/mysql/base.py#L1609 as a result, we get query without SKIP LOCKED. It works correct in 1.4.0 and higher but we can't update this dependency because of Latest SQLAlchemy (1.4) Incompatible with latest sqlalchemy_utils #14811 (webserver crash). In our infrastructure I tested latest version of SQLAlchemy and Scheduler work well, but webserver not.

As far as we merged thin PR and update SQLAlchemy Airflow will support HA Schedulers with MariaDB 10.6+


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@boring-cyborg
Copy link

boring-cyborg bot commented Sep 24, 2021

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (flake8, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@Mikhail-M Mikhail-M changed the title Support for mariadb >= 10.6 Support for MariaDB >= 10.6 Sep 24, 2021
@potiuk
Copy link
Member

potiuk commented Sep 24, 2021

I think if we REALLY want to support mariadb, we should simply add it to the matrix of tests - similarly as we did with MsSQL. It will be far simpler than MsSQL because MariaDB is almost 100% compatible with MySQL. Would you maybe @Mikhail-M like to add fulll support for mariadb (and possibly solve any problems when they arise ? I am happy to guide you with that.

@Mikhail-M
Copy link
Contributor Author

@potiuk Yes, I am ready.
I REALLY want to support MariaDB, because we use it currently. If you provide to me some information for next step it will be beautiful.

@potiuk
Copy link
Member

potiuk commented Sep 24, 2021

I think if we REALLY want to support mariadb, we should simply add it to the matrix of tests - similarly as we did with MsSQL. It will be far simpler than MsSQL because MariaDB is almost 100% compatible with MySQL. Would you maybe @Mikhail-M like to add fulll support for mariadb (and possibly solve any problems when they arise ? I am happy to guide you with that.

Cool/. You can take a look as the inspiration:

It was then followed up by a few more as we realized that we need more:

If you foilow what has been added there, we should be good :)

@potiuk
Copy link
Member

potiuk commented Sep 24, 2021

Most of the changes in the biggest - first change were to fix various MSSQL incompatibilities, so you will likely not need those.

@Mikhail-M
Copy link
Contributor Author

Mikhail-M commented Sep 24, 2021

Thank you!
I think it should be better to create another PR for production-level MariaDB support. What should we do with current one?

@potiuk
Copy link
Member

potiuk commented Sep 24, 2021

I think it would be great to get it all in one go - one problem I have with this PR is that it has no tests, so we do not really know if it works (and any future changes might still break it) - adding CI support will not only test it, but prevent any regressions in the future.

@potiuk potiuk linked an issue Sep 25, 2021 that may be closed by this pull request
Comment on lines +184 to +186
mariadb_10_6 = (
isinstance(dialect, MySQLDialect) and dialect._is_mariadb and dialect.server_version_info >= (10, 6)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emm not sure depending on _is_mariadb is a good idea.

Copy link
Contributor Author

@Mikhail-M Mikhail-M Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, it looks like there is no better way.

In our version of sqlalchemy it realized this way -- https://github.com/sqlalchemy/sqlalchemy/blob/rel_1_3_24/lib/sqlalchemy/dialects/mysql/base.py#L2686, so we can inline it, but this is most likely worse

In the modern version of sqlalchemy we can replace it with dialect.is_mariadb (and it is backward compatible)
https://github.com/sqlalchemy/sqlalchemy/blob/main/lib/sqlalchemy/dialects/mysql/base.py#L2756

Do you have any suggestions?

@potiuk
Copy link
Member

potiuk commented Oct 5, 2021

Hey @Mikhail-M -> are you still working on this ?

@Mikhail-M
Copy link
Contributor Author

Hey @Mikhail-M -> are you still working on this ?

Yes, I am.
Sorry, it was a busy days. But today I had a chance to do something.

@Mikhail-M Mikhail-M requested a review from uranusjr October 29, 2021 20:43
@uranusjr
Copy link
Member

You're using INSTALL_MARIA_CLIENT in some places and INSTALL_MARIADB_CLIENT some else and the linter is catching it.

@Mikhail-M
Copy link
Contributor Author

Mikhail-M commented Nov 5, 2021

I disagree completely with "no test" as "no effort".
We made sure even if the experience was sub par for Scheduler HA, to atleast make sure it works.

In my experience Scheduler HA with MariaDB backend doesn't work now(deadlocks), so I decided to prepare this patch (The details in the PR description).
Now we are using fork with this patch and it looks like it works fine (Airflow 2.1+, 2.2 + MariaDB Galera Cluster have been tested).

But overal, it works well, because, as @potiuk mentioned, MariaDB is almost 100% compatible with MySQL

@potiuk
Copy link
Member

potiuk commented Nov 5, 2021

For me it's a bit of "burying head in sand" (not sure of the idiomatic English).

If we think MySQL is bad for us - let's take a "curagous' step and drop it (knowing the consequences).

Otherwise MariaDB is just a flavour of MySQL that people use and and by not testing it we wil have more issues to handle on a daily basis:

  • We do not need to add mariadb clients to image (mysql libs works for it 100%)
  • People are already using MariaDB seeing "mysql" is supported

This is very different than MsSQL or other DBs.

@Mikhail-M
Copy link
Contributor Author

  • We do not need to add mariadb clients to image (mysql libs works for it 100%)

Agree, it is true

@kaxil
Copy link
Member

kaxil commented Nov 5, 2021

Otherwise MariaDB is just a flavour of MySQL that people use and and by not testing it we wil have more issues to handle on a daily basis:

I would rather argue that if MariaDB is like everyone knows a flavor of MySQL there is no need of adding more tests.

If we think MySQL is bad for us - let's take a "curagous' step and drop it (knowing the consequences).

I am all for it but since we had "officially" supported it we can't hang the current users to dry. This is why "officially" supporting means a lot and I am opposed to saying we officially support MariaDB.

People are already using MariaDB seeing "mysql" is supported

However similar, they are not the same and this is why both of them exists in the truth though.

@kaxil
Copy link
Member

kaxil commented Nov 5, 2021

To be very clear I am ok to have the "code" fix that you have in a separate PR but I am not OK with adding MariaDB to the CI and saying we officially support it

@potiuk
Copy link
Member

potiuk commented Nov 5, 2021

To be very clear I am ok to have the "code" fix that you have in a separate PR but I am not OK with adding MariaDB to the CI and saying we officially support it

We can also add MariaDB to the test suite but say that we don't "officially" support it yet. similarly as wtih all other changes I am hesitant to approve a change that we have no tests that test it. That sounds very inconsistent.

@kaxil
Copy link
Member

kaxil commented Nov 5, 2021

We can also add MariaDB to the test suite but say that we don't "officially" support it yet. similarly as wtih all other changes I am hesitant to approve a change that we have no tests that test it. That sounds very inconsistent.

Approve a change with no tests? Since we support Postgres, MySQL and MSSQL and the tests are passing with them, what more do you need? I don't see it is as inconsistent, we have doing that for ages. That is the main difference on what is officially supported vs not.

That's what we did it MSSQL way before it was even experimental. Some code that keeps the existing functionality but fixes it for the other is not inconsistent for sure

@kaxil
Copy link
Member

kaxil commented Nov 5, 2021

You might have missed my reply in #18506 (comment)

"We can also add MariaDB to the test suite but say that we don't "officially" support it yet" is inconsistent -- If we have it in CI we support it or plan to support it. "best effort" is as someone encounters and fixes it they will or we do same as what we did with HA.

@potiuk
Copy link
Member

potiuk commented Nov 5, 2021

"We can also add MariaDB to the test suite but say that we don't "officially" support it yet" is inconsistent -- If we have it in CI we support it or plan to support it. "best effort" is as someone encounters and fixes it they will or we do same as what we did with HA.

Just to reverse the logic - do you think we also support "Oracle" now as "best effort"?

I think it's exactly the same situation as MariaDB - it is supported by SqlAlchemy and we have no tests for it.

Or should we say we don't support at all "Oracle" and "MariaDB". I really wonder what's the difference now :) ?

I am not really trying to be mean, just want to try to understand what "best effort" means really..

@potiuk
Copy link
Member

potiuk commented Nov 5, 2021

It's really about the "statement" and how we explain to our users when they ask. I have no good answer to that now.

@potiuk
Copy link
Member

potiuk commented Nov 5, 2021

And let me be very clear on that as well - I am also OK on merging that PR without the tests. But I oppose on calling it "best effort". I stand by my statement that not running tests is "we do not suppot" MariaDb. and whenever someeone reports issue with MariaDB, our answer should be (as today ) - we do not support it, change to MySQL. Postgres or MsSQL. . The answer 'we do best effort' to have Maria DB is simply wrong statement that bears no meaning whatsoever. From our perspectie either we are OK with our users to run MariaDB or not.

IMHO We should bluntly say "MariaDB does not work" if we have no tests. Because we really do not know. We never tested it. We do not even know which versions are supported to which extent and which configurations are supported for encoding and the like. There is no "maybe".

This is all I am here about our statement. The "best effort" is just wrong statement IMHO.

@kaxil
Copy link
Member

kaxil commented Nov 5, 2021

Just to reverse the logic - do you think we also support "Oracle" now as "best effort"?

Any DB that is not in our CI is not "officially supported". MariaDB or MSSQL were solved by at least some of the community members. We did it when doing Scheduler HA, Bas did it for MSSQL even before we had experimental support.

The main point there is "that we do not provide guarantee that it will 100% work"

I think it's exactly the same situation as MariaDB - it is supported by SqlAlchemy and we have no tests for it.

There is no way we can support all the DBs supported by SqlAlchemy. SqlAlchemy is for unifying DB API's but as you can see it still gets difficult for them.

Or should we say we don't support at all "Oracle" and "MariaDB". I really wonder what's the difference now :) ?

Yup we don't officially support -- " We provide no guarantee that it will work" . We don't really need to say what DBs we don't support :) We just need to say what DBs we officially support.

I am not really trying to be mean, just want to try to understand what "best effort" means really..

The only thing I wanted to convey was that we shouldn't officially support yet another DB. "best effort" for me is literally we will accept fixes if someone creates PRs as long as it doesn't break existing support for our officially supported DBs.

@kaxil
Copy link
Member

kaxil commented Nov 5, 2021

This is all I am here about our statement. The "best effort" is just wrong statement IMHO.

Check my very first comment:
#18506 (comment) -- I am against officially supporting it :)

@potiuk
Copy link
Member

potiuk commented Nov 5, 2021

This is all I am here about our statement. The "best effort" is just wrong statement IMHO.

Check my very first comment: #18506 (comment) -- I am against officially supporting it :)

That's cooll. But let's not call it "best effort" - lets call it "no support".

@kaxil
Copy link
Member

kaxil commented Nov 5, 2021

Yup happy with it

@potiuk
Copy link
Member

potiuk commented Nov 5, 2021

Me too. My opinion is that trying to "soften" the message is much worse that being very straightforward. We should be more curagous sometimes to say "We do not do that" rahter thatn "Well, maybe that we wll do but not entirely". For me - merging that change and removing the tests but bluntly saying "we do not support MariaDB" is FAR better than saying "we do best effort to support MariaDB". We don't. It does not work. some people did some changes to make it work, Maybe it does for some people , but we do not - as a community - support it at all.

Just it - transparent, open communication and properly setting the expectations of people who are receiving the mesage. By saying 'best effort" we go into the "blurry" zone - where our expectation is that Maria DB does not work, but the user expectation who happens to know Maria DB is that it works. This is a wrong place to be in where expectations are so different on both sides.

@kaxil
Copy link
Member

kaxil commented Nov 5, 2021

Yup and in my opinion, we stick with what we know and are 100% sure even in terms of performance and edge cases. In trying to increase the number of supported DBs we increase maintenance and provide a "sub-par" experience for our users.

A wise man once said "Do less but do them well" or something similar to the following image :)

image

@potiuk
Copy link
Member

potiuk commented Nov 6, 2021

I am fine with that :) - even more. I will write my thoughts on devlist about mysql :)

@mik-laj
Copy link
Member

mik-laj commented Nov 6, 2021

@angelorobo
Copy link

Is the MariaDB support decision finalized yet?

@potiuk
Copy link
Member

potiuk commented Nov 22, 2021

Is the MariaDB support decision finalized yet?

I don't think it is - I think if you want to get support, the best way to help with the decision is to state your position as someone who wants it in the discussion in the devlist and argue for it. I think I was the only one (from those speaking so far) who thinks that we should add tests and full support.

For me the current status quo is "we do not support MariaDB". Full stop. Even though we know there are many of our users who use MariaDB, they do it fully on their own - and I am going to mostly close/convert into discussion any issues where MariaDB is mentioned and it will look like "maybe a a database issue" (this is very much consequences of the status quo). At least that's my understanding of it (though I think it woudl be better for our users to support it).

And just to elaborate a little why I take this 'stance':

I believe eiether we say "we support MariaDB" - and add tests, or we say "we do not support it" when we have no tests.

I think there is no middle-ground and "best-effort" when we (as community) do "no effort" to make sure the support is there, and especially no effort to make sure it is not broken in the future (this is what tests are really about).

We are adding new features and there was at least one case where lack of certain DB feature caused change of direction of the implementation (https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-42%3A+Dynamic+Task+Mapping) - big part of the implementation was changed because MySQL does not support big indexes. And without tests we have no guarantee that we will not hit similar limitations of MariaDB in specific versions, especialy that MariaDB versioning is very different from MySQL and some features are appearing with various versions and different cadence. So currently we are not even able to say which version of MySQl our users could use.

So for me at least status quo is "we have no tests" which is equivalent to "we do not support it". I was not able to gain enough support for "supporting" MariaDB, so without more support and discussion on the devlist, the status quo is "we do not support MariaDB".

But I think if you could chime in from the user perspective and say how much you would like - and why - have support for MariaDB on the devlist, might encourage others to speak and possibly support it or at least revive the discussion. Maybe if we know how many of our users are actually using MariaDB could help (but I think for that we would need to wait for our annual survey - I am definitely planning to add such question for "MySQL" users) - the case would be stronger, but I have not enough data to back my "feeling" that we have a lot of users using MariaDB - possibly even more than MySQL users.

@potiuk
Copy link
Member

potiuk commented Nov 22, 2021

The link to discussion is here:
Here is the thread link: https://lists.apache.org/thread/dp78j0ssyhx62008lbtblrc856nbmlfb

@github-actions
Copy link

github-actions bot commented Jan 7, 2022

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Jan 7, 2022
@github-actions github-actions bot closed this Jan 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale Stale PRs per the .github/workflows/stale.yml policy file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add tests suite for MariaDB 10.6+ and fix incompatibilities

7 participants