-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Add a migration script tooling support to migrate off mssql #35861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a migration script tooling support to migrate off mssql #35861
Conversation
|
I am not going to block it in this form - but - as explained in a number of messages - I have a huge concern about a) making it part of official airflow tooling The problem is that if there are any issues found that need to be fixed, we will have to RELEASE fixes to it. Users are not able to modify and manipulate the export file itself, any problems with the migration will come to us as an issue to be solved and user's expectation will be to release a fix. Currenlty we do not have a mechanism to do so. We release stuff in providers, airflow, helm chart and we have formal process to do so. Once we make it part of any of these, users will expect us to release a .1, .2. version or whatever in case there are any issues to be fixed. There are many problems if we try to support this as "official tool" released by the Airflow PMC (and having it in our repo will make people expect it). For example what happens if a new version of sqlite library gets released that we will have to support it? By the time we release airflow 2.8, 2.9, the DAG objects of our will be different than 2.7.3. So what should we do if we make it part of airflow There will be implicit expectations that this tool should be evolving together with airflow. So eventually we will have to add --version-from and --version-too and all the stuff because users will be confused. Is this script working with the current version of airflow? Is this script only supposed to be used on 2.7.3? What if I take version of that script released in 2.9 and run it on my "2.7.3" database? Should I expect it to work? Should I get my target airflow db migrated to latest version before using the sqlite dump to import the data? What happens if somoene has 2.5.3 version of MSSQL? Can they migrate without upgrading to 2.7.3? Will it still work? etc. etc. And most importantly - should we support issues and questions of users when - for whatever reason the sqlite export will stop working (for example because they will use sqlite 9.13 (imaginary) released 2 years from now)? Should we release new versions of that script when sqlite 9.13 gets released? I'd love to avoid all those questions and issues to be honest. That's why I am hesitating to get it in this form and as part of "airflow" repository. PROPOSAL: I am quite ok to have this as MSSQL migration tool for 2.7.3 -> Postgres/Sqlite 2.7.3 ONLY. Nothing else. No generic migration tool. We should limit it to specific versions of sqlite, airflow and whatever external dependency needed, ideally we should have requirements.txt with fixed versions of everything that is needed to run it. AND we should have it outside of airlfow repo. It's very easy (I can do it in minutes) to create a separate repository ( I think then, I will be even ok with using sqlite db as intermediary form - not being a text file. |
|
I wonder - what others think about it. Especially those who were in Airflow when we migrated to Airflow 2. from 1.10.15. And just to add a bit more context. We did release https://pypi.org/project/apache-airflow-upgrade-check/ when we started Airflow 1.10 -> Airflow 2 version. And that was not even a migration tool, that was a checker that checked if you are ready to migrate. And my concerns are heavily based on experienced we had then. The more we limit and constraint the users, the less "black" or even "gray" box our migration tool will be, the better for any kind of issues. If we RELEASE something that looks like ready-to-use-CLI-tool, users will expect it to works as a generic tool, simply. The more we make it "do-it-yourself" solution, the more likely it is they will look inside and fix it, knowing that this is is what they are supposed to do. |
|
Hi @potiuk Yes, maybe this is a mis-understanding. I did NOT want to make this script a release piece. If we see we need it in other cases we could leverage it. So it is now intended solely to support the MSSQL migration purposes. If you make another repo I am fine using this PR just for review purposes and drop the file to another location (or do the review on another side). |
|
And... Fantastic you are doing it ! Thanks for that! |
I see no problem with that. |
I think we never said we will maintain and relase the scripts. We discussed - we should tell the users how to do it, but not that it will become part of standard airflow tooling and that we will keep it maintained. Let me explain why I think it should be in a separate repo. I personaly think we should never have the code in our main airlfow repo that we are not planning to update dependencies on regularly. This opens up for example for CVE/security vulnerabilities. There are already CVE/vulnerabilities reported to us about the code that is just somewhere in our code when it is merely an "example". This basically means that we are supposed - to update to newer version of dependencies in the future, because some 3rd-party dependencies might have a security issue. For example if we add And we at least will have to respond to those reports and explain, and most likely if more people will be complaining we will likely have to do something about it - publish VEX information stating "this is only a tool that we are not really maintaining with airflow any more". Having separate repo with very clear README and explanation what's the purpose of it, and information that this is only an example/one-time-conversion tool that users are going to use and modify on their own as needed makes it much less possible that such reports will be raised, especially if we just write a blog post about it at Airflow Publication in medium and won't link to it from our documentation. And it is quite fair I think as well from our side - we will provide a way for our users to migrate, but we wont commit to maintain it in the future. Which we can also clearly say - the longer you delay, the more "future" problems YOU will be dealing with - it's your choice - dear user. I think it's an assertive and fair way of communicating with our MSSQL users. The more explicitly we do it now, the more "separate" it is now from "airlfow" code, the easier (and fair) it will be to tell our users who will come to us 2 years from now telling that the migration script does not work that they are on their own, because they have not done what we advised. They had a chance to do it when our eyes and hands were on it, but 2 years later our eyes and hands will be looking elsewwhere. This is something that we have a bit of problem now - when we have the users that come to us 3 years after 1.10 reached end-of-life. They see our docs (if they look at it at all), they see released "migration check" and if does not work (for example because setuptools release broke it) they will come to us to "fix" it. We refuse, of coure, but I would like to have this migration efforts very clear that this is quite a bit their problem if they delay the migration and make it very, very clear from day one. I do not want to give users the impression we are going to support them 3 years from now when they come and say "this script does not work any more". |
|
+1 to keep this separate from the Airflow repo considering the added maintenance time it would cost and of course the CVE issues |
|
Migrated script to https://github.com/apache/airflow-mssql-migration |
Add a migration tool in order to deprecate MSSQL support in Airflow 2.8.0
I placed this in the
/scripts/toolsfolder for now as it is not something targetted (yet) to be supported in main airflow core. Nevertheless because of the nature how it is it could be made into oneairflow db ...command.I am open where we store it (for the moment).
How to test?
migrate_script.pyinto a place where the Airflow python env is availablepython migrate_script.py --extractand see that amigration.dbSQLite file is built in your current folder.migration.dband themigrate_script.pyto a test environment with Airflow 2.7.3, stop the airflow processes and and executepython migrate_script.py --restore- before hitting enter ensure no valuable data is contained, as it will wipe the DB content before copy!I tested with:
There might be other data "in the wild", probably there will be "glitches" if more situations are seen. Feedback welcome!