-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Automatically save and allow restore of recent DAG run configs #27805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
|
If you don't mind, you can add before and after pics |
|
Note that when a recent configuration is selected, it is not automatically formatted as CodeMirror no longer supports this functionality. Another library can be introduced to remedy this (something like js-beautify) but seems unnecessary. I don't have any before images, but the only difference is that the "Recent Configurations" input is not there. Please see these after images: |
|
What happens if the previous config is big? I wonder if we should simply show a datetime or sorts for each saved entry instead. Also do we need a way for the user to clear saved configs? |
|
Is it relevent to see 'previous config' of all other dag ? Should this be on a per dag basis ? |
|
Also wondering if the localstorage is the correct place for this. Storing all configs of each manual run for each dag could be pretty heavy. (localstorage is limited to 5M). Also loading all of this on the page could be quite long (storage is synchronous if i am not mistaken so reading all of that from it could be expensive on a UI point of view) |
@pierrejeambrun Do you have an alternative suggestion of storing recent configs asides from localstorage? We could also do session storage? |
|
@aaronabraham311, one other obvious way would be to persist them in the db, and expose them through the rest api. (In this case this is templated via FAB so providing them to the template is enough I believe) This requires more effort to implement, and I am not sure all db support json type and if this would be more suitable. This is just an idea, wondering what others think. |
|
I disagree with the db/API endpoint idea. I think it's too much overhead for a minor feature that seems intended to save some time for DAGs with small configs that the user didn't expect to run multiple times. I feel like the ability to copy/paste configurations from here also overlaps this feature. My suggestion is that we can either save recent configs in session storage by DAG, or in local storage but limited to a certain number of total recent configs across all DAGs. Additionally, Aaron and I are on a bit of a short timeline and likely won't have time to implement a backend-supported version of this feature, but if that's the direction we really want to go, we can get started on it and see where it goes. |
|
Now that you mention it, DagRun conf are already stored in the database and available for each run. Isn't it easier to just retrieve them and provide them to the I feel like there is no need to persist this also on the client side. |
|
@pierrejeambrun Oh that's great! Is there any example on how to access the DagRun configs from the db? Or is there a code snippet that we can use as an example? |
|
@aaronabraham311 There are a lot of examples querying resource from the db in the |
I've implemented this approach and limited the number of requested recent DAG runs to 10... Because there may be duplicate recent configs this means we won't necessarily get 10 different recent configs. Take a look! |
|
Hello @jh242, Thank you for the update :) Here you are loading the grid_data info (which has a lot of extra stuff on task instances etc.), just to retrieve the 'configs'. As you said you could end up with less than 10 confs (because of duplicate), or even with no conf at all if the last 10 run were scheduled. Here we statically render a template, there is usually no need to go through an API call. What you could do is, inside the view that renders the You can check the Maybe I missed something, but this should make the code even cleaner and simple :) |
|
Thanks @pierrejeambrun , that's an excellent point. My latest commit implements your suggestion. |
pierrejeambrun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, tested locally and seems great.
I made a few suggestions. Can you also rebase your branch please ? :)
@bbovenzi do you mind providing some feedback on this one, as you already handled confs in the grid_data view. :)
airflow/www/static/js/trigger.js
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid manually replacing the quotes, you can json.dumps(conf). See what is done in the grid_data endpoint, and my comments bellow.
airflow/www/views.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think empty objects are relevent here. (also update the is not None syntax)
| DagRun.conf is not None, | |
| DagRun.conf.isnot(None), | |
| DagRun.conf != {} |
airflow/www/views.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conf are not always json compatible. I believe they can be plain str. check def encode_dag_run func. We may want to also handle this case. Maybe we can extract a reusable function that will handle the conf the same way that is done in encode_dag_run. (json.dumps if possible, otherwise str, and also returns conf_is_json, then we skip the conf that are not json in the response)
|
Hey @pierrejeambrun , thanks so much for your guidance so far. I've implemented your suggestions, including the reusable function for checking if dagRun configs are JSON or not. I'm still a little bit confused on serializing the response from my query as I haven't worked with Flask before and I wasn't sure how to test a string dagRun config, so that may need more attention. |
|
Hi @pierrejeambrun @bbovenzi! We have addressed your requested changes. Please let us know if we can begin merging this! We would like to merge this in by end of this week if possible as we won't be able to push this through afterwards. |
pierrejeambrun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good to me. I would like to get a second pair of eyes on this one. A few nits on my side and I should be ready :)
airflow/www/static/js/trigger.js
Outdated
| .setSize(null, height); | ||
|
|
||
| function setRecentConfig(e) { | ||
| document.querySelector('.CodeMirror').CodeMirror.setValue(e.target.value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should try to format the config JSON so it doesn't appear as a single line
function setRecentConfig(e) {
let { value } = e.target;
try {
const json = JSON.parse(value);
value = JSON.stringify(json, null, 2);
} catch (err) {
console.err(err);
}
document.querySelector('.CodeMirror').CodeMirror.setValue(value);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey yeah! I wanted to do this as well, but code formatting is a function that CodeMirror no longer supports. The other alternative is adding another JS library just to do this formatting, but I don't think we need this overhead (let me know if we already have one that can do it!).
The other alternative is writing a bespoke function to do JSON formatting but I didn't think it was worth the trouble.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function above should do it. We just need that try/catch block to capture thrown errors when it is not valid json.
| ) | ||
| .order_by(DagRun.execution_date.desc()) | ||
| .limit(5) | ||
| .all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change this to the first 5 unique configs?
If we can now select old configs, a lot of the options would just be the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, yep! Somehow that slipped my mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit of an SQLalchemy noob--I thought that dropping in a .distinct after (or before) the .order_by clause would produce what we needed, but it doesn't seem to be working and I'm not sure why. Could I get a bit of guidance on this @bbovenzi ? Thanks in advance!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think .distinct(DagRun.conf) works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried this and it did not. Removing the .order_by made it work though. Works now!
|
@jh242 Can you also fix the failing test. The extracted function (There is a difference if the dagrun config is an empty object) https://github.com/apache/airflow/actions/runs/3577867703/jobs/6017872438 |
|
@pierrejeambrun @bbovenzi My latest commit should address all of the above issues! Thanks so much again guys. Would really appreciate getting this merged before Friday if possible--we're contributing as part of a university course and it would be great to have something to show for our efforts :D EDIT: Hey, so looks like DISTINCT ON isn't even supported on any DB other than MySQL, so I'm going to try to reformulate the query using group_by |
bbovenzi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. Thanks for being patient with all the requested changes.
|
updated our branch to be up-to-date with airflow. Thanks again @bbovenzi @pierrejeambrun :D |
pierrejeambrun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ! Thank you for this addition 🎉
|
Awesome work, congrats on your first merged pull request! |


closes: #27714
Note: we found that the restored recent config isn't automatically formatted because CodeMirror doesn't support that anymore. We could add js-beautify but we were unsure if we should add this library for this singular fix. It would also only work for ES10. If we want more backwards compatability, let us know and we can re-write.
cc: @jh242