Skip to content

Conversation

@aaronabraham311
Copy link
Contributor

closes: #27714

Note: we found that the restored recent config isn't automatically formatted because CodeMirror doesn't support that anymore. We could add js-beautify but we were unsure if we should add this library for this singular fix. It would also only work for ES10. If we want more backwards compatability, let us know and we can re-write.

cc: @jh242

@boring-cyborg boring-cyborg bot added area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues labels Nov 20, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Nov 20, 2022

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (flake8, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@ephraimbuddy
Copy link
Contributor

If you don't mind, you can add before and after pics

@jh242
Copy link
Contributor

jh242 commented Nov 21, 2022

Note that when a recent configuration is selected, it is not automatically formatted as CodeMirror no longer supports this functionality. Another library can be introduced to remedy this (something like js-beautify) but seems unnecessary.

I don't have any before images, but the only difference is that the "Recent Configurations" input is not there.

Please see these after images:

Screenshot 2022-11-21 at 11 09 55 AM

Screenshot 2022-11-21 at 11 10 14 AM

@uranusjr
Copy link
Member

What happens if the previous config is big? I wonder if we should simply show a datetime or sorts for each saved entry instead.

Also do we need a way for the user to clear saved configs?

@pierrejeambrun
Copy link
Member

Is it relevent to see 'previous config' of all other dag ? Should this be on a per dag basis ?

@pierrejeambrun
Copy link
Member

pierrejeambrun commented Nov 22, 2022

Also wondering if the localstorage is the correct place for this. Storing all configs of each manual run for each dag could be pretty heavy. (localstorage is limited to 5M). Also loading all of this on the page could be quite long (storage is synchronous if i am not mistaken so reading all of that from it could be expensive on a UI point of view)

@aaronabraham311
Copy link
Contributor Author

aaronabraham311 commented Nov 22, 2022

Also wondering if the localstorage is the correct place for this. Storing all configs of each manual run for each dag could be pretty heavy. (localstorage is limited to 5M). Also loading all of this on the page could be quite long (storage is synchronous if i am not mistaken so ready all of that from it could be expensive on a UI point of view)

@pierrejeambrun Do you have an alternative suggestion of storing recent configs asides from localstorage? We could also do session storage?

@pierrejeambrun
Copy link
Member

pierrejeambrun commented Nov 23, 2022

@aaronabraham311, one other obvious way would be to persist them in the db, and expose them through the rest api. (In this case this is templated via FAB so providing them to the template is enough I believe)

This requires more effort to implement, and I am not sure all db support json type and if this would be more suitable. This is just an idea, wondering what others think.

@jh242
Copy link
Contributor

jh242 commented Nov 24, 2022

I disagree with the db/API endpoint idea. I think it's too much overhead for a minor feature that seems intended to save some time for DAGs with small configs that the user didn't expect to run multiple times. I feel like the ability to copy/paste configurations from here also overlaps this feature. My suggestion is that we can either save recent configs in session storage by DAG, or in local storage but limited to a certain number of total recent configs across all DAGs.

Additionally, Aaron and I are on a bit of a short timeline and likely won't have time to implement a backend-supported version of this feature, but if that's the direction we really want to go, we can get started on it and see where it goes.

@pierrejeambrun
Copy link
Member

pierrejeambrun commented Nov 24, 2022

Now that you mention it, DagRun conf are already stored in the database and available for each run. Isn't it easier to just retrieve them and provide them to the trigger.html template to populate the drop down directly ? We also have a lot of control of what confs should be retrieved (for instance the 5 most recent conf for a specific dag).

I feel like there is no need to persist this also on the client side.

@aaronabraham311
Copy link
Contributor Author

@pierrejeambrun Oh that's great! Is there any example on how to access the DagRun configs from the db? Or is there a code snippet that we can use as an example?

@pierrejeambrun
Copy link
Member

pierrejeambrun commented Nov 24, 2022

@aaronabraham311 There are a lot of examples querying resource from the db in the views.py file. In this case DagRun should have what you need. In the example you mentioned above (json config displayed in the dagrun details), its coming from the grid_data view of the same file :)

@jh242
Copy link
Contributor

jh242 commented Nov 26, 2022

@aaronabraham311 There are a lot of examples querying resource from the db in the views.py file. In this case DagRun should have what you need. In the example you mentioned above (json config displayed in the dagrun details), its coming from the grid_data view of the same file :)

I've implemented this approach and limited the number of requested recent DAG runs to 10... Because there may be duplicate recent configs this means we won't necessarily get 10 different recent configs. Take a look!

@pierrejeambrun
Copy link
Member

pierrejeambrun commented Nov 27, 2022

Hello @jh242,

Thank you for the update :)

Here you are loading the grid_data info (which has a lot of extra stuff on task instances etc.), just to retrieve the 'configs'. As you said you could end up with less than 10 confs (because of duplicate), or even with no conf at all if the last 10 run were scheduled.

Here we statically render a template, there is usually no need to go through an API call. What you could do is, inside the view that renders the trigger.html, make your custom request to the db to retrieve the conf you like. (Filter on confs that are not of type json, deduplicate them with a distrinc(), fetch only the last runs that were 'manually triggered', with a conf not None) etc. When you have your configurations, you can pass this as an extra variable to the trigger.html template. Then update the template so it can use this variable to render extra options in the drop down.

You can check the def index (/home) view and check how we directly pass variable to the template for rendering. (dags, tags etc).

Maybe I missed something, but this should make the code even cleaner and simple :)

@jh242
Copy link
Contributor

jh242 commented Nov 27, 2022

Thanks @pierrejeambrun , that's an excellent point. My latest commit implements your suggestion.

Copy link
Member

@pierrejeambrun pierrejeambrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, tested locally and seems great.

I made a few suggestions. Can you also rebase your branch please ? :)

@bbovenzi do you mind providing some feedback on this one, as you already handled confs in the grid_data view. :)

Copy link
Member

@pierrejeambrun pierrejeambrun Nov 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid manually replacing the quotes, you can json.dumps(conf). See what is done in the grid_data endpoint, and my comments bellow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think empty objects are relevent here. (also update the is not None syntax)

Suggested change
DagRun.conf is not None,
DagRun.conf.isnot(None),
DagRun.conf != {}

Copy link
Member

@pierrejeambrun pierrejeambrun Nov 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conf are not always json compatible. I believe they can be plain str. check def encode_dag_run func. We may want to also handle this case. Maybe we can extract a reusable function that will handle the conf the same way that is done in encode_dag_run. (json.dumps if possible, otherwise str, and also returns conf_is_json, then we skip the conf that are not json in the response)

@jh242
Copy link
Contributor

jh242 commented Nov 28, 2022

Hey @pierrejeambrun , thanks so much for your guidance so far. I've implemented your suggestions, including the reusable function for checking if dagRun configs are JSON or not. I'm still a little bit confused on serializing the response from my query as I haven't worked with Flask before and I wasn't sure how to test a string dagRun config, so that may need more attention.

@aaronabraham311 aaronabraham311 removed the request for review from ryanahamilton November 29, 2022 18:29
@aaronabraham311
Copy link
Contributor Author

Hi @pierrejeambrun @bbovenzi! We have addressed your requested changes. Please let us know if we can begin merging this! We would like to merge this in by end of this week if possible as we won't be able to push this through afterwards.

Copy link
Member

@pierrejeambrun pierrejeambrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good to me. I would like to get a second pair of eyes on this one. A few nits on my side and I should be ready :)

.setSize(null, height);

function setRecentConfig(e) {
document.querySelector('.CodeMirror').CodeMirror.setValue(e.target.value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should try to format the config JSON so it doesn't appear as a single line

function setRecentConfig(e) {
  let { value } = e.target;
  try {
    const json = JSON.parse(value);
    value = JSON.stringify(json, null, 2);
  } catch (err) {
    console.err(err);
  }
  document.querySelector('.CodeMirror').CodeMirror.setValue(value);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey yeah! I wanted to do this as well, but code formatting is a function that CodeMirror no longer supports. The other alternative is adding another JS library just to do this formatting, but I don't think we need this overhead (let me know if we already have one that can do it!).

The other alternative is writing a bespoke function to do JSON formatting but I didn't think it was worth the trouble.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function above should do it. We just need that try/catch block to capture thrown errors when it is not valid json.

)
.order_by(DagRun.execution_date.desc())
.limit(5)
.all()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change this to the first 5 unique configs?
If we can now select old configs, a lot of the options would just be the same.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, yep! Somehow that slipped my mind.

Copy link
Contributor

@jh242 jh242 Nov 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit of an SQLalchemy noob--I thought that dropping in a .distinct after (or before) the .order_by clause would produce what we needed, but it doesn't seem to be working and I'm not sure why. Could I get a bit of guidance on this @bbovenzi ? Thanks in advance!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think .distinct(DagRun.conf) works?

Copy link
Contributor

@jh242 jh242 Nov 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried this and it did not. Removing the .order_by made it work though. Works now!

@pierrejeambrun
Copy link
Member

pierrejeambrun commented Nov 30, 2022

@jh242 Can you also fix the failing test. The extracted function get_dag_run_conf is not exactly doing what was done before directly in the encode_dag_run. I recommend keeping the same structure.

(There is a difference if the dagrun config is an empty object)

https://github.com/apache/airflow/actions/runs/3577867703/jobs/6017872438

@jh242
Copy link
Contributor

jh242 commented Nov 30, 2022

@pierrejeambrun @bbovenzi My latest commit should address all of the above issues! Thanks so much again guys. Would really appreciate getting this merged before Friday if possible--we're contributing as part of a university course and it would be great to have something to show for our efforts :D

EDIT: Hey, so looks like DISTINCT ON isn't even supported on any DB other than MySQL, so I'm going to try to reformulate the query using group_by

Copy link
Contributor

@bbovenzi bbovenzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. Thanks for being patient with all the requested changes.

@jh242
Copy link
Contributor

jh242 commented Dec 1, 2022

updated our branch to be up-to-date with airflow. Thanks again @bbovenzi @pierrejeambrun :D

Copy link
Member

@pierrejeambrun pierrejeambrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ! Thank you for this addition 🎉

@bbovenzi bbovenzi merged commit e2455d8 into apache:main Dec 1, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Dec 1, 2022

Awesome work, congrats on your first merged pull request!

@ephraimbuddy ephraimbuddy added this to the Airflow 2.6.0 milestone Jan 9, 2023
@ephraimbuddy ephraimbuddy added the type:new-feature Changelog: New Features label Jan 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues type:new-feature Changelog: New Features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Re-use recent DagRun JSON-configurations

6 participants