Skip to content

try and dissuade people from specifying data_input in runcard#1009

Merged
scarrazza merged 5 commits into
masterfrom
doc-datafromfit
Dec 2, 2020
Merged

try and dissuade people from specifying data_input in runcard#1009
scarrazza merged 5 commits into
masterfrom
doc-datafromfit

Conversation

@wilsonmr
Copy link
Copy Markdown
Contributor

closes #1008

Unless somebody wants to make a matched_datasets_from_fits which honestly wouldn't be that difficult then I think the best we can do is tell people not to do this.

I fixed some errors with the runcards above as well

@wilsonmr wilsonmr requested review from RosalynLP and voisey November 18, 2020 13:27
@RosalynLP
Copy link
Copy Markdown
Contributor

@wilsonmr Can I just check I understand what is going on here:
The problem is that in group_dataset_inputs_by_metadata we create data_input, grouped by the metadata. But if the user specifies data_input in the runcard this overrides the grouping and means that every "group" is instead the whole of the data?

@wilsonmr
Copy link
Copy Markdown
Contributor Author

That's exactly it.

Normally you can't specify production rules as runcard keys (it raises a config error) but you can if you use the from_ mechanism

@RosalynLP
Copy link
Copy Markdown
Contributor

OK I understand, thank you - the bit I am not so sure about is the matched_datasets_from_dataspecs bit... this usesdata_inputwhich should already be grouped by metadata right...or is the issue where you have one fit which is from an old specification and one which is from a new one?

@wilsonmr
Copy link
Copy Markdown
Contributor Author

specifically the runcard data_input is "prioritised" over the data_input specified by each grouping so although the grouping function does the right thing (returns a list of dictionaries with data_input being a list of dataset_input s for the datasets in that group), when subsequent function take the data_input they take the one from the runcard (with all the data) not the one for that specific group as was intended.

You don't get this problem if you do experiments: {from_: fit} or dataset_inputs: {from_: fit} either

With matched_datasets_from_dataspecs the data isn't already grouped. It loops through dataspecs and for each spec it tries to produce the data_input which usually would require one of either experiments or dataset_inputs. In fact:

dataspecs:
 - speclabel: spec 1
   fit: <new fit>
   dataset_inputs: {from_: fit}
 - speclabel: spec 2
   fit: <old fit>
   experiments: {from_: fit}

works fine. However if we don't know what the fit will have (which is the case for now in compfits) then trying to get dataset_inputs or experiments from the fit will fail depending on whether it is an old or new fit respectively because that key simply doesn't exist in the fit runcard. As a work around in compfits I used data_input: {from_: fitinputcontext} which will work with either because it produces the data_input taking into account backwards compat however now whenever we use the namespace of either dataspecs to do a grouping it will run into issues for reason above. This isn't an issue for compfits because we only use dataspecs for looping over all datasets however if we had e.g dataspecs_chi2_table then it would be a problem.

In general somebody using validphys is rarely writing a runcard for which they don't know whether the fit is new or old, so they really should just not abuse this trick. I will try and make it clearer that there is no issue with matched_datasets_from_dataspecs the reason I mention it is I'm worried people will copy and paste from the compare fits template and end up using that where they shouldn't - which reminds me I also was going to put a comment in the runcard explaining not to do that either.

…ue of taking data input from fitinputcontext
@wilsonmr
Copy link
Copy Markdown
Contributor Author

I tried to make the docs a bit more clear what the issue is and I added warnings to runcards which correctly use the production rule (because subsequent actions don't use any grouping) is that better?

@RosalynLP
Copy link
Copy Markdown
Contributor

I tried to make the docs a bit more clear what the issue is and I added warnings to runcards which correctly use the production rule (because subsequent actions don't use any grouping) is that better?

I think this is better and the addition of the warnings are very good. I still had to read it through a few times to get my head around it so I'll have a think about whether there is something I could add to make it clearer still.

@wilsonmr
Copy link
Copy Markdown
Contributor Author

thanks! I'm also sure I made some grammatical errors.

The issue itself is kind of complicated and the TL;DR is don't put data_input: {from_: fitinputcontext} in the runcard unless you really have to and are sure it doesn't break something

@RosalynLP
Copy link
Copy Markdown
Contributor

I think part of the problem is that we want that point to come across really clear but there is a lot of technical detail in there which is bogging things down and obscuring it. Maybe we can just have something which states that TL;DR boldly and clearly, and then we have another paragraph bracketed off which is "technical explanation for why"?

@RosalynLP
Copy link
Copy Markdown
Contributor

Because for most people we just want them to be aware of what the outcome will be if they do put that, and it won't matter to them exactly why.

@wilsonmr
Copy link
Copy Markdown
Contributor Author

Yeah that's a solid point! Will take a look now

@wilsonmr
Copy link
Copy Markdown
Contributor Author

ready for review!

Copy link
Copy Markdown
Contributor

@RosalynLP RosalynLP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good, I think it reads clearer now :)

@RosalynLP
Copy link
Copy Markdown
Contributor

Just to mention I restarted the mac build

Copy link
Copy Markdown
Contributor

@voisey voisey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a few very minor suggestions on the text. Otherwise happy to merge

Comment thread doc/sphinx/source/vp/dataspecification.rst Outdated
Comment thread doc/sphinx/source/vp/dataspecification.rst Outdated
Comment thread doc/sphinx/source/vp/dataspecification.rst Outdated
Co-authored-by: Cameron Voisey <32741139+voisey@users.noreply.github.com>
@scarrazza scarrazza merged commit 8dc93f6 into master Dec 2, 2020
@scarrazza scarrazza deleted the doc-datafromfit branch December 2, 2020 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Using data_input: {from_: fitinputcontext} should be used with great care if at all

4 participants