Skip to content

Feedback from Nav on the Model Inputs section #62

@amyheather

Description

@amyheather

Input modelling

  • 1. Guess that the figure below is associated with the sentence “.. randomly sampling values from a probability distribution”. If so, the figure could probably me moved one line up. If however, you would like to associate the figure with the next sentence (selection of appropriate distributions), then perhaps we have different shapes?
Image
  • 2. Please include link to file NHS_synthetic.csv. From Amy: https://github.com/pythonhealthdatascience/des_rap_book/tree/main/data. I added the file to my github project.

  • 3. Distfit does not appear to be a standard python library (Python 3.13.5/Windows).

  • In Setup > dependency management you have already shown how a package can be added to the environment. I followed these steps. There is an opportunity here for the readers to test what they have learnt. Thus, this could be set as a task. --> Have add a prompt to users to install, mentioning to them that they will already have if they do the task on the environments page

  • P.S: Ignore comment 18 – I think the instructions are right as there may be several environments and so the des-example environment needs to be activated first and then prune command used.

  • 4. data = pd.read_csv("../../data/NHS_synthetic.csv") followed by data.head() will also output 5 X 6 columns. Thus, perhaps a line can be added to explain the purpose of dtype. I think use of str becomes clear in the next bit of code.

Image
  • 5. A sentence on Timedelta() could be added, specially in reference to the change of date.
Image
  • 6. I guess IAT = zero for patients arriving at the same time (note the count is 15K). I think it was mentioned it is synthetic arrival data; is it based on real-world data (nurse simuation)? Have changed label to say 0-1 (histogram bin, but had labelled with only lower end of the bin).
Image
  • 7. Section 3.2.1: “Unsurprisingly, the best fit for both is the exponential distribution (lowest test statistic)”. However, note the following sentence presented earlier in relation to KS Test. “Higher values indicate a better fit.” --> Oops! My mistake - should have said lower. Thank you for spotting!

  • 8. Section 3.2.2: “We can view a summary table from distfit”. Do we include print(dfit_iat.summary) and print(dfit_service.summary) to the code snippet?

Image
  • 9. Could the P-value be displayed as a normal number (between 0 and 1) rather than scientific notation? For example, 3.08e-264 could be presented as 0.0 [00s of zeros] 308.
    • Out of curiosity, I got the output below with .300f (p={ks_result.pvalue:.300f}) 😊 Kolmogorov-Smirnov statistic for expon: 0.0480 (p=3.08e-264) (p=0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000003084810939513930484000969797305253965)
    • 0.00e+00 is just zero. This may provide the opportunity for readers to compare the output of test statistic and P-value. From the results, it appears that test static is lower (0.0480) for a higer P-value (3.08e-264) and for P-value = zero (0.00e+00), the test statistic is higher (0.1226).
Image
  • 10. Just a thought: I know we are using the nurse simulation. But just wondering, for the book, perhaps we can generate values that would allow the readers to relate to our explanation of KS included earlier? But it may not be necessary, as you already mentioned that small p-values are common in large datasets (your one has over 125,000 records) and, indeed, small and high are relative terms. --> Have add an explanation to make it clear that when p are small, as will often be the case, just to focus on test statistic, and would suggest leaving as is might be simplest, as aligns with provided example, and relatively simple to assume if p are there as well, you consider them both (but both will be pretty much telling the same story anyway).
Image
  • 11. Section 3.3: “Using the comprehensive approach, there were a few distributions that were all very low scores (pareto, expon, genextreme).”
    • Refer to point 7 above (earlier on in the document, in relation to the KS Test, it is mentioned that “Higher values indicate a better fit.”).

Input data management

  • 12. Input Modelling Code: The link (line 1) takes us to “input modelling” section (the previous section). Is it possible to hyperlink this to the section sub-section related to “input modelling” (e.g., where you have code on distribution fit). Basically the previous section is very long, so it might be better to have more targeted hyperlink (e.g., input modelling steps” (actually, even this sub-section is very long, not sure if subsections can be introduced on the right navigation bar?). --> Linked to section. In book, H2 headers are in the sidebar. These cannot change with toggling between python and R versions, so I keep H2 headers the same for both, but can have unique H3-5 headers for Python/R content (though unfortunately means those can't also be sidebar). To help with this page though, I have changed the headers from H3 to H2 (as they were consistent between Python and R anyway!)
Image Image
  • 13. The links below (further information) are great. Just wondering if some of the links could be “additionally” introduced earlier or link to take the reader to the further information? E.g., in the “checklist:managing your raw data” box, you have follow the FAIR principles. The link for this is at the end: https://open-science-training-handbook.github.io/Open-Science-Training-Handbook_EN/02OpenScienceBasics/02OpenResearchDataAndMaterials.html (which is great, but could it have been earlier as well)?

  • 14. Expected: some instructions on how to generate and provide a DOI. It would have been good for readers to work through an example. I think both Zenodo and GitHub has to be used in tandem for this (I have not done this).

  • 15. I really benefited from your tutorial on Github and have been routinely using it, also with my students! But in relation to “maintaining private and public version” – I found it difficult to understand how to actually get this to work, sharing subsets of files between public and private? (but this may outside the scope of this book). --> Redid to try and make clearer


Parameters from script

  • 16. It would be good to have a line on suppression of Pylint warning. --> NA: hid pylint, and removed some code
Image Image
  • 17. I found following code snippet a bit difficult to follow (were the highlighted brackets needed?):
Image

Is the code above doing he same as below: --> Yes - agreed that is clearer - changed as suggested

if arrivals is None:
    arrivals = create_arrivals()
if consultations is None:
    consultations = create_consultations()
if transfers is None:
    transfers = create_transfers()

Items 18 to 24

Items 18 to 24 are broadly not addressed - see comment below.

View items.
  • 18. Nurse visit simulation: It would be good to add a paragraph as to what the code below is doing in relation to the nurse simulation.

  • 19. Nurse visit simulation: The reader could be instructed to download simlogger.py from the repo and add to the project, and then execute the code.

  • 20. Nurse visit simulation: Perhaps we can slightly modify the code so there is some feedback? This will enable the readers to try out the code as they go along and make changes and learn also. For example, we can add a small code snippet as the one below.

Image
  • 21. Stroke pathway simulation: Similar to point 18 above, I think it would be good to describe what this part of the code is doing; the code is commented – excellent – and there is also the overall description (incl. flowchart) of both the models (in a different section), but I think we need something in between the really hight level (model description) and low-level (commenting of the code), for readers to engage. This could be just an introductory paragraph before the code block!

  • 22. Stroke pathway simulation: Instead of “simulation.logging import SimLogger”, using from simlogger import SimLogger, could have meant that those working through your book (and who might have added simlogger.py to their project – see comment 19), this code would work the first time. However, using simulation.logging is also an opportunity for learning.

    • Thus, the book can instruct the reader to first create a directory called “simulation” under their project.
    • Next, import (or copy and paste) the logging.py file from your repo to the directory “simulation”: https://github.com/pythonhealthdatascience/pydesrap_stroke/tree/main/simulation
    • Finally, execute the code.
    • P.S: Since I was already working under the package “simulation”, I created a subdirectory “simulationStroke” and then created logging.py under the new subdirectory. I think like Java, subdirectory = subpackage. Thus, I used the following import statement and it worked fine:
Image
  • 23. Stroke pathway simulation: Similar to point 20 above, to make the code provide some feedback, I added inbuilt string representation of object ( repl ) to the classes ASUArrivals, etc. (as Param is creating those classes only adding params.dict returns the address of the classes, rather than the values.

For example:

Image

Then:

Image

This outputs the value in the terminal:

Image
  • 24. Stroke pathway simulation: Several functions in the Params class are not used (check screenshot below). Perhaps the code above (in Main()) can be slightly extended to show it use? This way, the readers will have a better understanding of the stroke model.
Image

Parameters from file

  • 25. Incomplete sentence? “Though storing parameters within a script is commonplace - particularly for simpler models - due to its simplicity”.

  • 26. Create data dictionary: The link to input data management page (very long section), which explains data dictionary, could be precisely targeted, e.g., section hyperlinks. Any recommended format for data dictionary (considering archiving and version control)? I guess any word processor will do.

  • 27. example_parameters.csv is used later on in the code too!

Image

Link to des_rap_book/pages/inputs/parameters_file_resources/example_parameters.csv?

Item 28

Item 28 is not addressed - see comment below.

View item
  • 28. Stroke model example: I used the code for both jupyter and in script. Worked fine. Just one comment: Since the initialisation of the Classes contain the parameter values from “parameters from script example”, and since these classes are then being used in this block of code (“from simulation.parameters import”) AND the parameter values are exactly the same in parameters.csv, I changed the values and renamed the .CSV for the “parameters from file” examples, just to see the parameters from file are the once being used (rather than the initialisation values); It worked fine. I guess this can be used an example of parameter validation, when models have both initial values (from scripts) + code to read params from files.

Parameter validation

  • 29. Minor issue: For code blocks comparing code side-by-side, can the font for the code be adjusted for smaller screens? Else, we will have a scrollbar. --> Don't think this is possible, so have instead removed the side-by-side, and reduce some line lengths, to avoid scrollbars
Image
  • 30. Accidental creation of new parameters: This topic is obviously very important for any code written in a dynamically typed languages (not enforcing strict declaration of variables), and the depth to which it is written is excellent! However, could similar functionality be achieved using slots? We can then divide the section into basic section (slots) and an advanced section (as it is written now)?
Image

It is possible I may have caught the wrong end of the stick😊 So please check if slot can actually be used as example?

  • 31. Accidental creation of new parameters: I don’t think inheritance has been discussed earlier in the book together with the code (it has just been mentioned). The example is important, but only concern is its introduction at this point may disturb the flow. Inheritance (with code) could possible be included earlier with a different example. Again, this can be kept in the advance section. --> Add link to earlier page that mentions it, have add videos to that page to help make more beginner friendly, and then here, I have linked back to that page, as well as adding a video on metaclasses

  • 32. Validating parameter values within functions: I think it would be good to show that the variable value is being changed if the validation passes, else not. Otherwise the code is only generating a prompt. I have used a global variable below; there may be neater solutions, --> Made clearer that its an error in my description, and add print message for showing successful model run.

  • 33. Validating parameter values within classes: Minor issue: Same comment as 29. Can the font for the code block be reduced or some other way of getting rid of the scroll bars.

Items 34 to 38

Items 34 to 38 are not addressed - see comment below.

View items Image
  • 36. Example: Stroke example: Receiving an error in relation to the new module I added (logging). It appears there is a standard python library with the same name, so probably not able to find the class it wants as it first finds logging.py?
Image

Anyway, I solved the problem by renaming logging.py to loggingTest.py and changing the import statement to:

Image
  • 37. Example: Stroke example: In relation to comment 24 (“several functions in the Params class are not used”) now make sense. However, like I mentioned in comment 24, it would be good to have some simple code included, that show the application of these functions.

  • 38. Example: Stroke example: One final comment is perhaps we consider adding a small matrix (at the start of example section) which shows which parts of the tutorial are implemented in the code, for each of the examples.

Example code Initialisation - from script Initialisation - from file Validation - prevent accidental creation Validation - ranges
Nurse Yes No Yes No
Stroke Yes No No Yes

Metadata

Metadata

Assignees

Labels

peer reviewFeedback from peer review of the repo

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions