Conversation
- based heavily on rads::suppress - main improvement is that it suppresses on denominators (not just numerators) - also add caution when numerator == 0 - more validation - more customizable arguments, but defaults are for CHI output - many more tests - passes all tests without errors or warnings in devtools::check()
|
Getting an error when processing death data. Please see if you can reproduce it when working through this: Trying a few tests but not conclusive yet. This doesn't happen if paring down the instruction set to just the last one called, so I'm guessing some variable is growing when it shouldn't be, but haven't figured it out yet. Given the size, this seems like a possible race condition where some querry is being repeated too much (indefinitely till error?). The error: |
- gets latest version from main on GitHub - get local version - compares and gives notice if you're behind - passes all tests
- Implements request batching to minimize get_population database calls by consolidating similar queries -- this can save substantial amounts of time - Adds `is_chars` parameter to support CHARS data processing where King County is defined by ZIP codes beginning with #980 | #981 - Refactor code into modular helper functions for maintenance and troubleshooting - Fixes issue in process_template_row() function parameter reference - Fixed assign_geographic_crosswalks() helper function which did not have catvarname passed as a parameter and was looking for it in the wrong scope. This fix allowed for the output of pov200grp (which is defined by zip or block) - Fixed error for 'race' where it looked for 'Race/ethnicity' for the cat#_group but it shoudl have looked for 'Race/Ethnicity' (capitalization matters!) - Output structure is identical to previous verison of the function
77f163a to
d701f39
Compare
- separate helper functions into a distinct file - pass only necessary subset of population data to process_template_row() to avoid memory problems - allow for sequential processing if future plan not set - improved helpfile with example of how to set-up futures - tested successfully with Ron's death ETL
|
Troubleshooting the memory issue in chi_get_proper_pop was not fun. It was humbling to realize how little I knew and know about futures. The gist is that futures are greedy and I kept passing way too much environmental information to each future.
It now works on the death code that you shared.
This is how I addressed the memory issue:
* I took all the helper functions out of the main function and saved them in a distinct file called chi_get_proper_pop_helpers.R. Apparently this helps with memory management because each function should only carry the arguments that we give it, rather than other environmental information. Even if this didn't help, I think it makes it a little bit easier to manage and troubleshoot the function.
* I now pre-filter the giant population data (all_population_data) that we got from multiple SQL pulls. I split that so I have one a dataset per row of pop.template. I then save these as discrete tables to a temp directory so that the future_lapply will load and use only the data that it needs for that iteration, which keeps the amount of memory needed for each future relatively small.
* I also now set up STEP 3 (getting the population for each batched query from SQL) and STEP 4 (Processing each for of pop.template) so that it will run sequentially if inherits(future::plan(), "sequential") == TRUE, which is the default unless someone specifically sets the plan to multisession or multicore.
As expected, it runs way faster on our laptops vs a VM because of the increased numbers of cores.
|
- updated from rads::suppress to apde.chi.tools::chi_suppress_results - specify definitions of values when have NA or zero for numerator and or denominator
is_charsparameter to support CHARS data processing where King County is defined by ZIP codes beginning with #980 | #981