Skip to content

optimize chi_get_proper_pop function#27

Merged
dcolombara merged 5 commits intodevfrom
get_proper_pop_redux
May 15, 2025
Merged

optimize chi_get_proper_pop function#27
dcolombara merged 5 commits intodevfrom
get_proper_pop_redux

Conversation

@dcolombara
Copy link
Copy Markdown
Contributor

  • Implements request batching to minimize get_population database calls by consolidating similar queries -- this can save substantial amounts of time
  • Adds is_chars parameter to support CHARS data processing where King County is defined by ZIP codes beginning with #980 | #981
  • Refactor code into modular helper functions for maintenance and troubleshooting
  • Fixes issue in process_template_row() function parameter reference
  • Fixed assign_geographic_crosswalks() helper function which did not have catvarname passed as a parameter and was looking for it in the wrong scope. This fix allowed for the output of pov200grp (which is defined by zip or block)
  • Fixed error for 'race' where it looked for 'Race/ethnicity' for the cat#_group but it shoudl have looked for 'Race/Ethnicity' (capitalization matters!)
  • Output structure is identical to previous verison of the function

@dcolombara dcolombara requested a review from rwbuie May 9, 2025 22:59
 - based heavily on rads::suppress
 - main improvement is that it suppresses on denominators (not just numerators)
 - also add caution when numerator == 0
 - more validation
 - more customizable arguments, but defaults are for CHI output
 - many more tests
 - passes all tests without errors or warnings in devtools::check()
@dcolombara dcolombara added the enhancement New feature or request label May 9, 2025
@rwbuie
Copy link
Copy Markdown
Collaborator

rwbuie commented May 13, 2025

Getting an error when processing death data. Please see if you can reproduce it when working through this:
https://github.com/PHSKC-APDE/chi/blob/51d588b8ad80bbe05f87e62ee8fa0c74353868e5/death/2023%20prototyping%20apde.chi.tools/01_death_rate.qmd#L60-L344

Trying a few tests but not conclusive yet. This doesn't happen if paring down the instruction set to just the last one called, so I'm guessing some variable is growing when it shouldn't be, but haven't figured it out yet. Given the size, this seems like a possible race condition where some querry is being repeated too much (indefinitely till error?).

The error:
Progress interrupted by simpleError condition: The total size of the 35 globals exported for future expression (‘FUN()’) is 101.91 GiB. This exceeds the maximum allowed size 2.93 GiB per by R option "future.globals.maxSize". This limit is set to protect against transfering too large objects to parallel workers by mistake, which may not be intended and could be costly. See help("future.globals.maxSize", package = "future") for further explainations and how to adjust or remove this threshold The three largest globals are ‘process_template_row’ (12.74 GiB of class ‘function’), ‘assign_geographic_crosswalks’ (12.74 GiB of class ‘function’) and ‘create_demographic_shell’ (12.74 GiB of class ‘function’)
Error in getGlobalsAndPackages(expr, envir = envir, globals = globals) :
The total size of the 35 globals exported for future expression (‘FUN()’) is 101.91 GiB. This exceeds the maximum allowed size 2.93 GiB per by R option "future.globals.maxSize". This limit is set to protect against transfering too large objects to parallel workers by mistake, which may not be intended and could be costly. See help("future.globals.maxSize", package = "future") for further explainations and how to adjust or remove this threshold The three largest globals are ‘process_template_row’ (12.74 GiB of class ‘function’), ‘assign_geographic_crosswalks’ (12.74 GiB of class ‘function’) and ‘create_demographic_shell’ (12.74 GiB of class ‘function’)

 - gets latest version from main on GitHub
 - get local version
 - compares and gives notice if you're behind
 - passes all tests
- Implements request batching to minimize get_population database calls by
  consolidating similar queries -- this can save substantial amounts of time
 - Adds `is_chars` parameter to support CHARS data processing where King County
   is defined by ZIP codes beginning with #980 | #981
 - Refactor code into modular helper functions for maintenance and troubleshooting
 - Fixes issue in process_template_row() function parameter reference
 - Fixed assign_geographic_crosswalks() helper function which did not have
   catvarname passed as a parameter and was looking for it in the wrong scope.
   This fix allowed for the output of pov200grp (which is defined by zip or block)
 - Fixed error for 'race' where it looked for 'Race/ethnicity' for the cat#_group
   but it shoudl have looked for 'Race/Ethnicity' (capitalization matters!)
 - Output structure is identical to previous verison of the function
@dcolombara dcolombara force-pushed the get_proper_pop_redux branch from 77f163a to d701f39 Compare May 13, 2025 23:47
 - separate helper functions into a distinct file
 - pass only necessary subset of population data to
   process_template_row() to avoid memory problems
 - allow for sequential processing if future plan not set
 - improved helpfile with example of how to set-up futures
- tested successfully with Ron's death ETL
@dcolombara
Copy link
Copy Markdown
Contributor Author

dcolombara commented May 15, 2025 via email

- updated from rads::suppress to apde.chi.tools::chi_suppress_results
- specify definitions of values when have NA or zero for numerator and or denominator
@dcolombara dcolombara merged commit e15f220 into dev May 15, 2025
@dcolombara dcolombara deleted the get_proper_pop_redux branch May 15, 2025 23:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants