Skip to content

Version 0.0.2#13

Merged
dcolombara merged 51 commits intomainfrom
dev
Mar 7, 2025
Merged

Version 0.0.2#13
dcolombara merged 51 commits intomainfrom
dev

Conversation

@dcolombara
Copy link
Copy Markdown
Contributor

apde.chi.tools v0.0.2: Major Enhancements and Bug Fixes

Overview

This PR implements significant improvements to the CHI package, including new functions, bug fixes, performance enhancements, and improved documentation. The changes focus on stabilizing core calculations, enhancing metadata generation, and improving QA processes.

Major Changes

New Functions

  • Created chi_generate_analysis_set for standardized analysis dataset preparation
  • Created chi_compare_estimates for result validation
  • Added option to generate trend templates for specific indicators
  • Implemented Wilson Score for confidence intervals with level reporting

Refactoring and Improvements

  • Renamed proto_chi_calc to chi_calc and significantly enhanced functionality
  • Renamed chi_sql_update to chi_update_sql for clarity
  • Added non_chi_byvars parameter to chi_calc for flexible aggregation
  • Updated QA functions for ACS data and RSE issues

Bug Fixes

  • Fixed several issues in chi_calc
  • Fixed contradictory warnings/errors with race3 categorization
  • Fixed issues with global variable declarations

Documentation

  • Added initial vignette for Prevalence/Proportion calculations
  • Improved function documentation with roxygen2 headers
  • Enhanced README

Maintenance

  • Added basic testthat tests
  • Updated NAMESPACE with proper imports
  • Properly declared global variables in a single file / location
  • Improved metadata generation process
  • Fixed function namespace qualification issues

Version

Bumped to version 0.0.2

 - drop VignetteBuilder: knitr b/c no vignettes to build
 - added future and future.apply to imports
 - replace remote install of rads.data with rads b/c rads is needed too
   and auto installs rads data
 - removed self reference to github::PHSKC-APDE/apde.chi.tools
 - changed role = 'auth' to role = 'aut'
 - tweaked the Description paragraph, which must be in complete sentences
 - specify namespace for substrRight, calc,chi_cols,
   compare_estimate, get_population,
 - added @importFrom for crossing & substrRight
 - replace .() with list()
 - added trend.span parameter that was missing
 - name space updated by devtools::document() due to importFrom
 - to avoid warnings about missing visible bindings for global variables
- chi_generate_trend_years does not exist for testing
 - added year.span as explicit argument to chi_generate_tro_shell()  and test
 - replaced `span` (undefined var) with `trend.span) in chi_process_trends
- Replaced `assign()` calls with local variable assignments within `chi_calc`
  function to improve encapsulation and avoid unintended
  side effects in the global environment
- Simplified code for better readability and maintainability
- Removed unnecessary cleanup of global variables (`rm()`)
- Eliminating globals within package functions is important
  for ensuring predictability, avoiding conflicts,
  and maintaining clean, self-contained functions
 - validate_yaml_data >> tsql_validate_field_types
 - updated filepath to CHI standards
 - update roxygen2 header
 - added chi_get_yaml & chi_get_cols to streamline code
 - updated references to chi_qa >> chi_qa_tro
 sql_clean deprecated in rads
 - replace validate_yaml_data with tsql_validate_field_types
 - replace import and refernce to imported YAML with
   get_chi_yaml and get_chi_cols
 - updated / improved header
 - based on email from Joie on 2/18/2025
- BUG FIX: arguments for where statement in future_lapply
  rads::calc were out of scope. Fix was to subset ph.data
  directly using data.table/dtsurvey syntax
- added progress bar for future_lapply
- simplified syntax within future_lapply
- eliminated creation of columns Joie no longer wants:
  comparison_with_kc, significance, time_trends
- input validation
- switch to using na.omit rather than setdiff(X, NA)
- better roxygen2 header
 - improved headers
 - added section compare cat1*/cat2* with rads.data::misc_chi_byvars
 - chi_est >> NOW CHIestimates
 - chi_meta >> NOW CHImetadata
 - removed declaration of globals (now in separate file)
 - dropped all ref to `comparison_with_kc`
 -
 - as always, so devtools::check() plays nicely with data.table syntax
 - BUG FIX in error check for server argument values
 - BUG FIX in error check for CHIestimates and CHImetadata:
   both checked if exists, but always existed bc NULL.
   Now checks if NULL
 - Eliminated asking if want to create a table that doesn't exist.
   Replaced with a Warning because the need
   for a reply broke the esting and would also
   break if used in a long data processing piepline
 - Improved headers
 - improved header
 - added input validation
 - Limit the available years in metadata to 10
   years, since trends are only for 10 years.
 - Added link to reference docs
 - Made bullet point list
 - no functional change
- New code to create the analysis_set if it it does not exist
- Output is used by chi_generate_tro_shell
 - used to identify differences between two
   datasets (typically the current year's and
   last year's), in order to identify notable
   changes
 - most new ones use contents of testthat/helper.R
 - improved tests for chi_generate_tro_shell
 - created a dummy test for chi_process_trends
   > this can be updated in the future, but I
   wanted to pass all tests.
 - updated for new functions and function revisions
 - will make separate vignetted for rates
 - put in /quarto_docs rather than /vignettes
   bc vignettes expects files that can be
   rendered during package builds. Currently,
   RMD but not QMD can be used that way.
dcolombara and others added 21 commits February 21, 2025 11:44
 - previously said it woudl not load the table
   if it did not previously exist. Now it
   states that a new table will be made
 - improved details in header
 - removed ignore_trends argument because Joie
   officially asked that we no longer have the
   time_trends column
 - added tweaks to deal with the non-standard
   nature of birth data race/ethicity
 - changed name of code for chi_calc and
   chi_compare_estimeates & which resulted
   in inconsequential changes to their helpfiles
 - specifying a minimum version
   caused devtools::check() to fail
 - Joie changed default to 90% CI, and good
   to have option to change on demand from 90%
Necessary changes to pass devtools::check()
 - before created race3_hispanic when needed
   but stopped if there was race3_hispanic.
   It was non-sensical. Now gives informative
   messages
 - chi_calc checks to make sure that all CHI
   cat1_varname and cat2_varname are coded
   as would be expected based upon rads.data::
   misc_chi_byvars. However, there are times
   we have non-CHI variables that are created
   alongside CHI (for example for BSK, COO, etc.).
   This allows the function to maintain strict
   standards for CHI variables but with
   flexibility if we are producing non-CHI
   estimates
 - previously gave empty rows for trends when there were no real trends to calculate
 - previously if there were any indicators that had trends,
   then rows for trends would be created for all variables.
    Now it will only create trend rows for specific
    variables that should have trends calculated
 - Wilson score gives better CI when 0% or 100%
 - Needed to add level because when using factor
   variables in calc, will get estimates for each
   level and need a way to filter out undesired
   levels from CHI output
 - ACS does not need to have cat1_varname, cat2_varname, numerator, and denominator
 - RSE is legitimately NA (actually undefined)
   when numerator is zero and therefore the
   result would be zero, which puts zero in
   the denominator for calculating RSE
 - properly deal with times when numerator == 0
 - Wilson socre no longer depends on se == 0, but only that result is 0 or 1
  - caution flag now noted when numerator == 0 (in addition to RSE > 30)
 - now follows standard naming convention in this pacackage:
  chi _ verb _ noun
 - now allows a specific indicator to be missing a
   specific bit of information. For example,
   in ACS
   medinc variable does not have latest_year_count
 - does not allow race4 to be 'Ethnicity' ...
   another of many tweaks
   to deal with insanity of race3/race4
 - previously only could read relevant tables
   from prod server
 - Now can use a data.frame/data.table in memory to make
   an anlysis set
 - allow for proper rounding based on whether
   integer, proportion, or rate
 - ensures there are no gaps in single years for trends
  - warns if there appears to be more than one distinct multi year 'year' value
 - streamlined messaging and warnings depdenent upon verbose argument
@dcolombara dcolombara self-assigned this Mar 7, 2025
@dcolombara dcolombara merged commit c55a5e2 into main Mar 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants