Skip to content

Pipeline: auto-documentation improvements#312

Merged
taimontgomery merged 18 commits intomasterfrom
issue-311
Jun 1, 2023
Merged

Pipeline: auto-documentation improvements#312
taimontgomery merged 18 commits intomasterfrom
issue-311

Conversation

@AlexTate
Copy link
Member

@AlexTate AlexTate commented May 31, 2023

Auto-documentation for runs is now more complete and consistent. The config files used for repeating analyses are now stored separately from those intended as documentation, so users no longer risk losing auto-documentation info if they don't make a copy before preparing a resume run.

Config Files for Repeat Analyses

All four primary configuration files are copied to the root of the Run Directory where they can be freely edited for resume runs without sacrificing auto-documentation. Previously, this was only done for the processed Run Config and Samples Sheet, while the Features Sheet and Paths Sheet remained in-place but modifiable between runs. Paths are automatically adjusted to ensure that these files represent a cohesive working configuration; config-config references are converted to relative paths that reference the adjacent copies in the target Run Directory, and all other paths are converted from relative to absolute.

Config Files for Auto-documentation

A new subdirectory, config, has been added to Run Directory outputs. It holds a copy of the four primary config files for auto-documentation only. During each resume run, a new timestamped config directory is created to hold copies of the config files that were used. If repeated analyses are performed, the outputs of the most recent analysis are now used; that is, if a replot run follows a recount run, only the most recent recount outputs are used for producing graphs.

Backward Compatibility

Performing resume runs in old Run Directories will automatically convert them. After this conversion the old-style Run Directory will behave just like new during subsequent resume runs.

  1. The four configuration files with cohesive paths are placed in the root of the Run Directory
  2. The existing processed Run Config is placed in a config directory. The other files are not included because they're likely to have been edited for other runs (due to the behavior described above). This is only a minor loss since prior tiny-count output directories contain a copy of the employed Features Sheet, and both the Paths File and Sample Sheet had been absorbed into the processed Run Config.
  3. A timestamped config directory is also created to hold the resume run's four config files.

Closes #311

AlexTate added 18 commits May 26, 2023 12:17
…g` key in the config object before writing (but only if it is a run config based instance)
…during resume runs when run_native=True. Previously this only happened when run_native=False.

Also cleaning up the code in resume()
…he organize_config step and config_out_dir output are preserved.
…te the updated Run Config back to the same location in the root directory of the Run Directory. This allows configurations to carry over from prior resume runs. Prior versions are already preserved in the timestamped config directories.
… function saves a copy of the file in the root of the Run Directory with amended paths. The idea is that these files should represent a working, portable configuration. Paths in these files are modified to achieve this goal:

- SamplesSheet: input file paths are converted to absolute paths for portability.
- PathsFile: samples_csv and features_csv are converted to their basename since they will be adjacent to the Run Config. All other paths are made absolute for portability.
- FeaturesSheet: no modifications, simply copies itself.

Features Sheet parsing has been moved into the FeaturesSheet class. In addition to its original functionality, read_csv() now omits duplicate rules that differ only in hierarchy value.
…t. The function setup_file_groups() has been removed because it has been unnecessary since the introduction of the SampleSheet class. The function absorbe_paths_file() has been absorbed into load_paths_config()

Additionally, load_paths_config() doesn't change the paths_* keys anymore. This is done in Configuration.save_run_profile() for consistency
…with older Run Directories. These directories are upgraded to follow the new auto-documentation approach so that multiple resume runs can be performed on them.
… Sheet in START_HERE had sentence case in the Replicate Number column
…s need to be reset each time a resume run is performed. Also refactoring for cleaner code that's more consistent with the newer ResumePlotterConfig.
…forbidden to change from run to run. This allows users to change normalization settings if they wish. Nothing is stopping the user from changing the Input Files column but unless the changes preserve the file's basename, this will result in a general error (file not found)
@AlexTate AlexTate requested a review from taimontgomery May 31, 2023 00:42
@taimontgomery
Copy link
Collaborator

Tested with ram1 dataset.

@taimontgomery taimontgomery merged commit 47ac57c into master Jun 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pipeline: auto-documentation improvements

2 participants