Pipeline: auto-documentation improvements#312
Merged
taimontgomery merged 18 commits intomasterfrom Jun 1, 2023
Merged
Conversation
…g` key in the config object before writing (but only if it is a run config based instance)
…g directory on resume runs
…during resume runs when run_native=True. Previously this only happened when run_native=False. Also cleaning up the code in resume()
…he organize_config step and config_out_dir output are preserved.
…te the updated Run Config back to the same location in the root directory of the Run Directory. This allows configurations to carry over from prior resume runs. Prior versions are already preserved in the timestamped config directories.
… function saves a copy of the file in the root of the Run Directory with amended paths. The idea is that these files should represent a working, portable configuration. Paths in these files are modified to achieve this goal: - SamplesSheet: input file paths are converted to absolute paths for portability. - PathsFile: samples_csv and features_csv are converted to their basename since they will be adjacent to the Run Config. All other paths are made absolute for portability. - FeaturesSheet: no modifications, simply copies itself. Features Sheet parsing has been moved into the FeaturesSheet class. In addition to its original functionality, read_csv() now omits duplicate rules that differ only in hierarchy value.
…t. The function setup_file_groups() has been removed because it has been unnecessary since the introduction of the SampleSheet class. The function absorbe_paths_file() has been absorbed into load_paths_config() Additionally, load_paths_config() doesn't change the paths_* keys anymore. This is done in Configuration.save_run_profile() for consistency
…with older Run Directories. These directories are upgraded to follow the new auto-documentation approach so that multiple resume runs can be performed on them.
… Sheet in START_HERE had sentence case in the Replicate Number column
… under some circumstances.
…s need to be reset each time a resume run is performed. Also refactoring for cleaner code that's more consistent with the newer ResumePlotterConfig.
…ng in file objects.
…forbidden to change from run to run. This allows users to change normalization settings if they wish. Nothing is stopping the user from changing the Input Files column but unless the changes preserve the file's basename, this will result in a general error (file not found)
Collaborator
|
Tested with ram1 dataset. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Auto-documentation for runs is now more complete and consistent. The config files used for repeating analyses are now stored separately from those intended as documentation, so users no longer risk losing auto-documentation info if they don't make a copy before preparing a resume run.
Config Files for Repeat Analyses
All four primary configuration files are copied to the root of the Run Directory where they can be freely edited for resume runs without sacrificing auto-documentation. Previously, this was only done for the processed Run Config and Samples Sheet, while the Features Sheet and Paths Sheet remained in-place but modifiable between runs. Paths are automatically adjusted to ensure that these files represent a cohesive working configuration; config-config references are converted to relative paths that reference the adjacent copies in the target Run Directory, and all other paths are converted from relative to absolute.
Config Files for Auto-documentation
A new subdirectory,
config, has been added to Run Directory outputs. It holds a copy of the four primary config files for auto-documentation only. During each resume run, a new timestampedconfigdirectory is created to hold copies of the config files that were used. If repeated analyses are performed, the outputs of the most recent analysis are now used; that is, if areplotrun follows arecountrun, only the most recentrecountoutputs are used for producing graphs.Backward Compatibility
Performing resume runs in old Run Directories will automatically convert them. After this conversion the old-style Run Directory will behave just like new during subsequent resume runs.
configdirectory. The other files are not included because they're likely to have been edited for other runs (due to the behavior described above). This is only a minor loss since prior tiny-count output directories contain a copy of the employed Features Sheet, and both the Paths File and Sample Sheet had been absorbed into the processed Run Config.configdirectory is also created to hold the resume run's four config files.Closes #311