Feature/msk ucsf ingestion#9
Open
KociOrges wants to merge 23 commits into
Open
Conversation
…r-upload-areas-morphic-redesign-util Feature/submission envelop UUID for upload areas morphic redesign util
…list md5 sums computation while listing files
…ocessing for UCSF datasets
… on the context argument
…type, --derived-from
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description:
This pull request improves our dataset ingestion workflow by refactoring the linking logic for different dataset contexts (MSK, UCSF, and legacy JAX). The key changes include:
MSK-Specific Enhancements
Supports many-to-one relationships between differentiated cell lines and a shared library preparation.
New Context Support:
A new
--contextcommand‑line argument is introduced (e.g.,--context unperturbed_multiple) so that the ingestion process can use a different linking strategy for UCSF datasets while preserving legacy behaviour for MSK and JAX when the context is not specified.Refactored Linking:
The previously monolithic
establish_linksfunction is split into smaller helper methods (_link_cell_lines_to_children, _process_library_preparations, and _link_sequencing_files) to improve maintainability.Enhanced Robustness:
A new helper,
request_with_retries, has been added to implement exponential backoff for HTTP requests, improving resilience against transient connection errors.Related tickets #126, #61