From 4acdd5211c4284469c92654cd9985543cd9ad4f5 Mon Sep 17 00:00:00 2001 From: ivanmilevtues Date: Tue, 17 Jun 2025 01:27:29 +0200 Subject: [PATCH] Added high-level diagrams --- .codeboarding/BeadArrayUtility.md | 223 +++++++++++++++++ .codeboarding/BeadPoolManifest Parser.md | 127 ++++++++++ .codeboarding/ClusterFile Parser.md | 131 ++++++++++ .codeboarding/GenotypeCalls Processor.md | 265 +++++++++++++++++++++ .codeboarding/LocusAggregate Manager.md | 97 ++++++++ .codeboarding/on_boarding.md | 289 +++++++++++++++++++++++ 6 files changed, 1132 insertions(+) create mode 100644 .codeboarding/BeadArrayUtility.md create mode 100644 .codeboarding/BeadPoolManifest Parser.md create mode 100644 .codeboarding/ClusterFile Parser.md create mode 100644 .codeboarding/GenotypeCalls Processor.md create mode 100644 .codeboarding/LocusAggregate Manager.md create mode 100644 .codeboarding/on_boarding.md diff --git a/.codeboarding/BeadArrayUtility.md b/.codeboarding/BeadArrayUtility.md new file mode 100644 index 0000000..3921d80 --- /dev/null +++ b/.codeboarding/BeadArrayUtility.md @@ -0,0 +1,223 @@ +```mermaid + +graph LR + + BeadArrayUtility["BeadArrayUtility"] + + BeadPoolManifestParser["BeadPoolManifestParser"] + + ClusterFileParser["ClusterFileParser"] + + GenotypeCallsManager["GenotypeCallsManager"] + + ScannerDataReader["ScannerDataReader"] + + BeadPoolManifestParser -- "uses" --> BeadArrayUtility + + ClusterFileParser -- "uses" --> BeadArrayUtility + + GenotypeCallsManager -- "uses" --> BeadArrayUtility + + GenotypeCallsManager -- "uses" --> BeadPoolManifestParser + + ScannerDataReader -- "uses" --> BeadArrayUtility + + GenotypeCallsManager -- "integrates" --> ScannerDataReader + +``` + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + + + +## Component Details + + + +This component overview describes the core components within the `BeadArrayFiles` subsystem, focusing on their responsibilities in parsing various binary file formats related to bead array data. The main flow involves different parsers utilizing a low-level utility component for basic data reading, with the `GenotypeCallsManager` acting as a central orchestrator integrating data from other parsers. + + + +### BeadArrayUtility + +This component provides fundamental utility functions for reading various primitive data types (like bytes, integers, strings, and floats) from file handles and performing basic operations like DNA complement. It acts as a low-level data parsing layer, supporting other components in the BeadArrayFiles subsystem by abstracting the complexities of binary file reading. + + + + + +**Related Classes/Methods**: + + + +- `BeadArrayFiles.module.BeadArrayUtility.read_string` (82:109) + +- `BeadArrayFiles.module.BeadArrayUtility.read_byte` (70:80) + +- `BeadArrayFiles.module.BeadArrayUtility.read_int` (46:56) + +- `BeadArrayFiles.module.BeadArrayUtility.read_float` (58:68) + +- `BeadArrayFiles.module.BeadArrayUtility.complement` (5:20) + +- `BeadArrayFiles.module.BeadArrayUtility.read_ushort` (34:44) + + + + + +### BeadPoolManifestParser + +This component is responsible for parsing Bead Pool Manifest (BPM) files, which contain crucial information about genetic loci, assay types, and normalization parameters. It orchestrates the reading of the manifest header and individual locus entries, ensuring data integrity and version compatibility. + + + + + +**Related Classes/Methods**: + + + +- `BeadArrayFiles.module.BeadPoolManifest.BeadPoolManifest.__parse_file` (46:129) + +- `BeadArrayFiles.module.BeadPoolManifest.LocusEntry` (236:368) + +- `BeadArrayFiles.module.BeadPoolManifest.LocusEntry.__parse_file` (276:295) + +- `BeadArrayFiles.module.BeadPoolManifest.LocusEntry.__parse_locus_version_6` (297:342) + +- `BeadArrayFiles.module.BeadPoolManifest.LocusEntry.__parse_locus_version_7` (344:355) + +- `BeadArrayFiles.module.BeadPoolManifest.LocusEntry.__parse_locus_version_8` (357:368) + +- `BeadArrayFiles.module.BeadPoolManifest.SourceStrand.from_string` (161:182) + +- `BeadArrayFiles.module.BeadPoolManifest.RefStrand.from_string` (214:234) + + + + + +### ClusterFileParser + +This component handles the parsing of EGT cluster files, which store clustering information for genotype calls. It reads various sections of the cluster file, including version information, manifest details, and arrays of cluster records and scores, ensuring proper reconstruction of the cluster data. + + + + + +**Related Classes/Methods**: + + + +- `BeadArrayFiles.module.ClusterFile.ClusterFile.read_cluster_file` (82:146) + +- `BeadArrayFiles.module.ClusterFile.ClusterFile.read_array` (64:79) + +- `BeadArrayFiles.module.ClusterFile.ClusterRecord.read_record` (184:237) + +- `BeadArrayFiles.module.ClusterFile.ClusterScore.read_record` (270:284) + +- `BeadArrayFiles.module.ClusterFile.ClusterFile` (2:146) + +- `BeadArrayFiles.module.ClusterFile.ClusterRecord` (148:237) + +- `BeadArrayFiles.module.ClusterFile.ClusterStats` (287:317) + +- `BeadArrayFiles.module.ClusterFile.ClusterScore` (240:284) + + + + + +### GenotypeCallsManager + +This component is responsible for reading and managing genotype call data from GTC files. It provides a comprehensive interface for accessing various data points such as genotypes, intensities (raw and normalized), scores, and control data, abstracting the underlying file structure. + + + + + +**Related Classes/Methods**: + + + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.__init__` (103:136) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.__get_generic_array` (155:182) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.__get_generic_array_numpy` (184:207) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_genotypes` (299:304) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_base_calls_generic` (306:346) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_base_calls_plus_strand` (348:361) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_base_calls_forward_strand` (363:375) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_base_calls` (377:408) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_genotype_scores` (410:415) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_control_x_intensities` (424:429) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_control_y_intensities` (431:436) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_raw_x_intensities` (438:443) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_raw_y_intensities` (445:450) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_num_calls` (488:495) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_num_no_calls` (497:504) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_num_intensity_only` (506:513) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_ballele_freqs` (515:523) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_logr_ratios` (525:533) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_percentiles_x` (535:550) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_percentiles_y` (552:567) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_normalized_intensities` (569:581) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_normalization_transforms` (583:588) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.is_write_complete` (590:621) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_ploidy_type` (223:228) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_slide_identifier` (237:242) + + + + + +### ScannerDataReader + +This component is specifically designed to parse and encapsulate scanner-related metadata from file handles. It extracts details such as scanner name, PMT settings, and imaging user, which are typically embedded within genotype call files. + + + + + +**Related Classes/Methods**: + + + +- `BeadArrayFiles.module.GenotypeCalls.ScannerData.read_scanner_data` (733:748) + +- `BeadArrayFiles.module.GenotypeCalls.ScannerData` (700:748) + + + + + + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/BeadPoolManifest Parser.md b/.codeboarding/BeadPoolManifest Parser.md new file mode 100644 index 0000000..8a95241 --- /dev/null +++ b/.codeboarding/BeadPoolManifest Parser.md @@ -0,0 +1,127 @@ +```mermaid + +graph LR + + File_Reading_Utilities["File Reading Utilities"] + + Bead_Pool_Manifest_Parser["Bead Pool Manifest Parser"] + + Locus_Entry_Parser["Locus Entry Parser"] + + Strand_Type_Converters["Strand Type Converters"] + + Bead_Pool_Manifest_Parser -- "uses" --> File_Reading_Utilities + + Bead_Pool_Manifest_Parser -- "creates and uses" --> Locus_Entry_Parser + + Locus_Entry_Parser -- "uses" --> File_Reading_Utilities + + Locus_Entry_Parser -- "uses" --> Strand_Type_Converters + +``` + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + + + +## Component Details + + + +This subsystem is responsible for parsing Bead Pool Manifest (BPM) files, which contain genetic array data. It comprises several components: `File Reading Utilities` for low-level binary file operations, `Bead Pool Manifest Parser` for orchestrating the overall file parsing, `Locus Entry Parser` for handling individual genetic locus entries, and `Strand Type Converters` for standardizing genetic strand information. The main flow involves the `Bead Pool Manifest Parser` reading the file, utilizing `File Reading Utilities` for basic data extraction, and delegating the parsing of individual locus entries to the `Locus Entry Parser`. The `Locus Entry Parser` further relies on `File Reading Utilities` and `Strand Type Converters` to process detailed genetic information. + + + +### File Reading Utilities + +This component provides fundamental utility functions for reading various data types (bytes, integers, strings) from file handles. It serves as a low-level interface for binary file parsing within the BeadArrayFiles subsystem. + + + + + +**Related Classes/Methods**: + + + +- `module.BeadArrayUtility.read_string` (82:109) + +- `module.BeadArrayUtility.read_byte` (70:80) + +- `module.BeadArrayUtility.read_int` (46:56) + + + + + +### Bead Pool Manifest Parser + +This component is responsible for parsing the overall structure of Bead Pool Manifest (BPM) files. It initializes the manifest object and orchestrates the reading of various sections, including manifest metadata and individual locus entries. + + + + + +**Related Classes/Methods**: + + + +- `module.BeadPoolManifest.BeadPoolManifest.__init__` (21:44) + +- `module.BeadPoolManifest.BeadPoolManifest.__parse_file` (46:129) + + + + + +### Locus Entry Parser + +This component handles the parsing of individual locus entries within a Bead Pool Manifest file. It supports different versions of locus entry formats and extracts specific genetic information such as SNPs, chromosomes, and mapping details. + + + + + +**Related Classes/Methods**: + + + +- `module.BeadPoolManifest.LocusEntry.__init__` (254:274) + +- `module.BeadPoolManifest.LocusEntry.__parse_file` (276:295) + +- `module.BeadPoolManifest.LocusEntry.__parse_locus_version_6` (297:342) + +- `module.BeadPoolManifest.LocusEntry.__parse_locus_version_7` (344:355) + +- `module.BeadPoolManifest.LocusEntry.__parse_locus_version_8` (357:368) + + + + + +### Strand Type Converters + +This component provides utility functions for converting string representations of genetic strand types (SourceStrand and RefStrand) into their corresponding internal enum-like representations, ensuring consistent data handling. + + + + + +**Related Classes/Methods**: + + + +- `module.BeadPoolManifest.SourceStrand.from_string` (161:182) + +- `module.BeadPoolManifest.RefStrand.from_string` (214:234) + + + + + + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/ClusterFile Parser.md b/.codeboarding/ClusterFile Parser.md new file mode 100644 index 0000000..165a23f --- /dev/null +++ b/.codeboarding/ClusterFile Parser.md @@ -0,0 +1,131 @@ +```mermaid + +graph LR + + ClusterFileReader["ClusterFileReader"] + + ClusterRecordHandler["ClusterRecordHandler"] + + ClusterScoreHandler["ClusterScoreHandler"] + + BeadArrayUtility["BeadArrayUtility"] + + ClusterFileReader -- "reads data using" --> BeadArrayUtility + + ClusterFileReader -- "orchestrates record parsing by" --> ClusterRecordHandler + + ClusterFileReader -- "orchestrates score parsing by" --> ClusterScoreHandler + + ClusterRecordHandler -- "reads data using" --> BeadArrayUtility + + ClusterScoreHandler -- "reads data using" --> BeadArrayUtility + +``` + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + + + +## Component Details + + + +The ClusterFile Parser subsystem is responsible for reading and interpreting .cluster files, which contain genotype clustering information. It orchestrates the parsing of file metadata, individual cluster records, and cluster scores, leveraging a set of utility functions for low-level data extraction. + + + +### ClusterFileReader + +This component is responsible for reading and parsing the entire cluster file, including file version, metadata, and orchestrating the reading of individual cluster records and scores. It uses utility functions for basic data type reading and delegates to other components for record and score parsing. + + + + + +**Related Classes/Methods**: + + + +- `BeadArrayFiles.module.ClusterFile.ClusterFile.read_cluster_file` (82:146) + +- `BeadArrayFiles.module.ClusterFile.ClusterFile` (2:146) + +- `BeadArrayFiles.module.ClusterFile.ClusterFile.read_array` (64:79) + +- `BeadArrayFiles.module.ClusterFile.ClusterFile.add_record` (38:46) + + + + + +### ClusterRecordHandler + +This component focuses on reading and interpreting individual cluster records within the cluster file. It extracts various statistical data related to clusters and constructs ClusterRecord and ClusterStats objects. + + + + + +**Related Classes/Methods**: + + + +- `BeadArrayFiles.module.ClusterFile.ClusterRecord.read_record` (184:237) + +- `BeadArrayFiles.module.ClusterFile.ClusterRecord` (148:237) + +- `BeadArrayFiles.module.ClusterFile.ClusterStats` (287:317) + + + + + +### ClusterScoreHandler + +This component is dedicated to reading and processing cluster score data from the file. It extracts float and byte values to form ClusterScore objects. + + + + + +**Related Classes/Methods**: + + + +- `BeadArrayFiles.module.ClusterFile.ClusterScore.read_record` (270:284) + +- `BeadArrayFiles.module.ClusterFile.ClusterScore` (240:284) + + + + + +### BeadArrayUtility + +This component provides fundamental utility functions for reading primitive data types (integers, floats, strings, bytes) from a file handle. These functions are low-level building blocks used by other components for parsing file contents. + + + + + +**Related Classes/Methods**: + + + +- `BeadArrayFiles.module.BeadArrayUtility.read_int` (46:56) + +- `BeadArrayFiles.module.BeadArrayUtility.read_string` (82:109) + +- `BeadArrayFiles.module.BeadArrayUtility.read_byte` (70:80) + +- `BeadArrayFiles.module.BeadArrayUtility.read_float` (58:68) + + + + + + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/GenotypeCalls Processor.md b/.codeboarding/GenotypeCalls Processor.md new file mode 100644 index 0000000..aa487e7 --- /dev/null +++ b/.codeboarding/GenotypeCalls Processor.md @@ -0,0 +1,265 @@ +```mermaid + +graph LR + + GenotypeCalls_Processor["GenotypeCalls Processor"] + + BeadArray_Utility_Functions["BeadArray Utility Functions"] + + BeadPoolManifest_Parser["BeadPoolManifest Parser"] + + ClusterFile_Parser["ClusterFile Parser"] + + Normalization_Transform["Normalization Transform"] + + Scanner_Data["Scanner Data"] + + GenotypeCalls_Processor -- "utilizes" --> BeadArray_Utility_Functions + + GenotypeCalls_Processor -- "utilizes" --> BeadPoolManifest_Parser + + GenotypeCalls_Processor -- "utilizes" --> ClusterFile_Parser + + GenotypeCalls_Processor -- "applies" --> Normalization_Transform + + GenotypeCalls_Processor -- "retrieves" --> Scanner_Data + + BeadPoolManifest_Parser -- "utilizes" --> BeadArray_Utility_Functions + + ClusterFile_Parser -- "utilizes" --> BeadArray_Utility_Functions + +``` + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + + + +## Component Details + + + +This architecture describes the components involved in processing and retrieving genotype call data from GTC files. The central `GenotypeCalls Processor` component orchestrates the data extraction, relying on `BeadArray Utility Functions` for low-level file operations. It integrates with `BeadPoolManifest Parser` and `ClusterFile Parser` to obtain manifest and clustering information. Additionally, it utilizes `Normalization Transform` for intensity data adjustments and `Scanner Data` to capture scanner-specific details, providing a comprehensive view of genotype call data. + + + +### GenotypeCalls Processor + +This is a central component for processing and retrieving genotype call data from GTC files. It provides methods to access various attributes like sample names, intensities, genotypes, and quality scores. It also includes helper classes for normalization transforms and scanner data. It heavily relies on BeadArrayUtility for low-level data reading and interacts with BeadPoolManifest Parser and ClusterFile Parser to get related file information. + + + + + +**Related Classes/Methods**: + + + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls` (62:621) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.__init__` (103:136) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.is_write_complete` (590:621) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.__get_generic_array` (155:182) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.__get_generic_array_numpy` (184:207) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_sample_name` (230:235) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.__get_generic` (138:153) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_slide_identifier` (237:242) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_sample_plate` (244:249) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_sample_well` (251:256) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_cluster_file` (258:263) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_snp_manifest` (265:270) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_imaging_date` (272:279) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_autocall_date` (281:288) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_autocall_version` (290:297) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_genotypes` (299:304) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_base_calls_generic` (306:346) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_base_calls_plus_strand` (348:361) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_base_calls_forward_strand` (363:375) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_base_calls` (377:408) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_ploidy_type` (223:228) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_genotype_scores` (410:415) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_scanner_data` (417:422) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_control_x_intensities` (424:429) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_control_y_intensities` (431:436) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_raw_x_intensities` (438:443) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_raw_y_intensities` (445:450) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_call_rate` (452:457) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_gender` (459:465) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_logr_dev` (467:472) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_gc10` (474:479) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_gc50` (481:486) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_num_calls` (488:495) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_num_no_calls` (497:504) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_num_intensity_only` (506:513) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_ballele_freqs` (515:523) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_logr_ratios` (525:533) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_percentiles_x` (535:550) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_percentiles_y` (552:567) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_normalized_intensities` (569:581) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_normalization_transforms` (583:588) + +- `BeadArrayFiles.module.GenotypeCalls.NormalizationTransform.read_normalization_transform` (640:650) + +- `BeadArrayFiles.module.GenotypeCalls.NormalizationTransform` (624:697) + +- `BeadArrayFiles.module.GenotypeCalls.ScannerData.read_scanner_data` (733:748) + +- `BeadArrayFiles.module.GenotypeCalls.ScannerData` (700:748) + + + + + +### BeadArray Utility Functions + +This component provides fundamental utility functions for reading various data types (e.g., byte, int, string, float, char, ushort) from binary files. It also includes a complement function for nucleotide base manipulation. These functions are crucial for low-level data extraction from GTC files. + + + + + +**Related Classes/Methods**: + + + +- `BeadArrayFiles.module.BeadArrayUtility.read_byte` (70:80) + +- `BeadArrayFiles.module.BeadArrayUtility.read_int` (46:56) + +- `BeadArrayFiles.module.BeadArrayUtility.read_string` (82:109) + +- `BeadArrayFiles.module.BeadArrayUtility.read_float` (58:68) + +- `BeadArrayFiles.module.BeadArrayUtility.read_char` (22:32) + +- `BeadArrayFiles.module.BeadArrayUtility.read_ushort` (34:44) + +- `BeadArrayFiles.module.BeadArrayUtility.complement` (5:20) + + + + + +### BeadPoolManifest Parser + +This component is responsible for parsing binary Bead Pool Manifest (BPM) files. It extracts information about loci, including names, SNPs, chromosomes, map information, addresses, and normalization lookups. + + + + + +**Related Classes/Methods**: + + + +- `BeadArrayFiles.module.BeadPoolManifest.BeadPoolManifest` (2:129) + +- `BeadArrayFiles.module.BeadPoolManifest.BeadPoolManifest.__init__` (21:44) + +- `BeadArrayFiles.module.BeadPoolManifest.BeadPoolManifest.__parse_file` (46:129) + + + + + +### ClusterFile Parser + +This component is responsible for parsing cluster files, which contain clustering information for genotyping. It provides methods to access cluster-related data. + + + + + +**Related Classes/Methods**: + + + +- `BeadArrayFiles.module.ClusterFile` (2:146) + + + + + +### Normalization Transform + +This component is responsible for handling normalization transformations applied to intensity data. It provides methods to read and apply these transformations to raw intensity values, enabling the calculation of normalized intensities. + + + + + +**Related Classes/Methods**: + + + +- `BeadArrayFiles.module.GenotypeCalls.NormalizationTransform` (624:697) + +- `BeadArrayFiles.module.GenotypeCalls.NormalizationTransform.read_normalization_transform` (640:650) + + + + + +### Scanner Data + +This component encapsulates information related to the scanner used for genotyping. It provides methods to read and store details suchs as scanner name, software version, and other relevant scanner-specific data from the GTC file. + + + + + +**Related Classes/Methods**: + + + +- `BeadArrayFiles.module.GenotypeCalls.ScannerData` (700:748) + +- `BeadArrayFiles.module.GenotypeCalls.ScannerData.read_scanner_data` (733:748) + + + + + + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/LocusAggregate Manager.md b/.codeboarding/LocusAggregate Manager.md new file mode 100644 index 0000000..4b5f851 --- /dev/null +++ b/.codeboarding/LocusAggregate Manager.md @@ -0,0 +1,97 @@ +```mermaid + +graph LR + + LocusAggregateLoader["LocusAggregateLoader"] + + LocusAggregateGenerator["LocusAggregateGenerator"] + + LocusAggregateCore["LocusAggregateCore"] + + LocusAggregateCore -- "uses" --> LocusAggregateLoader + + LocusAggregateCore -- "uses" --> LocusAggregateGenerator + + LocusAggregateLoader -- "creates" --> LocusAggregateCore + + LocusAggregateGenerator -- "creates" --> LocusAggregateCore + +``` + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + + + +## Component Details + + + +The LocusAggregate Manager subsystem is responsible for efficiently handling and processing genetic locus data, particularly for aggregation across multiple samples. It orchestrates the loading of individual sample locus data, the grouping of loci into manageable batches, and the final aggregation of these loci across all samples. The core purpose is to provide a structured way to access and manipulate aggregated genetic information, enabling downstream analysis. + + + +### LocusAggregateLoader + +Responsible for loading a slice of loci data for a single sample, populating a LocusAggregate object with genotype, score, B-allele frequency, log R ratio, and intensity data, and handling different sample versions. + + + + + +**Related Classes/Methods**: + + + +- `BeadArrayFiles.module.LocusAggregate.Loader.__call__` (23:53) + + + + + +### LocusAggregateGenerator + +Creates a LocusAggregate object representing data for a single locus aggregated across all samples by iterating through sample buffers and appending relevant data. + + + + + +**Related Classes/Methods**: + + + +- `BeadArrayFiles.module.LocusAggregate.GenerateLocusAggregate.__call__` (76:103) + + + + + +### LocusAggregateCore + +Represents the core data structure for aggregated locus information and provides static methods for managing and orchestrating the loading, grouping, and aggregation of loci data across multiple samples. + + + + + +**Related Classes/Methods**: + + + +- `BeadArrayFiles.module.LocusAggregate.LocusAggregate` (105:204) + +- `BeadArrayFiles.module.LocusAggregate.LocusAggregate.load_buffer` (135:148) + +- `BeadArrayFiles.module.LocusAggregate.LocusAggregate.group_loci` (151:173) + +- `BeadArrayFiles.module.LocusAggregate.LocusAggregate.aggregate_samples` (176:204) + + + + + + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/on_boarding.md b/.codeboarding/on_boarding.md new file mode 100644 index 0000000..8e0de3a --- /dev/null +++ b/.codeboarding/on_boarding.md @@ -0,0 +1,289 @@ +```mermaid + +graph LR + + BeadArrayUtility["BeadArrayUtility"] + + BeadPoolManifest_Parser["BeadPoolManifest Parser"] + + ClusterFile_Parser["ClusterFile Parser"] + + GenotypeCalls_Processor["GenotypeCalls Processor"] + + LocusAggregate_Manager["LocusAggregate Manager"] + + BeadArrayUtility -- "provides data reading utilities for" --> BeadPoolManifest_Parser + + BeadArrayUtility -- "provides data reading utilities for" --> ClusterFile_Parser + + BeadArrayUtility -- "provides data reading utilities for" --> GenotypeCalls_Processor + + BeadPoolManifest_Parser -- "parses data for" --> GenotypeCalls_Processor + + ClusterFile_Parser -- "parses data for" --> GenotypeCalls_Processor + + GenotypeCalls_Processor -- "processes data from" --> LocusAggregate_Manager + + LocusAggregate_Manager -- "aggregates data using" --> BeadPoolManifest_Parser + + click BeadArrayUtility href "https://github.com/Illumina/BeadArrayFiles/blob/main/.codeboarding//BeadArrayUtility.md" "Details" + + click BeadPoolManifest_Parser href "https://github.com/Illumina/BeadArrayFiles/blob/main/.codeboarding//BeadPoolManifest Parser.md" "Details" + + click ClusterFile_Parser href "https://github.com/Illumina/BeadArrayFiles/blob/main/.codeboarding//ClusterFile Parser.md" "Details" + + click GenotypeCalls_Processor href "https://github.com/Illumina/BeadArrayFiles/blob/main/.codeboarding//GenotypeCalls Processor.md" "Details" + + click LocusAggregate_Manager href "https://github.com/Illumina/BeadArrayFiles/blob/main/.codeboarding//LocusAggregate Manager.md" "Details" + +``` + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + + + +## Component Details + + + +The `BeadArrayFiles` project is designed to parse and process various binary file formats related to bead array data, such as Bead Pool Manifest (BPM) files, Cluster files, and Genotype Call (GTC) files. It provides low-level utilities for reading binary data, specialized parsers for different file types, and a central component for managing and retrieving genotype call information. Additionally, it includes a component for aggregating locus data across multiple samples. + + + +### BeadArrayUtility + +This component provides fundamental utility functions for reading various data types (byte, int, string, float, ushort) from binary files and performing basic operations like DNA complement. It acts as a low-level data access layer for other components that parse specific file formats. + + + + + +**Related Classes/Methods**: + + + +- `BeadArrayFiles.module.BeadArrayUtility.read_string` (82:109) + +- `BeadArrayFiles.module.BeadArrayUtility.read_byte` (70:80) + +- `BeadArrayFiles.module.BeadArrayUtility.read_int` (46:56) + +- `BeadArrayFiles.module.BeadArrayUtility.read_float` (58:68) + +- `BeadArrayFiles.module.BeadArrayUtility.complement` (5:20) + +- `BeadArrayFiles.module.BeadArrayUtility.read_ushort` (34:44) + + + + + +### BeadPoolManifest Parser + +This component is responsible for parsing Bead Pool Manifest (BPM) files. It includes classes for the manifest itself and individual locus entries, handling different versions of the locus entry format. It heavily relies on BeadArrayUtility for reading data. + + + + + +**Related Classes/Methods**: + + + +- `BeadArrayFiles.module.BeadPoolManifest.BeadPoolManifest.__init__` (21:44) + +- `BeadArrayFiles.module.BeadPoolManifest.BeadPoolManifest.__parse_file` (46:129) + +- `BeadArrayFiles.module.BeadPoolManifest.LocusEntry` (236:368) + +- `BeadArrayFiles.module.BeadPoolManifest.LocusEntry.__init__` (254:274) + +- `BeadArrayFiles.module.BeadPoolManifest.LocusEntry.__parse_file` (276:295) + +- `BeadArrayFiles.module.BeadPoolManifest.LocusEntry.__parse_locus_version_6` (297:342) + +- `BeadArrayFiles.module.BeadPoolManifest.LocusEntry.__parse_locus_version_7` (344:355) + +- `BeadArrayFiles.module.BeadPoolManifest.LocusEntry.__parse_locus_version_8` (357:368) + +- `BeadArrayFiles.module.BeadPoolManifest.SourceStrand.from_string` (161:182) + +- `BeadArrayFiles.module.BeadPoolManifest.RefStrand.from_string` (214:234) + + + + + +### ClusterFile Parser + +This component handles the parsing of Cluster files, which contain information about genotype clusters. It includes classes for the cluster file itself, individual cluster records, and cluster scores, utilizing BeadArrayUtility for data extraction. + + + + + +**Related Classes/Methods**: + + + +- `BeadArrayFiles.module.ClusterFile.ClusterFile.read_cluster_file` (82:146) + +- `BeadArrayFiles.module.ClusterFile.ClusterFile` (2:146) + +- `BeadArrayFiles.module.ClusterFile.ClusterFile.read_array` (64:79) + +- `BeadArrayFiles.module.ClusterFile.ClusterFile.add_record` (38:46) + +- `BeadArrayFiles.module.ClusterFile.ClusterRecord.read_record` (184:237) + +- `BeadArrayFiles.module.ClusterFile.ClusterRecord` (148:237) + +- `BeadArrayFiles.module.ClusterFile.ClusterStats` (287:317) + +- `BeadArrayFiles.module.ClusterFile.ClusterScore.read_record` (270:284) + +- `BeadArrayFiles.module.ClusterFile.ClusterScore` (240:284) + + + + + +### GenotypeCalls Processor + +This is a central component for processing and retrieving genotype call data from GTC files. It provides methods to access various attributes like sample names, intensities, genotypes, and quality scores. It also includes helper classes for normalization transforms and scanner data. It heavily relies on BeadArrayUtility for low-level data reading and interacts with BeadPoolManifest Parser and ClusterFile Parser to get related file information. + + + + + +**Related Classes/Methods**: + + + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls` (62:621) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.__init__` (103:136) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.is_write_complete` (590:621) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.__get_generic_array` (155:182) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.__get_generic_array_numpy` (184:207) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_sample_name` (230:235) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.__get_generic` (138:153) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_slide_identifier` (237:242) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_sample_plate` (244:249) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_sample_well` (251:256) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_cluster_file` (258:263) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_snp_manifest` (265:270) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_imaging_date` (272:279) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_autocall_date` (281:288) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_autocall_version` (290:297) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_genotypes` (299:304) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_base_calls_generic` (306:346) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_base_calls_plus_strand` (348:361) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_base_calls_forward_strand` (363:375) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_base_calls` (377:408) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_ploidy_type` (223:228) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_genotype_scores` (410:415) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_scanner_data` (417:422) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_control_x_intensities` (424:429) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_control_y_intensities` (431:436) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_raw_x_intensities` (438:443) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_raw_y_intensities` (445:450) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_call_rate` (452:457) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_gender` (459:465) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_logr_dev` (467:472) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_gc10` (474:479) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_gc50` (481:486) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_num_calls` (488:495) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_num_no_calls` (497:504) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_num_intensity_only` (506:513) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_ballele_freqs` (515:523) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_logr_ratios` (525:533) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_percentiles_x` (535:550) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_percentiles_y` (552:567) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_normalized_intensities` (569:581) + +- `BeadArrayFiles.module.GenotypeCalls.GenotypeCalls.get_normalization_transforms` (583:588) + +- `BeadArrayFiles.module.GenotypeCalls.NormalizationTransform.read_normalization_transform` (640:650) + +- `BeadArrayFiles.module.GenotypeCalls.NormalizationTransform` (624:697) + +- `BeadArrayFiles.module.GenotypeCalls.ScannerData.read_scanner_data` (733:748) + +- `BeadArrayFiles.module.GenotypeCalls.ScannerData` (700:748) + + + + + +### LocusAggregate Manager + +This component is responsible for aggregating locus data, potentially from multiple samples. It includes a loader and a generator for locus aggregates, facilitating the grouping and processing of genetic loci. + + + + + +**Related Classes/Methods**: + + + +- `BeadArrayFiles.module.LocusAggregate.Loader.__call__` (full file reference) + +- `BeadArrayFiles.module.LocusAggregate.LocusAggregate` (full file reference) + +- `BeadArrayFiles.module.LocusAggregate.GenerateLocusAggregate.__call__` (full file reference) + +- `BeadArrayFiles.module.LocusAggregate.LocusAggregate.load_buffer` (full file reference) + +- `BeadArrayFiles.module.LocusAggregate.LocusAggregate.aggregate_samples` (full file reference) + +- `BeadArrayFiles.module.LocusAggregate.LocusAggregate.group_loci` (full file reference) + + + + + + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file