Skip to content

Add REGENIE phenotype and covariate sidecars for plink_simulated#1919

Merged
lyh970817 merged 1 commit intonf-core:modulesfrom
lyh970817:regenie-popgen-fixtures
Mar 13, 2026
Merged

Add REGENIE phenotype and covariate sidecars for plink_simulated#1919
lyh970817 merged 1 commit intonf-core:modulesfrom
lyh970817:regenie-popgen-fixtures

Conversation

@lyh970817
Copy link
Copy Markdown

@lyh970817 lyh970817 commented Mar 13, 2026

Summary

This PR adds three tiny plink_simulated_* sidecar files to data/genomics/homo_sapiens/popgen/ so REGENIE module tests can cover:

  • quantitative phenotype input
  • binary phenotype input in REGENIE-compatible 0/1 form
  • explicit covariate input

The new files extend the existing plink_simulated cohort rather than introducing a second GWAS fixture set.

Files Added

  • data/genomics/homo_sapiens/popgen/plink_simulated_quantitative_phenoname.phe
  • data/genomics/homo_sapiens/popgen/plink_simulated_binary_phenoname.phe
  • data/genomics/homo_sapiens/popgen/plink_simulated_covariates.txt

Rationale

nf-core/modules already reuses plink_simulated.{bed,bim,fam,pgen,pvar,psam} for GWAS/popgen tests, but the existing phenotype sidecars are not sufficient to exercise REGENIE's intended test matrix cleanly:

  • the existing plink_simulated_phenoname.phe is a binary 1/2 phenotype file, not a continuous quantitative trait
  • that existing file was only usable for the quantitative path via a --force-qt workaround
  • REGENIE binary mode expects 0/1/NA, so a dedicated 0/1 binary sidecar is also needed
  • there was no shared covariate sidecar for the same cohort

Adding small deterministic sidecars on top of the existing cohort keeps the dataset reusable and avoids duplicating genotype bundles.

Data Provenance

These files are deterministic sidecars derived from the sample identifiers in the existing plink_simulated dataset.

They are synthetic tabular fixtures only:

  • no sensitive or controlled-access data
  • no external downloads
  • no new biological source material

Size

The added files are small:

  • plink_simulated_quantitative_phenoname.phe: 4.5 KB
  • plink_simulated_binary_phenoname.phe: 3.0 KB
  • plink_simulated_covariates.txt: 7.4 KB

Total added footprint: about 15 KB.

Validation

These files were validated locally by pointing the companion nf-core/modules REGENIE tests at this branch and running:

NF_MODULES_TESTDATA_BASE_PATH='file:///.../test-datasets-work/data/' nf-test test modules/nf-core/regenie/step1/tests/main.nf.test   --profile singularity --updateSnapshot --verbose --stopOnFirstFailure

NF_MODULES_TESTDATA_BASE_PATH='file:///.../test-datasets-work/data/' nf-test test modules/nf-core/regenie/step2/tests/main.nf.test   --profile singularity --updateSnapshot --verbose --stopOnFirstFailure

Companion PR

This dataset PR is required by the companion nf-core/modules PR: nf-core/modules#10800

Copy link
Copy Markdown
Member

@dialvarezs dialvarezs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@lyh970817 lyh970817 merged commit 7a669c0 into nf-core:modules Mar 13, 2026
1 check passed
@lyh970817 lyh970817 deleted the regenie-popgen-fixtures branch March 13, 2026 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants