Skip to content

Issue with join function #95

@blawney

Description

@blawney

I'm experiencing a puzzling issue while using join to filter out common variants. I wonder if I'm perhaps missing an environment variable or something that might explain and correct this behavior.

For a minimum reproducible example, my input file looks like:

$ cat input.gor 
CHROM	POS	ID	REF	ALT	QUAL	FILTER	CALLER	pn	de_identified_subject	GT	AD	AF	DP	GQ	Allele
chr1	22375	chr1:22375	T	C	10.61	PASS	SNP	SUBJECT_XYZ_G38	SUBJECT_XYZ	0/1	4,4	0.500	8	9	C
chr8	73281635	chr8:73281635	C	G	50.0	PASS	SNP	SUBJECT_XYZ_G38	SUBJECT_XYZ	1|0	14,14	0.500	28	47	G
chrY	12666523	chrY:12666523	CT	C	62.03	PASS	SNP	SUBJECT_XYZ_G38	SUBJECT_XYZ	1	0,4	1.000	4	62	C

The aim is to remove the chr8:73281635 C>G variant which is relatively common (AF of approx 0.5 in gnomAD). I have a file of common variants that looks like:

$ head -5 common_variants.gor
CHROM	POS	STOP	REF	ALT	_1	_2	AF
chr1	10067	10067	T	TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCC	2	119758	1.67003e-05
chr1	10108	10108	C	CAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT	2	10090	0.000198216
chr1	10111	10111	C	A	1	44330	2.25581e-05
chr1	10114	10115	TA	T	4	22240	0.000179856
...

(The entire file is ~15G)

Indeed, the variant above is in that file:

$ grep -P "chr8\t73281635\t73281635\tC\tG" common_variants.gor 
chr8	73281635	73281635	C	G	59729	119550	0.499615

Now, if I run gorpipe with the following, the chr8 variant remains:

<path to software>/gor-5.7.0/gor/gorscripts/build/install/gorscripts/bin/gorpipe 'gor input.gor | sort genome | join -n -snpsnp common_variants.gor -rprefix common -xl REF,Allele -xr REF,ALT' > with_full_variants.gor

Now, if I filter my large file of common variants to only those on chr8:

awk 'NR==1 ||  /^chr8/' common_variants.gor > chr8.gor

Then run the same command with this chr8.gor:

<path to software>/gor-5.7.0/gor/gorscripts/build/install/gorscripts/bin/gorpipe 'gor input.gor | sort genome | join -n -snpsnp chr8.gor -rprefix common -xl REF,Allele -xr REF,ALT' > with_chr8_variants.gor

then it works as expected, and the chr8 variant is removed.

I tried giving it a tmp directory in case the large common_variants.gor file was causing issues by using: export GOR_GORPIPE_OPTS="-Djava.io.tmpdir=<path to tmp>/gor_tmp" but that did not change the result. I did not receive any warnings. Clearly the logic works, so I assume this is related to some secondary issue related to file size.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions