Description
Hi, I think there's a float precision error in heterozygosity_expected caused by the comparison statement af_sum < 1 here.
Some values that are effectively 1 will return True as a result they get filled e.g.
0.99999999....
Example
import numpy as np
import allel
g = allel.GenotypeArray([
[[0, 0], [0, 0], [0, 0]],
[[0, 0], [0, 1], [1, 1]],
[[0, 1], [2, 3], [4, 5]]
])
af = g.count_alleles().to_frequencies()
allel.heterozygosity_expected(af, ploidy=2)
returns
The expected result is
array([0. , 0.5 , 0.83333333])
The third value is nan-filled because
af_sum = np.sum(af, axis=1)
print(af_sum[0], af_sum[1], af_sum[2])
1.0 1.0 0.9999999999999999
This actually occurs in the polyploid test case but the same check is used in the reference implementation in the test suit.
Fix
I've got a PR ready to fix it by rounding to a suitable precision based on the dtype.
precision = np.finfo(af_sum.dtype).precision
af_sum = np.round(np.sum(af, axis=1), decimals=precision)
print(af_sum[0], af_sum[1], af_sum[2])
1.0 1.0 1.0
Though I'm not sure what case af_sum < 1 is actually checking for?
Description
Hi, I think there's a float precision error in
heterozygosity_expectedcaused by the comparison statementaf_sum < 1here.Some values that are effectively 1 will return
Trueas a result they get filled e.g.0.99999999....Example
returns
The expected result is
The third value is nan-filled because
1.0 1.0 0.9999999999999999This actually occurs in the polyploid test case but the same check is used in the reference implementation in the test suit.
Fix
I've got a PR ready to fix it by rounding to a suitable precision based on the dtype.
1.0 1.0 1.0Though I'm not sure what case
af_sum < 1is actually checking for?