Skip to content

Floating point bug in heterozygosity_expected resulting in nans #274

@timothymillar

Description

@timothymillar

Description

Hi, I think there's a float precision error in heterozygosity_expected caused by the comparison statement af_sum < 1 here.
Some values that are effectively 1 will return True as a result they get filled e.g.
0.99999999....

Example

import numpy as np
import allel

g = allel.GenotypeArray([
    [[0, 0], [0, 0], [0, 0]],
    [[0, 0], [0, 1], [1, 1]],
    [[0, 1], [2, 3], [4, 5]]
])

af = g.count_alleles().to_frequencies()

allel.heterozygosity_expected(af, ploidy=2)

returns

array([0. , 0.5, nan])

The expected result is

array([0.        , 0.5       , 0.83333333])

The third value is nan-filled because

af_sum = np.sum(af, axis=1)
print(af_sum[0], af_sum[1], af_sum[2])

1.0 1.0 0.9999999999999999

This actually occurs in the polyploid test case but the same check is used in the reference implementation in the test suit.

Fix

I've got a PR ready to fix it by rounding to a suitable precision based on the dtype.

precision = np.finfo(af_sum.dtype).precision
af_sum = np.round(np.sum(af, axis=1), decimals=precision)
print(af_sum[0], af_sum[1], af_sum[2])

1.0 1.0 1.0

Though I'm not sure what case af_sum < 1 is actually checking for?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions