Skip to content

Use float64 in Jenks natural breaks internals (#1100)#1101

Merged
brendancol merged 3 commits into
masterfrom
issue-1100
Mar 31, 2026
Merged

Use float64 in Jenks natural breaks internals (#1100)#1101
brendancol merged 3 commits into
masterfrom
issue-1100

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

Fixes #1100. The Jenks natural breaks algorithm used float32 for its internal matrices and bin edge array. The naive variance formula sum_squares - (sum * sum) / w loses all significant digits when data has a large offset relative to its spread (elevations around 1000m, projected coordinates in the millions, etc.).

Changed four float32 sites to float64:

  • lower_class_limits matrix dtype
  • var_combinations matrix dtype
  • val = np.float32(data[i4]) cast removed
  • kclass bin edge array dtype

Test plan

  • test_natural_breaks_large_offset_1100: five tight clusters at offset 100,000 with spread of 10 -- all 5 classes must be separated cleanly
  • Full test_classify.py suite: 85 passed, no regressions

The Jenks matrices and bin edge array used float32, causing the naive
variance formula (sum_squares - sum*sum/w) to lose all significant
digits when data had a large offset relative to its spread. Changed
lower_class_limits, var_combinations, val cast, and kclass to float64.
test_natural_breaks_large_offset_1100: five tight clusters offset by
100,000 must be separated into 5 distinct classes. With float32
internals, the variance calculation lost all signal and merged clusters.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Mar 30, 2026
@brendancol brendancol merged commit 629d533 into master Mar 31, 2026
11 checks passed
@brendancol brendancol deleted the issue-1100 branch May 4, 2026 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Jenks natural breaks uses float32 internally, wrong bin edges for offset data

1 participant