Skip to content

[Looky-7769] Fix offset optimization data loss bug (60% records lost)#4

Open
halconel wants to merge 6 commits intoLooky-7769/offsetsfrom
Looky-7769/offset-bug-fix
Open

[Looky-7769] Fix offset optimization data loss bug (60% records lost)#4
halconel wants to merge 6 commits intoLooky-7769/offsetsfrom
Looky-7769/offset-bug-fix

Conversation

@halconel
Copy link
Copy Markdown
Owner

Summary

Fixes critical bugs in offset optimization that caused 60% data loss in production:

  • Lost: 48,915 out of 82,000 records
  • Date: 2025-12-08
  • Status: ✅ All data recovered, bugs fixed

Root Causes

Bug 1: Strict Inequality (> instead of >=)

Impact: Records with update_ts == offset were skipped
Fix: Changed to >= + added process_ts check to prevent infinite loops
File: datapipe/meta/sql_meta.py:946, 972, 994, 1018, 1048

Bug 2: Wrong ORDER BY

Impact: Records sorted by transform_keys instead of update_ts, causing earlier timestamps to be skipped
Fix: ORDER BY update_ts, transform_keys (chronological first)
File: datapipe/meta/sql_meta.py:1183

Bug 3: Delayed Records (Known Limitation)

Impact: Manually inserted records with update_ts < offset will be lost
Mitigation: Added warning when store_chunk(now=past_time) is called
File: datapipe/meta/sql_meta.py:259-265

Changes Summary

Code Changes

  • datapipe/meta/sql_meta.py - Fixed offset optimization logic
  • Warning added for edge case with past timestamps

Tests Added

  • tests/test_offset_hypotheses.py - Tests for both hypotheses
  • tests/test_offset_production_bug_main.py - Reproduces production bug
  • tests/test_offset_hypothesis_3_multi_step.py - Multi-step pipeline tests
  • tests/offset_edge_cases/ - Edge case invariant tests

Documentation

  • docs/source/offset-optimization.md - Complete guide to offset optimization
  • Temporary investigation docs removed (preserved in git history)

Test Results ✅

Test Category Result Note
Hypothesis 1 (strict inequality) ✅ PASSED Bug fixed
Hypothesis 2 (ORDER BY) ✅ PASSED Bug fixed
Production bug reproduction ✅ PASSED 0% data loss (was 60%)
Anti-regression (no infinite loop) ✅ PASSED No reprocessing
Hypothesis 3 (multi-step) ✅ PASSED Disproven
All offset optimization tests 15/15 PASSED Including error retry

Commits (6)

  1. d816a4d - Add comprehensive test suite and documentation
  2. 17899b7 - Fix ORDER BY update_ts (Hypothesis 2)
  3. 317a170 - Fix strict inequality >= and process_ts filtering (Hypothesis 1)
  4. c336261 - Add warning for past timestamps (Hypothesis 4)
  5. 44d301b - Add offset optimization documentation, remove temp docs
  6. 92e98a0 - Fix test expectations in edge cases

Files Changed

Core Logic:

  • datapipe/meta/sql_meta.py (+182/-35 lines)

Tests:

  • tests/test_offset_hypotheses.py (new)
  • tests/test_offset_production_bug_main.py (new)
  • tests/test_offset_hypothesis_3_multi_step.py (new)
  • tests/offset_edge_cases/ (new directory)
  • tests/test_build_changed_idx_sql_v2.py (updated expectations)

Documentation:

  • docs/source/offset-optimization.md (new, 168 lines)
  • docs/source/SUMMARY.md (updated)

Review Focus

  1. datapipe/meta/sql_meta.py:946-1177 - Core offset optimization logic

    • Lines 946, 972, 994, 1018, 1048: >>=
    • Lines 1150-1177: New WHERE clause with process_ts check
    • Line 1183: ORDER BY update_ts, transform_keys
  2. Tests - Comprehensive coverage of production bug and edge cases

  3. Documentation - Clear explanation with SQL examples

Ready for Production ✅

All tests pass, production bug fully resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant