Skip to content

fix: fix boolean inline constant decoding#5862

Merged
Xuanwo merged 3 commits intolance-format:mainfrom
Xuanwo:fix/boolean-inline-constant-decode
Jan 30, 2026
Merged

fix: fix boolean inline constant decoding#5862
Xuanwo merged 3 commits intolance-format:mainfrom
Xuanwo:fix/boolean-inline-constant-decode

Conversation

@Xuanwo
Copy link
Copy Markdown
Collaborator

@Xuanwo Xuanwo commented Jan 30, 2026

This fixes a regression where reading a dataset could fail with:

Inline constant is not supported for non-fixed-stride data type Boolean


Parts of this PR were drafted with assistance from Codex (with gpt-5.2) and fully reviewed and edited by me. I take full responsibility for all changes.

Support decoding DataType::Boolean inline scalar values and add regression tests.
@github-actions
Copy link
Copy Markdown
Contributor

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@github-actions
Copy link
Copy Markdown
Contributor

Code Review

P0 Issue: Silent data corruption risk with debug_assert

Location: rust/lance-arrow/src/scalar.rs:143-168

The PR removes runtime validation and replaces it with debug_assert checks. This is dangerous because:

  1. debug_assert is compiled out in release builds - In release mode, if the inline value length doesn't match expectations (e.g., due to data corruption, format changes, or bugs in upstream code), the code will silently proceed with incorrect data instead of failing fast.

  2. The original error handling served as a safety net - The original code returned proper errors when invariants were violated, which would surface bugs early and prevent silent data corruption.

Recommendation: Keep the original runtime validation. While the PR description says this is "per maintainer preference", the trade-off here favors safety. At minimum, the length check should remain as a runtime error:

// Keep runtime validation for boolean
if matches\!(data_type, DataType::Boolean) {
    if inline_value.len() \!= 1 {
        return Err(ArrowError::InvalidArgumentError(format\!(
            "Invalid boolean inline scalar length (expected 1 byte, got {})",
            inline_value.len()
        )));
    }
} else {
    let byte_width = data_type.byte_width_opt().ok_or_else(|| {
        ArrowError::InvalidArgumentError(format\!(
            "Inline constant is not supported for non-fixed-stride data type {:?}",
            data_type
        ))
    })?;
    
    if inline_value.len() \!= byte_width {
        return Err(ArrowError::InvalidArgumentError(format\!(
            "Inline constant length mismatch for {:?}: expected {} bytes but got {}",
            data_type, byte_width, inline_value.len()
        )));
    }
}

P1 Issue: Missing test for false boolean value

The unit test only covers true. Add a test for false to ensure the bit-packing is handled correctly for both values:

#[test]
fn test_inline_value_boolean_false_round_trip() {
    let scalar: ArrayRef = Arc::new(BooleanArray::from_iter([Some(false)]));
    let inline = try_inline_value(&scalar).unwrap();
    let decoded = decode_scalar_from_inline_value(&DataType::Boolean, &inline).unwrap();
    assert_eq\!(decoded.len(), 1);
    assert\!(\!decoded.as_boolean().value(0));
}

The fix for the boolean inline constant decoding regression is correct in principle. The tests are well-structured. The main concern is the removal of runtime validation.

@Xuanwo Xuanwo changed the title Fix boolean inline constant decoding fix: fix boolean inline constant decoding Jan 30, 2026
@github-actions github-actions Bot added the bug Something isn't working label Jan 30, 2026
@Xuanwo
Copy link
Copy Markdown
Collaborator Author

Xuanwo commented Jan 30, 2026

Test failed for:

dataset::scanner::test::test_ann_prefilter::data_storage_version_2_LanceFileVersion__Stable::stable_row_ids_1_false::index_params_2_VectorIndexParams__with_ivf_hnsw_sq_params_MetricType__L2_IvfBui stdout ----

thread 'dataset::scanner::test::test_ann_prefilter::data_storage_version_2_LanceFileVersion__Stable::stable_row_ids_1_false::index_params_2_VectorIndexParams__with_ivf_hnsw_sq_params_MetricType__L2_IvfBui' (18589) panicked at rust/lance/src/dataset/scanner.rs:5839:9:
assertion `left == right` failed
  left: 6
 right: 11

It's another flaky test and not related to my changes,

@Xuanwo Xuanwo merged commit eef9554 into lance-format:main Jan 30, 2026
27 of 28 checks passed
vivek-bharathan pushed a commit to vivek-bharathan/lance that referenced this pull request Feb 2, 2026
This fixes a regression where reading a dataset could fail with:

`Inline constant is not supported for non-fixed-stride data type
Boolean`

---

**Parts of this PR were drafted with assistance from Codex (with
`gpt-5.2`) and fully reviewed and edited by me. I take full
responsibility for all changes.**
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants