Skip to content

feat: add Error::External variant for preserving user errors#5606

Merged
wjones127 merged 3 commits intolance-format:mainfrom
wjones127:feat/external-error-variant
Jan 13, 2026
Merged

feat: add Error::External variant for preserving user errors#5606
wjones127 merged 3 commits intolance-format:mainfrom
wjones127:feat/external-error-variant

Conversation

@wjones127
Copy link
Copy Markdown
Contributor

Summary

  • Add Error::External variant with #[snafu(transparent)] to preserve user errors passed through streams
  • Update From<ArrowError> to extract inner error from ExternalError variant
  • Update From<DataFusionError> to handle External and nested ArrowError::ExternalError
  • Add helper methods: external(), external_source(), into_external()

Test plan

  • Unit tests for error conversions in lance-core
  • Integration tests for InsertBuilder::execute_stream and MergeInsertJob::execute_reader

🤖 Generated with Claude Code

@github-actions github-actions Bot added the enhancement New feature or request label Dec 31, 2025
@github-actions
Copy link
Copy Markdown
Contributor

PR Review: feat: add Error::External variant for preserving user errors

This PR adds functionality to preserve user-defined errors passed through Lance APIs. The implementation is clean and the test coverage is comprehensive.

No blocking issues found

The code quality is good, error handling is consistent with existing patterns, and tests cover the key scenarios.

Minor observations (non-blocking)

  1. From<Error> for DataFusionError - The existing conversion at line 354-357 converts all Lance errors to DataFusionError::Execution(e.to_string()), which loses the External variant's recoverability on the return path. This is likely acceptable since users typically extract errors from Lance, not the reverse, but worth noting for completeness.

  2. Documentation - The doc comments on the new methods are clear and helpful. The examples in the tests effectively demonstrate the intended usage pattern.

Overall this is a well-designed addition that addresses a real user need for error recovery in streaming APIs.

@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 31, 2025

Codecov Report

❌ Patch coverage is 90.00000% with 25 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-core/src/error.rs 91.72% 12 Missing ⚠️
rust/lance/src/dataset/write/insert.rs 87.87% 4 Missing and 4 partials ⚠️
rust/lance/src/dataset/write/merge_insert.rs 87.17% 3 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

@wjones127 wjones127 marked this pull request as ready for review December 31, 2025 18:32
@wjones127 wjones127 force-pushed the feat/external-error-variant branch 2 times, most recently from fe93b6a to c18f61c Compare January 12, 2026 19:23
Copy link
Copy Markdown
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement.

Comment thread rust/lance-core/src/error.rs Outdated
Comment on lines +413 to +426
// Check if the ArrowError wraps an external error and extract it
match *arrow_err {
ArrowError::ExternalError(source) => {
// Try to downcast to lance_core::Error first
match source.downcast::<Self>() {
Ok(lance_err) => *lance_err,
Err(source) => Self::External { source },
}
}
other => Self::Arrow {
message: other.to_string(),
location,
},
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we get this automatically if we returned LanceError::from(arrow_err)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. That's a good idea for simplification.

wjones127 and others added 3 commits January 12, 2026 14:54
When users pass streams with custom error types (e.g., wrapped in
ArrowError::ExternalError), the original error was being converted to
a string and lost. This adds an Error::External variant that preserves
the boxed error, allowing users to downcast and recover their original
error type.

Changes:
- Add #[snafu(transparent)] External variant to Error enum
- Update From<ArrowError> to extract inner error from ExternalError
- Update From<DataFusionError> to handle External and nested cases
- Add helper methods: external(), external_source(), into_external()

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When a lance_core::Error is converted to ArrowError or DataFusionError
and then back, the original error type is now recovered via downcasting
instead of being wrapped in External or converted to strings.

This enables workflows where iterators/streams use lance_core::Error
internally but go through Arrow/DataFusion error types at API boundaries.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@wjones127 wjones127 force-pushed the feat/external-error-variant branch from 92bfa6d to be8e177 Compare January 12, 2026 22:55
@wjones127 wjones127 merged commit f01d5d8 into lance-format:main Jan 13, 2026
28 checks passed
@wjones127 wjones127 deleted the feat/external-error-variant branch January 13, 2026 17:30
jackye1995 pushed a commit to jackye1995/lance that referenced this pull request Jan 21, 2026
…ormat#5606)

## Summary

- Add `Error::External` variant with `#[snafu(transparent)]` to preserve
user errors passed through streams
- Update `From<ArrowError>` to extract inner error from `ExternalError`
variant
- Update `From<DataFusionError>` to handle `External` and nested
`ArrowError::ExternalError`
- Add helper methods: `external()`, `external_source()`,
`into_external()`

## Test plan

- [x] Unit tests for error conversions in `lance-core`
- [x] Integration tests for `InsertBuilder::execute_stream` and
`MergeInsertJob::execute_reader`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
wjones127 added a commit to lancedb/lancedb that referenced this pull request Feb 13, 2026
…e()` and `Table.add()` (#2948)

BREAKING CHANGE: Arbitrary `impl RecordBatchReader` is no longer
accepted, it must be made into `Box<dyn RecordBatchReader>`.

This PR replaces `IntoArrow` with a new trait `Scannable` to define
input row data. This provides the following advantages:

1. **We can implement `Scannable` for more types than `IntoArrow`, such
as `RecordBatch` and `Vec<RecordBatch>`.** The `IntoArrow` trait was
implemented for arbitrary `T: RecordBatchReader`, and the Rust compiler
would prevent us from implementing it for foreign types like
`RecordBatch` because (theoretically) those types might implement
`RecordBatchReader` in the future. That's why we implement `Scannable`
for `Box<dyn RecordBatchReader>` instead; since it's a concrete type it
doesn't block implementing for other foreign types.
2. **We can potentially replay `Scannable` values**. Previously, we had
to choose between buffering all data in memory and supporting retries of
writes. But because `Scannable` things can optionally support
re-scanning, we now have a way of supporting retries while also
streaming.
3. **`Scannable` can provide hints like `num_rows`, which can be used to
schedule parallel writers.** Without knowing the total number of rows,
it's difficult to know whether it's worth writing multiple files in
parallel.

We don't yet fully take advantage of (2) and (3) yet, but will in future
PRs. For (2), in order to be ready to leverage this, we need to hook the
`Scannable` implementation up to Python and NodeJS bindings. Right now
they always pass down a stream, but we want to make sure they support
retries when possible. And for (3), this will need to be hooked up to
#2939 and to a pipeline for running pre-processing steps (like embedding
generation).

## Other changes

* Moved `create_table` and `add_data` into their own modules. I've
created a follow up issue to split up `table.rs` further, as it's by far
the largest file: #2949
* Eliminated the `HAS_DATA` generic for `CreateTableBuilder`. I didn't
see any public-facing places where we differentiated methods, which is
why I felt this simplification was okay.
* Added an `Error::External` variant and integrated some conversions to
allow certain errors to pass through transparently. This will fully work
once we upgrade Lance and get to take advantage of changes in
lance-format/lance#5606
* Added LZ4 compression support for write requests to remote endpoints.
I checked and this has been supported on the server for > 1 year.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants