feat: add Error::External variant for preserving user errors#5606
feat: add Error::External variant for preserving user errors#5606wjones127 merged 3 commits intolance-format:mainfrom
Conversation
PR Review: feat: add Error::External variant for preserving user errorsThis PR adds functionality to preserve user-defined errors passed through Lance APIs. The implementation is clean and the test coverage is comprehensive. No blocking issues foundThe code quality is good, error handling is consistent with existing patterns, and tests cover the key scenarios. Minor observations (non-blocking)
Overall this is a well-designed addition that addresses a real user need for error recovery in streaming APIs. |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
fe93b6a to
c18f61c
Compare
| // Check if the ArrowError wraps an external error and extract it | ||
| match *arrow_err { | ||
| ArrowError::ExternalError(source) => { | ||
| // Try to downcast to lance_core::Error first | ||
| match source.downcast::<Self>() { | ||
| Ok(lance_err) => *lance_err, | ||
| Err(source) => Self::External { source }, | ||
| } | ||
| } | ||
| other => Self::Arrow { | ||
| message: other.to_string(), | ||
| location, | ||
| }, | ||
| } |
There was a problem hiding this comment.
Would we get this automatically if we returned LanceError::from(arrow_err)?
There was a problem hiding this comment.
I think so. That's a good idea for simplification.
When users pass streams with custom error types (e.g., wrapped in ArrowError::ExternalError), the original error was being converted to a string and lost. This adds an Error::External variant that preserves the boxed error, allowing users to downcast and recover their original error type. Changes: - Add #[snafu(transparent)] External variant to Error enum - Update From<ArrowError> to extract inner error from ExternalError - Update From<DataFusionError> to handle External and nested cases - Add helper methods: external(), external_source(), into_external() 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When a lance_core::Error is converted to ArrowError or DataFusionError and then back, the original error type is now recovered via downcasting instead of being wrapped in External or converted to strings. This enables workflows where iterators/streams use lance_core::Error internally but go through Arrow/DataFusion error types at API boundaries. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
92bfa6d to
be8e177
Compare
…ormat#5606) ## Summary - Add `Error::External` variant with `#[snafu(transparent)]` to preserve user errors passed through streams - Update `From<ArrowError>` to extract inner error from `ExternalError` variant - Update `From<DataFusionError>` to handle `External` and nested `ArrowError::ExternalError` - Add helper methods: `external()`, `external_source()`, `into_external()` ## Test plan - [x] Unit tests for error conversions in `lance-core` - [x] Integration tests for `InsertBuilder::execute_stream` and `MergeInsertJob::execute_reader` 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
…e()` and `Table.add()` (#2948) BREAKING CHANGE: Arbitrary `impl RecordBatchReader` is no longer accepted, it must be made into `Box<dyn RecordBatchReader>`. This PR replaces `IntoArrow` with a new trait `Scannable` to define input row data. This provides the following advantages: 1. **We can implement `Scannable` for more types than `IntoArrow`, such as `RecordBatch` and `Vec<RecordBatch>`.** The `IntoArrow` trait was implemented for arbitrary `T: RecordBatchReader`, and the Rust compiler would prevent us from implementing it for foreign types like `RecordBatch` because (theoretically) those types might implement `RecordBatchReader` in the future. That's why we implement `Scannable` for `Box<dyn RecordBatchReader>` instead; since it's a concrete type it doesn't block implementing for other foreign types. 2. **We can potentially replay `Scannable` values**. Previously, we had to choose between buffering all data in memory and supporting retries of writes. But because `Scannable` things can optionally support re-scanning, we now have a way of supporting retries while also streaming. 3. **`Scannable` can provide hints like `num_rows`, which can be used to schedule parallel writers.** Without knowing the total number of rows, it's difficult to know whether it's worth writing multiple files in parallel. We don't yet fully take advantage of (2) and (3) yet, but will in future PRs. For (2), in order to be ready to leverage this, we need to hook the `Scannable` implementation up to Python and NodeJS bindings. Right now they always pass down a stream, but we want to make sure they support retries when possible. And for (3), this will need to be hooked up to #2939 and to a pipeline for running pre-processing steps (like embedding generation). ## Other changes * Moved `create_table` and `add_data` into their own modules. I've created a follow up issue to split up `table.rs` further, as it's by far the largest file: #2949 * Eliminated the `HAS_DATA` generic for `CreateTableBuilder`. I didn't see any public-facing places where we differentiated methods, which is why I felt this simplification was okay. * Added an `Error::External` variant and integrated some conversions to allow certain errors to pass through transparently. This will fully work once we upgrade Lance and get to take advantage of changes in lance-format/lance#5606 * Added LZ4 compression support for write requests to remote endpoints. I checked and this has been supported on the server for > 1 year. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Error::Externalvariant with#[snafu(transparent)]to preserve user errors passed through streamsFrom<ArrowError>to extract inner error fromExternalErrorvariantFrom<DataFusionError>to handleExternaland nestedArrowError::ExternalErrorexternal(),external_source(),into_external()Test plan
lance-coreInsertBuilder::execute_streamandMergeInsertJob::execute_reader🤖 Generated with Claude Code