Avoid panic on Date32 overflow by cht42 · Pull Request #9144 · apache/arrow-rs

cht42 · 2026-01-12T09:28:41Z

Which issue does this PR close?

This PR does the same thing as what was done in #7737 to handle overflow errors gracely instead of panicking.

Part of Temporal Arithmetic Can Panic #4456.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

yes

Are there any user-facing changes?

cht42 · 2026-01-12T09:29:41Z

 }
-
-#[cfg(test)]
-mod tests {


those tests were added when there was a custom function used for shift_months (https://github.com/apache/arrow-rs/pull/2031/changes#diff-39c31d552939b8478fb676b059d5920a9c71b6659600348681c9ed7c03ea8485R52). Now that we use chrono functions, those tests are not necessary anymore

However, this PR also adds add_months_date that is quite similar to shift_months -- do you think it is valuable to port the test to use add_months_date instead?

That would make it easier to ensure we are not changing behavior (esp with respect to over flow handling / truncation)

sounds good, added them back. Removed the datetime ones as we don't need them and added a couple test for the integer overflow handling

scovich

Approach generally looks good, tho it's a lot harder to review a PR that mixes refactoring with bug fixes.

NOTE: Github isn't handling the diff very gracefully. I'm not sure why -- git diff -b handles it just fine?? Discussion here 🍿

scovich · 2026-01-13T17:21:57Z

-        let epoch = NaiveDate::from_ymd_opt(1970, 1, 1).unwrap();
        let ms_per_day = 24 * 60 * 60 * 1000i64;

-        // Define the boundary dates using NaiveDate::from_ymd_opt
-        let max_valid_date = NaiveDate::from_ymd_opt(262142, 12, 31).unwrap();
-        let min_valid_date = NaiveDate::from_ymd_opt(-262143, 1, 1).unwrap();
-
        // Calculate their millisecond values from epoch
-        let max_valid_millis = (max_valid_date - epoch).num_milliseconds();
-        let min_valid_millis = (min_valid_date - epoch).num_milliseconds();
+        let max_valid_millis = (MAX_VALID_DATE - EPOCH).num_milliseconds();
+        let min_valid_millis = (MIN_VALID_DATE - EPOCH).num_milliseconds();


Changes like this seem unrelated, better to split out as a "prefactor" PR that can merge quickly? Then this PR would be easier to review by not mixing refactor with bug fixes and API changes.

scovich · 2026-01-13T17:23:34Z

    /// # Arguments
    ///
    /// * `i` - The Date32Type to convert
+    #[deprecated(since = "58.0.0", note = "Use to_naive_date_opt instead.")]


I wonder if we should split out the #[deprecated] annotations to a separate PR so this one can merge immediately while the other waits for next major release?

(FWIW the next release will be major so it is less of a concern)

alamb · 2026-01-13T17:36:29Z

-        let res = res.add(Duration::try_milliseconds(ms as i64).unwrap());
-        Date32Type::from_naive_date(res)
+        let res = Date32Type::to_naive_date_opt(date)?;
+        let res = res.checked_add_signed(Duration::try_days(days as i64)?)?;


I checked that add just calls into checked_add_signed and panics. Thus this is equivalent

https://docs.rs/chrono/latest/src/chrono/naive/date/mod.rs.html#1952

impl Add for NaiveDate {
1953 type Output = NaiveDate;
1954
1955 #[inline]
1956 fn add(self, rhs: TimeDelta) -> NaiveDate {
1957 self.checked_add_signed(rhs).expect("NaiveDate + TimeDelta overflowed")
1958 }

alamb · 2026-01-13T17:49:31Z

                Date32Type,
                as_primitive,
-                Date32Type::to_naive_date,
+                |v| Date32Type::to_naive_date_opt(v).unwrap(),


This used to panic implicitly and now the panic'ing is explicit. I think that is an improvement

alamb · 2026-01-13T17:52:12Z

 }
-
-#[cfg(test)]
-mod tests {


However, this PR also adds add_months_date that is quite similar to shift_months -- do you think it is valuable to port the test to use add_months_date instead?

That would make it easier to ensure we are not changing behavior (esp with respect to over flow handling / truncation)

alamb · 2026-01-13T17:54:45Z

        // Date64Type::to_naive_date_opt has boundaries determined by NaiveDate's supported range.
        // The valid date range is from January 1, -262143 to December 31, 262142 (Gregorian calendar).

-        let epoch = NaiveDate::from_ymd_opt(1970, 1, 1).unwrap();


thank you for cleaning up this repetition

alamb · 2026-01-13T17:55:07Z

Thank you for your contribution @cht42 🙏

cht42 · 2026-01-14T04:52:57Z

I followed @scovich recommendation and spun up #9165 to refactor the current tests. With that refactor, it should be easier to integrate similar tests with Date32Type, so it'd be nice to get that merged first

# Which issue does this PR close? - Closes #N/A (internal refactoring - no issue) # Rationale for this change Noticed while writing tests in #9144, that the current tests could be re-written to be easier to read/re-use. ⚠️ FIY, I used claude to refactor those tests, I read the changes and we are keeping the same test cases. The Date64 boundary tests in `arrow-arith/src/numeric.rs` had significant code duplication. Each test function for Date64 operations (`to_naive_date_opt`, `add_year_months_opt`, `subtract_year_months_opt`, `add_day_time_opt`, `subtract_day_time_opt`, `add_month_day_nano_opt`, `subtract_month_day_nano_opt`) repeated similar setup code and boundary checks, making the test suite harder to maintain and extend. # What changes are included in this PR? This PR refactors the Date64 boundary tests by: 1. **Introducing shared constants** for commonly used values: - `MAX_VALID_DATE`, `MIN_VALID_DATE`, `EPOCH` - NaiveDate constants for chrono's valid date range 2. **Adding utility functions** to reduce repetition: - `date_to_millis(year, month, day)` - converts a date to milliseconds from epoch - `max_valid_millis()`, `min_valid_millis()`, `year_2000_millis()` - common millisecond values 3. **Consolidating similar test patterns** into parameterized helper functions: - `test_year_month_op()` - tests `add_year_months_opt` and `subtract_year_months_opt` - `test_day_time_op()` - tests `add_day_time_opt` and `subtract_day_time_opt` - `test_month_day_nano_op()` - tests `add_month_day_nano_opt` and `subtract_month_day_nano_opt` 4. **Reducing 8 separate test functions to 4** while maintaining the same test coverage Net result: **-297 lines** (163 added, 460 removed) with equivalent functionality. # Are these changes tested? Yes - this is a refactoring of existing tests. The same boundary conditions and edge cases are still tested, just organized more efficiently. Running `cargo test` confirms all tests pass. # Are there any user-facing changes? No. This is an internal test refactoring with no changes to public APIs or functionality. --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

alamb · 2026-01-14T22:27:49Z

I followed @scovich recommendation and spun up #9165 to refactor the current tests. With that refactor, it should be easier to integrate similar tests with Date32Type, so it'd be nice to get that merged first

Merged! Now there are some conflicts with this PR to address however

cht42 · 2026-01-15T04:56:08Z

updated with the new tests, had to change them a bit again to account for the types difference

# Which issue does this PR close? - Closes #N/A (internal refactoring - no issue) # Rationale for this change Noticed while writing tests in apache#9144, that the current tests could be re-written to be easier to read/re-use. ⚠️ FIY, I used claude to refactor those tests, I read the changes and we are keeping the same test cases. The Date64 boundary tests in `arrow-arith/src/numeric.rs` had significant code duplication. Each test function for Date64 operations (`to_naive_date_opt`, `add_year_months_opt`, `subtract_year_months_opt`, `add_day_time_opt`, `subtract_day_time_opt`, `add_month_day_nano_opt`, `subtract_month_day_nano_opt`) repeated similar setup code and boundary checks, making the test suite harder to maintain and extend. # What changes are included in this PR? This PR refactors the Date64 boundary tests by: 1. **Introducing shared constants** for commonly used values: - `MAX_VALID_DATE`, `MIN_VALID_DATE`, `EPOCH` - NaiveDate constants for chrono's valid date range 2. **Adding utility functions** to reduce repetition: - `date_to_millis(year, month, day)` - converts a date to milliseconds from epoch - `max_valid_millis()`, `min_valid_millis()`, `year_2000_millis()` - common millisecond values 3. **Consolidating similar test patterns** into parameterized helper functions: - `test_year_month_op()` - tests `add_year_months_opt` and `subtract_year_months_opt` - `test_day_time_op()` - tests `add_day_time_opt` and `subtract_day_time_opt` - `test_month_day_nano_op()` - tests `add_month_day_nano_opt` and `subtract_month_day_nano_opt` 4. **Reducing 8 separate test functions to 4** while maintaining the same test coverage Net result: **-297 lines** (163 added, 460 removed) with equivalent functionality. # Are these changes tested? Yes - this is a refactoring of existing tests. The same boundary conditions and edge cases are still tested, just organized more efficiently. Running `cargo test` confirms all tests pass. # Are there any user-facing changes? No. This is an internal test refactoring with no changes to public APIs or functionality. --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

cht42 · 2026-01-17T09:26:21Z

@alamb / @scovich mind taking another look ?

cht42 · 2026-01-17T09:27:36Z

-    }
-
-    #[test]
-    fn test_shift_months_datetime() {


we don't need those datetime tests anymore, add_months_date only accepts NaiveDate

alamb

Thanks again @cht42

## Which issue does this PR close?  - From this discussion: apache#19711 (comment) ## Rationale for this change  In above PR we didn't enable a test because upstream arrow-rs had a panic bug. This is now fixed: - apache/arrow-rs#9144 So now we can enable this test again to assert expected error. ## What changes are included in this PR?  ## Are these changes tested?  ## Are there any user-facing changes?

Refactor date manipulation functions for overflow handling

31bb8a6

github-actions Bot added arrow Changes to the arrow crate parquet-variant parquet-variant* crates labels Jan 12, 2026

cht42 commented Jan 12, 2026

View reviewed changes

cht42 mentioned this pull request Jan 12, 2026

feat(spark): implement add_months function apache/datafusion#19711

Merged

fix

ead6ffb

alamb changed the title ~~Refactor Date32 manipulation functions for overflow handling~~ Avoid panic on Date32 overflow Jan 13, 2026

scovich reviewed Jan 13, 2026

View reviewed changes

alamb mentioned this pull request Jan 13, 2026

Andrew Lamb Weekly-ish Open Source plan - 2026-01-05 apache/datafusion#19652

Closed

44 tasks

alamb approved these changes Jan 13, 2026

View reviewed changes

cht42 mentioned this pull request Jan 14, 2026

refactor: streamline date64 tests #9165

Merged

cht42 added 3 commits January 15, 2026 08:17

Merge branch 'main' into chuet/date-panic

194d28f

update tests

5ed9716

add back test for add_months

d7f2801

cht42 commented Jan 17, 2026

View reviewed changes

alamb approved these changes Jan 17, 2026

View reviewed changes

alamb merged commit ac640da into apache:main Jan 17, 2026
30 checks passed

alamb mentioned this pull request Feb 23, 2026

Upgrade DataFusion to arrow-rs/parquet 58.0.0 / object_store 0.13.0 apache/datafusion#19728

Merged

Jefffrey mentioned this pull request Apr 22, 2026

chore: re-enable add_months overflow test apache/datafusion#21774

Merged

Conversation

cht42 commented Jan 12, 2026 • edited by alamb Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scovich left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented Jan 13, 2026

Uh oh!

cht42 commented Jan 14, 2026

Uh oh!

alamb commented Jan 14, 2026

Uh oh!

cht42 commented Jan 15, 2026

Uh oh!

cht42 commented Jan 17, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cht42 commented Jan 12, 2026 •

edited by alamb

Loading