Update arrow and support extract on durations and intervals#1
Closed
nrc wants to merge 74 commits intopydantic:logfirefrom
Closed
Update arrow and support extract on durations and intervals#1nrc wants to merge 74 commits intopydantic:logfirefrom
nrc wants to merge 74 commits intopydantic:logfirefrom
Conversation
* Implement native support StringView for overlay Signed-off-by: Chojan Shang <psiace@apache.org> * Re-write impl of overlay Signed-off-by: Chojan Shang <psiace@apache.org> * Minor update Signed-off-by: Chojan Shang <psiace@apache.org> * Add more tests Signed-off-by: Chojan Shang <psiace@apache.org> --------- Signed-off-by: Chojan Shang <psiace@apache.org>
* feat(11523): set the default memory pool to the tracked-consumer pool * test(11523): update tests for the OOM message including the top consumers * chore(11523): remove duplicate wording from OOM messages
* partial aggr for bool_*() * Use null filter
* feat/11953: Support StringView for TRANSLATE() fn Signed-off-by: Devan <devandbenz@gmail.com> * formatting Signed-off-by: Devan <devandbenz@gmail.com> * fixes internal error for GenericByteArray cast Signed-off-by: Devan <devandbenz@gmail.com> * adds additional TRANSLATE test Signed-off-by: Devan <devandbenz@gmail.com> * adds additional TRANSLATE test Signed-off-by: Devan <devandbenz@gmail.com> * rm unnecessary generic Signed-off-by: Devan <devandbenz@gmail.com> * cleanup + fix typo Signed-off-by: Devan <devandbenz@gmail.com> * cleanup + fix typo Signed-off-by: Devan <devandbenz@gmail.com> * adds some additional testing to sqllogictests for TRANSLATE string_view Signed-off-by: Devan <devandbenz@gmail.com> --------- Signed-off-by: Devan <devandbenz@gmail.com>
…pache#12016) * Handle arguments checking of min/max function to avoid crashes Signed-off-by: Tai Le Manh <manhtai.lmt@gmail.com> * Fix code format error --------- Signed-off-by: Tai Le Manh <manhtai.lmt@gmail.com>
…#12009) * Remove order_by on aggregate window functions since that operation is handled by the window function * Add unit test for window functions using udaf with ordering * Resolve clippy warning
…t usage (apache#11999) * fix error Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * use exec err Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fmt Signed-off-by: jayzhan211 <jayzhan211@gmail.com> --------- Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
* Improve performance of REPEAT functions Signed-off-by: Tai Le Manh <manhtai.lmt@gmail.com> * Improve performance of REPEAT functions Signed-off-by: Tai Le Manh <manhtai.lmt@gmail.com> * Fix cargo fmt Signed-off-by: Tai Le Manh <manhtai.lmt@gmail.com> --------- Signed-off-by: Tai Le Manh <manhtai.lmt@gmail.com>
…or::state` (apache#12001) * Remove wrong comment * Remove wrong comment on Accumulator::state * Not call twice comment * Adjust comment order
apache#11994) * Minor: improve ParquetExec docs * typo * clippy * fix whitespace so rustdoc does not treat as tests * Apply suggestions from code review Co-authored-by: Oleks V <comphead@users.noreply.github.com> * expound upon column rewriting in the context of schema evolution --------- Co-authored-by: Oleks V <comphead@users.noreply.github.com>
* feat: Add map_extract module and function * chore: Fix fmt * chore: Add tests * chore: Simplify * chore: Simplify * chore: Fix clippy * doc: Add user doc * feat: use Signature::user_defined * chore: Update tests * chore: Fix fmt * chore: Fix clippy * chore * chore: typo * chore: Check args len in return_type * doc: Update doc * chore: Simplify logic * chore: check args earlier * feat: Support UTF8VIEW * chore: Update doc * chore: Fic clippy * refacotr: Use MutableArrayData * chore * refactor: Avoid type conversion * chore: Fix clippy * chore: Follow DuckDB * Update datafusion/functions-nested/src/map_extract.rs Co-authored-by: Jay Zhan <jayzhan211@gmail.com> * chore: Fix fmt --------- Co-authored-by: Jay Zhan <jayzhan211@gmail.com>
…rate (apache#12036) * refactor: Move LimitedDistinctAggregation to physical-optimizer crate * chore: Update cargo.lock * chore: Fix clippy * Update datafusion/physical-optimizer/src/limited_distinct_aggregation.rs Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * chore: Clean import --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
…#12030) * Adds new crate for window functions * Moves `row_number` to window functions crate * Fixes build errors * Regenerates protobuf * Makes `row_number` no-op temporarily * Minor: fixes formatting * Implements `WindowUDF` for `row_number` * Minor: fixes formatting * Adds singleton instance of UDWF: `row_number` * Adds partition evaluator * Registers default window functions * Implements `evaluate_all` * Fixes: allow non-uppercase globals * Minor: prefix underscore for unused variable * Minor: fixes formatting * Uses `row_number_udwf` * Fixes: unparser test for `row_number` * Uses row number to represent functional dependency * Minor: fixes formatting * Removes `row_number` from case-insensitive name test * Deletes wrapper for `row_number` window expression * Fixes: lowercase name in error statement * Fixes: `row_number` fields are not nullable * Fixes: lowercase name in explain output * Updates Cargo.lock * Fixes: lowercase name in explain output * Adds support for result ordering * Minor: add newline between methods * Fixes: re-export crate name in doc comments * Adds doc comment for `WindowUDFImpl::nullable` * Minor: renames variable * Minor: update doc comments * Deletes code * Minor: update doc comments * Minor: adds period * Adds doc comment for `row_number` window UDF * Adds fluent API for creating `row_number` expression * Minor: removes unnecessary path prefix * Adds roundtrip logical plan test case * Updates unit tests for `row_number` * Deletes code * Minor: copy edit doc comments * Minor: deletes comment * Minor: copy edits udwf doc comments
…pache#12000) * fix/11982: resolves projection issue found in with_column window fn usage Signed-off-by: Devan <devandbenz@gmail.com> * fix/11982: resolves projection issue found in with_column window fn usage Signed-off-by: Devan <devandbenz@gmail.com> * fmt Signed-off-by: Devan <devandbenz@gmail.com> * fmt Signed-off-by: Devan <devandbenz@gmail.com> * refactor to get tests working Signed-off-by: Devan <devandbenz@gmail.com> * change test to use test harness Signed-off-by: Devan <devandbenz@gmail.com> * use row_number method and add comment about test Signed-off-by: Devan <devandbenz@gmail.com> * add back import Signed-off-by: Devan <devandbenz@gmail.com> --------- Signed-off-by: Devan <devandbenz@gmail.com>
* Support HEAD of sqlparser main * special case ID as a non-keyword when unparsing * fix EXTRACT expresssions * TODO REVERT: comment out failing test Making this commit just to let tests progress. * use sqlparser-rs v0.50.0
* Minor: make some physical-plan properties public * add Default for GroupOrderingFull * make groups and null_expr private again * remove pub label
…pache#12018) Signed-off-by: Tai Le Manh <manhtai.lmt@gmail.com>
* Add Utf8View support to STRPOS function * fix type inconsistency * fix type inconsistency * refactor tests
* Update itertools requirement from 0.12 to 0.13 Updates the requirements on [itertools](https://github.com/rust-itertools/itertools) to permit the latest version. - [Changelog](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md) - [Commits](rust-itertools/itertools@v0.12.0...v0.13.0) --- updated-dependencies: - dependency-name: itertools dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * Update Cargo.lock * Avoid deprecated API * nested-functions: workspace version of itertools --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Eduard Karacharov <eduard.karacharov@gmail.com>
* fix the wildcard expand for filter plan * expand the wildcard for the error message * add the tests * fix recompute_schema * fix clippy * cargo fmt * change the check for having clause * rename the function and moving the tests * fix check * expand the schema for aggregate plan * reduce the time to expand wildcard * clean the testing table after tested * fmt and address review * stop expand wildcard and add more check for group-by and selects * simplify the having check
* Convert LogicalPlanBuilder to use Arc<LogicalPlan> Summary Update struct to use Arc. Verified test passes. Used Arc::clone as much as I can; and unwrap_arc when a owned LogicalPlan is required. Keep pub fn input unchanged as LogicalPlan to limit change scope. If we change the pub fn, we can also get rid of this pattern: ``` unnest(unwrap_arc(self.plan), ...).map(Self::from) ``` * Use self.plan directly without Arc::clone * Implement From to allow into syntax * Improve documentation * Consume self in to_recursive_query --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
…n`, add comments (apache#12102)
* Improve documentation on `StringArrayType` trait * tweaks * Update datafusion/functions/src/string/common.rs Co-authored-by: Oleks V <comphead@users.noreply.github.com> --------- Co-authored-by: Oleks V <comphead@users.noreply.github.com>
…tead of deprecated Timestamp (apache#11597) * bump substrait-rs * consume and produce precisiontimestamps * bump substrait to latest * clippy * deprecate in 42, since we're already on 41
* optimize code * optimize code
…lly (apache#12098) * fix: UDF, UDAF, UDWF with_alias(..) should wrap the inner function fully * revert back to having Arc<Self> * add notes about adding stuff into Aliased impls * fix clippy --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* minor: SortExec measure elapsed_compute time when sorting Whilst investigating query execution performance I noticed that some SortExec nodes were reporting suspiciously short elapsed_compute times. It appears that the SortExec node wasn't running the elapsed_compute timer when it doing the actual sorting operation. * fix: apply review suggestions
* naive impl * calc capacity * cleanup * Update test * simplify coercion logic * write some more tests * Update tests * Improve implementation and do the right thing for null * add ticket reference --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
…pache#12111) * add a bench file substr.rs * taplo format --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Author
|
The official Arrow and ecosystem update is at apache#12032 and it is quite a big one this time around |
Signed-off-by: Nick Cameron <nrc@ncameron.org>
0276b57 to
6140c5f
Compare
|
Will pull this commit onto a new branch 👍 |
Dandandan
pushed a commit
that referenced
this pull request
Apr 21, 2026
…messages (apache#20387) ## Which issue does this PR close? - Closes apache#20386. ## Rationale for this change `memory_limit` (`RuntimeEnvBuilder::new().with_memory_limit()`) configuration uses `greedy` memory pool as `default`. However, if `memory_pool` (`RuntimeEnvBuilder::new().with_memory_pool()`) is set, it overrides by expected `memory_pool` config such as `fair`. Also, if both `memory_limit` and `memory_pool` configs are not set, `unbounded` memory pool will be used so it can be useful to expose `ultimately used/selected pool` as part of `ResourcesExhausted` error message for the end user awareness and the user may need to switch used memory pool (`greedy`, `fair`, `unbounded`), - Also, [this comparison table](lance-format/lance#3601 (comment)) is an example use-case for both `greedy` and `fair` memory pools runtime behaviors and this addition can help for this kind of comparison table by exposing used memory pool info as part of native logs. Please find following example use-cases by `datafusion-cli`: **Case1**: datafusion-cli result when `memory-limit` and `top-memory-consumers > 0` are set: ``` eren.avsarogullari@AWGNPWVK961 debug % ./datafusion-cli --memory-limit 10M --command 'select * from generate_series(1,500000) as t1(v1) order by v1;' --top-memory-consumers 3 DataFusion CLI v53.0.0 Error: Not enough memory to continue external sort. Consider increasing the memory limit config: 'datafusion.runtime.memory_limit', or decreasing the config: 'datafusion.execution.sort_spill_reservation_bytes'. caused by Resources exhausted: Additional allocation failed for ExternalSorter[0] with top memory consumers (across reservations) as: ExternalSorterMerge[0]#2(can spill: false) consumed 10.0 MB, peak 10.0 MB, DataFusion-Cli#0(can spill: false) consumed 0.0 B, peak 0.0 B, ExternalSorter[0]#1(can spill: true) consumed 0.0 B, peak 0.0 B. Error: Failed to allocate additional 128.0 KB for ExternalSorter[0] with 0.0 B already allocated for this reservation - 0.0 B remain available for the total memory pool: greedy(used: 10.0 MB, pool_size: 10.0 MB) ``` **Case2**: datafusion-cli result when `memory-limit` and `top-memory-consumers = 0` (disabling top memory consumers logging) are set: ``` eren.avsarogullari@AWGNPWVK961 debug % ./datafusion-cli --memory-limit 10M --command 'select * from generate_series(1,500000) as t1(v1) order by v1;' --top-memory-consumers 0 DataFusion CLI v53.0.0 Error: Not enough memory to continue external sort. Consider increasing the memory limit config: 'datafusion.runtime.memory_limit', or decreasing the config: 'datafusion.execution.sort_spill_reservation_bytes'. caused by Resources exhausted: Failed to allocate additional 128.0 KB for ExternalSorter[0] with 0.0 B already allocated for this reservation - 0.0 B remain available for the total memory pool: greedy(used: 10.0 MB, pool_size: 10.0 MB) ``` **Case3**: datafusion-cli result when only `memory-limit`, `memory-pool` and `top-memory-consumers > 0` are set: ``` eren.avsarogullari@AWGNPWVK961 debug % ./datafusion-cli --memory-limit 10M --mem-pool-type fair --top-memory-consumers 3 --command 'select * from generate_series(1,500000) as t1(v1) order by v1;' DataFusion CLI v53.0.0 Error: Not enough memory to continue external sort. Consider increasing the memory limit config: 'datafusion.runtime.memory_limit', or decreasing the config: 'datafusion.execution.sort_spill_reservation_bytes'. caused by Resources exhausted: Additional allocation failed for ExternalSorter[0] with top memory consumers (across reservations) as: ExternalSorterMerge[0]#2(can spill: false) consumed 10.0 MB, peak 10.0 MB, ExternalSorter[0]#1(can spill: true) consumed 0.0 B, peak 0.0 B, DataFusion-Cli#0(can spill: false) consumed 0.0 B, peak 0.0 B. Error: Failed to allocate additional 128.0 KB for ExternalSorter[0] with 0.0 B already allocated for this reservation - 0.0 B remain available for the total memory pool: fair(pool_size: 10.0 MB) ``` ## What changes are included in this PR? - Adding name property to MemoryPool instances, - Expose used MemoryPool info to Resources Exhausted error messages ## Are these changes tested? Yes and updating existing test cases. ## Are there any user-facing changes? Yes, being updated Resources Exhausted error messages.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.