Skip to content

Update arrow and support extract on durations and intervals#1

Closed
nrc wants to merge 74 commits intopydantic:logfirefrom
nrc:logfire-interval-extract
Closed

Update arrow and support extract on durations and intervals#1
nrc wants to merge 74 commits intopydantic:logfirefrom
nrc:logfire-interval-extract

Conversation

@nrc
Copy link
Copy Markdown

@nrc nrc commented Aug 22, 2024

No description provided.

PsiACE and others added 30 commits August 15, 2024 07:23
* Implement native support StringView for overlay

Signed-off-by: Chojan Shang <psiace@apache.org>

* Re-write impl of overlay

Signed-off-by: Chojan Shang <psiace@apache.org>

* Minor update

Signed-off-by: Chojan Shang <psiace@apache.org>

* Add more tests

Signed-off-by: Chojan Shang <psiace@apache.org>

---------

Signed-off-by: Chojan Shang <psiace@apache.org>
* feat(11523): set the default memory pool to the tracked-consumer pool

* test(11523): update tests for the OOM message including the top consumers

* chore(11523): remove duplicate wording from OOM messages
* partial aggr for bool_*()

* Use null filter
* feat/11953: Support StringView for TRANSLATE() fn

Signed-off-by: Devan <devandbenz@gmail.com>

* formatting

Signed-off-by: Devan <devandbenz@gmail.com>

* fixes internal error for GenericByteArray cast

Signed-off-by: Devan <devandbenz@gmail.com>

* adds additional TRANSLATE test

Signed-off-by: Devan <devandbenz@gmail.com>

* adds additional TRANSLATE test

Signed-off-by: Devan <devandbenz@gmail.com>

* rm unnecessary generic

Signed-off-by: Devan <devandbenz@gmail.com>

* cleanup + fix typo

Signed-off-by: Devan <devandbenz@gmail.com>

* cleanup + fix typo

Signed-off-by: Devan <devandbenz@gmail.com>

* adds some additional testing to sqllogictests for TRANSLATE string_view

Signed-off-by: Devan <devandbenz@gmail.com>

---------

Signed-off-by: Devan <devandbenz@gmail.com>
…pache#12016)

* Handle arguments checking of min/max function to avoid crashes

Signed-off-by: Tai Le Manh <manhtai.lmt@gmail.com>

* Fix code format error

---------

Signed-off-by: Tai Le Manh <manhtai.lmt@gmail.com>
…#12009)

* Remove order_by on aggregate window functions since that operation is handled by the window function

* Add unit test for window functions using udaf with ordering

* Resolve clippy warning
…t usage (apache#11999)

* fix error

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* use exec err

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* fmt

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

---------

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
* Improve performance of REPEAT functions

Signed-off-by: Tai Le Manh <manhtai.lmt@gmail.com>

* Improve performance of REPEAT functions

Signed-off-by: Tai Le Manh <manhtai.lmt@gmail.com>

* Fix cargo fmt

Signed-off-by: Tai Le Manh <manhtai.lmt@gmail.com>

---------

Signed-off-by: Tai Le Manh <manhtai.lmt@gmail.com>
…or::state` (apache#12001)

* Remove wrong comment

* Remove wrong comment on Accumulator::state

* Not call twice comment

* Adjust comment order
apache#11994)

* Minor: improve ParquetExec docs

* typo

* clippy

* fix whitespace so rustdoc does not treat as tests

* Apply suggestions from code review

Co-authored-by: Oleks V <comphead@users.noreply.github.com>

* expound upon column rewriting in the context of schema evolution

---------

Co-authored-by: Oleks V <comphead@users.noreply.github.com>
* feat: Add map_extract module and function

* chore: Fix fmt

* chore: Add tests

* chore: Simplify

* chore: Simplify

* chore: Fix clippy

* doc: Add user doc

* feat: use Signature::user_defined

* chore: Update tests

* chore: Fix fmt

* chore: Fix clippy

* chore

* chore: typo

* chore: Check args len in return_type

* doc: Update doc

* chore: Simplify logic

* chore: check args earlier

* feat: Support UTF8VIEW

* chore: Update doc

* chore: Fic clippy

* refacotr: Use MutableArrayData

* chore

* refactor: Avoid type conversion

* chore: Fix clippy

* chore: Follow DuckDB

* Update datafusion/functions-nested/src/map_extract.rs

Co-authored-by: Jay Zhan <jayzhan211@gmail.com>

* chore: Fix fmt

---------

Co-authored-by: Jay Zhan <jayzhan211@gmail.com>
…rate (apache#12036)

* refactor:  Move LimitedDistinctAggregation to physical-optimizer crate

* chore: Update cargo.lock

* chore: Fix clippy

* Update datafusion/physical-optimizer/src/limited_distinct_aggregation.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* chore: Clean import

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
…#12030)

* Adds new crate for window functions

* Moves `row_number` to window functions crate

* Fixes build errors

* Regenerates protobuf

* Makes `row_number` no-op temporarily

* Minor: fixes formatting

* Implements `WindowUDF` for `row_number`

* Minor: fixes formatting

* Adds singleton instance of UDWF: `row_number`

* Adds partition evaluator

* Registers default window functions

* Implements `evaluate_all`

* Fixes: allow non-uppercase globals

* Minor: prefix underscore for unused variable

* Minor: fixes formatting

* Uses `row_number_udwf`

* Fixes: unparser test for `row_number`

* Uses row number to represent functional dependency

* Minor: fixes formatting

* Removes `row_number` from case-insensitive name test

* Deletes wrapper for `row_number` window expression

* Fixes: lowercase name in error statement

* Fixes: `row_number` fields are not nullable

* Fixes: lowercase name in explain output

* Updates Cargo.lock

* Fixes: lowercase name in explain output

* Adds support for result ordering

* Minor: add newline between methods

* Fixes: re-export crate name in doc comments

* Adds doc comment for `WindowUDFImpl::nullable`

* Minor: renames variable

* Minor: update doc comments

* Deletes code

* Minor: update doc comments

* Minor: adds period

* Adds doc comment for `row_number` window UDF

* Adds fluent API for creating `row_number` expression

* Minor: removes unnecessary path prefix

* Adds roundtrip logical plan test case

* Updates unit tests for `row_number`

* Deletes code

* Minor: copy edit doc comments

* Minor: deletes comment

* Minor: copy edits udwf doc comments
…pache#12000)

* fix/11982: resolves projection issue found in with_column window fn usage

Signed-off-by: Devan <devandbenz@gmail.com>

* fix/11982: resolves projection issue found in with_column window fn usage

Signed-off-by: Devan <devandbenz@gmail.com>

* fmt

Signed-off-by: Devan <devandbenz@gmail.com>

* fmt

Signed-off-by: Devan <devandbenz@gmail.com>

* refactor to get tests working

Signed-off-by: Devan <devandbenz@gmail.com>

* change test to use test harness

Signed-off-by: Devan <devandbenz@gmail.com>

* use row_number method and add comment about test

Signed-off-by: Devan <devandbenz@gmail.com>

* add back import

Signed-off-by: Devan <devandbenz@gmail.com>

---------

Signed-off-by: Devan <devandbenz@gmail.com>
* Support HEAD of sqlparser main

* special case ID as a non-keyword when unparsing

* fix EXTRACT expresssions

* TODO REVERT: comment out failing test

Making this commit just to let tests progress.

* use sqlparser-rs v0.50.0
* Minor: make some physical-plan properties public

* add Default for GroupOrderingFull

* make groups and null_expr private again

* remove pub label
HuSen8891 and others added 18 commits August 21, 2024 13:03
* Add Utf8View support to STRPOS function

* fix type inconsistency

* fix type inconsistency

* refactor tests
* Update itertools requirement from 0.12 to 0.13

Updates the requirements on [itertools](https://github.com/rust-itertools/itertools) to permit the latest version.
- [Changelog](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md)
- [Commits](rust-itertools/itertools@v0.12.0...v0.13.0)

---
updated-dependencies:
- dependency-name: itertools
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update Cargo.lock

* Avoid deprecated API

* nested-functions: workspace version of itertools

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Eduard Karacharov <eduard.karacharov@gmail.com>
* fix the wildcard expand for filter plan

* expand the wildcard for the error message

* add the tests

* fix recompute_schema

* fix clippy

* cargo fmt

* change the check for having clause

* rename the function and moving the tests

* fix check

* expand the schema for aggregate plan

* reduce the time to expand wildcard

* clean the testing table after tested

* fmt and address review

* stop expand wildcard and add more check for group-by and selects

* simplify the having check
* Convert LogicalPlanBuilder to use Arc<LogicalPlan>

Summary
Update struct to use Arc. Verified test passes.

Used Arc::clone as much as I can; and unwrap_arc when a owned
LogicalPlan is required.

Keep pub fn input unchanged as LogicalPlan to limit change scope. If we
change the pub fn, we can also get rid of this pattern:

```
unnest(unwrap_arc(self.plan), ...).map(Self::from)
```

* Use self.plan directly without Arc::clone

* Implement From to allow into syntax

* Improve documentation

* Consume self in to_recursive_query

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* Improve documentation on `StringArrayType` trait

* tweaks

* Update datafusion/functions/src/string/common.rs

Co-authored-by: Oleks V <comphead@users.noreply.github.com>

---------

Co-authored-by: Oleks V <comphead@users.noreply.github.com>
…tead of deprecated Timestamp (apache#11597)

* bump substrait-rs

* consume and produce precisiontimestamps

* bump substrait to latest

* clippy

* deprecate in 42, since we're already on 41
…lly (apache#12098)

* fix: UDF, UDAF, UDWF with_alias(..) should wrap the inner function fully

* revert back to having Arc<Self>

* add notes about adding stuff into Aliased impls

* fix clippy

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* minor: SortExec measure elapsed_compute time when sorting

Whilst investigating query execution performance I noticed that
some SortExec nodes were reporting suspiciously short elapsed_compute
times. It appears that the SortExec node wasn't running the
elapsed_compute timer when it doing the actual sorting operation.

* fix: apply review suggestions
* naive impl

* calc capacity

* cleanup

* Update test

* simplify coercion logic

* write some more tests

* Update tests

* Improve implementation and do the right thing for null

* add ticket reference

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
…pache#12111)

* add a bench file substr.rs

* taplo format

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
@nrc
Copy link
Copy Markdown
Author

nrc commented Aug 23, 2024

The official Arrow and ecosystem update is at apache#12032 and it is quite a big one this time around

adriangb and others added 2 commits August 23, 2024 13:59
Signed-off-by: Nick Cameron <nrc@ncameron.org>
@nrc nrc force-pushed the logfire-interval-extract branch from 0276b57 to 6140c5f Compare August 23, 2024 02:24
@davidhewitt davidhewitt deleted the branch pydantic:logfire August 29, 2024 08:01
@davidhewitt
Copy link
Copy Markdown

Will pull this commit onto a new branch 👍

@davidhewitt davidhewitt reopened this Aug 29, 2024
@davidhewitt
Copy link
Copy Markdown

adriangb pushed a commit that referenced this pull request Jan 31, 2025
Dandandan pushed a commit that referenced this pull request Apr 21, 2026
…messages (apache#20387)

## Which issue does this PR close?
- Closes apache#20386.

## Rationale for this change
`memory_limit` (`RuntimeEnvBuilder::new().with_memory_limit()`)
configuration uses `greedy` memory pool as `default`. However, if
`memory_pool` (`RuntimeEnvBuilder::new().with_memory_pool()`) is set, it
overrides by expected `memory_pool` config such as `fair`. Also, if both
`memory_limit` and `memory_pool` configs are not set, `unbounded` memory
pool will be used so it can be useful to expose `ultimately
used/selected pool` as part of `ResourcesExhausted` error message for
the end user awareness and the user may need to switch used memory pool
(`greedy`, `fair`, `unbounded`),
- Also, [this comparison
table](lance-format/lance#3601 (comment))
is an example use-case for both `greedy` and `fair` memory pools runtime
behaviors and this addition can help for this kind of comparison table
by exposing used memory pool info as part of native logs.

Please find following example use-cases by `datafusion-cli`:
**Case1**: datafusion-cli result when `memory-limit` and
`top-memory-consumers > 0` are set:
```
eren.avsarogullari@AWGNPWVK961 debug % ./datafusion-cli --memory-limit 10M --command 'select * from generate_series(1,500000) as t1(v1) order by v1;' --top-memory-consumers 3

DataFusion CLI v53.0.0
Error: Not enough memory to continue external sort. Consider increasing the memory limit config: 'datafusion.runtime.memory_limit', or decreasing the config: 'datafusion.execution.sort_spill_reservation_bytes'.
caused by
Resources exhausted: Additional allocation failed for ExternalSorter[0] with top memory consumers (across reservations) as:
  ExternalSorterMerge[0]#2(can spill: false) consumed 10.0 MB, peak 10.0 MB,
  DataFusion-Cli#0(can spill: false) consumed 0.0 B, peak 0.0 B,
  ExternalSorter[0]#1(can spill: true) consumed 0.0 B, peak 0.0 B.
Error: Failed to allocate additional 128.0 KB for ExternalSorter[0] with 0.0 B already allocated for this reservation - 0.0 B remain available for the total memory pool: greedy(used: 10.0 MB, pool_size: 10.0 MB)
```
**Case2**: datafusion-cli result when `memory-limit` and
`top-memory-consumers = 0` (disabling top memory consumers logging) are
set:
```
eren.avsarogullari@AWGNPWVK961 debug % ./datafusion-cli --memory-limit 10M --command 'select * from generate_series(1,500000) as t1(v1) order by v1;' --top-memory-consumers 0

DataFusion CLI v53.0.0
Error: Not enough memory to continue external sort. Consider increasing the memory limit config: 'datafusion.runtime.memory_limit', or decreasing the config: 'datafusion.execution.sort_spill_reservation_bytes'.
caused by
Resources exhausted: Failed to allocate additional 128.0 KB for ExternalSorter[0] with 0.0 B already allocated for this reservation - 0.0 B remain available for the total memory pool: greedy(used: 10.0 MB, pool_size: 10.0 MB)
```
**Case3**: datafusion-cli result when only `memory-limit`, `memory-pool`
and `top-memory-consumers > 0` are set:
```
eren.avsarogullari@AWGNPWVK961 debug % ./datafusion-cli --memory-limit 10M --mem-pool-type fair --top-memory-consumers 3 --command 'select * from generate_series(1,500000) as t1(v1) order by v1;'

DataFusion CLI v53.0.0
Error: Not enough memory to continue external sort. Consider increasing the memory limit config: 'datafusion.runtime.memory_limit', or decreasing the config: 'datafusion.execution.sort_spill_reservation_bytes'.
caused by
Resources exhausted: Additional allocation failed for ExternalSorter[0] with top memory consumers (across reservations) as:
  ExternalSorterMerge[0]#2(can spill: false) consumed 10.0 MB, peak 10.0 MB,
  ExternalSorter[0]#1(can spill: true) consumed 0.0 B, peak 0.0 B,
  DataFusion-Cli#0(can spill: false) consumed 0.0 B, peak 0.0 B.
Error: Failed to allocate additional 128.0 KB for ExternalSorter[0] with 0.0 B already allocated for this reservation - 0.0 B remain available for the total memory pool: fair(pool_size: 10.0 MB)
```

## What changes are included in this PR?
- Adding name property to MemoryPool instances,
- Expose used MemoryPool info to Resources Exhausted error messages

## Are these changes tested?
Yes and updating existing test cases.

## Are there any user-facing changes?
Yes, being updated Resources Exhausted error messages.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.