Skip to content

Conversation

@jonathanc-n
Copy link
Contributor

Which issue does this PR close?

Closes #13279.

What changes are included in this PR?

Added timestamp, binary, and float for the fuzz testing

@github-actions github-actions bot added the core Core DataFusion crate label Nov 6, 2024
),
ColumnDescr::new("binary", DataType::Binary),
ColumnDescr::new("large_binary", DataType::LargeBinary),
ColumnDescr::new("binaryview", DataType::BinaryView),
Copy link
Contributor

@LeslieKid LeslieKid Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can put binary near string types instead of placing it in the middle of some fixed-size primitive types.

use rand::Rng;

/// Randomly generate binary arrays
pub struct BinaryArrayGenerator {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

ColumnDescr::new("time32_ms", DataType::Time32(TimeUnit::Millisecond)),
ColumnDescr::new("time64_us", DataType::Time64(TimeUnit::Microsecond)),
ColumnDescr::new("time64_ns", DataType::Time64(TimeUnit::Nanosecond)),
// TODO: randomize timezones for timestamp types
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets create a ticket instead of todo

Vec::new()
} else {
let len = rng.gen_range(1..=max_len);
(0..len).map(|_| rng.gen()).collect()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if len differs from max_len?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that len is the actual length of the value, which is drawn between 1..max_len

pub rng: StdRng,
}

impl BinaryArrayGenerator {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it, thinking of if we should tests for this generator?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the generator itself is part of a test 🤔 What would we test? Maybe that the distinct values are as specified?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jonathanc-n -- I think this looks great in my opinion

// low cardinality columns
ColumnDescr::new("u8_low", DataType::UInt8).with_max_num_distinct(10),
ColumnDescr::new("utf8_low", DataType::Utf8).with_max_num_distinct(10),
ColumnDescr::new("binary", DataType::Binary),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could potentially remove the todo binary a few lines above

pub rng: StdRng,
}

impl BinaryArrayGenerator {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the generator itself is part of a test 🤔 What would we test? Maybe that the distinct values are as specified?

@alamb alamb merged commit 31d27c2 into apache:main Nov 10, 2024
@alamb
Copy link
Contributor

alamb commented Nov 10, 2024

Thanks again @jonathanc-n -- and thanks to @comphead @LeslieKid for the reviews

jayzhan211 pushed a commit to jayzhan211/datafusion that referenced this pull request Nov 12, 2024
* Added Timestamp/Binary/Float to fuzz

* clippy fix

* small fix

* remove todo

* remove todo
@jonathanc-n jonathanc-n deleted the add-timestamp/binary/float-to-fuzz branch November 27, 2024 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add fuzz support for Timestamp, Binary and Float

4 participants