-
Notifications
You must be signed in to change notification settings - Fork 1.9k
test: support run filter_pushdown on windows machine #13610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
findepi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for working on this!
datafusion/core/tests/parquet/mod.rs
Outdated
| mod external_access_plan; | ||
| mod file_statistics; | ||
| #[cfg(not(target_family = "windows"))] | ||
| // #[cfg(not(target_family = "windows"))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's OK to just remove the line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I remove it later
| #[tokio::test] | ||
| async fn single_file_small_data_pages() { | ||
| let tempdir = TempDir::new().unwrap(); | ||
| let tempdir = TempDir::new_in(Path::new(".")).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this end up litterring in the local git checkout when running the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you perhaps try to undo this change? (same above)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this end up litterring in the local git checkout when running the tests?
Emm, path which hold by tempdir will be delete after test case compeleted, Because struct TempDir implements Drop trait.
impl Drop for TempDir {
fn drop(&mut self) {
if !self.keep {
let _ = remove_dir_all(self.path());
}
}
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you perhaps try to undo this change? (same above)
Hi @findepi. It's a bummer, but we still have to make this change. Because, I found out from the previous PR #13531 that the temp directory (i.g. "C:\Users\RUNNER~1\AppData\Local\Temp") that contains the special character ~ in CI, which will be replace to "%7E" when loading file, so that the file cannot be find. I notice that some parquet cases are working fine on windows machine (i.g. parquet::file_statistics::check_stats_precision_with_filter_pushdown). This cases read parquet files in "../../parquet-testing/data", which implies that the current folder definitely ok and does not have special characters (i.g. "~", "\").
datafusion/datafusion/common/src/test_util.rs
Lines 224 to 229 in fdb221f
| pub fn parquet_test_data() -> String { | |
| match get_data_dir("PARQUET_TEST_DATA", "../../parquet-testing/data") { | |
| Ok(pb) => pb.display().to_string(), | |
| Err(err) => panic!("failed to get parquet data dir: {err}"), | |
| } | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @zhuliquan -- using new_in makes sense for me and I agree that the temporary file will be deleted when the TempFile goes out of scope
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @zhuliquan -- using
new_inmakes sense for me and I agree that the temporary file will be deleted when theTempFilegoes out of scope
Thanks @alamb reviews. Emm,usingnew_inis aim to make temp parquet file in current directory instead of system temp directory. It can avoid some special char (i.g ~). The feature which deleting file without scope is owe toDroptrait which is implemented byTempDir
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the temporary file will be deleted when the TempFile goes out of scope
will this work too if a test panics?
tests creating files -- or sometimes creating files -- in the git repo is something we shoulda void
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the temporary file will be deleted when the TempFile goes out of scope
will this work too if a test panics? tests creating files -- or sometimes creating files -- in the git repo is something we shoulda void
Actually, drop will be invoked when test panic. If test happend panic, Rust will begin unwinding the stack, which means it will try to drop all variables that are still in scope (and therefore their drop methods will be called). I've done experiments to confirm this conclusion (Wherever I add panic! to the test code, the temporary directory seems to be removed). Even in extreme cases (i.g. panic happened in drop function), there is an ignore file to exclude the use of parquet files for this test.
Line 68 in 2ac8af8
| datafusion/core/*.parquet |
| let mut canonical_path = path.canonicalize()?; | ||
|
|
||
| if cfg!(target_os = "windows") { | ||
| canonical_path = canonical_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this test exercised on the CI?
datafusion/.github/workflows/rust.yml
Lines 326 to 339 in ddee471
| windows: | |
| name: cargo test (win64) | |
| runs-on: windows-latest | |
| steps: | |
| - uses: actions/checkout@v4 | |
| with: | |
| submodules: true | |
| - name: Setup Rust toolchain | |
| uses: ./.github/actions/setup-windows-builder | |
| - name: Run tests (excluding doctests) | |
| shell: bash | |
| run: | | |
| export PATH=$PATH:$HOME/d/protoc/bin | |
| cargo test --lib --tests --bins --features avro,json,backtrace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it, I also suspect it in PR #11575. However, this case run ok on my windows machine, and when I remove the code referenced above, this case can't be ok (got below error).
thread 'parquet::filter_pushdown::single_file' panicked at datafusion\core\tests\parquet\filter_pushdown.rs:516:66:
called `Result::unwrap()` on an `Err` value: ParquetError(External(Generic { store: "LocalFileSystem", source: InvalidUrl { url: Url { scheme: "file", cannot_be_a_base: false, username: "", password: None, host: None, port: None, path: "/%5C%5C%3F%5CD:%5Cworkspace%5Crust%5Cgithub.com%5Czhuliquan%5Cdatafusion%5Cdatafusion%5Ccore%5C.tmpoyfZYi%5Cdata.parquet", query: None, fragment: None } } }))
stack backtrace:
0: rust_begin_unwind
at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14\library/std\src/panicking.rs:662:5
1: core::panicking::panic_fmt
at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14\library/core\src/panicking.rs:74:14
2: core::result::unwrap_failed
at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14\library/core\src/result.rs:1677:5
3: core::result::Result<T,E>::unwrap
at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14\library\core\src/result.rs:1102:23
4: parquet_exec::parquet::filter_pushdown::TestCase::read_with_options::{{closure}}
at .\tests\parquet\filter_pushdown.rs:516:22
5: parquet_exec::parquet::filter_pushdown::TestCase::run_with_filter::{{closure}}
at .\tests\parquet\filter_pushdown.rs:448:14
6: parquet_exec::parquet::filter_pushdown::TestCase::run::{{closure}}
at .\tests\parquet\filter_pushdown.rs:433:43
7: parquet_exec::parquet::filter_pushdown::single_file::{{closure}}
at .\tests\parquet\filter_pushdown.rs:85:16
8: <core::pin::Pin<P> as core::future::future::Future>::poll
at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14\library\core\src\future/future.rs:123:9
9: <core::pin::Pin<P> as core::future::future::Future>::poll
at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14\library\core\src\future/future.rs:123:9
10: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}::{{closure}}
at C:\Users\zlq16\.cargo\registry\src\rsproxy.cn-0dccff568467c15b\tokio-1.40.0\src\runtime\scheduler\current_thread\mod.rs:696:57
11: tokio::runtime::coop::with_budget
at C:\Users\zlq16\.cargo\registry\src\rsproxy.cn-0dccff568467c15b\tokio-1.40.0\src\runtime\coop.rs:107:5
12: tokio::runtime::coop::budget
at C:\Users\zlq16\.cargo\registry\src\rsproxy.cn-0dccff568467c15b\tokio-1.40.0\src\runtime\coop.rs:73:5
13: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}
at C:\Users\zlq16\.cargo\registry\src\rsproxy.cn-0dccff568467c15b\tokio-1.40.0\src\runtime\scheduler\current_thread\mod.rs:696:25
14: tokio::runtime::scheduler::current_thread::Context::enter
at C:\Users\zlq16\.cargo\registry\src\rsproxy.cn-0dccff568467c15b\tokio-1.40.0\src\runtime\scheduler\current_thread\mod.rs:423:19
15: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}
at C:\Users\zlq16\.cargo\registry\src\rsproxy.cn-0dccff568467c15b\tokio-1.40.0\src\runtime\scheduler\current_thread\mod.rs:695:36
16: tokio::runtime::scheduler::current_thread::CoreGuard::enter::{{closure}}
at C:\Users\zlq16\.cargo\registry\src\rsproxy.cn-0dccff568467c15b\tokio-1.40.0\src\runtime\scheduler\current_thread\mod.rs:774:68
17: tokio::runtime::context::scoped::Scoped<T>::set
at C:\Users\zlq16\.cargo\registry\src\rsproxy.cn-0dccff568467c15b\tokio-1.40.0\src\runtime\context\scoped.rs:40:9
18: tokio::runtime::context::set_scheduler::{{closure}}
at C:\Users\zlq16\.cargo\registry\src\rsproxy.cn-0dccff568467c15b\tokio-1.40.0\src\runtime\context.rs:180:26
19: std::thread::local::LocalKey<T>::try_with
at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14\library\std\src\thread/local.rs:283:12
20: std::thread::local::LocalKey<T>::with
at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14\library\std\src\thread/local.rs:260:9
21: tokio::runtime::context::set_scheduler
at C:\Users\zlq16\.cargo\registry\src\rsproxy.cn-0dccff568467c15b\tokio-1.40.0\src\runtime\context.rs:180:9
22: tokio::runtime::scheduler::current_thread::CoreGuard::enter
at C:\Users\zlq16\.cargo\registry\src\rsproxy.cn-0dccff568467c15b\tokio-1.40.0\src\runtime\scheduler\current_thread\mod.rs:774:27
23: tokio::runtime::scheduler::current_thread::CoreGuard::block_on
at C:\Users\zlq16\.cargo\registry\src\rsproxy.cn-0dccff568467c15b\tokio-1.40.0\src\runtime\scheduler\current_thread\mod.rs:683:19
24: tokio::runtime::scheduler::current_thread::CurrentThread::block_on::{{closure}}
at C:\Users\zlq16\.cargo\registry\src\rsproxy.cn-0dccff568467c15b\tokio-1.40.0\src\runtime\scheduler\current_thread\mod.rs:191:28
25: tokio::runtime::context::runtime::enter_runtime
at C:\Users\zlq16\.cargo\registry\src\rsproxy.cn-0dccff568467c15b\tokio-1.40.0\src\runtime\context\runtime.rs:65:16
26: tokio::runtime::scheduler::current_thread::CurrentThread::block_on
at C:\Users\zlq16\.cargo\registry\src\rsproxy.cn-0dccff568467c15b\tokio-1.40.0\src\runtime\scheduler\current_thread\mod.rs:179:9
27: tokio::runtime::runtime::Runtime::block_on_inner
at C:\Users\zlq16\.cargo\registry\src\rsproxy.cn-0dccff568467c15b\tokio-1.40.0\src\runtime\runtime.rs:361:47
28: tokio::runtime::runtime::Runtime::block_on
at C:\Users\zlq16\.cargo\registry\src\rsproxy.cn-0dccff568467c15b\tokio-1.40.0\src\runtime\runtime.rs:335:13
29: parquet_exec::parquet::filter_pushdown::single_file
at .\tests\parquet\filter_pushdown.rs:223:5
30: parquet_exec::parquet::filter_pushdown::single_file::{{closure}}
at .\tests\parquet\filter_pushdown.rs:69:23
31: core::ops::function::FnOnce::call_once
at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14\library\core\src\ops/function.rs:250:5
32: core::ops::function::FnOnce::call_once
at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14\library/core\src\ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
error: test failed, to rerun pass `--test parquet_exec`
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @zhuliquan and @findepi -- this seems like an improvement to me.
| #[tokio::test] | ||
| async fn single_file_small_data_pages() { | ||
| let tempdir = TempDir::new().unwrap(); | ||
| let tempdir = TempDir::new_in(Path::new(".")).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @zhuliquan -- using new_in makes sense for me and I agree that the temporary file will be deleted when the TempFile goes out of scope
* test: support run filter_pushdown on windows * minor: remove useless comment for filter_pushdown
Which issue does this PR close?
Closes #.
Rationale for this change
support run filter_pushdown on windows machine
What changes are included in this PR?
1、avoid escape char (i.g. '\') in path.
Are these changes tested?
filter_pushdown
Are there any user-facing changes?
No