Skip to content

Conversation

@zhuliquan
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

support run filter_pushdown on windows machine

What changes are included in this PR?

1、avoid escape char (i.g. '\') in path.

Are these changes tested?

filter_pushdown

Are there any user-facing changes?

No

@github-actions github-actions bot added the core Core DataFusion crate label Nov 30, 2024
Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for working on this!

mod external_access_plan;
mod file_statistics;
#[cfg(not(target_family = "windows"))]
// #[cfg(not(target_family = "windows"))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's OK to just remove the line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I remove it later

#[tokio::test]
async fn single_file_small_data_pages() {
let tempdir = TempDir::new().unwrap();
let tempdir = TempDir::new_in(Path::new(".")).unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this end up litterring in the local git checkout when running the tests?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you perhaps try to undo this change? (same above)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this end up litterring in the local git checkout when running the tests?

Emm, path which hold by tempdir will be delete after test case compeleted, Because struct TempDir implements Drop trait.

impl Drop for TempDir {
    fn drop(&mut self) {
        if !self.keep {
            let _ = remove_dir_all(self.path());
        }
    }
}

Copy link
Contributor Author

@zhuliquan zhuliquan Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you perhaps try to undo this change? (same above)

Hi @findepi. It's a bummer, but we still have to make this change. Because, I found out from the previous PR #13531 that the temp directory (i.g. "C:\Users\RUNNER~1\AppData\Local\Temp") that contains the special character ~ in CI, which will be replace to "%7E" when loading file, so that the file cannot be find. I notice that some parquet cases are working fine on windows machine (i.g. parquet::file_statistics::check_stats_precision_with_filter_pushdown). This cases read parquet files in "../../parquet-testing/data", which implies that the current folder definitely ok and does not have special characters (i.g. "~", "\").

pub fn parquet_test_data() -> String {
match get_data_dir("PARQUET_TEST_DATA", "../../parquet-testing/data") {
Ok(pb) => pb.display().to_string(),
Err(err) => panic!("failed to get parquet data dir: {err}"),
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @zhuliquan -- using new_in makes sense for me and I agree that the temporary file will be deleted when the TempFile goes out of scope

Copy link
Contributor Author

@zhuliquan zhuliquan Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @zhuliquan -- using new_in makes sense for me and I agree that the temporary file will be deleted when the TempFile goes out of scope
Thanks @alamb reviews. Emm,using new_in is aim to make temp parquet file in current directory instead of system temp directory. It can avoid some special char (i.g ~). The feature which deleting file without scope is owe to Drop trait which is implemented by TempDir

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the temporary file will be deleted when the TempFile goes out of scope

will this work too if a test panics?
tests creating files -- or sometimes creating files -- in the git repo is something we shoulda void

Copy link
Contributor Author

@zhuliquan zhuliquan Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the temporary file will be deleted when the TempFile goes out of scope

will this work too if a test panics? tests creating files -- or sometimes creating files -- in the git repo is something we shoulda void

Actually, drop will be invoked when test panic. If test happend panic, Rust will begin unwinding the stack, which means it will try to drop all variables that are still in scope (and therefore their drop methods will be called). I've done experiments to confirm this conclusion (Wherever I add panic! to the test code, the temporary directory seems to be removed). Even in extreme cases (i.g. panic happened in drop function), there is an ignore file to exclude the use of parquet files for this test.

datafusion/core/*.parquet

let mut canonical_path = path.canonicalize()?;

if cfg!(target_os = "windows") {
canonical_path = canonical_path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this test exercised on the CI?

windows:
name: cargo test (win64)
runs-on: windows-latest
steps:
- uses: actions/checkout@v4
with:
submodules: true
- name: Setup Rust toolchain
uses: ./.github/actions/setup-windows-builder
- name: Run tests (excluding doctests)
shell: bash
run: |
export PATH=$PATH:$HOME/d/protoc/bin
cargo test --lib --tests --bins --features avro,json,backtrace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it, I also suspect it in PR #11575. However, this case run ok on my windows machine, and when I remove the code referenced above, this case can't be ok (got below error).

thread 'parquet::filter_pushdown::single_file' panicked at datafusion\core\tests\parquet\filter_pushdown.rs:516:66:
called `Result::unwrap()` on an `Err` value: ParquetError(External(Generic { store: "LocalFileSystem", source: InvalidUrl { url: Url { scheme: "file", cannot_be_a_base: false, username: "", password: None, host: None, port: None, path: "/%5C%5C%3F%5CD:%5Cworkspace%5Crust%5Cgithub.com%5Czhuliquan%5Cdatafusion%5Cdatafusion%5Ccore%5C.tmpoyfZYi%5Cdata.parquet", query: None, fragment: None } } }))
stack backtrace:
   0: rust_begin_unwind
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14\library/std\src/panicking.rs:662:5
   1: core::panicking::panic_fmt
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14\library/core\src/panicking.rs:74:14
   2: core::result::unwrap_failed
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14\library/core\src/result.rs:1677:5
   3: core::result::Result<T,E>::unwrap
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14\library\core\src/result.rs:1102:23
   4: parquet_exec::parquet::filter_pushdown::TestCase::read_with_options::{{closure}}
             at .\tests\parquet\filter_pushdown.rs:516:22
   5: parquet_exec::parquet::filter_pushdown::TestCase::run_with_filter::{{closure}}
             at .\tests\parquet\filter_pushdown.rs:448:14
   6: parquet_exec::parquet::filter_pushdown::TestCase::run::{{closure}}
             at .\tests\parquet\filter_pushdown.rs:433:43
   7: parquet_exec::parquet::filter_pushdown::single_file::{{closure}}
             at .\tests\parquet\filter_pushdown.rs:85:16
   8: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14\library\core\src\future/future.rs:123:9
   9: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14\library\core\src\future/future.rs:123:9
  10: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}::{{closure}}
             at C:\Users\zlq16\.cargo\registry\src\rsproxy.cn-0dccff568467c15b\tokio-1.40.0\src\runtime\scheduler\current_thread\mod.rs:696:57
  11: tokio::runtime::coop::with_budget
             at C:\Users\zlq16\.cargo\registry\src\rsproxy.cn-0dccff568467c15b\tokio-1.40.0\src\runtime\coop.rs:107:5
  12: tokio::runtime::coop::budget
             at C:\Users\zlq16\.cargo\registry\src\rsproxy.cn-0dccff568467c15b\tokio-1.40.0\src\runtime\coop.rs:73:5
  13: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}
             at C:\Users\zlq16\.cargo\registry\src\rsproxy.cn-0dccff568467c15b\tokio-1.40.0\src\runtime\scheduler\current_thread\mod.rs:696:25
  14: tokio::runtime::scheduler::current_thread::Context::enter
             at C:\Users\zlq16\.cargo\registry\src\rsproxy.cn-0dccff568467c15b\tokio-1.40.0\src\runtime\scheduler\current_thread\mod.rs:423:19
  15: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}
             at C:\Users\zlq16\.cargo\registry\src\rsproxy.cn-0dccff568467c15b\tokio-1.40.0\src\runtime\scheduler\current_thread\mod.rs:695:36
  16: tokio::runtime::scheduler::current_thread::CoreGuard::enter::{{closure}}
             at C:\Users\zlq16\.cargo\registry\src\rsproxy.cn-0dccff568467c15b\tokio-1.40.0\src\runtime\scheduler\current_thread\mod.rs:774:68
  17: tokio::runtime::context::scoped::Scoped<T>::set
             at C:\Users\zlq16\.cargo\registry\src\rsproxy.cn-0dccff568467c15b\tokio-1.40.0\src\runtime\context\scoped.rs:40:9
  18: tokio::runtime::context::set_scheduler::{{closure}}
             at C:\Users\zlq16\.cargo\registry\src\rsproxy.cn-0dccff568467c15b\tokio-1.40.0\src\runtime\context.rs:180:26
  19: std::thread::local::LocalKey<T>::try_with
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14\library\std\src\thread/local.rs:283:12
  20: std::thread::local::LocalKey<T>::with
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14\library\std\src\thread/local.rs:260:9
  21: tokio::runtime::context::set_scheduler
             at C:\Users\zlq16\.cargo\registry\src\rsproxy.cn-0dccff568467c15b\tokio-1.40.0\src\runtime\context.rs:180:9
  22: tokio::runtime::scheduler::current_thread::CoreGuard::enter
             at C:\Users\zlq16\.cargo\registry\src\rsproxy.cn-0dccff568467c15b\tokio-1.40.0\src\runtime\scheduler\current_thread\mod.rs:774:27
  23: tokio::runtime::scheduler::current_thread::CoreGuard::block_on
             at C:\Users\zlq16\.cargo\registry\src\rsproxy.cn-0dccff568467c15b\tokio-1.40.0\src\runtime\scheduler\current_thread\mod.rs:683:19
  24: tokio::runtime::scheduler::current_thread::CurrentThread::block_on::{{closure}}
             at C:\Users\zlq16\.cargo\registry\src\rsproxy.cn-0dccff568467c15b\tokio-1.40.0\src\runtime\scheduler\current_thread\mod.rs:191:28
  25: tokio::runtime::context::runtime::enter_runtime
             at C:\Users\zlq16\.cargo\registry\src\rsproxy.cn-0dccff568467c15b\tokio-1.40.0\src\runtime\context\runtime.rs:65:16
  26: tokio::runtime::scheduler::current_thread::CurrentThread::block_on
             at C:\Users\zlq16\.cargo\registry\src\rsproxy.cn-0dccff568467c15b\tokio-1.40.0\src\runtime\scheduler\current_thread\mod.rs:179:9
  27: tokio::runtime::runtime::Runtime::block_on_inner
             at C:\Users\zlq16\.cargo\registry\src\rsproxy.cn-0dccff568467c15b\tokio-1.40.0\src\runtime\runtime.rs:361:47
  28: tokio::runtime::runtime::Runtime::block_on
             at C:\Users\zlq16\.cargo\registry\src\rsproxy.cn-0dccff568467c15b\tokio-1.40.0\src\runtime\runtime.rs:335:13
  29: parquet_exec::parquet::filter_pushdown::single_file
             at .\tests\parquet\filter_pushdown.rs:223:5
  30: parquet_exec::parquet::filter_pushdown::single_file::{{closure}}
             at .\tests\parquet\filter_pushdown.rs:69:23
  31: core::ops::function::FnOnce::call_once
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14\library\core\src\ops/function.rs:250:5
  32: core::ops::function::FnOnce::call_once
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14\library/core\src\ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
error: test failed, to rerun pass `--test parquet_exec`

@zhuliquan zhuliquan requested a review from findepi December 2, 2024 14:49
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @zhuliquan and @findepi -- this seems like an improvement to me.

#[tokio::test]
async fn single_file_small_data_pages() {
let tempdir = TempDir::new().unwrap();
let tempdir = TempDir::new_in(Path::new(".")).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @zhuliquan -- using new_in makes sense for me and I agree that the temporary file will be deleted when the TempFile goes out of scope

@alamb alamb merged commit 948949e into apache:main Dec 5, 2024
25 checks passed
@zhuliquan zhuliquan deleted the test-filter_pushdown branch December 5, 2024 15:14
zhuliquan added a commit to zhuliquan/datafusion that referenced this pull request Dec 6, 2024
* test: support run filter_pushdown on windows

* minor: remove useless comment for filter_pushdown
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants