Skip to content

Fix CI check when version changes -- remove checked in file that is created by doc example#12034

Merged
alamb merged 4 commits intoapache:mainfrom
alamb:alamb/fix_exmple
Aug 19, 2024
Merged

Fix CI check when version changes -- remove checked in file that is created by doc example#12034
alamb merged 4 commits intoapache:mainfrom
alamb:alamb/fix_exmple

Conversation

@alamb
Copy link
Copy Markdown
Contributor

@alamb alamb commented Aug 16, 2024

Which issue does this PR close?

Closes #11892

Rationale for this change

See #11892 -- basically if versions change the CI fails

The problem is that the output parquet file that is written as part of the doc example is checked in, and thus if the actual bytes written by the example change (e.g. when the version number changes) the local file is different too

What changes are included in this PR?

  1. Delete the example.parquet file
  2. Add it to .gitignore so it isn't added again
  3. Update example to avoid recreating it

Are these changes tested?

Yes by CI

Are there any user-facing changes?

No, this is a test fix only

@korowa
Copy link
Copy Markdown
Contributor

korowa commented Aug 18, 2024

Since this file is produced by https://github.com/apache/datafusion/blob/main/docs/source/library-user-guide/using-the-dataframe-api.md#write-dataframe-to-files, which is runned by this doctest as

cargo test --doc 'library_user_guide_dataframe_api'

won't it be better to replace it with tempdir rather than adding this file to gitignore? E.g.

use datafusion::prelude::*;
use datafusion::error::Result;
use datafusion::dataframe::DataFrameWriteOptions;

#[tokio::main]
async fn main() -> Result<()> {
    // Replace with actual target path
    let target_path = tempfile::tempdir()?.path().join("example.parquet");

    let ctx = SessionContext::new();
    // read example.csv file into a DataFrame
    let df = ctx.read_csv("tests/data/example.csv", CsvReadOptions::new()).await?;
    // stream the contents of the DataFrame to the target file
    df.write_parquet(
        target_path.to_string_lossy().as_ref(),
        DataFrameWriteOptions::new(),
        None, // writer_options
    ).await;
    Ok(())
}

or to delete the file in the end of the example, like it's done in other doctests, but using tempdir has zero (or close to it) risk that this file will be committed.

@alamb alamb changed the title Remove checked in file that is created by doc example Fix CI check when version changes -- remove checked in file that is created by doc example Aug 19, 2024
@alamb
Copy link
Copy Markdown
Contributor Author

alamb commented Aug 19, 2024

won't it be better to replace it with tempdir rather than adding this file to gitignore? E.g.

Good idea -- thanks @korowa. Done in 14caf64

@alamb alamb added the documentation Improvements or additions to documentation label Aug 19, 2024
Copy link
Copy Markdown
Contributor

@korowa korowa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

Thank you @alamb

@alamb
Copy link
Copy Markdown
Contributor Author

alamb commented Aug 19, 2024

Thank you for the review @korowa

@alamb alamb merged commit c2cbba2 into apache:main Aug 19, 2024
@alamb alamb deleted the alamb/fix_exmple branch August 19, 2024 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate development-process Related to development process of DataFusion documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CI fails with Binary files a/datafusion/core/example.parquet and b/datafusion/core/example.parquet differ after changing the version

2 participants