-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-11969: [Rust][DataFusion] Improve Examples in documentation #9710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -58,6 +58,69 @@ Here are some of the projects known to use DataFusion: | |
|
|
||
| (if you know of another project, please submit a PR to add a link!) | ||
|
|
||
| ## Example Usage | ||
|
|
||
| Run a SQL query against data stored in a CSV: | ||
|
|
||
| ```rust | ||
| use datafusion::prelude::*; | ||
| use arrow::util::pretty::print_batches; | ||
| use arrow::record_batch::RecordBatch; | ||
|
|
||
| #[tokio::main] | ||
| async fn main() -> datafusion::error::Result<()> { | ||
| // create the dataframe | ||
| let mut ctx = ExecutionContext::new(); | ||
| let df = ctx.read_csv("tests/example.csv", CsvReadOptions::new())?; | ||
|
|
||
| let mut ctx = ExecutionContext::new(); | ||
| ctx.register_csv("example", "tests/example.csv", CsvReadOptions::new())?; | ||
|
|
||
| // create a plan to run a SQL query | ||
| let df = ctx.sql("SELECT a, MIN(b) FROM example GROUP BY a LIMIT 100")?; | ||
|
|
||
| // execute and print results | ||
| let results: Vec<RecordBatch> = df.collect().await?; | ||
| print_batches(&results)?; | ||
| Ok(()) | ||
| } | ||
| ``` | ||
|
|
||
| Use the DataFrame API to process data stored in a CSV: | ||
|
|
||
| ```rust | ||
| use datafusion::prelude::*; | ||
| use arrow::util::pretty::print_batches; | ||
| use arrow::record_batch::RecordBatch; | ||
|
|
||
| #[tokio::main] | ||
| async fn main() -> datafusion::error::Result<()> { | ||
| // create the dataframe | ||
| let mut ctx = ExecutionContext::new(); | ||
| let df = ctx.read_csv("tests/example.csv", CsvReadOptions::new())?; | ||
|
|
||
| let df = df.filter(col("a").lt_eq(col("b")))? | ||
| .aggregate(&[col("a")], &[min(col("b"))])? | ||
| .limit(100)?; | ||
|
|
||
| // execute and print results | ||
| let results: Vec<RecordBatch> = df.collect().await?; | ||
|
||
| print_batches(&results)?; | ||
| Ok(()) | ||
| } | ||
| ``` | ||
|
|
||
| Both of these examples will produce | ||
|
|
||
| ```text | ||
| +---+--------+ | ||
| | a | MIN(b) | | ||
| +---+--------+ | ||
| | 1 | 2 | | ||
| +---+--------+ | ||
| ``` | ||
|
|
||
|
|
||
|
|
||
| ## Using DataFusion as a library | ||
|
|
||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpicking: this might be a little bit fun with API churn, e.g. I believe the input expr ownership work you've recently opened would change these from slices to vecs and we don't have a way to catch that automatically like we do for the in-crate docs (am I right in thinking that
cargo testruns all doctests?).Edit: to be clear, I don't think it's a reason to not do it, just curious if anyone has ideas for how to prevent doc drift :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point @returnString
The way I justified the danger of drift to myself was "the main usecase of this documentation (the overview) is likely to help them answer the question of "should I even bother to try and use this crate". Once they decide to try and actually use the crate they will look at the real docs on docs.rs (from which they can copy/paste).
For the purpose of an example of "what does this library do" I felt even a slightly out of date example might be valuable.
Or maybe I am just trying to pad my github stats ;) But in all seriousness I am not committed to this PR. If it isn't a good idea I can just close it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to me; agreed that (personally, at least) I'll give less consideration to projects without simple readme examples.
It balloons the scope of this PR quite a lot so I'm not saying this is a good idea, but I just did a bit of digging and it looks like people have gone through this particular problem before: https://blog.guillaume-gomez.fr/articles/2019-04-13+Keeping+Rust+projects%27+README.md+code+examples+up-to-date
And the end result of that is https://crates.io/crates/doc-comment, which looks like it'll wire up any
rust-tagged code blocks in external files as doctests, optionally only for#[cfg(test)].If it's useful, I could log a followup task to integrate that and take a look at it myself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@returnString https://crates.io/crates/doc-comment looks super awesome -- I think that would be most helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logged as https://issues.apache.org/jira/browse/ARROW-12015 :)