Skip to content

Conversation

@Dandandan
Copy link
Contributor

@Dandandan Dandandan commented Feb 14, 2021

To reduce the amount of dependencies in Arrow, this speeds up compile times, by removing the criterion dependencies.

On my machine time to build the crate with cargo test:

Old New
Nr dependencies 151 110
execution time 38.92 28.24

@github-actions
Copy link

@codecov-io
Copy link

codecov-io commented Feb 14, 2021

Codecov Report

Merging #9493 (466f4b0) into master (8547c61) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #9493   +/-   ##
=======================================
  Coverage   82.12%   82.12%           
=======================================
  Files         235      235           
  Lines       54729    54729           
=======================================
  Hits        44944    44944           
  Misses       9785     9785           
Impacted Files Coverage Δ
rust/parquet/src/encodings/encoding.rs 94.86% <0.00%> (-0.20%) ⬇️
rust/arrow/src/array/transform/fixed_binary.rs 84.21% <0.00%> (+5.26%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8547c61...466f4b0. Read the comment docs.

@jorgecarleitao
Copy link
Member

Doesn't this make it more difficult to develop? I usually need to run a benchmark multiple times during dev of e.g. a kernel, and now I need to have two vs code opened for this.

I do not get the benefits either: doesn't the times that you refer to only relevant in the dev and in the first build? Running cargo test will not re-build criterion and others, right? The same applies to the number of dependencies.

I think that the concern in the mailing list is not about dev dependencies but about non-dev dependencies.

@alamb
Copy link
Contributor

alamb commented Feb 15, 2021

I think that the concern in the mailing list is not about dev dependencies but about non-dev dependencies.

It is true that I was originally thinking about non-dev dependencies, though speeding up dev cycles itself is also a worthy goal.

memory-check = []

[dev-dependencies]
criterion = "0.3"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I was imagining something more like adding a

[features]
benches = ["criterion"]

So then if one wanted to have the benches built they could run a command like cargo bench --features benches or something

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be interesting to try out something like that out. It will keep the dev flow more similar to what we have now while reducing the compile times when you don't want to run any benchmarks.

@jorgecarleitao would something like this work for you?

@jorgecarleitao
Copy link
Member

jorgecarleitao commented Feb 15, 2021

I am just trying to understand how this reduces the dev cycle: maybe I am not using the correct flow.

My flow atm on the arrow crate is (on a vs terminal opened on rust/arrow)

git checkout master
git checkout -b feature/A
# > remove all workspaces from `../Cargo.toml`  since rust-analyzer builds all and I do not care about them.

# > change something
cargo fmt
cargo clippy
cargo test --lib
cargo bench --bench X
# repeat step until happy

# > commit

git checkout master
cargo bench --bench X
git checkout feature/A
cargo bench --bench X
# > copy bench results

Open PR with bench results and pray that other dependencies pass. If not, iterate on them.

I am curious as to how others have been working here.

@alamb
Copy link
Contributor

alamb commented Mar 3, 2021

@Dandandan what do you think we should do with this PR? Is it something we should work on getting in? Or should we park the discussion for now?

I doubt this is going to improve any compile times when benchmarks are involved, but it will improve times for those PRs which don't need to rerun benchmarks

@Dandandan
Copy link
Contributor Author

@alamb let's park the discussion for now and see if there are other areas of improvement. I think it still could be a good idea to decrease compile / dev times, but the benchmarks are also quite central to the arrow crate. Maybe it makes more sense to do that it the DataFusion crate? Let's see

@alamb
Copy link
Contributor

alamb commented Mar 3, 2021

Closing this PR for now

@alamb alamb closed this Mar 3, 2021
alamb pushed a commit that referenced this pull request Mar 28, 2021
…ject

Same idea as #9493 but for examples in DataFusion

FYI @alamb

Clean + building with `cargo test`.

Moving the micro benchmarks out of the crate is also another possibility.

|         | Old           | New  |
| ------------- |:-------------:| -----:|
| Nr dependencies      | 309 | 249 |
| build time(s)      | 77    |68 |

Closes #9494 from Dandandan/move_datafusion_examples

Lead-authored-by: Heres, Daniel <danielheres@gmail.com>
Co-authored-by: Daniël Heres <danielheres@gmail.com>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
alamb pushed a commit to apache/arrow-rs that referenced this pull request Apr 20, 2021
…ject

Same idea as apache/arrow#9493 but for examples in DataFusion

FYI @alamb

Clean + building with `cargo test`.

Moving the micro benchmarks out of the crate is also another possibility.

|         | Old           | New  |
| ------------- |:-------------:| -----:|
| Nr dependencies      | 309 | 249 |
| build time(s)      | 77    |68 |

Closes #9494 from Dandandan/move_datafusion_examples

Lead-authored-by: Heres, Daniel <danielheres@gmail.com>
Co-authored-by: Daniël Heres <danielheres@gmail.com>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants