Skip to content

benchmark for dvc status with different numbers of ignore rules#30

Merged
pared merged 9 commits into
treeverse:masterfrom
karajan1001:fix_18
Jun 8, 2020
Merged

benchmark for dvc status with different numbers of ignore rules#30
pared merged 9 commits into
treeverse:masterfrom
karajan1001:fix_18

Conversation

@karajan1001
Copy link
Copy Markdown
Contributor

@karajan1001 karajan1001 commented May 31, 2020

20 ignore rules with 10k files tracked
Screenshot from 2020-05-31 10-16-55

3 ignore rules with 10k files tracked

Screenshot from 2020-05-31 10-17-02

Empty ignore rules file with 10k files tracked

Screenshot from 2020-05-31 10-17-04

1. reduce files in data prevent `dvc add` causing timeout failure.
2. quiet mode
@karajan1001
Copy link
Copy Markdown
Contributor Author

dvc add 10K files takes quite a long time. If I use 40k files, the test would fail because of the asv timeout.

@skshetry
Copy link
Copy Markdown
Collaborator

dvc add 10K files takes quite a long time. If I use 40k files, the test would fail because of the asv timeout.

@karajan1001 would it be better to have add benchmarks with dvcignore first, rather than status`?

@skshetry skshetry requested review from efiop and pared May 31, 2020 02:42
@skshetry skshetry linked an issue May 31, 2020 that may be closed by this pull request
@karajan1001
Copy link
Copy Markdown
Contributor Author

karajan1001 commented May 31, 2020

dvc add 10K files takes quite a long time. If I use 40k files, the test would fail because of the asv timeout.

@karajan1001 would it be better to have add benchmarks with dvcignore first, rather than status`?
@skshetry Thank you.

From my understanding, .dvcignore influences both dvc add and dvc status. But the dvc add takes much longer time and .dvcignore is not the bottleneck to it. It is better to test .dvcignore with dvc status rather than dvc add.

@karajan1001
Copy link
Copy Markdown
Contributor Author

dvc add 10K files takes quite a long time. If I use 40k files, the test would fail because of the asv timeout.

This is because the test runs for several times. And in each time calculating md5 and adding files to the cache took server seconds. Now I put dvc add before copying files to data, this change can prevent adding files to the cache.

@karajan1001
Copy link
Copy Markdown
Contributor Author

dvc add 10K files takes quite a long time. If I use 40k files, the test would fail because of the asv timeout.

This is because the test runs for several times. And in each time calculating md5 and adding files to the cache took server seconds. Now I put dvc add before copying files to data, this change can prevent adding files to the cache.

After these changes, here is the result.

20 ignore rules with 10k files tracked

Screenshot from 2020-05-31 14-19-15
3 ignore rules with 10k files tracked

Screenshot from 2020-05-31 14-19-18

Empty ignore rules file with 10k files tracked

Screenshot from 2020-05-31 14-19-19

These results matches
Screenshot from 2020-05-31 14-27-27

A new test with 3 ignore files with 10 rules compare to one file with 30 rules
@karajan1001
Copy link
Copy Markdown
Contributor Author

Add one more test 1 ignore file with 30 rules compare to 3 ignore file with 10 rules each.

1 file 30 rules
Screenshot from 2020-05-31 15-54-41

3 file each has 10 rules
Screenshot from 2020-05-31 15-54-43

Copy link
Copy Markdown
Contributor

@pared pared left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question

Comment thread benchmarks/status.py Outdated
@karajan1001
Copy link
Copy Markdown
Contributor Author

Please wait for a moment.
This benchmark runs too long and itself needs optimization.

  1. Use a module-level setup and teardown.
  2. Set repeat to 1 and increase numbers (1 setup and run tests multi-times) ASV

@karajan1001
Copy link
Copy Markdown
Contributor Author

karajan1001 commented Jun 1, 2020

Please wait for a moment.
This benchmark runs too long and itself needs optimization.

  1. Use a module-level setup and teardown.
  2. Set repeat to 1 and increase numbers (1 setup and run tests multi-times) ASV
  1. module-level setup and teardown runs in every benchmark. They are combined with class-level ones.
  2. I had set repeat to 1 and numbers to 10, and the total running time cut in half on my computer. But it seems there is still something wrong with it.

The setup() runs twice, and the time_status() runs ten times for the first time, but only once for the second time. I'm asking ASV for advice.

@pared
Copy link
Copy Markdown
Contributor

pared commented Jun 1, 2020

@karajan1001
Asv has something called setup_cache, some context here.

The problem with it is that this method writes pickled setup result to disk, and loads it upon next execution of benchmark. In case of DVC, that does not make sense as we are operating on a directory/repository which changes state during benchmark execution.

Some more context

@karajan1001
Copy link
Copy Markdown
Contributor Author

karajan1001 commented Jun 2, 2020

@karajan1001
Asv has something called setup_cache, some context here.

The problem with it is that this method writes pickled setup result to disk, and loads it upon next execution of benchmark. In case of DVC, that does not make sense as we are operating on a directory/repository which changes state during benchmark execution.

Some more context

Yes, our setup is of paths and files, not python in-memory objects.

Currently, the whole bench takes 10 minutes on my computer, only a quarter compared to the first edition.

There is only one problem remains that each setup() runs twice and in the second period, the bench itself runs only once.

@pared
Copy link
Copy Markdown
Contributor

pared commented Jun 2, 2020

@karajan1001
It's most likely due to fact that we run not only benchmarks. If you inspect run_benchmarks.dvc you will note that we also profile the tests. That is the reason for the additional run.

@karajan1001
Copy link
Copy Markdown
Contributor Author

karajan1001 commented Jun 2, 2020

@karajan1001
It's most likely due to fact that we run not only benchmarks. If you inspect run_benchmarks.dvc you will note that we also profile the tests. That is the reason for the additional run.

Thank you @pared you gave me the right answer. According to ASV

  --profile, -p         In addition to timing, run the benchmarks through the
                        `cProfile` profiler and store the results.

And I tested that in deleting --profile the second run of setup() disappeared. I have nothing to optimize now.

@pared
Copy link
Copy Markdown
Contributor

pared commented Jun 3, 2020

@karajan1001 also, I created #33, not sure whether this profiling makes sense, maybe we should do it only when we notice degradation in performance.

Comment thread benchmarks/status.py
)
assert main(["add", "data", "--quiet"]) == 0
shutil.copytree(dataset_path, "data/data")
# calculating md5
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@pared
Copy link
Copy Markdown
Contributor

pared commented Jun 3, 2020

Now that I think about it, shouldn't we just have a single benchmark, with, multiple ignore rules?

I mean having 0, 3 and 30 seems to me like creating tests to compare status performance between those 3 use cases.

The point of dvc-bench is to detect degradation in performance (and not see how dvcignore influences walk through directory), and for that we need benchmark checking status with any existing rules. And having 0 dvcignore rules looks to me like a status benchmark (which we don't currently have).

I would consider leaving 30 rules case as is, removing 3 rules use case and moving 0 cases to StatusBenchmark (not necessarily in this change).

What do you think @karajan1001?

@karajan1001
Copy link
Copy Markdown
Contributor Author

@karajan1001 also, I created #33, not sure whether this profiling makes sense, maybe we should do it only when we notice degradation in performance.

Sorry for the late reply, I agreed with this.

Now that I think about it, shouldn't we just have a single benchmark, with, multiple ignore rules?

I mean having 0, 3 and 30 seems to me like creating tests to compare status performance between those 3 use cases.

The point of dvc-bench is to detect degradation in performance (and not see how dvcignore influences walk through directory), and for that we need benchmark checking status with any existing rules. And having 0 dvcignore rules looks to me like a status benchmark (which we don't currently have).

I would consider leaving 30 rules case as is, removing 3 rules use case and moving 0 cases to StatusBenchmark (not necessarily in this change).

What do you think @karajan1001?

Yes, the zero dvcignore is more a status benchmark. I had changed the DVCIgnoreEmpty name several times and couldn't make
a final decision.

30 rules comparing 3 rules can tell us how the time cost grows with the increase of dvcignore. I think it is OK if the time cost grows linear to the number of rules in dvcignore. Without this assumption, we could not separate the influence of preprocessing and regex matching. But maybe this detailed performance influences should only be checked if there is an obvious performance degradation. Just like the --profile in the previous discussion. Benchmark gives a big picture, only some degradation detected should we go into the small pieces.

@pared
Copy link
Copy Markdown
Contributor

pared commented Jun 5, 2020

@karajan1001

But maybe this detailed performance influences should only be checked if there is an obvious performance degradation. Just like the --profile in the previous discussion. Benchmark gives a big picture, only some degradation detected should we go into the small pieces.

Please take a note of @efiop comment under the profile issue. We can leave the profiles, in order to have analysis materials if something goes south.

@karajan1001
Copy link
Copy Markdown
Contributor Author

But maybe this detailed performance influences should only be checked if there is an obvious performance degradation. Just like the --profile in the previous discussion. Benchmark gives a big picture, only some degradation detected should we go into the small pieces.

@pared @efiop
And what about the 3 ignore rules benchmark? In my opinion with the message from --profile, we already have enough information.

@pared
Copy link
Copy Markdown
Contributor

pared commented Jun 5, 2020

@karajan1001

I agree, to see the detailed difference in case 3 vs 30, we would need to compare charts for both tests. And the main point of this repo is to fulfill #8, which will be hard to programmatically define in case of comparing 2 benchmarks.

So let's get rid of it for now?

@pared
Copy link
Copy Markdown
Contributor

pared commented Jun 8, 2020

@karajan1001 great change! Thank you very much!

@pared pared merged commit 24cf942 into treeverse:master Jun 8, 2020
@karajan1001 karajan1001 deleted the fix_18 branch December 15, 2022 10:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dvcignore/tree benchmark

3 participants