remote: gs/s3: remove batch_exists#2375
Conversation
|
For the record: as we've discussed in PMs, while we are at it, let's try to get rid of batch_exists, since we now have connection pools and no longer need it. |
|
@pared Could you please check how |
Sure, Ill prepare some benchmark. |
|
@efiop Prepare repo script: timing script: Average execution time for 5 runs: EDIT Ill run more extensive test. |
|
This should be much worse for ssh. You dropped using many sftp per connection, which was a significant optimization. |
|
P.S. What was the point of batch exists for gs/s3 in the first place? We may drop only those while still having batch exists for ssh. |
|
@pared did you set |
|
Also looks like |
Can't we use pool there too, same way we do for pull?
It was mainly because of batch_exists for ssh.
It is enabled by default. |
Not sure what you mean, add a pool of sftp connections in each SSH connection? This will require special handling anyway, like |
|
@Suor Before the connection pool, we had a problem that we were limited by ~4 ssh connections, so we've started using batch_exists which multiplied those 4 by 8 sftp connections. With connection pool in place, we are reusing already opened connections, which is probably why the tests show small performance degradation. |
|
@efiop we are still limited by 4 SSH connections with or without pool, if SSH server has many CPUs then this should not be enough. |
|
Average execution time for 50 repeats: |
|
@Suor but because of the pool, workers can reuse already opened ssh connections instead of opening new ones for each batch and then multiplexing sftp. |
|
@Suor I've got 12. Ill limit and retry tests |
|
@pared then something looks wrong, CPUs are not used properly by current master. Maybe it's IO bound for you. |
|
@Suor maybe I should try with "real" case? Like ssh cache on different physical machine? |
|
@pared you can try, it will add a network lag at least, which might also make a number of threads more important. |
|
BTW, using |
It is known, not using |
|
Tried the same bench scenario, tried jobs=1 and 2. Looks like there is almost no difference, at least vs local ssh. |
|
So the benches for me, ran Current - 50s, |
|
Ok, so tested with big latency by checking the status from SF to India. And got 31m vs 1h+(couldn't wait longer and the progress bar is broken on master). So looks like we do need sftp pool too 🙁 |
|
I've also noticed that it spends around 10minutes before even checking the remote, so there might be something else broken. Need to investigate. |
|
Ok, guys, how about we re-define |
|
@efiop Ill retrieve previous version of cache exists for SSH then. |
dad1a4e to
fadf47c
Compare
| progress_callback = ProgressCallback(len(checksums)) | ||
|
|
||
| def exists_with_progress(chunks): | ||
| return self.batch_exists(chunks, callback=progress_callback) |
There was a problem hiding this comment.
We've lost the progress bar :)
Signed-off-by: Ruslan Kuprieiev <ruslan@iterative.ai>
Signed-off-by: Ruslan Kuprieiev <ruslan@iterative.ai>
Have you followed the guidelines in our
Contributing document?
Does your PR affect documented changes or does it add new functionality
that should be documented? If yes, have you created a PR for
dvc.org documenting it or at
least opened an issue for it? If so, please add a link to it.
Fix #2373