libstore: Do not mark connections as bad when RemoteStore::narFromPath is called as a coroutine#14998
Conversation
b0069ce to
5f5bc27
Compare
|
Hm, I'm not sure this is sound? If the coroutine is abandoned before the entire NAR has been copied (e.g. because the consumer of the coroutine throws an exception half-way through), won't that leave the connection in an undefined state? We only want to ignore the unwind exception if it occurs after the entire NAR has been copied. In general, it's annoying that coroutines that produce NARs don't run to completion, since it causes a lot of unexpected issues (e.g. 7ba8443). I wonder if there is a way to fix that generically. |
5f5bc27 to
07ea89d
Compare
Hm, yeah I think I have a fix for that (look at 446e7f1). The reason is that they don't call There's also the question of what to do when the coroutine has produced more data than what we expect. Currently we just discard anything extra. I wonder if there are any bugs lurking in there. |
| if (hasCoro && *coro) { | ||
| (*coro)(); | ||
| } | ||
| if (*coro) { | ||
| cur = coro->get(); | ||
| } else { |
There was a problem hiding this comment.
These duplicate ifs seem tricky, but first first if doesn't get run on the first iteration because hasCoro is false. We don't want to call the coroutine if it's completed on the last iteration - otherwise it would segfault.
|
If the coroutine runs to completion, then it won't get an unwind exception, right (since there's nothing to unwind)? So then |
|
Hm this is quite confusing indeed. I'll double-check if just the fix to serialise.cc is enough. |
…h is called as a coroutine forced_unwind is thrown by Boost.Context when destroying the coroutine. This lead to us resetting the remote connection for each narFromPath with the ssh-ng:// store, so copying was very slow.
Without this we can abort by throwing an exception in the destructor: [24/635/2958 copied (3.8/26.0 GiB)] copying path '/nix/store/ncd2iic2nwxwhqsf4gp9sdybkwnwz20b-ruby3.3-mini_portile2-2.8.9' from 'ssh-ng://localhost:22' Nix crashed. This is a bug. Please report this at https://github.com/NixOS/nix/issues with the following information included: Exception: nix::Interrupted: error: interrupted by the user Stack trace: 0# 0x00000000004AFFE9 in result/bin/nix 1# 0x00007F946290A1AA in /nix/store/cf1a53iqg6ncnygl698c4v0l8qam5a2q-gcc-14.3.0-lib/lib/libstdc++.so.6 2# __cxa_call_terminate in /nix/store/cf1a53iqg6ncnygl698c4v0l8qam5a2q-gcc-14.3.0-lib/lib/libstdc++.so.6 3# __gxx_personality_v0 in /nix/store/cf1a53iqg6ncnygl698c4v0l8qam5a2q-gcc-14.3.0-lib/lib/libstdc++.so.6 4# 0x00007F946283FA19 in /nix/store/cf1a53iqg6ncnygl698c4v0l8qam5a2q-gcc-14.3.0-lib/lib/libgcc_s.so.1 5# _Unwind_RaiseException in /nix/store/cf1a53iqg6ncnygl698c4v0l8qam5a2q-gcc-14.3.0-lib/lib/libgcc_s.so.1 6# __cxa_throw in /nix/store/cf1a53iqg6ncnygl698c4v0l8qam5a2q-gcc-14.3.0-lib/lib/libstdc++.so.6 7# 0x00007F94635D82D0 in /nix/store/9wrnk0nizdwba4sy9lg3h0xd30pg1x5a-nix-util-2.34.0pre/lib/libnixutil.so.2.34.0 8# nix::Pid::wait() in /nix/store/9wrnk0nizdwba4sy9lg3h0xd30pg1x5a-nix-util-2.34.0pre/lib/libnixutil.so.2.34.0 9# nix::Pid::~Pid() in /nix/store/9wrnk0nizdwba4sy9lg3h0xd30pg1x5a-nix-util-2.34.0pre/lib/libnixutil.so.2.34.0
This avoids the wall of text like, because ThreadPool doesn't print interrupts on shutdowns. error (ignored): opening a connection to remote store 'ssh-ng://127.0.0.1' previously failed error (ignored): opening a connection to remote store 'ssh-ng://127.0.0.1' previously failed error (ignored): opening a connection to remote store 'ssh-ng://127.0.0.1' previously failed error (ignored): opening a connection to remote store 'ssh-ng://127.0.0.1' previously failed error (ignored): opening a connection to remote store 'ssh-ng://127.0.0.1' previously failed error (ignored): opening a connection to remote store 'ssh-ng://127.0.0.1' previously failed error (ignored): opening a connection to remote store 'ssh-ng://127.0.0.1' previously failed
07ea89d to
b40b786
Compare
Yup, dropped all those changes. Resuming the coroutine in |
libstore: Do not mark connections as bad when RemoteStore::narFromPath is called as a coroutine
Motivation
forced_unwindis thrown by Boost.Context when destroying the coroutine.This lead to us resetting the remote connection for each
narFromPathwith the ssh-ng:// store, so copying was very slow.
Also fixes some interrupt issues with ssh-ng store.
Context
Fixes #6950.
Add 👍 to pull requests you find important.
The Nix maintainer team uses a GitHub project board to schedule and track reviews.