-
Notifications
You must be signed in to change notification settings - Fork 66
implement Nexus quiesce (sagas, db activity) for upgrade #8740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| // `DataStoreConnection` is used by various packages that we'd like to not | ||
| // depend on `nexus-db-queries` (in order to parallelize compilation). | ||
| // However, we need to do some datastore-specific work around the lifecycle | ||
| // of this object (i.e., when it gets instantiated and when it gets | ||
| // dropped). To achieve this, the caller in `nexus-db-queries` provides a | ||
| // `releaser` whose sole purpose is to be dropped when this object is | ||
| // dropped, allowing it to do the needed cleanup there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For context, this package was created in #7985 to better parallelize Nexus compilation. This consequence feels pretty gross but I'm not sure how better to deal with this.
| pub async fn claim(&self) -> Result<DataStoreConnection, Error> { | ||
| let id = self.next_id.fetch_add(1, Ordering::SeqCst); | ||
| let held_since = Utc::now(); | ||
| let debug = Backtrace::force_capture().to_string(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be controversial because collecting a backtrace is considered expensive, but it feels like in the context of anything having to do with the database, this is probably a small cost. The upside is that if quiesce gets stuck (or anything else, for that matter), we'll be able to look at stack traces where outstanding database claims are held.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we measure this? What's the cost of the backtrace relative to accessing the DB connection?
I think my opinion might change if this was like, < 10% vs 100%+ of the total time to get a claim by itself.
Couple other questions:
- Is this output meaningful for us, given that we're in an async calling context? (I assume "yes" because the task is running, just wanna make sure these don't all point to the same spot in the tokio runtime)
- What's the size of these backtraces? Presumably we're going to have an extra one for each live DB claim now. Like latency - maybe this is acceptable! just want to know what we're dealing with
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this is paranoia on my part, but:
Does this backtrace contain any argument info, or is it purely the callstack? I understand this is mostly for omdb-level debugging, but I don't want it to be a side-channel for exfiltrating user data.
(EDIT: seems like "no" on the arguments)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we measure this? What's the cost of the backtrace relative to accessing the DB connection?
We could. To be precise, what I think matters most is how long the backtrace takes relative to the time that the connection is held, not how long it took to acquire the connection. So what I'd probably do is:
- define some USDT probes before and after creating the backtrace
- write a D script that traces these probes, plus the qorb claim acquired / released probes and breaks out the time spent in these two areas
- deploy somewhere (best would be a racklette) and trace it for, like, 10 minutes
I'm honestly not sure it's worth the effort. A cheaper thing I could do is measure how long it takes to do a few backtraces on my own dev system and make sure it's as quick as I'd hope. I'd expect this to be on the order of tens (maybe hundreds) of microseconds and I'd expect it to be dwarfed by even a basic database query round-trip.
Is this output meaningful for us, given that we're in an async calling context? (I assume "yes" because the task is running, just wanna make sure these don't all point to the same spot in the tokio runtime)
Yup. Here's an example:
Details
claim 1248 held since 2025-08-01 21:17:03.232936025 UTC (2s 7ms ago)
acquired by:
0: {async_fn#0}
at ./nexus/db-queries/src/db/pool.rs:197:21
1: {async_fn#0}
at ./nexus/db-queries/src/db/datastore/mod.rs:379:27
2: {async_fn#0}
at ./nexus/db-queries/src/db/datastore/mod.rs:368:45
3: {async_fn#0}
at ./nexus/db-queries/src/db/datastore/deployment.rs:959:59
4: {async_fn#0}
at ./nexus/db-queries/src/db/datastore/deployment.rs:2294:70
5: {async_block#0}
at ./nexus/src/app/background/tasks/blueprint_load.rs:60:18
6: poll<alloc::boxed::Box<(dyn core::future::future::Future<Output=serde_json::value::Value> + core::marker::Send), alloc::alloc::Global>>
at /home/dap/.rustup/toolchains/1.88.0-x86_64-unknown-illumos/lib/rustlib/src/rust/library/core/src/future/future.rs:124:9
7: {async_fn#0}
at ./nexus/src/app/background/driver.rs:339:54
8: {async_fn#0}
at ./nexus/src/app/background/driver.rs:292:62
9: tokio::runtime::task::core::Core<T,S>::poll::{{closure}}
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/task/core.rs:365:17
10: with_mut<tokio::runtime::task::core::Stage<omicron_nexus::app::background::driver::{impl#2}::run::{async_fn_env#0}>, core::task::poll::Poll<()>, tokio::runtime::task::core::{impl#6}::poll::{closure_env#0}<omicron_nexus::app::background::driver::{impl#2}::run::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>>>
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/loom/std/unsafe_cell.rs:16:9
11: poll<omicron_nexus::app::background::driver::{impl#2}::run::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>>
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/task/core.rs:354:13
12: tokio::runtime::task::harness::poll_future::{{closure}}
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/task/harness.rs:535:19
13: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
at /home/dap/.rustup/toolchains/1.88.0-x86_64-unknown-illumos/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:272:9
14: do_call<core::panic::unwind_safe::AssertUnwindSafe<tokio::runtime::task::harness::poll_future::{closure_env#0}<omicron_nexus::app::background::driver::{impl#2}::run::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>>>, core::task::poll::Poll<()>>
at /home/dap/.rustup/toolchains/1.88.0-x86_64-unknown-illumos/lib/rustlib/src/rust/library/std/src/panicking.rs:589:40
15: __rust_try
16: try<core::task::poll::Poll<()>, core::panic::unwind_safe::AssertUnwindSafe<tokio::runtime::task::harness::poll_future::{closure_env#0}<omicron_nexus::app::background::driver::{impl#2}::run::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>>>>
at /home/dap/.rustup/toolchains/1.88.0-x86_64-unknown-illumos/lib/rustlib/src/rust/library/std/src/panicking.rs:552:19
17: catch_unwind<core::panic::unwind_safe::AssertUnwindSafe<tokio::runtime::task::harness::poll_future::{closure_env#0}<omicron_nexus::app::background::driver::{impl#2}::run::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>>>, core::task::poll::Poll<()>>
at /home/dap/.rustup/toolchains/1.88.0-x86_64-unknown-illumos/lib/rustlib/src/rust/library/std/src/panic.rs:359:14
18: tokio::runtime::task::harness::poll_future
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/task/harness.rs:523:18
19: tokio::runtime::task::harness::Harness<T,S>::poll_inner
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/task/harness.rs:210:27
20: tokio::runtime::task::harness::Harness<T,S>::poll
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/task/harness.rs:155:15
21: tokio::runtime::task::raw::poll
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/task/raw.rs:325:5
22: tokio::runtime::task::raw::RawTask::poll
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/task/raw.rs:255:18
23: tokio::runtime::task::LocalNotified<S>::run
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/task/mod.rs:509:9
24: {closure#0}
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/scheduler/multi_thread/worker.rs:600:13
25: with_budget<core::result::Result<alloc::boxed::Box<tokio::runtime::scheduler::multi_thread::worker::Core, alloc::alloc::Global>, ()>, tokio::runtime::scheduler::multi_thread::worker::{impl#1}::run_task::{closure_env#0}>
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/task/coop/mod.rs:167:5
26: budget<core::result::Result<alloc::boxed::Box<tokio::runtime::scheduler::multi_thread::worker::Core, alloc::alloc::Global>, ()>, tokio::runtime::scheduler::multi_thread::worker::{impl#1}::run_task::{closure_env#0}>
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/task/coop/mod.rs:133:5
27: run_task
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/scheduler/multi_thread/worker.rs:591:9
28: tokio::runtime::scheduler::multi_thread::worker::Context::run
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/scheduler/multi_thread/worker.rs:539:24
29: tokio::runtime::scheduler::multi_thread::worker::run::{{closure}}::{{closure}}
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/scheduler/multi_thread/worker.rs:504:21
30: tokio::runtime::context::scoped::Scoped<T>::set
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/context/scoped.rs:40:9
31: tokio::runtime::context::set_scheduler::{{closure}}
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/context.rs:176:26
32: try_with<tokio::runtime::context::Context, tokio::runtime::context::set_scheduler::{closure_env#0}<(), tokio::runtime::scheduler::multi_thread::worker::run::{closure#0}::{closure_env#0}>, ()>
at /home/dap/.rustup/toolchains/1.88.0-x86_64-unknown-illumos/lib/rustlib/src/rust/library/std/src/thread/local.rs:315:12
33: std::thread::local::LocalKey<T>::with
at /home/dap/.rustup/toolchains/1.88.0-x86_64-unknown-illumos/lib/rustlib/src/rust/library/std/src/thread/local.rs:279:15
34: tokio::runtime::context::set_scheduler
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/context.rs:176:9
35: tokio::runtime::scheduler::multi_thread::worker::run::{{closure}}
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/scheduler/multi_thread/worker.rs:499:9
36: tokio::runtime::context::runtime::enter_runtime
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/context/runtime.rs:65:16
37: tokio::runtime::scheduler::multi_thread::worker::run
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/scheduler/multi_thread/worker.rs:491:5
38: tokio::runtime::scheduler::multi_thread::worker::Launch::launch::{{closure}}
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/scheduler/multi_thread/worker.rs:457:45
39: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/blocking/task.rs:42:21
40: tokio::runtime::task::core::Core<T,S>::poll::{{closure}}
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/task/core.rs:365:17
41: with_mut<tokio::runtime::task::core::Stage<tokio::runtime::blocking::task::BlockingTask<tokio::runtime::scheduler::multi_thread::worker::{impl#0}::launch::{closure_env#0}>>, core::task::poll::Poll<()>, tokio::runtime::task::core::{impl#6}::poll::{closure_env#0}<tokio::runtime::blocking::task::BlockingTask<tokio::runtime::scheduler::multi_thread::worker::{impl#0}::launch::{closure_env#0}>, tokio::runtime::blocking::schedule::BlockingSchedule>>
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/loom/std/unsafe_cell.rs:16:9
42: poll<tokio::runtime::blocking::task::BlockingTask<tokio::runtime::scheduler::multi_thread::worker::{impl#0}::launch::{closure_env#0}>, tokio::runtime::blocking::schedule::BlockingSchedule>
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/task/core.rs:354:13
43: tokio::runtime::task::harness::poll_future::{{closure}}
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/task/harness.rs:535:19
44: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
at /home/dap/.rustup/toolchains/1.88.0-x86_64-unknown-illumos/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:272:9
45: do_call<core::panic::unwind_safe::AssertUnwindSafe<tokio::runtime::task::harness::poll_future::{closure_env#0}<tokio::runtime::blocking::task::BlockingTask<tokio::runtime::scheduler::multi_thread::worker::{impl#0}::launch::{closure_env#0}>, tokio::runtime::blocking::schedule::BlockingSchedule>>, core::task::poll::Poll<()>>
at /home/dap/.rustup/toolchains/1.88.0-x86_64-unknown-illumos/lib/rustlib/src/rust/library/std/src/panicking.rs:589:40
46: __rust_try
47: try<core::task::poll::Poll<()>, core::panic::unwind_safe::AssertUnwindSafe<tokio::runtime::task::harness::poll_future::{closure_env#0}<tokio::runtime::blocking::task::BlockingTask<tokio::runtime::scheduler::multi_thread::worker::{impl#0}::launch::{closure_env#0}>, tokio::runtime::blocking::schedule::BlockingSchedule>>>
at /home/dap/.rustup/toolchains/1.88.0-x86_64-unknown-illumos/lib/rustlib/src/rust/library/std/src/panicking.rs:552:19
48: catch_unwind<core::panic::unwind_safe::AssertUnwindSafe<tokio::runtime::task::harness::poll_future::{closure_env#0}<tokio::runtime::blocking::task::BlockingTask<tokio::runtime::scheduler::multi_thread::worker::{impl#0}::launch::{closure_env#0}>, tokio::runtime::blocking::schedule::BlockingSchedule>>, core::task::poll::Poll<()>>
at /home/dap/.rustup/toolchains/1.88.0-x86_64-unknown-illumos/lib/rustlib/src/rust/library/std/src/panic.rs:359:14
49: tokio::runtime::task::harness::poll_future
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/task/harness.rs:523:18
50: tokio::runtime::task::harness::Harness<T,S>::poll_inner
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/task/harness.rs:210:27
51: tokio::runtime::task::harness::Harness<T,S>::poll
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/task/harness.rs:155:15
52: tokio::runtime::task::raw::poll
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/task/raw.rs:325:5
53: tokio::runtime::task::raw::RawTask::poll
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/task/raw.rs:255:18
54: tokio::runtime::task::UnownedTask<S>::run
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/task/mod.rs:546:9
55: tokio::runtime::blocking::pool::Task::run
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/blocking/pool.rs:161:9
56: run
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/blocking/pool.rs:516:17
57: tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}}
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/blocking/pool.rs:474:13
58: __rust_begin_short_backtrace<tokio::runtime::blocking::pool::{impl#6}::spawn_thread::{closure_env#0}, ()>
at /home/dap/.rustup/toolchains/1.88.0-x86_64-unknown-illumos/lib/rustlib/src/rust/library/std/src/sys/backtrace.rs:152:18
59: std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}
at /home/dap/.rustup/toolchains/1.88.0-x86_64-unknown-illumos/lib/rustlib/src/rust/library/std/src/thread/mod.rs:559:17
60: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
at /home/dap/.rustup/toolchains/1.88.0-x86_64-unknown-illumos/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:272:9
61: do_call<core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<tokio::runtime::blocking::pool::{impl#6}::spawn_thread::{closure_env#0}, ()>>, ()>
at /home/dap/.rustup/toolchains/1.88.0-x86_64-unknown-illumos/lib/rustlib/src/rust/library/std/src/panicking.rs:589:40
62: __rust_try
63: try<(), core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<tokio::runtime::blocking::pool::{impl#6}::spawn_thread::{closure_env#0}, ()>>>
at /home/dap/.rustup/toolchains/1.88.0-x86_64-unknown-illumos/lib/rustlib/src/rust/library/std/src/panicking.rs:552:19
64: catch_unwind<core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<tokio::runtime::blocking::pool::{impl#6}::spawn_thread::{closure_env#0}, ()>>, ()>
at /home/dap/.rustup/toolchains/1.88.0-x86_64-unknown-illumos/lib/rustlib/src/rust/library/std/src/panic.rs:359:14
65: {closure#1}<tokio::runtime::blocking::pool::{impl#6}::spawn_thread::{closure_env#0}, ()>
at /home/dap/.rustup/toolchains/1.88.0-x86_64-unknown-illumos/lib/rustlib/src/rust/library/std/src/thread/mod.rs:557:30
66: core::ops::function::FnOnce::call_once{{vtable.shim}}
at /home/dap/.rustup/toolchains/1.88.0-x86_64-unknown-illumos/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5
67: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/alloc/src/boxed.rs:1966:9
68: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/alloc/src/boxed.rs:1966:9
69: std::sys::pal::unix::thread::Thread::new::thread_start
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/std/src/sys/pal/unix/thread.rs:97:17
70: _thrp_setup
71: <unknown>
What's the size of these backtraces? Presumably we're going to have an extra one for each live DB claim now. Like latency - maybe this is acceptable! just want to know what we're dealing with
We're saving the serialized form, so to give an example, the above one is about 16 KiB. Let's call it 32 KiB to be conservative. Each connection can have at most one of these at a time. At 16 possible connections per backend, and 5 CockroachDB backends (not sure if I'm remembering those right), that's 2.5 MiB per Nexus process. Seems fine.
| pub async fn claim(&self) -> Result<DataStoreConnection, Error> { | ||
| let id = self.next_id.fetch_add(1, Ordering::SeqCst); | ||
| let held_since = Utc::now(); | ||
| let debug = Backtrace::force_capture().to_string(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we measure this? What's the cost of the backtrace relative to accessing the DB connection?
I think my opinion might change if this was like, < 10% vs 100%+ of the total time to get a claim by itself.
Couple other questions:
- Is this output meaningful for us, given that we're in an async calling context? (I assume "yes" because the task is running, just wanna make sure these don't all point to the same spot in the tokio runtime)
- What's the size of these backtraces? Presumably we're going to have an extra one for each live DB claim now. Like latency - maybe this is acceptable! just want to know what we're dealing with
| pub async fn claim(&self) -> Result<DataStoreConnection, Error> { | ||
| let id = self.next_id.fetch_add(1, Ordering::SeqCst); | ||
| let held_since = Utc::now(); | ||
| let debug = Backtrace::force_capture().to_string(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this is paranoia on my part, but:
Does this backtrace contain any argument info, or is it purely the callstack? I understand this is mostly for omdb-level debugging, but I don't want it to be a side-channel for exfiltrating user data.
(EDIT: seems like "no" on the arguments)
| false | ||
| } else { | ||
| q.claims_held | ||
| .insert_unique(HeldDbClaimInfo { id, held_since, debug }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is probably fine, but to say it explicitly - this is a capture of where the claim started, not where the claim may be "stuck in-usage", right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. omdb reports this as acquired by <stack trace> (i.e., "acquired by this code path").
|
As a higher-level comment on the semantics we want:
For sagas, is that sufficient? E.g., if we have a saga that partially completed, but "this nexus isn't trying to run it" (or it was, but it crashed and rebooted) couldn't we have a system where all Nexuses identify "I'm not running sagas NOW", but there's still on-disk state identifying a partially executed saga? |
| async fn quiesce_start( | ||
| rqctx: RequestContext<Self::Context>, | ||
| ) -> Result<HttpResponseUpdatedNoContent, HttpError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be concerned about the case where:
- Nexus is quiescing (old Nexuses are running, as are new Nexuses)
- The old nexuses all quiesce, but one of them, for whatever reason, thinks that quiescing isn't done. Maybe it can't hear back from other old nexuses for some reason.
- The new nexuses boot, perform schema changes, resume their lives
- The old nexus retries: it reads internal DNS, sees records for all the new nexuses, and tells them to quiesce
- Now the system has an outage
Should we have an argument here, that's "only start quiescing if you are ________", where "_______" might be "running an old version", or something like that? Just something to make it conditional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API is really for demo, dev, and testing. My expectation is not that it will ever be invoked in a deployed system. This means the situation you're describing can't happen because the old Nexus would never tell the new Nexus instances to quiesce.
I expect it to work something like this:
- Nexus decides it's time to quiesce (probably in the Reconfigurator executor) and writes
quiesce_startedto the database. - Nexus invokes
Nexus::quiesce_start()(maybe from the same code path). - Nexus polls the other Nexus instances to see when they've finished quiescing.
- When they have all finished, Nexus writes
quiesce_done.
I'm not sure that's exactly right but my main point is that Nexus is deciding for itself when to quiesce itself, and that code will be calling this code. It won't be some external thing telling it to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my main point is that Nexus is deciding for itself when to quiesce itself
ack, makes sense. This is the key piece - we should have enough context when observing our own "expected version" vs the db_metadata table to make this call in the immediate future.
| // appears before messages from code paths that saw this change. | ||
| info!(&self.log, "starting db pool quiesce"); | ||
| self.quiesce.send_modify(|q| { | ||
| q.new_claims_allowed = ClaimsAllowed::Disallowed; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we care if we were already in the Disallowed state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it matters. With this sort of API you can either make it idempotent (like this: do nothing if it's already done) or have it fail and communicate this case and let callers implement the idempotence on top if they want. I don't see a reason to do the latter here. (In practice, there is only one caller, and it can only ever call it once in the lifetime of the process. If it did call it more than once, this would work fine.)
davepacheco
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tl;dr: I think there are several ways this work is incomplete, ranging from "important and solvable problems" to "not so important and hard to solve problems". I still think this PR is a step forward so I think we should proceed with landing it while also fixing and filing separate issues for the other problems.
Unrelated to the discussions here, I realized that I think this PR does not account for sagas that go through recovery (i.e., those that were not created during the lifetime of this process, but rather were resumed from the database). Those won't show up in sagas_running and so Nexus could think it's quiesced even though there are (recovered) sagas still running. It would even rip the database out from under them, preventing them from finishing. This definitely needs to be fixed. I'm hopeful it will be straightforward (see below) so I'm inclined to do that in a follow-on PR (shortly) because this one's already big, most of the contents here shouldn't change, and nothing depends on quiesce yet so there's no new risk introduced by landing this one. Let me know if you disagree!
My plan here:
- pass the saga
watch::Sender<Quiesce>channel to the saga recovery background task - have that background task keep
sagas_runningup to date for recovered sagas the same way we do it for newly-created sagas (i.e., insert them intosagas_runningwhen we resume them, then remove them when they finish) - add a boolean flag to that structure that reflects "saga recovery has completed at least once", then don't consider sagas quiesced until that's true and the existing conditions are true (sagas disallowed and no sagas running).
That brings me to @smklein's question:
As a higher-level comment on the semantics we want:
* For DB connections, this PR 100% makes sense to me * For sagas, it seems like we're saying "Quiesce is complete for this Nexus when no sagas are actively running"For sagas, is that sufficient? E.g., if we have a saga that partially completed, but "this nexus isn't trying to run it" (or it was, but it crashed and rebooted) couldn't we have a system where all Nexuses identify "I'm not running sagas NOW", but there's still on-disk state identifying a partially executed saga?
Under normal operation, it shouldn't be possible for there to be unfinished sagas that a Nexus is supposed to be running that it doesn't know about. Aside from bugs, there are two cases where this could happen and they both should be transient. The first case is saga recovery after a Nexus restart. Immediately on startup, Nexus will think there are no sagas running, but that's because it hasn't finished recovering them yet. This is the case that I mentioned above and I think we have a plausible plan to fixing it.
The second case is a little trickier: it's when Nexus re-assigns sagas to itself that were owned by another Nexus that has recently been expunged. Concretely: suppose we have a sled with Nexus N1 running a saga S1. The sled bursts into flames and is expunged. During the next blueprint execution, responding to the expungement, Nexus N2 assigns saga S1 to itself. As part of a concurrent upgrade, N2 decides to quiesce itself. It hasn't actually recovered S1 so it doesn't know about it yet. It may think it's quiesced, then try to recover it. That wouldn't be great! This specific case could probably be eliminated, either by having the quiesce state reflect that there may be unrecovered sagas as a result of this situation or (probably better) disallowing us from assigning ourselves sagas once we've started quiescing. I may try to incorporate that into the follow-on PR.
This case caused me to see a a deeper problem with this mechanism. This is extremely unlikely, but suppose:
- we're doing an upgrade and decide to start quiescing
- there are sagas running in some Nexus N1, so it's waiting for those to finish
- there are no sagas running in any other Nexus so they've disabled their database access
- the sled hosting N1 fails and needs to be expunged
With my suggested fixes above, there'd be no way to expunge the busted sled because the only remaining Nexus instances have shut off their database access. The upgrade would be stuck, since quiescing cannot complete without affirmative confirmation from all in-service Nexus instances. In this situation, support would need to either pause the upgrade before the handoff, get Nexus back up, do the expungement, and then proceed again; or else force the handoff (e.g., writing quiesce_completed) and allowing the saga to be abandoned. This sounds bad in that it seems hard to fix, but it's also extremely unlikely (both sled and disk failure seem quite rare in practice and this would be a pretty small window), and it is recoverable, so I'd be comfortable filing an "important non-blocker" issue on this.
This raises a related problem, which is that if one Nexus takes a lot longer to drain its sagas than the others, then the other ones are going to be pitching 503s for a long time. This suggests we might be better off having an extra step in the middle of quiesce which is "wait for other Nexus instances to report having drained their sagas before disabling database access". That way, all Nexus instances would keep doing most things (just not creating new sagas) until all sagas are done. Then they'd all shut off database access about the same time. This may well be worthwhile but I'd say that's well outside the scope of our initial upgrade MVP.
All of that said, my inclination would be:
- land this PR as-is
- do a follow-on PR (a blocker for actually using this work) that deals with recovered sagas and prevents sagas from being re-assigned while we're quiescing
- file issues for:
- dealing with needing to expunge a Nexus during quiesce
- having Nexus instances synchronize before moving through the quiesce stages (to minimize the window of downtime and the exposure to the above problem)
| pub async fn claim(&self) -> Result<DataStoreConnection, Error> { | ||
| let id = self.next_id.fetch_add(1, Ordering::SeqCst); | ||
| let held_since = Utc::now(); | ||
| let debug = Backtrace::force_capture().to_string(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we measure this? What's the cost of the backtrace relative to accessing the DB connection?
We could. To be precise, what I think matters most is how long the backtrace takes relative to the time that the connection is held, not how long it took to acquire the connection. So what I'd probably do is:
- define some USDT probes before and after creating the backtrace
- write a D script that traces these probes, plus the qorb claim acquired / released probes and breaks out the time spent in these two areas
- deploy somewhere (best would be a racklette) and trace it for, like, 10 minutes
I'm honestly not sure it's worth the effort. A cheaper thing I could do is measure how long it takes to do a few backtraces on my own dev system and make sure it's as quick as I'd hope. I'd expect this to be on the order of tens (maybe hundreds) of microseconds and I'd expect it to be dwarfed by even a basic database query round-trip.
Is this output meaningful for us, given that we're in an async calling context? (I assume "yes" because the task is running, just wanna make sure these don't all point to the same spot in the tokio runtime)
Yup. Here's an example:
Details
claim 1248 held since 2025-08-01 21:17:03.232936025 UTC (2s 7ms ago)
acquired by:
0: {async_fn#0}
at ./nexus/db-queries/src/db/pool.rs:197:21
1: {async_fn#0}
at ./nexus/db-queries/src/db/datastore/mod.rs:379:27
2: {async_fn#0}
at ./nexus/db-queries/src/db/datastore/mod.rs:368:45
3: {async_fn#0}
at ./nexus/db-queries/src/db/datastore/deployment.rs:959:59
4: {async_fn#0}
at ./nexus/db-queries/src/db/datastore/deployment.rs:2294:70
5: {async_block#0}
at ./nexus/src/app/background/tasks/blueprint_load.rs:60:18
6: poll<alloc::boxed::Box<(dyn core::future::future::Future<Output=serde_json::value::Value> + core::marker::Send), alloc::alloc::Global>>
at /home/dap/.rustup/toolchains/1.88.0-x86_64-unknown-illumos/lib/rustlib/src/rust/library/core/src/future/future.rs:124:9
7: {async_fn#0}
at ./nexus/src/app/background/driver.rs:339:54
8: {async_fn#0}
at ./nexus/src/app/background/driver.rs:292:62
9: tokio::runtime::task::core::Core<T,S>::poll::{{closure}}
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/task/core.rs:365:17
10: with_mut<tokio::runtime::task::core::Stage<omicron_nexus::app::background::driver::{impl#2}::run::{async_fn_env#0}>, core::task::poll::Poll<()>, tokio::runtime::task::core::{impl#6}::poll::{closure_env#0}<omicron_nexus::app::background::driver::{impl#2}::run::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>>>
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/loom/std/unsafe_cell.rs:16:9
11: poll<omicron_nexus::app::background::driver::{impl#2}::run::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>>
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/task/core.rs:354:13
12: tokio::runtime::task::harness::poll_future::{{closure}}
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/task/harness.rs:535:19
13: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
at /home/dap/.rustup/toolchains/1.88.0-x86_64-unknown-illumos/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:272:9
14: do_call<core::panic::unwind_safe::AssertUnwindSafe<tokio::runtime::task::harness::poll_future::{closure_env#0}<omicron_nexus::app::background::driver::{impl#2}::run::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>>>, core::task::poll::Poll<()>>
at /home/dap/.rustup/toolchains/1.88.0-x86_64-unknown-illumos/lib/rustlib/src/rust/library/std/src/panicking.rs:589:40
15: __rust_try
16: try<core::task::poll::Poll<()>, core::panic::unwind_safe::AssertUnwindSafe<tokio::runtime::task::harness::poll_future::{closure_env#0}<omicron_nexus::app::background::driver::{impl#2}::run::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>>>>
at /home/dap/.rustup/toolchains/1.88.0-x86_64-unknown-illumos/lib/rustlib/src/rust/library/std/src/panicking.rs:552:19
17: catch_unwind<core::panic::unwind_safe::AssertUnwindSafe<tokio::runtime::task::harness::poll_future::{closure_env#0}<omicron_nexus::app::background::driver::{impl#2}::run::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>>>, core::task::poll::Poll<()>>
at /home/dap/.rustup/toolchains/1.88.0-x86_64-unknown-illumos/lib/rustlib/src/rust/library/std/src/panic.rs:359:14
18: tokio::runtime::task::harness::poll_future
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/task/harness.rs:523:18
19: tokio::runtime::task::harness::Harness<T,S>::poll_inner
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/task/harness.rs:210:27
20: tokio::runtime::task::harness::Harness<T,S>::poll
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/task/harness.rs:155:15
21: tokio::runtime::task::raw::poll
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/task/raw.rs:325:5
22: tokio::runtime::task::raw::RawTask::poll
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/task/raw.rs:255:18
23: tokio::runtime::task::LocalNotified<S>::run
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/task/mod.rs:509:9
24: {closure#0}
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/scheduler/multi_thread/worker.rs:600:13
25: with_budget<core::result::Result<alloc::boxed::Box<tokio::runtime::scheduler::multi_thread::worker::Core, alloc::alloc::Global>, ()>, tokio::runtime::scheduler::multi_thread::worker::{impl#1}::run_task::{closure_env#0}>
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/task/coop/mod.rs:167:5
26: budget<core::result::Result<alloc::boxed::Box<tokio::runtime::scheduler::multi_thread::worker::Core, alloc::alloc::Global>, ()>, tokio::runtime::scheduler::multi_thread::worker::{impl#1}::run_task::{closure_env#0}>
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/task/coop/mod.rs:133:5
27: run_task
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/scheduler/multi_thread/worker.rs:591:9
28: tokio::runtime::scheduler::multi_thread::worker::Context::run
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/scheduler/multi_thread/worker.rs:539:24
29: tokio::runtime::scheduler::multi_thread::worker::run::{{closure}}::{{closure}}
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/scheduler/multi_thread/worker.rs:504:21
30: tokio::runtime::context::scoped::Scoped<T>::set
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/context/scoped.rs:40:9
31: tokio::runtime::context::set_scheduler::{{closure}}
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/context.rs:176:26
32: try_with<tokio::runtime::context::Context, tokio::runtime::context::set_scheduler::{closure_env#0}<(), tokio::runtime::scheduler::multi_thread::worker::run::{closure#0}::{closure_env#0}>, ()>
at /home/dap/.rustup/toolchains/1.88.0-x86_64-unknown-illumos/lib/rustlib/src/rust/library/std/src/thread/local.rs:315:12
33: std::thread::local::LocalKey<T>::with
at /home/dap/.rustup/toolchains/1.88.0-x86_64-unknown-illumos/lib/rustlib/src/rust/library/std/src/thread/local.rs:279:15
34: tokio::runtime::context::set_scheduler
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/context.rs:176:9
35: tokio::runtime::scheduler::multi_thread::worker::run::{{closure}}
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/scheduler/multi_thread/worker.rs:499:9
36: tokio::runtime::context::runtime::enter_runtime
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/context/runtime.rs:65:16
37: tokio::runtime::scheduler::multi_thread::worker::run
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/scheduler/multi_thread/worker.rs:491:5
38: tokio::runtime::scheduler::multi_thread::worker::Launch::launch::{{closure}}
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/scheduler/multi_thread/worker.rs:457:45
39: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/blocking/task.rs:42:21
40: tokio::runtime::task::core::Core<T,S>::poll::{{closure}}
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/task/core.rs:365:17
41: with_mut<tokio::runtime::task::core::Stage<tokio::runtime::blocking::task::BlockingTask<tokio::runtime::scheduler::multi_thread::worker::{impl#0}::launch::{closure_env#0}>>, core::task::poll::Poll<()>, tokio::runtime::task::core::{impl#6}::poll::{closure_env#0}<tokio::runtime::blocking::task::BlockingTask<tokio::runtime::scheduler::multi_thread::worker::{impl#0}::launch::{closure_env#0}>, tokio::runtime::blocking::schedule::BlockingSchedule>>
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/loom/std/unsafe_cell.rs:16:9
42: poll<tokio::runtime::blocking::task::BlockingTask<tokio::runtime::scheduler::multi_thread::worker::{impl#0}::launch::{closure_env#0}>, tokio::runtime::blocking::schedule::BlockingSchedule>
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/task/core.rs:354:13
43: tokio::runtime::task::harness::poll_future::{{closure}}
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/task/harness.rs:535:19
44: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
at /home/dap/.rustup/toolchains/1.88.0-x86_64-unknown-illumos/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:272:9
45: do_call<core::panic::unwind_safe::AssertUnwindSafe<tokio::runtime::task::harness::poll_future::{closure_env#0}<tokio::runtime::blocking::task::BlockingTask<tokio::runtime::scheduler::multi_thread::worker::{impl#0}::launch::{closure_env#0}>, tokio::runtime::blocking::schedule::BlockingSchedule>>, core::task::poll::Poll<()>>
at /home/dap/.rustup/toolchains/1.88.0-x86_64-unknown-illumos/lib/rustlib/src/rust/library/std/src/panicking.rs:589:40
46: __rust_try
47: try<core::task::poll::Poll<()>, core::panic::unwind_safe::AssertUnwindSafe<tokio::runtime::task::harness::poll_future::{closure_env#0}<tokio::runtime::blocking::task::BlockingTask<tokio::runtime::scheduler::multi_thread::worker::{impl#0}::launch::{closure_env#0}>, tokio::runtime::blocking::schedule::BlockingSchedule>>>
at /home/dap/.rustup/toolchains/1.88.0-x86_64-unknown-illumos/lib/rustlib/src/rust/library/std/src/panicking.rs:552:19
48: catch_unwind<core::panic::unwind_safe::AssertUnwindSafe<tokio::runtime::task::harness::poll_future::{closure_env#0}<tokio::runtime::blocking::task::BlockingTask<tokio::runtime::scheduler::multi_thread::worker::{impl#0}::launch::{closure_env#0}>, tokio::runtime::blocking::schedule::BlockingSchedule>>, core::task::poll::Poll<()>>
at /home/dap/.rustup/toolchains/1.88.0-x86_64-unknown-illumos/lib/rustlib/src/rust/library/std/src/panic.rs:359:14
49: tokio::runtime::task::harness::poll_future
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/task/harness.rs:523:18
50: tokio::runtime::task::harness::Harness<T,S>::poll_inner
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/task/harness.rs:210:27
51: tokio::runtime::task::harness::Harness<T,S>::poll
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/task/harness.rs:155:15
52: tokio::runtime::task::raw::poll
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/task/raw.rs:325:5
53: tokio::runtime::task::raw::RawTask::poll
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/task/raw.rs:255:18
54: tokio::runtime::task::UnownedTask<S>::run
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/task/mod.rs:546:9
55: tokio::runtime::blocking::pool::Task::run
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/blocking/pool.rs:161:9
56: run
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/blocking/pool.rs:516:17
57: tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}}
at /home/dap/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.47.0/src/runtime/blocking/pool.rs:474:13
58: __rust_begin_short_backtrace<tokio::runtime::blocking::pool::{impl#6}::spawn_thread::{closure_env#0}, ()>
at /home/dap/.rustup/toolchains/1.88.0-x86_64-unknown-illumos/lib/rustlib/src/rust/library/std/src/sys/backtrace.rs:152:18
59: std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}
at /home/dap/.rustup/toolchains/1.88.0-x86_64-unknown-illumos/lib/rustlib/src/rust/library/std/src/thread/mod.rs:559:17
60: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
at /home/dap/.rustup/toolchains/1.88.0-x86_64-unknown-illumos/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:272:9
61: do_call<core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<tokio::runtime::blocking::pool::{impl#6}::spawn_thread::{closure_env#0}, ()>>, ()>
at /home/dap/.rustup/toolchains/1.88.0-x86_64-unknown-illumos/lib/rustlib/src/rust/library/std/src/panicking.rs:589:40
62: __rust_try
63: try<(), core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<tokio::runtime::blocking::pool::{impl#6}::spawn_thread::{closure_env#0}, ()>>>
at /home/dap/.rustup/toolchains/1.88.0-x86_64-unknown-illumos/lib/rustlib/src/rust/library/std/src/panicking.rs:552:19
64: catch_unwind<core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<tokio::runtime::blocking::pool::{impl#6}::spawn_thread::{closure_env#0}, ()>>, ()>
at /home/dap/.rustup/toolchains/1.88.0-x86_64-unknown-illumos/lib/rustlib/src/rust/library/std/src/panic.rs:359:14
65: {closure#1}<tokio::runtime::blocking::pool::{impl#6}::spawn_thread::{closure_env#0}, ()>
at /home/dap/.rustup/toolchains/1.88.0-x86_64-unknown-illumos/lib/rustlib/src/rust/library/std/src/thread/mod.rs:557:30
66: core::ops::function::FnOnce::call_once{{vtable.shim}}
at /home/dap/.rustup/toolchains/1.88.0-x86_64-unknown-illumos/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5
67: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/alloc/src/boxed.rs:1966:9
68: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/alloc/src/boxed.rs:1966:9
69: std::sys::pal::unix::thread::Thread::new::thread_start
at /rustc/6b00bc3880198600130e1cf62b8f8a93494488cc/library/std/src/sys/pal/unix/thread.rs:97:17
70: _thrp_setup
71: <unknown>
What's the size of these backtraces? Presumably we're going to have an extra one for each live DB claim now. Like latency - maybe this is acceptable! just want to know what we're dealing with
We're saving the serialized form, so to give an example, the above one is about 16 KiB. Let's call it 32 KiB to be conservative. Each connection can have at most one of these at a time. At 16 possible connections per backend, and 5 CockroachDB backends (not sure if I'm remembering those right), that's 2.5 MiB per Nexus process. Seems fine.
| false | ||
| } else { | ||
| q.claims_held | ||
| .insert_unique(HeldDbClaimInfo { id, held_since, debug }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. omdb reports this as acquired by <stack trace> (i.e., "acquired by this code path").
| // appears before messages from code paths that saw this change. | ||
| info!(&self.log, "starting db pool quiesce"); | ||
| self.quiesce.send_modify(|q| { | ||
| q.new_claims_allowed = ClaimsAllowed::Disallowed; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it matters. With this sort of API you can either make it idempotent (like this: do nothing if it's already done) or have it fail and communicate this case and let callers implement the idempotence on top if they want. I don't see a reason to do the latter here. (In practice, there is only one caller, and it can only ever call it once in the lifetime of the process. If it did call it more than once, this would work fine.)
| async fn quiesce_start( | ||
| rqctx: RequestContext<Self::Context>, | ||
| ) -> Result<HttpResponseUpdatedNoContent, HttpError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API is really for demo, dev, and testing. My expectation is not that it will ever be invoked in a deployed system. This means the situation you're describing can't happen because the old Nexus would never tell the new Nexus instances to quiesce.
I expect it to work something like this:
- Nexus decides it's time to quiesce (probably in the Reconfigurator executor) and writes
quiesce_startedto the database. - Nexus invokes
Nexus::quiesce_start()(maybe from the same code path). - Nexus polls the other Nexus instances to see when they've finished quiescing.
- When they have all finished, Nexus writes
quiesce_done.
I'm not sure that's exactly right but my main point is that Nexus is deciding for itself when to quiesce itself, and that code will be calling this code. It won't be some external thing telling it to do this.
smklein
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main holdup with this PR was "dealing with sagas in the quiesce case" - if we're planning on punting that to later, fine with me, let's just make sure we get an issue filed before merging.
See below.
This is now up for review: #8794.
I had said I wanted to land this PR as-is, but I realized what I really meant was I wanted to get it to the point where it was ready to land so that the follow-up could be reviewed independent of this one. Now that #8794 is up for review, my plan is to get that one ready to land (address any review feedback), then land both of these together. Although there shouldn't be much risk here, given the timing, I figure I'll wait for the R17 gates to open. |
Fixes #8504.
This implementation adds:
quiesceprocess and anomdbcommand to call itquiescestatus and show what's keeping it from finishing (for debugging), plus onomdbcommand to call itThe implementation goes through a sequence of stages:
Much of the diff here related only to tests. If it's helpful, I could break this PR into smaller pieces, but I think that might actually be more confusing because it's harder to see how the pieces fit together.