Replace redis with file based mailbox storage#915
Replace redis with file based mailbox storage#915spacebear21 merged 5 commits intopayjoin:masterfrom
Conversation
|
I spoke with Dan about this and whether or not we should remove redis entirely or keep it feature gated. We both lean towards removing entirely. My reasoning is that the main benefit of an external database server is the ability to have multiple stateless directory instances. However, it doesn't really make sense to scale this horizontally since the directory's request (honest) throughput is implicitly bound in relation to bitcoin's throughput, on the order of tens of requests per second, so there is no scenario where even the most modest hardware can't handle this with one directory instance or where the database wouldn't itself be the bottleneck. Anyway I will be rebasing this PR on top of my changes to utilize the config crate in the directory, consistent with the way payjoin-cli uses it, and implementing the plain files on disk approach to persistence since that seems to be the simplest. I will keep the feature gating commit in the history because it might be useful to revert the removal at a later point if we do find a good reason to support redis. |
Pull Request Test Coverage Report for Build 17861723379Details
💛 - Coveralls |
b3219d8 to
88d5b8b
Compare
|
I created 2 tests to cover the mutants for these. This function can be simplified with that above helper function. rust-payjoin/payjoin/src/core/receive/v2/mod.rs Lines 1288 to 1316 in 87b0b69 |
|
perfect timing i was just about to do this myself good thing i checked my phone first... thanks so much! |
88d5b8b to
e67acc0
Compare
8f70824 to
9a2dbc4
Compare
9a2dbc4 to
b90dd72
Compare
spacebear21
left a comment
There was a problem hiding this comment.
code ACK, thank you for putting so much thought into this, this PR was a pleasure to read.
I'm not sure what is a good way to test this. How would you suggest validating this change before we deploy it to the production server?
| /// | ||
| /// Defaults to around 2e6, for a generous upper bound rounded up from ~2 | ||
| /// mailboxes/tx, ~4K txs/block, and ~144 blocks/24h. | ||
| const DEFAULT_CAPACITY: usize = 1 << (1 + 12 + 8); |
There was a problem hiding this comment.
It's unclear to me from the docstring what the special meaning of 1 + 12 + 8 is?
There was a problem hiding this comment.
it's just a power of two that's a loose bound on the estimate in the comment,
| // XOR data with a random pattern to obfuscate v1 requests | ||
| // and writing malicious data such as virus fingerprints | ||
| let xor: Vec<u8>; | ||
| let xor_file = dir.join("xor.dat"); | ||
| if fs::try_exists(&xor_file).await? { | ||
| xor = fs::read(xor_file).await?; | ||
| } else { | ||
| xor = OsRng.next_u64().to_ne_bytes().to_vec(); | ||
| let mut file = fs::File::create_new(xor_file).await?; | ||
| file.write_all(&xor).await?; | ||
| file.sync_all().await?; | ||
| } |
There was a problem hiding this comment.
This seemingly contradicts the commit message, which states that "v1 requests and responses are not [saved to disk]". Is XORing still useful if we are only writing encrypted v2 requests to disk?
There was a problem hiding this comment.
good catch, the comment predates that change, before I realized there's no point in storing the sender's original PSBT either since if the sender is gone there's no point in replying, i'll update the comment
honestly produced v2 requests are encrypted, but the directory has no way of verifying that, so a malicious client might get a directory to store something illegal on its drive, or even just something that triggers an antivirus program and causes disruption to the service, so XOR obfuscation makes sense for v2 as well
| // FIXME why does this fail? | ||
| // db.prune().await.expect("pruning should not fail"); | ||
| // assert_eq!(db.mailboxes.lock().await.len(), 1); | ||
| // mailbox seems to get pruned prematurely | ||
| // likely cause is it's in both read and insert queue, and two pruning runs | ||
| // are needed to fully clear it? | ||
|
|
||
| // mark the mailbox as read | ||
| _ = db.wait_for_v2_payload(&id).await.expect("waiting for payload should succeed"); | ||
|
|
||
| assert_eq!(db.mailboxes.lock().await.len(), 1); | ||
| // FIXME db.prune().await.expect("pruning should not fail"); |
There was a problem hiding this comment.
I noticed a few FIXME comments leftover here and in other places, do you plan on addressing these here or defer as follow-ups?
There was a problem hiding this comment.
i'm undecided, there's some hassles with these tests because this uses SystemTime, not tokio::time::Instant (so therefore mocking time is trickier) causing these commented out assertions to be a bit flakey...
there's no guarantee that filesystem timestamps are monotonically increasing, so if the system clock is readjusted this can cause some issues as well such as nominal creation times in the future, and time arithmetic can fail. i went back and forth between the two time representations and kinda settled on the current implementation but i don't like the fact that it can't be reliably tested, nor am i 100% convinced that the pruning behavior is entirely correct. the consequences are not very serious if there is a bug so i'm happy handling in a followup.
one thing i considered but ultimately decided against, without much conviction, is defining a new Error type that's basically enum { IO(std::io::Error), Time(SystemTime::Error) } to account for the dependence on the clock being correct, but this seems like it leaked an implementation detail and there isn't much that could be done to handle that error
There was a problem hiding this comment.
This seems appropriate to follow up with in or after #1047?
There was a problem hiding this comment.
no that PR only affects the payjoin crate, and this only concerns the directory, but also SystemTime dependence is inherent since that's what the type of timestamp files on disk have
thanks for the kind words! makes me happy to read that
the tests all pass with this (after #981, before that there were a few tests that dependended on overwriting behavior). i also tested manually quite a bit. something that we could consider is to have a i would be interested to explore fuzzing options, that's something i really don't have any experience with but have some understanding of how it works, it seems well suited for this and another approach would be property testing |
|
I created a local branch that simply unlocks the integration tests on our ffi layer, it looks promising as we no longer get any of the runtime errors! 🥳 Dart
Python
Note:there are warnings though in the macos python test |
arminsabouri
left a comment
There was a problem hiding this comment.
utACK/codeACK
Core logic in files.rs checks out with me. I had a couple questions and nits.
Have you thought about deployment strategy? How do / should we migrate existing redis sessions?
| db::Error::OverCapacity => todo!(), // TODO temporarily unavailable? | ||
| db::Error::V1SenderUnavailable => todo!(), // TODO gone? |
There was a problem hiding this comment.
These may get resolve in a future commit but wanted to call them out incase they don't.
Either InternalServerError or 503 could make sense here.
There was a problem hiding this comment.
these need to be addressed in this PR, so on point
There was a problem hiding this comment.
went with a 503 ServiceUnavailable Error for OverCapacity and in some discussion with yuval decided that 410 Gone was appropriate for V1SenderUnavailable as the intention is to say that the sender is gone and not coming back so the user should try again with a new request.
i'm not 100% sure 410 Gone is appropriate if the same resource would become valid in the near future
some 4xx code that indicates that the client should not retry that particular request, but GET again, and then retry a different request with payjoin proposal
410 seems appropriate, even though it initial intention is seo, in our case saying that the sender is gone and not coming back so you should try a different request seems to be covered here
Ultimately it hinges on whether we think it appropriate to expect the sender to come back in a v1 payjoin?
There was a problem hiding this comment.
it's appropriate to expect that, the sender can just retry and the receiver can as well (but should not just retry posting the same PUT request, but first GET the new request by the sender to ensure it's not stale)
we can just specify (in BIP 77) that in these circumstances the directory should respond with 410, and that it's OK to retry in the specific manner (if it's a fresh request resulting from another get), so essentially this is an aesthetics/bikeshedding thing, choosing the most appropriate status code to convey the specific failure condition is more of a bikeshedding thing in practice (but helps convey the intent to implementors)
alternative codes i've considered: 404, 424 (obscure) 502 or 504 could all be argued to make sense in this context
There was a problem hiding this comment.
but should not just retry posting the same PUT request, but first GET the new request by the sender to ensure it's not stale
I think this is why 410 is appropriate as this implies the request used to get to this point should be discarded and a new GET request is needed.
| // Check that there isn't there's already a v1 waiter for this ID, can't | ||
| // accept write needs to be rejected. | ||
| if self.pending_v1.contains_key(id) { | ||
| return Err(Error::OverCapacity); |
There was a problem hiding this comment.
Is this the correct error to return?
There was a problem hiding this comment.
I think so, but the comment should explain why. Basically, it's correct that there's no capacity. There may be capacity in the future. This should only happen in case of a collision, in which case it makes sense to wait and try again later, if two the a v1 sender's message gets to the wrong receiver then it should be rejected because the payment address is unrecognized, however, for this to happen it'd need to happen in a 30 second window or so. Similarly, if an attacker managed to guess the short ID of a v1 session, then it's better not to directly leak that it succeeded in doing that (an over capacity error is still a statistical leak if random v2 writes do work).
| debug_assert!(self.unread_ttl_at_capacity < self.unread_ttl_below_capacity); | ||
| debug_assert!(self.pending_v1.iter().all(|(_, v)| !v.sender.is_closed())); | ||
|
|
||
| // Prune in flight requests, these can persist in the case of a |
There was a problem hiding this comment.
Nit: uncomplete comment.
There was a problem hiding this comment.
I just added.
"Prune in flight requests, these can persist in the case of an incomplete session"
Not sure if this actually is appropriate here..?
There was a problem hiding this comment.
once the last v2 getter for a mailbox times out and there are no more inflight requests, the shared future is still in the table
| pending | ||
| .sender | ||
| .send(Arc::new(payload)) | ||
| .expect("sending on oneshot channel must succeed"); |
There was a problem hiding this comment.
Just want to double check my understanding. If the receiver end timesout it will clean up the waitmap entry, correct?
There was a problem hiding this comment.
not on its own, that's what the .retain() line in prune does, after the last receiver times out the shared future will have a refcount of 1 since its only copy is in the waitmap, and those entries will be cleared on the next pass
b90dd72 to
7e9b405
Compare
55be8df to
033ffa3
Compare
3a1c934 to
56d2d47
Compare
This is a refactoring of the existing redis DbPool, simplifying it a bit and renaming the various methods for clarity. The trait is async, with its futures also satisfying `Send` in order to allow its use in the `hyper::Service` trait. DbPool was renamed to db::redis::Db (reexported as RedisDB) in anticipation of additional impls of the trait. Co-authored-by: benalleng <benalleng@gmail.com>
v2 payloads are saved to disk. v1 requests and responses are not, because they only make sense to keep around so long as the sender's connection is still active, and because they are not encrypted. In flight requests (long polls) are tracked waitmaps for notifying waiting readers of new payloads or v1 responses. Mailbox contents are retained for up to a week by default, or up to 24 hours at capacity. Read mailboxes have a grace period of 10 minutes before they are expired. For simplicity no concurrent hashmap is used, and the mutex is held during writes to disk. This is because concurrency is implicitly bound by bitcoin's transaction throughput limits, for the retention times described above a very generous upper bound is 25 writes per second. Because only mutual exclusion for the entire structure is used, pruning is done on a per need basis by the locking thread and not in a background task. Co-authored-by: benalleng <benalleng@gmail.com>
56d2d47 to
0b6d029
Compare
arminsabouri
left a comment
There was a problem hiding this comment.
re-utACK 0b6d029
Reviewed the pushes that occured from my last ack to now.
| help = "The redis host to connect to" | ||
| long = "storage-dir", | ||
| env = "PJ_STORAGE_DIR", | ||
| help = "A directory for writing mailbox data." |
There was a problem hiding this comment.
Nit: This is the sqlite database correct? if so its not only mailbox data
There was a problem hiding this comment.
no just files on disk
This PR first makes redis optional, and then removes it entirely, due to various issues with testcontainers and with the complexity of deploying a directory. If we ever want to restore redis support then the feature gating commits should make this a bit easier.
This means the directory can no longer be "horizontally scaled", i.e. it's got stateful in memory structures that were previously done using redis's pubsub. This is not an issue because of implicit bounds on throughput (no more than 25 writes/second even with very generous bounds).
Depends on #914
closes #419