Skip to content

Conversation

@conectado
Copy link
Owner

@conectado conectado commented Jul 9, 2025

refs #17

@conectado conectado force-pushed the concurrency-patterns-article branch from 669f5fa to 59c1785 Compare July 9, 2025 23:35
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Jul 9, 2025

Deploying taping-memory with  Cloudflare Pages  Cloudflare Pages

Latest commit: 7a436f7
Status: ✅  Deploy successful!
Preview URL: https://10698634.taping-memory.pages.dev
Branch Preview URL: https://concurrency-patterns-article.taping-memory.pages.dev

View logs

@conectado conectado mentioned this pull request Jul 9, 2025
@conectado conectado force-pushed the concurrency-patterns-article branch from e9ff83e to 66b2b67 Compare July 16, 2025 01:43
@thomaseizinger
Copy link
Contributor

thomaseizinger commented Jul 25, 2025

Relevant: firezone/firezone#10003

I think the learning here is that one needs to be mindful as to which components are part of these loops.

@conectado
Copy link
Owner Author

Relevant: firezone/firezone#10003

I think the learning here is that one needs to be mindful as to which components are part of these loops.

@thomaseizinger Yeah! That's a great example, but also, that's the way to do it, first implement it single-task then split it to multi-task when benchmarks suggest it's a good idea. I'll try to make it clearer in the conclusion that it can be very beneficial to have multiple tasks.

But in this particular case, there's something a bit weird and I might be missing some context. Most of the CPU time consumed by the phoenix channel is due to tracing-related stuff?

@thomaseizinger
Copy link
Contributor

thomaseizinger commented Jul 25, 2025

But in this particular case, there's something a bit weird and I might be missing some context. Most of the CPU time consumed by the phoenix channel is due to tracing-related stuff?

Yeah. I am not sure if it is a bug in tracing right now (I've found this: tokio-rs/tracing#3345) but it seems that even just having trace! macros in the hotpath can be quite expensive even if they are not active.

@thomaseizinger
Copy link
Contributor

Just gave this a read, very detailed explanation of the various trade-offs! I liked the section about error handling in particular.

I think sans-IO works especially well for completely UDP-based stuff because timeouts are your only way to react to any IO failures anyway.

Integrating that with TCP is a bit more difficult. For one, it is stream-based so if performance matters a lot, you may not want to send around heap-allocated messages all the time but directly operate on borrowed data. In the UDP case, using buffers pools helps here but for TCP that is more difficult due to the variation of the message sizes.

Two, the two-way communication in the case of errors is very annoying with channels as you demonstrated very well.

One thing that I would recommend you to mention is the complexity of task-wakeups. As elegant as they are in some ways, hand-rolled futures are very error-prone and I would not recommend them unless all other options are actually not workable. In Firezone, all IO (apart from phoenix-channel) is now actually using async but it is all in separate threads, using channels to interact with one, big sans-IO state machine and I am very happy with it. Task wake-up bugs are a very painful reality. (As an extension of that, so are state machine bugs in poll_timeout actually).

As a bottom-line, I think sticking to async where you can is good followed by building sans-IO stuff. Also, spawning tasks is always to be done with caution I think. For one, you often break back-pressure if you e.g. spawn a task for each incoming thing. Two, spawning tasks, similar to channels, breaks the stacktrace and thus error reporting. So structured concurrency is better there.

My 2c, great article on the whole! Happy to do some detailed editing on it if you'd like before you publish it :)

@conectado
Copy link
Owner Author

Just gave this a read, very detailed explanation of the various trade-offs! I liked the section about error handling in particular.

Thanks <3 and thanks for the feedback!

I think sans-IO works especially well for completely UDP-based stuff because timeouts are your only way to react to any IO failures anyway.

Integrating that with TCP is a bit more difficult. For one, it is stream-based so if performance matters a lot, you may not want to send around heap-allocated messages all the time but directly operate on borrowed data. In the UDP case, using buffers pools helps here but for TCP that is more difficult due to the variation of the message sizes.

You could still have a "performant" sans-IO approach without heap-allocating messages if you always immediately react to messages instead of buffering them within your sans-IO state?

I mean this isn't trivial but what I mean is this challenge would exist regardless of using sans-IO. TCP is quite complicated for performance-critical applications.

Two, the two-way communication in the case of errors is very annoying with channels as you demonstrated very well.

This is a problem with sans-IO?

One thing that I would recommend you to mention is the complexity of task-wakeups. As elegant as they are in some ways, hand-rolled futures are very error-prone and I would not recommend them unless all other options are actually not workable. In Firezone, all IO (apart from phoenix-channel) is now actually using async but it is all in separate threads, using channels to interact with one, big sans-IO state machine and I am very happy with it. Task wake-up bugs are a very painful reality. (As an extension of that, so are state machine bugs in poll_timeout actually).

Yeah! I definitely need to mention this and it's the biggest downside, I'll take a look into how you're doing it in Firezone now. I didn't think it was such a big problem that you'd rather use async. For me the biggest benefit about hand-rolling futures is that you get an error in the call-site so there's no need to keep track between the IO object that errored and the error.

As a bottom-line, I think sticking to async where you can is good followed by building sans-IO stuff. Also, spawning tasks is always to be done with caution I think. For one, you often break back-pressure if you e.g. spawn a task for each incoming thing. Two, spawning tasks, similar to channels, breaks the stacktrace and thus error reporting. So structured concurrency is better there.

My 2c, great article on the whole! Happy to do some detailed editing on it if you'd like before you publish it :)

thanks so much, I'll let you know when it's ready for a detailed review 😁

@thomaseizinger
Copy link
Contributor

Two, the two-way communication in the case of errors is very annoying with channels as you demonstrated very well.

This is a problem with sans-IO?

High-performance sans-IO yes because at least the way I am doing it now is to have IO in separate threads to make progress on IO while the state machine is working (i.e. decrypting / encrypting)

@thomaseizinger
Copy link
Contributor

I didn't think it was such a big problem that you'd rather use async.

The problem is that it is so hard to get it right and the bugs only happen sporadically. Eliminating this class of bugs where possible instills a lot more confidence.

@conectado conectado marked this pull request as ready for review August 5, 2025 16:26
Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff. I left a few notes!

@conectado conectado force-pushed the concurrency-patterns-article branch from 32e0e9b to 7a436f7 Compare August 13, 2025 16:07
@conectado conectado merged commit c72f5d8 into main Aug 13, 2025
2 checks passed
@conectado conectado deleted the concurrency-patterns-article branch August 13, 2025 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants