-
Notifications
You must be signed in to change notification settings - Fork 26
Description
Problem
At inbrowser.dev (backed by rainbow from ipfs.io gateway, so a general problem in our infra), we see inconsistent page load times across regions, and sometimes across requests within the same region.
User can get instant response from one instance, and then on subsequent page load, or request, I get stalled page load and timeout, even tho the data exist in cache of one of the other rainbows in the global cluster. We also see inconsistency across subresources on a single page.
Scope
- Rainbow users running multiple instances should have means of "logically merging their block caches"
- This should be opt-in feature, that requires manual configuration of rainbow operator
- (Open question) Do we want to run bitswap server in rainbow, or HTTP client to avoid "the unsustainable manual peering trap"?
- We don't want to invent any new protocols. Use HTTP stack if possible.
Solutions
A: Add HTTP Retrieval Client to Rainbow, leverage Cache-Control: only-if-cached
We know we need HTTP retrieval client for Kubo to enable HTTP Gateway over Libp2p by default, and to make direct HTTP retrieval from service providers more feasible. We can't do that without a client and end-to-end tests. Prototyping one in Rainbow sounds like a good plan, improving multiple work streams at the same time.
The idea here is to introduce HTTP client which runs in addition, or in parallel to bitswap retrieval.
Keep it simple, don't mix abstractions, do opportunistic block retrieval like bitswap, but over HTTP.
Using application/vnd.ipld.raw and trustless gateway protocol is a good match here: allows us to benefit from HTTP caching and middleware, making it more flexible than bitswap.
Rainbow could:
- Have a list of other rainbow instances in form of URLs with trustless gateway endpoints
- In case of ipfs.io gateway, we could produce a list with shuffled same-region instances first, and the rest of instances after them.
- Make inexpensive block requests with
Cache-Control: only-if-cachedgoing over list in sequence.- This does not cost any expensive IO, if rainbow does not have the block locally, it will instantly respond with HTTP 412.
This way, once a block lands in any of our rainbow caches, we will discover it, and requests won't timeout after 1m on unlucky scenarios.
Open questions:
- Is sequential, inexpensive HTTP check enough to avoid amplification attacks?
- Ok to start at the same time as bitswap, or do we want to delay, and act as a fallback when we are unable to find block by regular means for (>10-30s)?
B: Set up reverse proxy (nginx, lb) to try rainbows with Cache-Control: only-if-cached first
Writing this down just to have something other than (A), I don't personally believe (B) is feasible.
The idea here is to update the way our infrastructure proxies gateway requests to rainbow instances, and first ask all upstream instances within the region for resource with Cache-Control: only-if-cached, and if none of them has the thing, retry with a normal request that will trigger p2p retrieval.
The downside here is that this feels like antipattern:
- Overrides any user-provided
Cache-Control - Creates cache hot spots: popular data is not distributed across rainbow instances, but always served by a specific instance which fetched it first.
C: Reuse Bitswap client and server we already have
Right now, Rainbow runs Bitswap in read-only mode. It always says it does not have data when asked over bitswap.
What we could do is to a permissioned version of peering:
- libp2p preconnect to safelisted set of peers and protect these peering connections from being closed
- If Rainbow does not announce peer records to DHT, we should require full
/ip|dns*/.../p2p/peerid, otherwise we
- If Rainbow does not announce peer records to DHT, we should require full
- (for now) allow serving data over bitswap to safe-listed set of
/p2p/multiaddrs (quick and easy), leverage existing peering config / libraries where possible (Add peering support #35) - (allows us to do more in the future) switch to HTTP retrieval (over libp2p or
/http)
D: ?
Ideas welcome.