-
Notifications
You must be signed in to change notification settings - Fork 26
Closed
Labels
P2Medium: Good to have, but can wait until someone steps upMedium: Good to have, but can wait until someone steps upneed/triageNeeds initial labeling and prioritizationNeeds initial labeling and prioritization
Description
Problem
Seems that we have hardcoded some settings related to delegated routing over HTTP
- http client pool details here
- http router timeout here
Line 273 in 19723fe
Timeout: 15 * time.Second,
15s timeout on cold cache might lead to undesired denial of service if content is only announced to IPNI at cid.contact, and either client or server are under load so receiving response takes more than 15s
Solution
I think we should expose http routing client metrics to see if/when things fail, and make things configurable (at least the routing timeout), and use our infra to adjust the default based on real world performance:
- expose timeout as a configuration setting, allowing us to fine-tune it on ipfs.io infra
- config option for adjusting timeout should follow whatever naming convention we end up in feat!: independent dht and routing v1 flags #113
- ipfs.io gateway infra timeouts (HTTP 504) ~1m, so I think it would not hurt if we wait for routing response bit longer than 15s
- have success/failure metrics for each defined /routing/v1 endpoint
- Needs analysis, but on the surface, it looks like we never finished this? There are error-related metrics in boxo/routing/http/client here,
but we don't seem to exposerouting_http_client_latencyon http://127.0.0.1:8091/debug/metrics/prometheus
- Needs analysis, but on the surface, it looks like we never finished this? There are error-related metrics in boxo/routing/http/client here,
Metadata
Metadata
Assignees
Labels
P2Medium: Good to have, but can wait until someone steps upMedium: Good to have, but can wait until someone steps upneed/triageNeeds initial labeling and prioritizationNeeds initial labeling and prioritization