Return request ID in HTTP response headers#10854
Conversation
and include this information in the HTTP response headers
| self._processing_finished_time = None | ||
| self._processing_finished_time: Optional[float] = None | ||
|
|
||
| # what time we finished sending the response to the client (or the connection | ||
| # dropped) | ||
| self.finish_time = None | ||
| self.finish_time: Optional[float] = None |
There was a problem hiding this comment.
Not sure why mypy started being unhappy, but it was interpreting these as being of type NoneType throughout the program, causing problems later.
"generic_" wasn't useful
I don't think this is true.
It is only used if opentracing is enabled (and I'm not 100% sure what is included as the header). |
reivilibre
left a comment
There was a problem hiding this comment.
I don't know if Patrick's comment above invalidates this PR or not, but I took a look.
Not entirely sure about the potential for log pollution; compare POST-424242 vs worker-event_persister18/POST-424242, especially when the worker ID is redundant/implied in the log file itself (as far as I know?).
| @@ -0,0 +1 @@ | |||
| Include a request id in Synapse's HTTP responses to aid debugging. No newline at end of file | |||
There was a problem hiding this comment.
| Include a request id in Synapse's HTTP responses to aid debugging. | |
| Include a request ID in Synapse's HTTP responses to aid debugging. |
| def get_request_id(self): | ||
| return "%s-%i" % (self.get_method(), self.request_seq) | ||
| def get_request_id(self) -> str: | ||
| return f"{self._instance_name}/{self.get_method()}-{self.request_seq}" |
There was a problem hiding this comment.
Is this the same thing that winds up in the logs?
I can see this perhaps making things a bit noisier, but not sure.
There was a problem hiding this comment.
yes, this is what goes in the logs. I'm not enthusiastic about adding this noise to them.
| # TODO can we avoid the cast? Maybe take finish_time as an explicit float param? | ||
| response_send_time = ( | ||
| cast(float, self.finish_time) - self._processing_finished_time |
There was a problem hiding this comment.
I think this is slightly less evil.
| # TODO can we avoid the cast? Maybe take finish_time as an explicit float param? | |
| response_send_time = ( | |
| cast(float, self.finish_time) - self._processing_finished_time | |
| assert self.finish_time is not None | |
| response_send_time = ( | |
| self.finish_time - self._processing_finished_time |
There was a problem hiding this comment.
+1. Or raise an explicit Exception.
| self.assertCountEqual(log.keys(), expected_log_keys) | ||
| self.assertEqual(log["log"], "Hello there, wally!") | ||
| self.assertTrue(log["request"].startswith("POST-")) | ||
| self.assertIn("POST-", log["request"]) |
There was a problem hiding this comment.
or vs:
| self.assertIn("POST-", log["request"]) | |
| self.assertTrue(log["request"].startswith("worker-test/POST-")) |
It sounds like it does.
What I want here is to be able quickly backtrack from a client HTTP response to the corresponding server-side processing. I've found myself wanting this when investigating sytests. A sytest run makes thousands of requests and it's a pain to have to cross-reference timestamps---never mind working out which worker's log file contains the relevant logs. I want a unique string I can bang into I think this is useful to have, even in the absence of full opentracing telemetry. It might be that |
I think this is a fair use-case. I personally feel somewhat inclined to say that the request ID in the logs themselves shouldn't make mention of the worker ID; but the header should. If you make the header format I think thoughts from someone on the team who does a bit more of that might be useful. |
|
I'm afraid I'm a little unconvinced that we want to be adding this header as well as the existing In order for
That still leaves you with the problem of figuring out which worker handled a particular request. A few thoughts here:
|
| self.version_string, | ||
| max_request_body_size=max_request_body_size(self.config), | ||
| reactor=self.get_reactor(), | ||
| instance_name=f"worker-{site_tag}", |
There was a problem hiding this comment.
I don't think site_tag is quite what we want, is it? typically it's just the port number.
| def get_request_id(self): | ||
| return "%s-%i" % (self.get_method(), self.request_seq) | ||
| def get_request_id(self) -> str: | ||
| return f"{self._instance_name}/{self.get_method()}-{self.request_seq}" |
There was a problem hiding this comment.
yes, this is what goes in the logs. I'm not enthusiastic about adding this noise to them.
| # TODO can we avoid the cast? Maybe take finish_time as an explicit float param? | ||
| response_send_time = ( | ||
| cast(float, self.finish_time) - self._processing_finished_time |
There was a problem hiding this comment.
+1. Or raise an explicit Exception.
|
Many thanks all.
Absolutely, this sounds ideal. And to be explicit, I'd include this header in the response even if opentracing was disabled.
That's fine by me, assuming the trace IDs are unique across workers (guessing they're a uuid of some kind?). |
|
I'm going to raise a new issue to summarise the discussion and close this. Many thanks all. |
We already have something like this when opentracing is enabled, see #10199. But that's only across federation.
When investigating test failures it's really useful to cross-reference HTTP responses with synapse logs. I propose exposing the request id to facilitate that. I've tried to change the request ids to be unique across all workers now, by including an instance name.
See also matrix-org/sytest#1144.