Return request ID in HTTP response headers by DMRobertson · Pull Request #10854 · matrix-org/synapse

DMRobertson · 2021-09-20T14:21:05Z

We already have something like this when opentracing is enabled, see #10199. But that's only across federation.

When investigating test failures it's really useful to cross-reference HTTP responses with synapse logs. I propose exposing the request id to facilitate that. I've tried to change the request ids to be unique across all workers now, by including an instance name.

See also matrix-org/sytest#1144.

and include this information in the HTTP response headers

DMRobertson · 2021-09-20T14:22:50Z

-        self._processing_finished_time = None
+        self._processing_finished_time: Optional[float] = None

        # what time we finished sending the response to the client (or the connection
        # dropped)
-        self.finish_time = None
+        self.finish_time: Optional[float] = None


Not sure why mypy started being unhappy, but it was interpreting these as being of type NoneType throughout the program, causing problems later.

missed these

"generic_" wasn't useful

clokep · 2021-09-20T17:57:02Z

But that's only across federation.

I don't think this is true. inject_response_headers gets called from two places:

BaseFederationServlet -- used by federation code as you mention.
trace_servlet -- which is called by _AsyncResource which is how most of the C-S API is implemented.

It is only used if opentracing is enabled (and I'm not 100% sure what is included as the header).

reivilibre

I don't know if Patrick's comment above invalidates this PR or not, but I took a look.

Not entirely sure about the potential for log pollution; compare POST-424242 vs worker-event_persister18/POST-424242, especially when the worker ID is redundant/implied in the log file itself (as far as I know?).

reivilibre · 2021-09-21T09:22:22Z

@@ -0,0 +1 @@
+Include a request id in Synapse's HTTP responses to aid debugging.


Suggested change

Include a request id in Synapse's HTTP responses to aid debugging.

Include a request ID in Synapse's HTTP responses to aid debugging.

reivilibre · 2021-09-21T09:24:07Z

-    def get_request_id(self):
-        return "%s-%i" % (self.get_method(), self.request_seq)
+    def get_request_id(self) -> str:
+        return f"{self._instance_name}/{self.get_method()}-{self.request_seq}"


Is this the same thing that winds up in the logs?
I can see this perhaps making things a bit noisier, but not sure.

yes, this is what goes in the logs. I'm not enthusiastic about adding this noise to them.

reivilibre · 2021-09-21T09:25:13Z

+        # TODO can we avoid the cast? Maybe take finish_time as an explicit float param?
+        response_send_time = (
+            cast(float, self.finish_time) - self._processing_finished_time


I think this is slightly less evil.

Suggested change

# TODO can we avoid the cast? Maybe take finish_time as an explicit float param?

response_send_time = (

cast(float, self.finish_time) - self._processing_finished_time

assert self.finish_time is not None

response_send_time = (

self.finish_time - self._processing_finished_time

+1. Or raise an explicit Exception.

reivilibre · 2021-09-21T09:26:13Z

        self.assertCountEqual(log.keys(), expected_log_keys)
        self.assertEqual(log["log"], "Hello there, wally!")
-        self.assertTrue(log["request"].startswith("POST-"))
+        self.assertIn("POST-", log["request"])


or vs:

Suggested change

self.assertIn("POST-", log["request"])

self.assertTrue(log["request"].startswith("worker-test/POST-"))

DMRobertson · 2021-09-21T10:41:56Z

I don't know if Patrick's comment above invalidates this PR or not, but I took a look.

It sounds like it does.

Not entirely sure about the potential for log pollution; compare POST-424242 vs worker-event_persister18/POST-424242, especially when the worker ID is redundant/implied in the log file itself (as far as I know?).

What I want here is to be able quickly backtrack from a client HTTP response to the corresponding server-side processing. I've found myself wanting this when investigating sytests. A sytest run makes thousands of requests and it's a pain to have to cross-reference timestamps---never mind working out which worker's log file contains the relevant logs. I want a unique string I can bang into grep -r or rg and just find the logs I'm after.

I think this is useful to have, even in the absence of full opentracing telemetry. It might be that

reivilibre · 2021-09-21T13:10:48Z

I want a unique string I can bang into grep -r or rg and just find the logs I'm after.

I think this is a fair use-case. I personally feel somewhat inclined to say that the request ID in the logs themselves shouldn't make mention of the worker ID; but the header should.

If you make the header format X-Request-Id: POST-424242 event_persister18, then a rg POST-424242 event_persister18.log doesn't seem so bad (assuming the sytest log file names follow a sensible pattern like that).

I think thoughts from someone on the team who does a bit more of that might be useful.

richvdh · 2021-09-23T12:07:00Z

I'm afraid I'm a little unconvinced that we want to be adding this header as well as the existing Synapse-Trace-Id header, which, as @clokep has mentioned, is returned for C-S requests provided opentracing is enabled (even if the opentracing span doesn't end up being sent on to the Jaeger collector).

In order for Synapse-Trace-Id to be useful for your usecase, I think two things are needed:

We should log the value of the Synapse-Trace-Id header in the final "Processed" line for each request. I think you can just fish it out of self.responseHeaders in _finished_processing
We need to actually give requests a trace-id when running in sytest. One way to do this is to enable opentracing under sytest; another is to update synapse.logging.opentracing.trace_servlet and BaseFederationServlet to generate a trace ID internally even if opentracing isn't enabled. I think the latter would be generally useful and would favour that?

That still leaves you with the problem of figuring out which worker handled a particular request. A few thoughts here:

can you just grep across the likely logfiles for the right Trace-Id? Generally it shouldn't be too hard to find once you figure out how your Synapse instance is routing the different types of request?
we could include the worker name in another response header?

richvdh · 2021-09-23T10:35:02Z

                self.version_string,
                max_request_body_size=max_request_body_size(self.config),
                reactor=self.get_reactor(),
+                instance_name=f"worker-{site_tag}",


I don't think site_tag is quite what we want, is it? typically it's just the port number.

richvdh · 2021-09-23T10:41:46Z

-    def get_request_id(self):
-        return "%s-%i" % (self.get_method(), self.request_seq)
+    def get_request_id(self) -> str:
+        return f"{self._instance_name}/{self.get_method()}-{self.request_seq}"


yes, this is what goes in the logs. I'm not enthusiastic about adding this noise to them.

richvdh · 2021-09-23T10:54:14Z

+        # TODO can we avoid the cast? Maybe take finish_time as an explicit float param?
+        response_send_time = (
+            cast(float, self.finish_time) - self._processing_finished_time


+1. Or raise an explicit Exception.

DMRobertson · 2021-10-04T10:36:24Z

Many thanks all.

another is to update synapse.logging.opentracing.trace_servlet and BaseFederationServlet to generate a trace ID internally even if opentracing isn't enabled. I think the latter would be generally useful and would favour that?

Absolutely, this sounds ideal. And to be explicit, I'd include this header in the response even if opentracing was disabled.

can you just grep across the likely logfiles for the right Trace-Id? Generally it shouldn't be too hard to find once you figure out how your Synapse instance is routing the different types of request?

That's fine by me, assuming the trace IDs are unique across workers (guessing they're a uuid of some kind?).

DMRobertson · 2021-10-14T16:45:02Z

I'm going to raise a new issue to summarise the discussion and close this. Many thanks all.

David Robertson added 2 commits September 20, 2021 15:04

Report which instance a request runs on

e61b353

and include this information in the HTTP response headers

Make mypy happy

980efe4

DMRobertson mentioned this pull request Sep 20, 2021

Log request ids in do_json_request_for on failure matrix-org/sytest#1144

Closed

DMRobertson commented Sep 20, 2021

View reviewed changes

DMRobertson requested a review from a team September 20, 2021 14:23

Changelog

933d59d

DMRobertson removed the request for review from a team September 20, 2021 14:52

David Robertson added 3 commits September 20, 2021 17:00

Fix tests

34180a1

missed these

Tweak worker requests' instance_name

24c0d5e

"generic_" wasn't useful

Fixup some more tests

7ed6f16

DMRobertson requested a review from a team September 20, 2021 17:27

reivilibre assigned reivilibre and unassigned reivilibre Sep 21, 2021

reivilibre reviewed Sep 21, 2021

View reviewed changes

reivilibre requested a review from a team September 21, 2021 09:30

richvdh reviewed Sep 23, 2021

View reviewed changes

DMRobertson closed this Oct 14, 2021

DMRobertson mentioned this pull request Oct 14, 2021

(Optionally?) provide trace IDs in all requests, even if open tracing is disabled #11090

Open

matrixbot mentioned this pull request Dec 21, 2023

(Optionally?) provide trace IDs in all requests, even if open tracing is disabled element-hq/synapse#11090

Open

		@@ -0,0 +1 @@
		Include a request id in Synapse's HTTP responses to aid debugging. No newline at end of file

	self.assertIn("POST-", log["request"])
	self.assertTrue(log["request"].startswith("worker-test/POST-"))

Uh oh!

Conversation

DMRobertson commented Sep 20, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clokep commented Sep 20, 2021

Uh oh!

reivilibre left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DMRobertson commented Sep 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

reivilibre commented Sep 21, 2021

Uh oh!

richvdh commented Sep 23, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DMRobertson commented Oct 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DMRobertson commented Oct 14, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

DMRobertson commented Sep 21, 2021 •

edited

Loading

DMRobertson commented Oct 4, 2021 •

edited

Loading