Skip to content

Leader must reject reports with duplicate extensions#3999

Merged
jcjones merged 3 commits intomainfrom
jcj/3997-leader-extension-duplication
Aug 26, 2025
Merged

Leader must reject reports with duplicate extensions#3999
jcjones merged 3 commits intomainfrom
jcj/3997-leader-extension-duplication

Conversation

@jcjones
Copy link
Copy Markdown
Contributor

@jcjones jcjones commented Aug 22, 2025

Note there's an error response inconsistency between Helper and Leader -- Leader returns ReportRejectionReason::DuplicateExtension while Helper returns ReportError::InvalidMessage (line 235 in aggregation_job_init.rs).

Also note that this adds the telemetry to TaskUploadCounter, but we already have -- completely unused -- DuplicateExtension in TaskAggregationCounter.

Maybe this is putting the rejection in the wrong place for the Leader?

Fixes #3997

Note there's an error response inconsistency between Helper and Leader -- Leader
returns ReportRejectionReason::DuplicateExtension while Helper returns
ReportError::InvalidMessage (line 235 in aggregation_job_init.rs).

Also note that this adds the telemetry to TaskUploadCounter, but we already
have -- completely unused -- DuplicateExtension in TaskAggregationCounter.

Maybe this is putting the rejection in the wrong place for the Leader?

Fixes #3997
@jcjones jcjones added the allow-changed-migrations Override the ci-migrations check to allow migrations that have changed. label Aug 22, 2025
use reqwest::Client;
use std::{
borrow::Cow,
collections::HashSet,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was tempted to do this one as a HashSet since it's already imported here, and switch the Helper's implementation over to HashSet.

@jcjones jcjones marked this pull request as ready for review August 22, 2025 20:08
@jcjones jcjones requested a review from a team as a code owner August 22, 2025 20:08
);
metrics
.upload_decode_failure_counter
.add(1, &[KeyValue::new("type", "duplicate_extension")]);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add a upload_decode_failure_counter.add(0, &[Keyvalue::new("type", "duplicate_extension")]); in Aggregator::new() to eagerly populate this time series at startup.

Copy link
Copy Markdown
Contributor

@tgeoghegan tgeoghegan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note there's an error response inconsistency between Helper and Leader -- Leader returns ReportRejectionReason::DuplicateExtension while Helper returns ReportError::InvalidMessage (line 235 in aggregation_job_init.rs).

ReportRejectionReason::DuplicateExtension doesn't seem right to me. The pertinent text from draft-ietf-ppm-dap-15:

If the same extension type appears more than once among the public extensions and the private extensions in the Leader's input share, then the Leader MUST ignore the report and MAY abort with error invalidMessage.

(See discussion of that MAY below) It seems fine for us to track this rejection reason internally to Janus (in particular so we can increment a counter) but what error does the client get? I'm not sure if the test you added lets us examine the HTTP response code and problem document a client would get. FWIW I think it'd be fine to add a test proving that ReportRejectionReason::DuplicateExtension gets rendered into the appropriate HTTP response. We don't necessarily need to rewrite the test to exercise the HTTP layer.

Also note that this adds the telemetry to TaskUploadCounter, but we already have -- completely unused -- DuplicateExtension in TaskAggregationCounter.

Maybe this is putting the rejection in the wrong place for the Leader?

The MAYs in the various upload path checks are intended to allow leaders to defer checks from the upload path (which is high-volume and keeps a client waiting), but Janus opts to do these checks as soon as possible. Or rather, we chose to do it in the more obvious place and figured we'd move it to the aggregation job path if it ever proved to be a performance problem. It never has -- what has caused us perf issues during upload is writing reports to the database (IO), not validating incoming messages (compute).

Anyway, we need that other country so that the helper can record such occurrences, since it never gets to see reports until it handles aggregation jobs. Admittedly the scenario of a helper getting duplicate extensions is quite unlikely -- you'd need a client to have correctly constructed the leader report share but mangled the helper's -- but such a bug being improbably makes tools for detecting it that much more valuable, I think.

report_too_early BIGINT NOT NULL DEFAULT 0, -- reports whose timestamp is too far in the future.
task_not_started BIGINT NOT NULL DEFAULT 0, -- reports sent to the task before it started.
task_ended BIGINT NOT NULL DEFAULT 0, -- reports sent to the task after it ended.
duplicate_extension BIGINT NOT NULL DEFAULT 0, -- reports with duplicate extensions
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

divviup-api will need matching changes: divviup/divviup-api#1884

@jcjones
Copy link
Copy Markdown
Contributor Author

jcjones commented Aug 26, 2025

ReportRejectionReason::DuplicateExtension gets rendered into the appropriate HTTP response. We don't necessarily need to rewrite the test to exercise the HTTP layer.

As it's a ReportRejectionReason, it's going to be rendered to a problem doc with DapProblemType::ReportRejected, but I see now that there's no technical reason why a Report Rejection can't yield InvalidMessage.

https://github.com/divviup/janus/blob/main/aggregator/src/aggregator/http_handlers.rs#L63-L76

@jcjones jcjones merged commit 8dbfbcd into main Aug 26, 2025
8 checks passed
@jcjones jcjones deleted the jcj/3997-leader-extension-duplication branch August 26, 2025 21:21
jcjones added a commit to divviup/divviup-api that referenced this pull request Aug 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

allow-changed-migrations Override the ci-migrations check to allow migrations that have changed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DAP-15: Resolve input share extension processing differences between Helper and Leader

3 participants