Skip to content

Comments

Integrating cosmos diagnostics with open telemetry tracer#22202

Merged
simplynaveen20 merged 18 commits intoAzure:mainfrom
simplynaveen20:user/nakumars/diagnoticOnTracer
Jul 15, 2021
Merged

Integrating cosmos diagnostics with open telemetry tracer#22202
simplynaveen20 merged 18 commits intoAzure:mainfrom
simplynaveen20:user/nakumars/diagnoticOnTracer

Conversation

@simplynaveen20
Copy link
Member

Cosmos public API end to end tracing is already added last year with PR #10265

This PR will add diagnostics information on the span event based in some threshold which can be configured via CosmosItemRequestOptions and CosmosQueryRequestOptions through setThresholdForDiagnosticsOnTracerInMS(). Default is 100 ms(CRUD) and 500ms(Query) . Metadata/Script crud request options does not have the option to set threshold and will take default. If needed we can add the option there as well

Note : -

  1. This is draft PR as we have bug in core tracer event api which is getting worked on ,due to which I have to add opentelemetry dependency explicitly in pom, once issue is fixed we will remove the changes from cosmos pom.
  2. This PR also changed the key of query metrics map back to logical id instead of partition range , see changes DefaultDocumentQueryExecutionContext.java and DocumentProducer.java

Some screen shot from Jaeger UI -
Point operation -
image
image

Query Operation -
image
image
image

@ghost ghost added the Cosmos label Jun 10, 2021
@simplynaveen20 simplynaveen20 changed the title Integrating cosmos diagnostic with open telemetry tracer Integrating cosmos diagnostics with open telemetry tracer Jun 10, 2021
@simplynaveen20 simplynaveen20 requested review from kushagraThapar, mbhaskar and moderakh and removed request for kushagraThapar June 10, 2021 14:41
@simplynaveen20 simplynaveen20 self-assigned this Jun 10, 2021
Copy link
Contributor

@moderakh moderakh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high level looks good. great work Naveen.

my comments:

  1. why are we introducing back pkRangeId? can't we rely on pkRange/feedRange instead?
  2. on the public api we expose period of time as duration elsewhere. we should try to be consistent.
  3. We should avoid adding more BridgeInternal methods if possible.

@check-enforcer
Copy link

This pull request is protected by Check Enforcer.

What is Check Enforcer?

Check Enforcer helps ensure all pull requests are covered by at least one check-run (typically an Azure Pipeline). When all check-runs associated with this pull request pass then Check Enforcer itself will pass.

Why am I getting this message?

You are getting this message because Check Enforcer did not detect any check-runs being associated with this pull request within five minutes. This may indicate that your pull request is not covered by any pipelines and so Check Enforcer is correctly blocking the pull request being merged.

What should I do now?

If the check-enforcer check-run is not passing and all other check-runs associated with this PR are passing (excluding license-cla) then you could try telling Check Enforcer to evaluate your pull request again. You can do this by adding a comment to this pull request as follows:
/check-enforcer evaluate
Typically evaulation only takes a few seconds. If you know that your pull request is not covered by a pipeline and this is expected you can override Check Enforcer using the following command:
/check-enforcer override
Note that using the override command triggers alerts so that follow-up investigations can occur (PRs still need to be approved as normal).

What if I am onboarding a new service?

Often, new services do not have validation pipelines associated with them, in order to bootstrap pipelines for a new service, you can issue the following command as a pull request comment:
/azp run prepare-pipelines
This will run a pipeline that analyzes the source tree and creates the pipelines necessary to build and validate your pull request. Once the pipeline has been created you can trigger the pipeline using the following comment:
/azp run java - [service] - ci

Copy link
Member

@mbhaskar mbhaskar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking time to discuss and add keeping the feedrange in metrics. LGTM

Copy link
Contributor

@moderakh moderakh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @simplynaveen20 great work. some minor comments/questions, other than that LGTM.

Arrays.asList(schedulingTimeSpanMap)
), pageResult.getActivityId());
BridgeInternal.putQueryMetricsIntoMap(pageResult, feedRange.getRange().toString(), qm);
String pkrId = pageResult.getResponseHeaders().get(HttpConstants.HttpHeaders.PARTITION_KEY_RANGE_ID);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't we rely on the targetRange please see here:
https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/DocumentProducer.java#L93

is that going be different than this one from the header response?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need both feedrange and rangeId in query metric key , it will be easier to diagnose issue for On call, discussed with @mbhaskar please check this comment #22202 (comment)

BridgeInternal.putQueryMetricsIntoMap(pageResult, feedRange.getRange().toString(), qm);
String pkrId = pageResult.getResponseHeaders().get(HttpConstants.HttpHeaders.PARTITION_KEY_RANGE_ID);
if (StringUtils.isEmpty(pkrId)) {
pkrId = "0";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on which scenario we come here? is this code reachable?

can't we rely on the targetRage see my above comment for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need both feedrange and rangeId in query metric key , it will be easier to diagnose issue for On call, discussed with @mbhaskar please check this comment #22202 (comment)

Arrays.asList(schedulingTimeSpanMap)),
tFeedResponse.getActivityId());
BridgeInternal.putQueryMetricsIntoMap(tFeedResponse, DEFAULT_PARTITION_RANGE, qm);
String pkrId = tFeedResponse.getResponseHeaders().get(HttpConstants.HttpHeaders.PARTITION_KEY_RANGE_ID);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't we rely on the targetRange please see here:
https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/DocumentProducer.java#L93

is that going be different than this one from the header response?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need both feedrange and rangeId in query metric key , it will be easier to diagnose issue for On call, discussed with @mbhaskar please check this comment #22202 (comment)

@simplynaveen20 simplynaveen20 marked this pull request as ready for review July 6, 2021 15:49
@simplynaveen20
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@simplynaveen20
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@simplynaveen20
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants