Skip to content

Inter-operability of tracing systems operating with shorter trace-id #349

@SergeyKanzhelev

Description

@SergeyKanzhelev

This issue is tracking an overall resolution of inter-operability of tracing systems operating with shorter trace-id.


@bogdandrutu @adriancole @tylerbenson @jcchavezs @yurishkuro @nicmunroe @tedsuo @reyang @danielkhan (people who was involved in PR discussion) and others, please review this issue. It is written based on my understanding of the problems identified in a spec. Let's agree that this is the list of problems we have to address and then split this single issue into smaller issues to discuss how to resolve them.

Please comment/react on this issue to indicate that it reflects your understanding of a problems identified in spec.


There are existing tracing systems that currently operate with the 64bit trace-id, but still want to use the Trace Context headers for cross-process communication. The spec attempts to give an advice on how 128-bit, compliant to the spec tracing systems may cooperate wit the 64bit systems.

Let's call the logic of generating 128bit trace-id by 64bit system a backfill logic. While addressing concern of adding more clarity into the spec regarding trace-id backfill logic (see #337), and whether this backfill logic must apply to new traces or traces "in transition", more problems with the current spec was highlighted. See discussion in PR #344.

Here are the issues that were identified.

1. Split trace problem

If app A calls two apps, B and C, with a 64-bit trace, and both B and C each backfill using different algorithm, then the trace become split. It now has two separate trace IDs.

This is an original problem raised in #337. Spec must be very clear on how to backfill trace-id in case of new trace-id generation comparing to the backfill logic on trace-id propagation.

2. Backfilling on left vs. right

Many existing systems dealing with the 64bit to 128bit transition already. These systems typically have a logic to backfill trace-id and look up traces given the subset of a trace-id bytes. It is common that these systems implement trace-id look ups based on right-most characters as a typical backfill logic puts zeroes in the first bytes of trace-id.

This backfill logic contradict with the other paragraph of a spec asking to put the randomness to the left side of the trace-id. So backfill logic, in order to follow this requirement, must backfill the rightmost bytes.

This creates confusion and breaks inter-operability with the existing 64bit tracing systems.

3. Left padding of randomness.

Left-padding of randomness requirement creates another challenge for existing 64bit tracing systems. Keeping rightmost bytes constant will break compatibility with the tracing systems that operates on these rightmost characters.

This requirement must be either rewritten to make sure it works with 64bit systems or removed from the specification.

4. Zero vs. random backfill

We can differentiate tracing systems into three types:

  • The 128bit system: operating with 128bit trace-id (fully compliant).
  • The 64bit system: operating with 64bit trace-id. Not capable to propagate tracestate and 64 extra bits of trace-id
  • The 128on64bit system: records 64 bit trace-id. But capable to propagate extra 64 bit of trace-id from incoming request to outgoing request.

Backfill logic that is currently described in a spec may work for 128on64bit systems. But it will not work for the 64bit systems. Even on a first trace-id generation, 64bit systems has no capabilities to preserve extra bytes and re-use them for subsequent outgoing calls from a single incoming call with the newly generated trace-id. Thus the requirements for 64bit systems may be different. Or the current requirement can be changed to simple zero backfill.

5. Spec should prescribe the behavior of a 64bit-only system

64bit systems (as oppose to 128on64) are quite harmful for inter-operability of tracing systems. Not only these systems fail to propagate an entire trace-id, they also fail to propagate tracestate.

Specification currently declares these systems non-compliant - uses the "MUST" language in trace-id propagation. However inter-operability with these tracing systems is quite important. So the note in specification on how 64bit systems should operate will go a long way of ensuring a better interoperability of various tracing systems.

The spec will benefit from being more explicit on how 64bit tracing systems must operate.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions