Skip to content

Spike: Open structured event pipeline for the backend #6918

@khvn26

Description

@khvn26

Our goal is to start tracking product events in the backend to, potentially, increase quality of the data (richer context, tracking accuracy).

Our hypothesis is that we can piggyback on structured logging + OTel's event tracking by introducing an exporter that will push events to our analytics platform (Amplitude), and, later, to other interesting destinations.

Possible outcomes of this spike include:

  • A working PoC or an epic detailing out implementation for a PoC.
  • An iternal RFC that includes design proposals for event taxonomy and the governance model.

Why now?

  • We're working on improving our product analytics stack and data capabilities. We want to learn and evolve together with our customers, and see how we can improve the product analytics space.
  • We'd like to try out a process where engineering thinks upfront about what kind of events their business logic should emit. OTel instrumentation model provides excellent developer experience for that — for free.
  • Frontend tracking has accuracy issues + events emitted from backend provide richer context (feature flag states, segments, server-side logic)
  • An increasing number of customers want (and implement) warehouse-native analytics. We are interested to see how we can contribute to this while continuing to use open standards.
  • Amplitude costs are growing - we need better control over what we track.

Considerations

  • Platform-agnosticism. Our target is Amplitude but the OTel Collector supports many exporters, allowing customers to route events to any CDP or data warehouse they use.
  • Our users need to be able to self-host this, and integrate in their data infrastructure with little friction.
  • GDPR compliance from the get-go: pseudonymise/prevent PII, event routing, consent.

What we'd like to validate as part of PoC

Product:

  • Contribute to adoption and retentions metrics for 2 of features that we can't track today

Tech:

  • Prove we can emit structured events via Python/Django logging that reach OTel Collector
  • Demonstrate filtering/routing at Collector level (not application code)
  • Confirm events can reach both Amplitude AND a warehouse (e.g., local ClickHouse)

Privacy/Compliance:

  • Demonstrate PII removal/hashing at Collector level
  • Show how to handle consent status in event pipeline
  • Prove data doesn't leave customer infrastructure in self-hosted mode

Developer Experience:

  • Single config file to control what events go where
  • Hot-reload config without restart
  • Clear examples of event taxonomy (technical vs business vs PII)

Non-Goals

  • Replace frontend tracking. Backend events are complementary, providing enriched context for existing frontend analytics.
  • Track high-volume events like flag evaluations or API calls.
  • Implement full data warehouse pipeline. The focus on routing events to existing infrastructure.
  • Create new Analytics product offering. This is for internal product analytics to understand our users better, not a customer-facing feature — yet.
Pinned by khvn26

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions