RADAR is an intelligent, Reinforcement Learning (RL) based solution for distributed trace selection. By utilizing the REINFORCE (Policy Gradient) algorithm adapted for a continuous Multi-Armed Bandit problem, RADAR dynamically adjusts sampling policies to maximize the diversity (entropy) of collected traces while strictly controlling the total volume to avoid excessive storage costs.
The system is modularized into three main components:
-
The RADAR Agent (
agent.py) The decision-making core. It uses the REINFORCE algorithm to evaluate a set of available sampling policies (defined intail_sampling_policies.json). It maintains independent probabilities for activating each policy, updating them via gradient ascent based on the received rewards, using a baseline (Exponential Moving Average) to stabilize learning. -
Experiment Orchestrator (
manager.py) The main controller that runs the continuous experiment lifecycle loop. It acts as the bridge between RADAR's decisions and the real environment. It dynamically generates OpenTelemetry Collector configurations, injects the selected policies, and triggers zero-downtime rolling updates via the Kubernetes API. -
Metrics Extractor and Entropy Calculator (
es_utils.py) Evaluates the quality of collected data. It queries Elasticsearch to retrieve traces associated with the current experiment hash. To calculate Shannon Entropy, it converts complex span trees into comparable string representations through:- Chronological & Hierarchical Sorting: Ensuring deterministic trace sequencing.
- Noise Filtering (Blacklisting): Removing high-cardinality attributes irrelevant to structural behavior (e.g., dynamic IPs, peer ports).
- Quantization: Discretizing continuous metrics like latency into buckets to avoid artificial entropy inflation.
RADAR learns which tail sampling policies yield the most informative traces. It balances two main conflicting objectives:
- Maximizing Information: Measured via the Shannon Entropy of the collected trace structures.
- Minimizing Cost: Measured via the total volume of traces collected.
The reward function combines normalized entropy with a sigmoid-based penalty for trace volume. It allows RADAR to explore freely within a budget but imposes severe penalties if the volume exceeds a specified threshold.
To apply new sampling policies in the environment without interrupting telemetry collection:
manager.pygenerates a new OpenTelemetry Collector YAML configuration mapping the chosen policies.- It tags the configuration with a unique
experiment_hash. - It patches the Kubernetes Deployment of the collector, modifying a metadata annotation to trigger a native Rolling Update.
- It waits for the new pods to become healthy and the old pods to terminate before instructing RADAR to measure the new environment state.
To run, customize, or reproduce RADAR, the following environments and libraries are expected:
- Python 3.x
- Kubernetes Cluster (and the
kubernetesPython client) for dynamic deployment of the OpenTelemetry Collector. - Elasticsearch (and the Elasticsearch Python client) to store and query the generated telemetry traces.
- OpenTelemetry Collector running in the cluster.
- NumPy for numerical computations.
Ensure your Kubernetes context (~/.kube/config) is configured correctly and Elasticsearch endpoints are accessible.
- Review and define your base tail sampling policies inside
tail_sampling_policies.json. - Configure your Elasticsearch cluster details and Kubernetes target deployment settings inside the respective Python files (
es_utils.py,manager.py). - Start the experiment loop:
python manager.py
-
Reward Tuning: You can adjust the
$\alpha$ and$\beta$ weights in thereward_functionto prioritize entropy extraction or storage cost savings. You can also change the sigmoid trigger threshold parameter. -
Noise Filtering: Modify the
tag_blacklistines_utils.pyto strip out domain-specific span attributes that you do not want affecting the entropy calculations. -
Baseline Tuning: The exponential moving average baseline used to calculate the advantage function
$A_t = R_t - b_t$ uses a decay factor,baseline_decay, which can be tuned inagent.py.
For the fully detailed research methodology, empirical experiments, and comprehensive results, please refer to the full text of Final Project in Computer Science of Renan Martins Alves (in Portuguese).