RADAR: RL-based Trace Selection via Entropy

RADAR is an intelligent, Reinforcement Learning (RL) based solution for distributed trace selection. By utilizing the REINFORCE (Policy Gradient) algorithm adapted for a continuous Multi-Armed Bandit problem, RADAR dynamically adjusts sampling policies to maximize the diversity (entropy) of collected traces while strictly controlling the total volume to avoid excessive storage costs.

Architecture

The system is modularized into three main components:

The RADAR Agent (agent.py) The decision-making core. It uses the REINFORCE algorithm to evaluate a set of available sampling policies (defined in tail_sampling_policies.json). It maintains independent probabilities for activating each policy, updating them via gradient ascent based on the received rewards, using a baseline (Exponential Moving Average) to stabilize learning.
Experiment Orchestrator (manager.py) The main controller that runs the continuous experiment lifecycle loop. It acts as the bridge between RADAR's decisions and the real environment. It dynamically generates OpenTelemetry Collector configurations, injects the selected policies, and triggers zero-downtime rolling updates via the Kubernetes API.
Metrics Extractor and Entropy Calculator (es_utils.py) Evaluates the quality of collected data. It queries Elasticsearch to retrieve traces associated with the current experiment hash. To calculate Shannon Entropy, it converts complex span trees into comparable string representations through:
- Chronological & Hierarchical Sorting: Ensuring deterministic trace sequencing.
- Noise Filtering (Blacklisting): Removing high-cardinality attributes irrelevant to structural behavior (e.g., dynamic IPs, peer ports).
- Quantization: Discretizing continuous metrics like latency into buckets to avoid artificial entropy inflation.

How it Works

RADAR learns which tail sampling policies yield the most informative traces. It balances two main conflicting objectives:

Maximizing Information: Measured via the Shannon Entropy of the collected trace structures.
Minimizing Cost: Measured via the total volume of traces collected.

The Reward Function

The reward function combines normalized entropy with a sigmoid-based penalty for trace volume. It allows RADAR to explore freely within a budget but imposes severe penalties if the volume exceeds a specified threshold.

Rolling Updates without Downtime

To apply new sampling policies in the environment without interrupting telemetry collection:

manager.py generates a new OpenTelemetry Collector YAML configuration mapping the chosen policies.
It tags the configuration with a unique experiment_hash.
It patches the Kubernetes Deployment of the collector, modifying a metadata annotation to trigger a native Rolling Update.
It waits for the new pods to become healthy and the old pods to terminate before instructing RADAR to measure the new environment state.

Prerequisites

To run, customize, or reproduce RADAR, the following environments and libraries are expected:

Python 3.x
Kubernetes Cluster (and the kubernetes Python client) for dynamic deployment of the OpenTelemetry Collector.
Elasticsearch (and the Elasticsearch Python client) to store and query the generated telemetry traces.
OpenTelemetry Collector running in the cluster.
NumPy for numerical computations.

Usage

Ensure your Kubernetes context (~/.kube/config) is configured correctly and Elasticsearch endpoints are accessible.

Review and define your base tail sampling policies inside tail_sampling_policies.json.
Configure your Elasticsearch cluster details and Kubernetes target deployment settings inside the respective Python files (es_utils.py, manager.py).
Start the experiment loop:
```
python manager.py
```

Customization

Reward Tuning: You can adjust the $\alpha$ and $\beta$ weights in the reward_function to prioritize entropy extraction or storage cost savings. You can also change the sigmoid trigger threshold parameter.
Noise Filtering: Modify the tag_blacklist in es_utils.py to strip out domain-specific span attributes that you do not want affecting the entropy calculations.
Baseline Tuning: The exponential moving average baseline used to calculate the advantage function $A_t = R_t - b_t$ uses a decay factor, baseline_decay, which can be tuned in agent.py.

For the fully detailed research methodology, empirical experiments, and comprehensive results, please refer to the full text of Final Project in Computer Science of Renan Martins Alves (in Portuguese).

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Regras gerais		Regras gerais
Dockerfile		Dockerfile
README.md		README.md
agent.py		agent.py
es_utils.py		es_utils.py
history.py		history.py
manager.py		manager.py
policy_probabilities.json		policy_probabilities.json
tail_sampling_policies.json		tail_sampling_policies.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RADAR: RL-based Trace Selection via Entropy

Architecture

How it Works

The Reward Function

Rolling Updates without Downtime

Prerequisites

Usage

Customization

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RADAR: RL-based Trace Selection via Entropy

Architecture

How it Works

The Reward Function

Rolling Updates without Downtime

Prerequisites

Usage

Customization

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages