Legacy Proxy System (LPS)

This repository contains the reference implementation of the Legacy Proxy System (LPS), including experimental scripts and automation tools for research on LPS, investigating its implementation, performance characteristics, and practical applications in architectural refactoring.

Motivation, description, and experimental results are published in the following conferences:

ICACS 2025: Accepted; In publishing process.

Table of Contents

Cloning
Updates
Requirements
- System Requirements
- Dependencies
Instructions for Running Experiments
Troubleshooting
Usage
- HTTP/2 Support with Granian
License
AI Disclosure

Cloning

If you want to work on the current version of the project, clone the repository and pull all git submodules with the following command. If you want to reproduce experiments presented at conferences or journals, checkout the respective conference/journal git branch.

git clone https://github.com/jtpgames/legacy-proxy-pattern.git && cd legacy-proxy-pattern && ./pull_all_submodules.sh

Updates

Because this repository heavily relies on git submodules, it is necessary to use the following command to make sure that all changes are properly pulled:

git pull && ./pull_all_submodules.sh

Requirements

System Requirements

To enhance the reliability of our experimental results and ensure they robustly support our claims, we conduct our experiments on three different hosts. The first serves as the primary testbed, while the remaining two are used for validation and cross-platform comparison:

Primary testbed: Fedora CoreOS 40 virtual machine running on a MacBook Pro with an Apple M2 Pro CPU and 16 GB RAM (VM configured with 4 vCPUs and 4 GB RAM)
Validation testbed 1: Ubuntu 24.04 running bare-metal on a ThinkPad E14 equipped with an Intel Core i5-1135G7 (8 CPUs, 16 GB RAM)
Validation testbed 2: Ubuntu 24.04 virtual machine running on an HP ProLiant DL380 G8 with dual Intel Xeon E5-2690 CPUs (VM configured with 16 vCPUs and 16 GB RAM)

Dependencies

Bash Installation (MacOS)

The scripts require Bash 5.0 or higher. Install it via Homebrew package manager:

# Install latest bash version
brew install bash

# Replace buildin version of bash with the new version of bash
echo 'alias bash="/opt/homebrew/bin/bash"' >> ~/.zshrc

Bash Installation (Ubuntu)

The scripts require Bash 5.0 or higher. Ubuntu 24.04 comes with this version preinstalled.

Additional tools

# Install ripgrep as a fast grep alternative
brew install ripgrep

# -- From docker documentation --
# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc

# Add the repository to Apt sources:
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "${UBUNTU_CODENAME:-$VERSION_CODENAME}") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
--

sudo apt-get install -y software-properties-common build-essential bc ripgrep git docker-compose docker-ce screen python3.12-venv python3.12-dev

sudo groupadd docker
sudo usermod -aG docker $USER

Instructions for Running Experiments

cd Automations/
./setup_and_run_gs_alarm_system_experiment.sh

Troubleshooting

Network Delays Between Load Tester and Proxy Services

Problem Description

During load testing, intermittent delays of approximately 15 seconds were observed between the load tester sending requests and the first proxy service (ars-comp-1) receiving them. While the proxy services themselves processed requests efficiently (typically 40-50ms), these network-layer delays significantly impacted overall response times.

Analysis Process

Request Tracing: By analyzing load tester logs and proxy logs with matching request IDs, we confirmed that:
- Load tester sends request at timestamp A
- Proxy receives same request at timestamp A + 15 seconds
- Proxy processes request normally in ~40ms
Infrastructure Investigation: Initial suspicions focused on:
- Podman user-mode networking performance on macOS
- FastAPI/Uvicorn configuration issues
- Container resource limitations
Root Cause Discovery: The issue was traced to TCP listen queue overflow behavior:
- System setting net.ipv4.tcp_abort_on_overflow = 0 causes silent packet drops
- When the listen queue is full, SYN packets are dropped rather than rejected
- Clients experience exponential backoff delays (3s, 6s, 12s…) waiting for retransmission

Solution

Modify the TCP behavior in the Podman machine to immediately reject connections when the listen queue is full, rather than causing delays:

# Temporary fix (active until VM restart)
podman machine ssh sudo sysctl -w net.ipv4.tcp_abort_on_overflow=1

# Permanent fix (persists across reboots)
podman machine ssh "echo 'net.ipv4.tcp_abort_on_overflow = 1' | sudo tee -a /etc/sysctl.conf"

# Verify the change
podman machine ssh sysctl net.ipv4.tcp_abort_on_overflow

Additional TCP Optimizations

For improved performance under high connection loads, apply these additional TCP optimizations:

# Apply optimizations immediately
podman machine ssh "sudo sysctl -w net.core.netdev_max_backlog=2000"
podman machine ssh "sudo sysctl -w net.ipv4.tcp_max_syn_backlog=2048"
podman machine ssh "sudo sysctl -w net.ipv4.tcp_fin_timeout=30"
podman machine ssh "sudo sysctl -w net.ipv4.ip_local_port_range='1024 61000'"

# Make settings persistent across reboots
podman machine ssh "echo 'net.core.netdev_max_backlog = 2000' | sudo tee -a /etc/sysctl.conf"
podman machine ssh "echo 'net.ipv4.tcp_max_syn_backlog = 2048' | sudo tee -a /etc/sysctl.conf"
podman machine ssh "echo 'net.ipv4.tcp_fin_timeout = 30' | sudo tee -a /etc/sysctl.conf"
podman machine ssh "echo 'net.ipv4.ip_local_port_range = 1024 61000' | sudo tee -a /etc/sysctl.conf"

Settings Explanation:

net.core.netdev_max_backlog = 2000: Increases network device packet queue size (default: 1000)
net.ipv4.tcp_max_syn_backlog = 2048: Expands SYN packet backlog (default: 256)
net.ipv4.tcp_fin_timeout = 30: Reduces connection cleanup time (default: 60 seconds)
net.ipv4.ip_local_port_range = 1024 61000: Expands available port range for outbound connections

Docker Implementation

The TCP optimizations must also be applied at the container level, as Docker/Podman containers use separate network namespaces that do not inherit host-level sysctl settings. Both docker-compose-legacy.yml and docker-compose-ng.yml have been configured with container-level TCP optimizations.

Container-Level Settings:

All Python-based proxy containers now include the following sysctl configurations:

sysctls:
  - net.ipv4.tcp_max_syn_backlog=2048
  - net.ipv4.tcp_fin_timeout=30
  - net.ipv4.ip_local_port_range=1024 61000
  - net.ipv4.tcp_abort_on_overflow=1

Note: The net.core.netdev_max_backlog setting cannot be applied at the container level as it requires host-level privileges. This optimization remains effective at the Podman machine level for host-to-container traffic.

Verification:

To verify that container-level TCP settings are applied correctly:

# Start a test container
cd python && docker-compose -f docker-compose-legacy.yml up -d ars-comp-1-1

# Check settings inside the container
docker exec <container-id> cat /proc/sys/net/ipv4/tcp_max_syn_backlog
docker exec <container-id> cat /proc/sys/net/ipv4/tcp_abort_on_overflow

# Clean up
docker-compose -f docker-compose-legacy.yml down

Fedora CoreOS Compatibility

The Podman machine environment uses Fedora CoreOS, which provides several advantages for TCP performance optimization:

System Information: - OS: Fedora CoreOS 40.20241019.3.0 - Kernel: 6.11.3-200.fc40.aarch64 - Architecture: Container-optimized minimal OS

Performance Benefits:

Modern TCP Stack: Kernel 6.11.3 includes latest TCP improvements and RFC implementations
Container-Optimized: Designed specifically for containerized workloads with tuned networking defaults
Minimal Overhead: Reduced background processes compared to desktop distributions
Immutable Design: Persistent sysctl configurations survive reboots reliably

Network Stack Features:

# Current congestion control (can be optimized further)
podman machine ssh "sysctl net.ipv4.tcp_congestion_control"
# Output: net.ipv4.tcp_congestion_control = cubic

# Kernel version verification
podman machine ssh "uname -r"
# Output: 6.11.3-200.fc40.aarch64

Note: The modern kernel and container-focused design of Fedora CoreOS provides an ideal foundation for the TCP optimizations implemented in this project. All host-level and container-level settings are fully compatible and effective.

References:

Fedora CoreOS Documentation: Official Fedora CoreOS Guide
Linux kernel 6.11 TCP improvements: Linux 6.11 Release Notes

Alternative Solutions

If connection errors occur after the above change, consider increasing the application-level backlog:

# In FastAPI/Uvicorn applications
uvicorn.run(app, host=host, port=port, backlog=8192, ...)

References

Linux TCP implementation: tcp_abort_on_overflow documentation
TCP listen queue behavior: Stevens, W. Richard. "TCP/IP Illustrated, Volume 1: The Protocols." Addison-Wesley, 2011.
Uvicorn configuration: Uvicorn Settings Documentation
TCP optimization for high-performance servers: DigitalOcean TCP Connection Limits
Linux network performance tuning: ESnet Linux Tuning Guide
TCP parameter tuning: Carder, Bradley. "Linux Network Performance Tuning." Linux Journal, 2016.
High-performance networking: Linux Networking Stack Tuning

Temporary DNS Resolution Errors in Performance Tests

Problem Description

During high-volume performance testing with hundreds of concurrent HTTP requests, temporary DNS resolution failures occurred when Python HTTP clients (httpx, requests) attempted to resolve Docker service names. These errors manifested as:

Error Type: socket.gaierror: [Errno -3] Temporary failure in name resolution
Affected Services: Load testers (Locust), ars-comp-1 proxies, ars-comp-2 proxies, and legacy-proxy-2 (MQTT-to-HTTP bridge)
Trigger: High request rates (100+ RPS) with multiple concurrent connections to Docker service hostnames
Impact: Request failures, test instability, and unreliable performance measurements

Root Cause Analysis

Docker Embedded DNS Server Limitations

Docker provides an embedded DNS resolver for containers on user-defined networks. Each container typically lists 127.0.0.11 as its nameserver, and the Docker engine forwards these queries to the host’s configured DNS servers. While this design simplifies service discovery, the embedded resolver introduces several operational constraints:

Finite Processing Capacity – The resolver is implemented within the Docker engine (via libnetwork) and therefore has limited throughput. Under very high query rates or concurrent workloads, users have reported timeouts and dropped queries due to resource contention.
Limited Caching Behavior – The resolver applies time-to-live (TTL) values to responses and caches entries for their duration, but it does not maintain an extensive cache across all lookups. This can result in repeated upstream queries and increased latency under load.
UDP-Dominant Transport – Most DNS traffic in Docker networks uses UDP on port 53. Although DNS can fall back to TCP for truncated or large responses, the embedded resolver has exhibited degraded performance or timeouts in such cases, especially under high concurrency.
Shared Resource Model – All containers attached to the same Docker network rely on the same embedded resolver instance on the host, making DNS resolution a centralized and potentially contended resource.

Performance Test Characteristics

Our experiments involve:

Service Topology: Multi-tier architecture with services like ars-comp-1-{1,2,3}, ars-comp-2-{1,2,3}, proxy1-{1,2,3}, proxy2-{1,2,3}, and ars-comp-3
Request Volume: Performance tests generate 100-500 requests per second
Connection Pattern: Each request may trigger DNS lookups for service names like ars-comp-2-1, ars-comp-2-2, mosquitto, etc.
Concurrent Clients: Multiple load testing workers and proxy instances making simultaneous requests

When combining high request rates with frequent service name resolution, the embedded DNS server can exceed its capacity, resulting in temporary failures.

Solution: Application-Level DNS Caching

To mitigate these DNS resolution failures, we implemented a local DNS cache at the application level in each Python service. This approach resolves hostnames to IP addresses once and caches the results in memory, eliminating repeated queries to the Docker DNS server.

Implementation Details

Cache Structure:

# DNS cache for hostname to IP resolution
dns_cache = {}

def resolve_hostname_to_ip(url: str) -> str:
    """Resolve hostname in URL to IP address, cache the result"""
    parsed = urlparse(url)
    hostname = parsed.hostname

    # Skip DNS caching if environment variable is set
    if os.getenv('SKIP_DNS_CACHE', '').lower() in ('1', 'true', 'yes'):
        return url

    if hostname in dns_cache:
        logger.debug(f"DNS cache hit for {hostname} -> {dns_cache[hostname]}")
        return url.replace(hostname, dns_cache[hostname])

    try:
        # Resolve hostname to IP
        ip_address = socket.gethostbyname(hostname)
        dns_cache[hostname] = ip_address
        logger.info(f"DNS resolved {hostname} -> {ip_address}")
        return url.replace(hostname, ip_address)
    except socket.gaierror as e:
        logger.warning(f"DNS resolution failed for {hostname}: {e}. Using original URL.")
        return url

Services with DNS Caching:

locust_scripts/common/common_locust.py - Load tester client (RepeatingHttpxClient)
python/ars_comp_1_proxy.py - ARS Component 1 proxy
python/ars_comp_2_proxy.py - ARS Component 2 proxy
python/mqtt/legacy_proxy_2.py - MQTT-to-HTTP bridge

Activation Mechanism:

The DNS cache is currently enabled by default and can be disabled by setting the SKIP_DNS_CACHE environment variable.

Benefits:

Reduced DNS Load: Eliminates repeated lookups for the same hostname
Improved Reliability: Prevents transient DNS failures under high load
Lower Latency: Avoids DNS query overhead for cached hostnames
Consistent Performance: Enables stable, repeatable performance test results

Failover Experiment Considerations

DNS Cache Invalidation Issue

For failover experiments, the local DNS cache must be disabled because when Docker containers are stopped and restarted during failover testing:

IP Address Changes: Docker assigns a new IP address to the restarted container
Stale Cache Entries: The DNS cache still contains the old IP address
Connection Failures: Requests to cached IPs fail because the container is no longer reachable at that address

Disabling the DNS cache ensures that:

DNS resolution happens on every request
Container restarts with new IP addresses are handled correctly
Failover behavior is tested accurately

Recommended Approach:

For different experiment types, consider these configurations:

Experiment Type	DNS Cache Status	Rationale
Performance Tests	Enabled	Maximizes throughput and prevents DNS-related failures
Failover Tests	Disabled (default)	Ensures proper handling of container restarts with new IPs

Docker Network DNS Architecture

Service Discovery Mechanism:

Docker Compose creates a default bridge network where each service is assigned:

Hostname: The service name from docker-compose.yml (e.g., ars-comp-1-1, proxy1-2, mosquitto)
DNS Entry: Automatically registered with the embedded DNS server at 127.0.0.11
Dynamic IP: Assigned from the bridge network subnet (e.g., 172.18.0.5)

Example Service Names from Our Configuration:

Legacy Pattern (docker-compose-legacy.yml): * ars-comp-1-1, ars-comp-1-2, ars-comp-1-3 - Client-facing proxies * ars-comp-2-1, ars-comp-2-2, ars-comp-2-3 - Direct HTTP forwarding proxies * ars-comp-3 - RAST simulator

New Generation Pattern (docker-compose-ng.yml): * ars-comp-1-1, ars-comp-1-2, ars-comp-1-3 - Client-facing proxies * proxy1-1, proxy1-2, proxy1-3 - HTTP-to-MQTT bridges * proxy2-1, proxy2-2, proxy2-3 - MQTT-to-HTTP bridges * mosquitto - MQTT broker * ars-comp-3 - RAST simulator

References

Serverfault forum: https://serverfault.com/questions/784495/poor-performance-with-docker-internal-dns
Docker Embedded DNS: Docker DNS Services Documentation
Python DNS Resolution: socket.gethostbyname() Documentation

HiveMQ Shared Subscription Load Balancing Under Load (CE and Edge; Observed “Sticky” Distribution)

Context

LPS relies on MQTT shared subscriptions for horizontal scaling of the MQTT-to-HTTP bridge (legacy_proxy_2). In our Compose setup:

Publishers (legacy_proxy_1, services proxy1-{1,2,3}) publish to per-instance topics:
- proxy1/message, proxy2/message, proxy3/message
Subscribers (legacy_proxy_2, services proxy2-{1,2,3}) subscribe using a shared subscription with a wildcard:
- $share/legacy_proxy/+/message
- (plus retry topic $share/legacy_proxy/+/retry/message)
QoS is configured as QoS 2 in the application configuration; however, in packet-level logs we observed that HiveMQ may grant a lower subscription QoS (e.g., SUBACK “Granted QoS 1”) and deliver messages at QoS 1. This is documented HiveMQ behavior.

The expected behavior for our experiments is an approximately even distribution of messages across the three legacy_proxy_2 instances.

We reproduced the load-dependent distribution behavior described below with both HiveMQ Community Edition (hivemq-ce) and HiveMQ Edge (hivemq-edge).

What We Observed in the NG Experiment (using LPS)

In performance experiments, message distribution can collapse entirely:

One legacy_proxy_2 instance (e.g., proxy2-1) receives essentially all messages.
The other two instances (e.g., proxy2-2, proxy2-3) receive none.

We confirmed this by counting “Successfully forwarded …” log lines in the NG experiment logs (for example under Automations/NG_Experiment/…/LegacyProxy_Logs/proxy2-.log).

Reproduction / Isolation Experiments

To isolate the problem from our application logic, we created a minimal reproducible test harness in hivemq/:

hivemq/test_shared_subscription.sh runs a controlled setup against HiveMQ and can switch:
- subscriber implementation: mosquitto_sub vs paho-mqtt
- publisher implementation: mosquitto_pub vs paho-mqtt vs aiomqtt (including a persistent connection mode)
- QoS: --qos 1 or --qos 2
- topics: matching the real NG topology described above
Each execution stores full logs in a dedicated timestamped folder under hivemq/test_logs/.

These experiments showed that the distribution is strongly dependent on load and publish rate, not just on the MQTT client libraries.

Key Findings

Low publish rate: With a low rate (e.g., ~1 message/second), distribution appears to work and messages are spread across subscribers.
Moderate rate / burstiness: With a higher rate (e.g., ~3 messages/second), messages often become “chunked”:
- all messages within a short time slice may be delivered to one subscriber
- the next time slice may be delivered to another subscriber
High broker load: Under sustained high throughput / heavy broker load, HiveMQ (CE and Edge) may become sticky and route almost everything to a single subscriber for an extended period (effectively disabling load balancing). See HiveMQ CPU Bottleneck: Single-Core Limitation.
Client libraries are not the primary driver: We observed the load-dependent behavior across different combinations of publisher/subscriber clients (mosquitto CLI, paho-mqtt, aiomqtt). Changing the client implementation alone did not eliminate the effect.
Client-side in-flight limits may influence, but not fix: The relevant “max in-flight” behavior is negotiated/controlled at the client level (e.g., MQTT v5 Receive Maximum or client library in-flight settings), not via a dedicated HiveMQ broker-side knob for shared-subscription fairness. Tuning this can change how uneven the distribution looks under load, but does not guarantee round-robin fairness.

Conclusion

Based on these experiments and the persistent reports in the HiveMQ CE community, we treat this as a known limitation or an unresolved bug in HiveMQ shared subscription distribution under load (observed on both HiveMQ CE and HiveMQ Edge).

At the time of writing, the behavior described above matches reports in HiveMQ CE issue tracker discussions, and the relevant issue is still open. Therefore, for experiments or production-like workloads that require strict/robust load balancing, shared subscriptions in HiveMQ CE/Edge should be validated under realistic load, and application-level mitigations (rate limiting, partitioned topics per consumer group, or alternative broker choices) should be considered.

HiveMQ CPU Bottleneck: Single-Core Limitation

Problem Description

During performance experiments with queue-draining mechanisms (a mechanism we employ after each experiment to make sure that all messages are forwarded to the last service in the chain), we observe that HiveMQ’s message distribution issue correlates with a CPU bottleneck:

Symptom: "Sticky" message distribution: only a single legacy_proxy_2 subscriber receives messages.
CPU Usage: HiveMQ utilizes nearly 100% of a single CPU core, despite the virtual machine having 4 CPU cores available.
Thread Analysis: Using top -H -p $(pgrep -f java) inside the HiveMQ container revealed that the hivemq-eventloop thread uses 100% CPU.
System Resources: While the host VM has 4 CPU cores (verified with nproc), HiveMQ only saturates one core during high load.

Example Docker Stats Output:

CONTAINER ID  NAME                   CPU %   MEM USAGE / LIMIT   MEM %   NET I/O           BLOCK I/O       PIDS  TIME          %CPU
b1a3b836ff8d  python-hivemq-1        95.92%  1.079GB / 3.793GB   28.44%  314.2MB / 198.2MB 136MB / 3.15MB  51    1h8m32.88s    74.58%
c8c77c430d12  python-proxy1-1-1      0.01%   57.69MB / 3.793GB   1.52%   225.1MB / 179.1MB 5.947MB / 158.8MB 7   3m48.16s      4.14%
5017a60377b0  python-proxy2-1-1      1.81%   44.92MB / 3.793GB   1.18%   93.02MB / 67.47MB 1.176MB / 145MB  3   2m19.62s      2.53%
3b881671cd10  python-proxy2-2-1      0.00%   30.98MB / 3.793GB   0.82%   27.21MB / 749kB   823.3kB / 145.4MB 3   7.10s         0.13%

Note that python-hivemq-1 shows 95.92% CPU usage, which represents saturation of a single core on a 4-core system.

Thread-Level Analysis:

# Connect to HiveMQ container and inspect thread CPU usage
docker exec -it python-hivemq-1 sh
top -H -p $(pgrep -f java)

# Verify available CPU cores
nproc  # Output: 4

The top -H output showed the hivemq-eventloop thread consuming 100% of a single core, indicating that message distribution processing is not parallelized across available cores.

Analysis

This CPU bottleneck provides additional context for the sticky distribution behavior observed under load:

Single-Threaded Bottleneck: The hivemq-eventloop appears to handle shared subscription message distribution in a single-threaded manner, creating a serialization point.
Load-Dependent Failure: As message throughput increases, the single-threaded eventloop becomes saturated, potentially causing HiveMQ to fall back to simpler (but less balanced) distribution strategies.
Insufficient Parallelization: HiveMQ CE/Edge does not appear to leverage multiple CPU cores for shared subscription distribution, even when cores are available.

Implications

The sticky distribution behavior observed in our experiments may be a consequence of CPU saturation in the eventloop thread.
Increasing CPU count alone will not resolve the issue if the distribution logic remains single-threaded.
This reinforces the need for application-level mitigations or alternative broker choices for workloads requiring robust load balancing under high throughput. Also, using a single-node HiveMQ instance is not a production-environment, so our observations are mostly relevant for our particular test environment and are not transferable to a production-like environment.
Further performance improvements in legacy_proxy_2 are meaningless, the broker is the bottleneck in the experiment.
During an experiment of 1 hour and 48 minutes, we observed that the system behaves as an unstable M/M/1 queue (ρ > 1) despite being designed as an M/M/c system. The measured arrival rate was λ = 302.51 msg/s, the service rate μ = 22.36 msg/s, resulting in a utilization ρ ≈ 13.53, indicating severe overloading. However, our performance experiment does not crash and the queue limits are not exceeded but would this would likely happen if the experiment would run longer.
Draining the queue after the load test takes for example: 2707 seconds ~ 45 minutes.

Test Environment

Host: Fedora CoreOS 40 VM running on MacBook Pro M2 Pro
VM Configuration: 4 vCPUs, 4 GB RAM
Container Runtime: Docker/Podman
HiveMQ Version: Community Edition (hivemq-ce) and Edge (hivemq-edge)

References

HiveMQ CE issue (still open): hivemq/hivemq-community-edition#558
HiveMQ community report: https://community.hivemq.com/t/inconsistent-message-distribution-with-shared-subscriptions-in-hivemq-ce-on-kubernetes/2364

Usage

HTTP/2 Support with Granian

The proxy services in this project support HTTP/2 cleartext (h2c) connections through Granian, which provides performance benefits over HTTP/1.1 including request multiplexing and header compression.

Running Services with HTTP/2

The Docker Compose configurations (docker-compose-legacy.yml and docker-compose-ng.yml) already use the start_python_app_with_tc.sh script with the --use-granian flag to run all proxy services with HTTP/2 support:

# Services in docker-compose files use this format:
command: ./start_python_app_with_tc.sh ars_comp_1_proxy.py --use-granian

When you run the Docker Compose setup, all proxy services automatically start with Granian using HTTP/2 instead of the default Uvicorn with HTTP/1.1. The --http 2 flag is automatically passed to Granian in the script.

To manually run a service with HTTP/2 support outside of Docker Compose:

# Run a proxy service with HTTP/2 support using Granian
./start_python_app_with_tc.sh ars_comp_1_proxy.py --use-granian

Client-Side HTTP/2 Configuration

When httpx clients communicate with HTTP/2 services, they must be configured to use HTTP/2 exclusively. This is achieved by setting both http2=True and http1=False parameters when constructing the client:

import httpx

# For synchronous clients
client = httpx.Client(http2=True, http1=False)

# For asynchronous clients
async_client = httpx.AsyncClient(http2=True, http1=False)

Important: Without the http1=False parameter, httpx may attempt HTTP/1.1 negotiation, which can result in protocol errors like "illegal request line" when communicating with HTTP/2 services.

USE_HTTP_2 Environment Variable

The httpx clients in proxy services can be configured to use HTTP/2 via the USE_HTTP_2 environment variable. This provides fine-grained control over which client connections use HTTP/2:

# In docker-compose files
environment:
  - USE_HTTP_2=true  # Enable HTTP/2 for httpx client

When USE_HTTP_2 is set to true, 1, or yes, the httpx client will be configured with: * http2=True - Enable HTTP/2 protocol * http1=False - Disable HTTP/1.1 to force HTTP/2 cleartext (h2c)

This setting controls the client-side HTTP protocol used for outbound requests, independent of how the service itself accepts incoming connections (which is controlled by the --use-granian flag).

Request Flow Diagrams

Baseline Architecture:

┌──────────────┐                 ┌─────────────────┐                 ┌─────────────────┐                 ┌──────────────┐
│   Locust     │    HTTP/2 or    │  ars-comp-1     │    HTTP/2 or    │  ars-comp-2     │     HTTP/1.1    │     RAST     │
│ Load Tester  │────HTTP/1.1────▶│  (ARS SRV 1)    │────HTTP/1.1────▶│  (ARS SRV 2)    │────(forced)────▶│  Simulator   │
│              │  (USE_HTTP_2)   │                 │  (USE_HTTP_2)   │                 │                 │              │
└──────────────┘                 └─────────────────┘                 └─────────────────┘                 └──────────────┘
     │                                   │                                   │                                   │
     │                                   │                                   │                                   │
   Granian                            Granian                            Granian                               Ktor
  (--use-granian)                    (--use-granian)                    (--use-granian)                       (HTTP/1.1)
   HTTP/2 h2c                        HTTP/2 h2c                         HTTP/2 h2c
   or Uvicorn                        or Uvicorn                         or Uvicorn
   HTTP/1.1                          HTTP/1.1                           HTTP/1.1

   httpx Client                      httpx Client                      httpx Client
   HTTP/2 or HTTP/1.1                HTTP/2 or HTTP/1.1                HTTP/1.1 (forced)
   (USE_HTTP_2)                      (USE_HTTP_2)                      (ignores USE_HTTP_2)

Refactored Architecture with LPS:

┌──────────────┐                 ┌─────────────────┐                 ┌─────────────────┐      ┌──────────┐      ┌─────────────────┐                 ┌─────────────────┐                 ┌──────────────┐
│   Locust     │    HTTP/2 or    │  ars-comp-1     │    HTTP/2 or    │ legacy-proxy-1  │      │  MQTT    │      │ legacy-proxy-2  │    HTTP/2 or    │  ars-comp-2     │     HTTP/1.1    │     RAST     │
│ Load Tester  │────HTTP/1.1────▶│  (ARS SRV 1)    │────HTTP/1.1────▶│  (HTTP→MQTT)    │─────▶│  Broker  │─────▶│  (MQTT→HTTP)    │────HTTP/1.1────▶│  (ARS SRV 2)    │────(forced)────▶│  Simulator   │
│              │  (USE_HTTP_2)   │                 │  (USE_HTTP_2)   │                 │ QoS2 │ MQTTv5   │ QoS2 │                 │  (USE_HTTP_2)   │                 │                 │              │
└──────────────┘                 └─────────────────┘                 └─────────────────┘      └──────────┘      └─────────────────┘                 └─────────────────┘                 └──────────────┘
     │                                   │                                   │                                           │                                   │                                   │
     │                                   │                                   │                                           │                                   │                                   │
   Granian                            Granian                            Granian                                    No Server                          Granian                                 Ktor
   (--use-granian)                   (--use-granian)                   (--use-granian)                             (MQTT Client)                      (--use-granian)                         (HTTP/1.1)
   HTTP/2 h2c                        HTTP/2 h2c                        HTTP/2 h2c                                  aiomqtt                            HTTP/2 h2c
   or Uvicorn                        or Uvicorn                        or Uvicorn                                  MQTTv5                             or Uvicorn
   HTTP/1.1                          HTTP/1.1                          HTTP/1.1                                                                       HTTP/1.1
                                                                                                                   httpx Client
   httpx Client                      httpx Client                      aiomqtt                                     HTTP/2 or HTTP/1.1                 httpx Client
   HTTP/2 or HTTP/1.1                HTTP/2 or HTTP/1.1                MQTTv5                                      (USE_HTTP_2)                       HTTP/1.1 (forced)
   (USE_HTTP_2)                      (USE_HTTP_2)                      (publishes to MQTT)                                                            (ignores USE_HTTP_2)

Protocol Notes:

Server-side protocol (incoming): Controlled by --use-granian flag (HTTP/2) or default Uvicorn (HTTP/1.1)
Client-side protocol (outgoing): Controlled by USE_HTTP_2 environment variable in httpx clients
ars-comp-2 exception: Always uses HTTP/1.1 for RAST communication regardless of USE_HTTP_2 setting
MQTT Bridge: legacy-proxy-2 is an MQTT subscriber that initiates HTTP requests (no incoming HTTP server)

Files Configured for HTTP/2:

python/ars_comp_1_proxy.py - Uses HTTP/2 for downstream requests (configured via USE_HTTP_2)
python/ars_comp_2_proxy.py - HTTP/2 DISABLED: Uses HTTP/1.1 only for downstream requests to RAST simulator (hardcoded, does not respect USE_HTTP_2)
python/mqtt/legacy_proxy_2.py - Uses HTTP/2 for downstream requests (configured via USE_HTTP_2)
locust_scripts/common/common_locust.py - Load test client configured for HTTP/2 connections (configured via USE_HTTP_2)

License

This project is licensed under the BSD License - see the LICENSE file for details.

AI Disclosure

This project utilized Warp AI (warp.dev) as an AI assistant in its development process. Specifically:

Documentation: Warp AI was used to help write and improve portions of this README.adoc file, including troubleshooting guides and technical explanations.
Code Development: Warp AI served as a coding assistant for developing and debugging bash and Python scripts.

While AI tools provided valuable assistance, all code and documentation have been reviewed, tested, and validated by the author. The final implementation decisions, architectural choices, and experimental design originate from the author.

Name		Name	Last commit message	Last commit date
Latest commit History 206 Commits
Automations		Automations
RAST-Common-Python @ 4efcc7c		RAST-Common-Python @ 4efcc7c
Simulators @ d913a86		Simulators @ d913a86
hivemq		hivemq
locust_scripts @ 4ec12ee		locust_scripts @ 4ec12ee
python		python
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.adoc		README.adoc
WARP.md		WARP.md
add_submodules.sh		add_submodules.sh
pull_all_submodules.sh		pull_all_submodules.sh
update_all_submodules.sh		update_all_submodules.sh

Folders and files

Latest commit

History

Repository files navigation

Legacy Proxy System (LPS)

Cloning

Updates

Requirements

System Requirements

Dependencies

Bash Installation (MacOS)

Bash Installation (Ubuntu)

Additional tools

Instructions for Running Experiments

Troubleshooting

Network Delays Between Load Tester and Proxy Services

Problem Description

Analysis Process

Solution

Additional TCP Optimizations

Docker Implementation

Fedora CoreOS Compatibility

Alternative Solutions

References

Temporary DNS Resolution Errors in Performance Tests

Problem Description

Root Cause Analysis

Solution: Application-Level DNS Caching

Failover Experiment Considerations

Docker Network DNS Architecture

References

HiveMQ Shared Subscription Load Balancing Under Load (CE and Edge; Observed “Sticky” Distribution)

Context

What We Observed in the NG Experiment (using LPS)

Reproduction / Isolation Experiments

Key Findings

Conclusion

HiveMQ CPU Bottleneck: Single-Core Limitation

References

Usage

HTTP/2 Support with Granian

Running Services with HTTP/2

Client-Side HTTP/2 Configuration

USE_HTTP_2 Environment Variable

Request Flow Diagrams

License

AI Disclosure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 1

Languages