Skip to content

feat: Add Pool On-Empty Behavior Configuration for Redis Connections#1018

Merged
collin-lee merged 4 commits intoenvoyproxy:mainfrom
notdu:main
Dec 7, 2025
Merged

feat: Add Pool On-Empty Behavior Configuration for Redis Connections#1018
collin-lee merged 4 commits intoenvoyproxy:mainfrom
notdu:main

Conversation

@notdu
Copy link
Copy Markdown
Contributor

@notdu notdu commented Dec 6, 2025

This PR is a follow-up to #987 (REDIS_TIMEOUT configuration).

Current Behavior

The ratelimit service uses radix for Redis connection pooling. The default radix behavior is PoolOnEmptyCreateAfter(1*time.Second):

  • REDIS_POOL_SIZE=50 → 50 persistent connections in the pool
  • When all 50 connections are in use → pool becomes "empty"
  • After 1 second wait → radix creates a new overflow connection (51st, 52nd, ...)
  • These overflow connections bypass the pool size limit

The ratelimit service currently passes no PoolOnEmpty options to radix, so there is no way to change this behavior.

Problem

When Redis becomes unresponsive (paused, network partition, or slow), this default behavior causes connection storms:

  1. All pool connections become blocked waiting for Redis
  2. New requests find an empty pool
  3. After 1 second, radix creates overflow connections for each waiting request
  4. With high request rates (e.g., 1k req/s), this can spawn thousands of goroutines
  5. Memory usage spikes, potentially causing OOM

This is problematic because:

  • REDIS_POOL_SIZE appears to limit connections but doesn't enforce it
  • Operators have no control over pool exhaustion behavior
  • The service cannot fail-fast during Redis outages

Proposed Solution

Introduce configurable pool on-empty behavior via environment variables:

Variable Description Default
REDIS_POOL_ON_EMPTY_BEHAVIOR ERROR, CREATE, or WAIT "" (radix default: CREATE after 1s)
REDIS_POOL_ON_EMPTY_WAIT_DURATION Duration before taking action 0 (immediate)

Behavior options:

  • ERROR: Return error after wait duration. Enforces strict pool size limit. Recommended for production to fail-fast during Redis outages.
  • CREATE: Create overflow connection after wait duration. Current default behavior.
  • WAIT: Block until a connection becomes available. Enforces pool size but risks goroutine buildup.

Test Result: REDIS_POOL_ON_EMPTY_BEHAVIOR

Configuration

Memory Limits:

  • Pod Memory Limit: 256 MB
  • GOMEMLIMIT: 200 MiB
  • GOGC: 300

Redis Configuration:

  • REDIS_TIMEOUT: 1000ms
  • REDIS_POOL_SIZE: 50

k6 Load Test:

  • rate: 1k requests/sec
  • timeout: 100ms
  • preAllocatedVUs: 50
  • maxVUs: 50

Redis: PAUSED for 5 minutes


Before REDIS_POOL_ON_EMPTY_BEHAVIOR config

image image

After REDIS_POOL_ON_EMPTY_BEHAVIOR=ERROR config

REDIS_POOL_ON_EMPTY_BEHAVIOR=ERROR
REDIS_POOL_ON_EMPTY_WAIT_DURATION=50ms
image image

Changes

  • Added REDIS_POOL_ON_EMPTY_BEHAVIOR and REDIS_POOL_ON_EMPTY_WAIT_DURATION settings
  • Added REDIS_PERSECOND_POOL_ON_EMPTY_BEHAVIOR and REDIS_PERSECOND_POOL_ON_EMPTY_WAIT_DURATION settings
  • Updated Redis client to apply pool options based on configuration
  • Added unit tests for new settings
  • Updated README.md with new configuration options

Related

@notdu notdu force-pushed the main branch 2 times, most recently from c590474 to a505335 Compare December 6, 2025 11:53
@notdu
Copy link
Copy Markdown
Contributor Author

notdu commented Dec 6, 2025

hi @collin-lee, @arkodg could you please help me review this PR?

Comment thread README.md Outdated
Comment thread src/redis/driver_impl.go Outdated
Comment thread src/redis/driver_impl.go
Comment thread src/redis/driver_impl.go Outdated
Comment on lines +127 to +130
// Empty string = use radix default (PoolOnEmptyCreateAfter(1s))
if poolOnEmptyBehavior != "" {
logger.Warnf("Redis pool %s: unknown on-empty behavior '%s', using default (CREATE after 1s)", maskedUrl, poolOnEmptyBehavior)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make the default behavior is CREATE and default wait time is 1s?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think, @collin-lee? Should we set the default value for REDIS_POOL_ON_EMPTY_BEHAVIOR to CREATE and REDIS_POOL_ON_EMPTY_WAIT_DURATION to 1s?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it the default behavior when poolOnEmptyBehavior is empty?

// Empty string = use radix default (PoolOnEmptyCreateAfter(1s))

I'm not sure, but my opinion is that I think it's better to fail fast in production. What would be the case to set to CREATE/1s?

Copy link
Copy Markdown
Contributor Author

@notdu notdu Dec 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CREATE is the default to maintain backward compatibility with radix's default behavior; ERROR is recommended for production, but changing the default would be a breaking change for existing deployments.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've refactored to make it explicit: CREATE with 1s wait duration - which produces the same behavior but is clearer and removes the "magic empty string". Please help me review it @collin-lee

Comment thread README.md Outdated
Comment thread README.md Outdated
Comment thread README.md Outdated
notdu added 3 commits December 6, 2025 22:59
Signed-off-by: notdu <huudutg@gmail.com>
Signed-off-by: notdu <huudutg@gmail.com>
Signed-off-by: notdu <huudutg@gmail.com>
@notdu notdu requested a review from collin-lee December 7, 2025 07:57
@collin-lee collin-lee merged commit fc44670 into envoyproxy:main Dec 7, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants