Skip to content

Conversation

@tanya732
Copy link
Contributor

@tanya732 tanya732 commented Jul 11, 2025

Changes

  1. JwkProvider Interface

Added a new default method getAll() that allows fetching all available JWKs at once. By default, it throws UnsupportedOperationException unless overridden.

  1. UrlJwkProvider Implementation
  • Introduced a thread-safe cache for JWKs using AtomicReference<List> cachedJwks.
  • Added setCachedJwks(List) for testability (to set the cache in tests).
  • Implemented getCachedJwks(), which returns cached JWKs if available, otherwise fetches them from the remote source and caches them.
  • Modified the get(String keyId) method:

Now attempts to find the key in the cache first.
If not found, it refreshes the cache from the remote JWKS endpoint and retries the lookup.
Throws SigningKeyNotFoundException if the key is still not found.
getAll() implementation remains, but now complements the new caching logic.

References

Please include relevant links supporting this change such as a:

  • support ticket
  • community post
  • StackOverflow post
  • support forum thread

Testing

Please describe how this can be tested by reviewers. Be specific about anything not tested and reasons why. If this library has unit and/or integration testing, tests should be added for new functionality and existing tests should complete without errors.

  • This change adds test coverage
  • This change has been tested on the latest version of Java or why not

Checklist

@jimmyjames
Copy link
Contributor

The following is from our gemini code review gem

Code Review

This pull request introduces a much-needed caching layer to the UrlJwkProvider, which is a great improvement for performance. However, there are some critical concurrency issues in the implementation that need to be addressed.

Category Location Issue Description Suggested Improvement
Bug 🐛 UrlJwkProvider.java, line 170-175 (getCachedJwks method) Incorrect Double-Checked Locking: The initial cache population logic does not re-check if the cache has been populated inside the synchronized block. If multiple threads call getCachedJwks() concurrently on a cold cache, they will all pass the if (jwks == null) check and queue up to execute the synchronized block, resulting in multiple unnecessary network requests to fetch the JWKS. The standard double-checked locking pattern should be used. Add a second null check for cachedJwks.get() immediately after entering the synchronized block to ensure the network call happens only once.
Performance ⚡️ UrlJwkProvider.java, line 188-204 (get method) Thundering Herd on Cache Miss: When a key is not found, a refresh is triggered inside a synchronized block. If multiple threads request different keys that are all missing from the cache, each thread will sequentially trigger a full network refresh. This is inefficient and can overload the JWKS endpoint. The refresh mechanism should prevent multiple concurrent refreshes. The same double-checked locking pattern used for the initial fetch should be applied here. Alternatively, consider a more advanced strategy like a time-based cache invalidation rather than refreshing on every miss.
Style 💅 UrlJwkProvider.java, lines 179-186 & 193-202 (get method) Duplicated Code: The logic to find a JWK by its keyId in a List<Jwk> is written twice: once for the cached list and once for the list fetched after a refresh. This violates the DRY (Don't Repeat Yourself) principle and makes the code harder to maintain. Extract the key-finding logic into a private helper method, like private Optional<Jwk> findKey(List<Jwk> jwks, String keyId), and call it for both the cached and the refreshed lists. This will simplify the get method and remove redundancy.
Style 🧹 UrlJwkProviderTest.java, line 348 Redundant Code in Test: The line provider.setCachedJwks(wrongJwks); is present twice consecutively in the shouldRefreshCacheIfKeyNotFound test. Remove the duplicate line to improve test code clarity.
Best Practice JwkProvider.java, line 15 Interface Default Method: Adding getAll() as a default method that throws UnsupportedOperationException is a valid way to extend an interface. However, it signals a shift in the provider's contract. Implementations that cannot fetch all keys at once may not support the new caching strategy. This is a minor design point, but consider adding a note in the Javadoc for getAll() explaining that it is intended to support caching provider implementations and may not be available for all providers. No code change is strictly necessary.

Summary

Top three areas for improvement:

  1. Fix Concurrency: The primary concern is the incorrect implementation of the double-checked locking pattern for the initial cache load, which can lead to bugs and performance degradation under load.
  2. Optimize Cache Refresh: The current refresh-on-miss strategy can cause a "thundering herd" problem. This should be made more robust to prevent redundant network calls.
  3. Refactor for Readability: The get method contains duplicated logic for finding a key. Refactoring this into a helper method would significantly improve the code's maintainability and clarity.

Confidence Rating: High

I am highly confident in this assessment. The identified issues, particularly the concurrency problems, are based on well-established Java programming patterns.

@tanya732 tanya732 marked this pull request as ready for review July 16, 2025 05:29
@tanya732 tanya732 requested a review from a team as a code owner July 16, 2025 05:29
@tanya732
Copy link
Contributor Author

@jimmyjames

Thank you for the brief summary, I have addressed the changes.

Copy link

@tusharpandey13 tusharpandey13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. A/C to this, cache-control headers for jwk_url return a default maxAge of 15s. We can take this as the default value of a configurable JWKs_refresh_interval for de-duplicating N/W calls.

  2. Why are we coupling local findKey with a network call on the same thread? Write operations are done on shared cache, why not delegate network calls to a dedicated network thread(pool), that uses a single-flight pattern to get data and update cache periodically.

The Single-Flight pattern is a concurrency design pattern that ensures only one execution of a function is in-flight for a given key at any given time

This can prevent deduplication of requests SDK-side and using JWKs_refresh_interval will ensure that rate limits are not hit.

@tanya732
Copy link
Contributor Author

  1. A/C to this, cache-control headers for jwk_url return a default maxAge of 15s. We can take this as the default value of a configurable JWKs_refresh_interval for de-duplicating N/W calls.
  2. Why are we coupling local findKey with a network call on the same thread? Write operations are done on shared cache, why not delegate network calls to a dedicated network thread(pool), that uses a single-flight pattern to get data and update cache periodically.

The Single-Flight pattern is a concurrency design pattern that ensures only one execution of a function is in-flight for a given key at any given time

This can prevent deduplication of requests SDK-side and using JWKs_refresh_interval will ensure that rate limits are not hit.

@tusharpandey13

Thank you for adding your inputs !!
The Single-Flight pattern is for preventing the "thundering herd" problem, especially on cache misses.
But the library's design already implements this pattern, and it's handled by a separate, higher-level class.

The UrlJwkProvider class, is a simple low-level provider that performs a synchronous, blocking network call. It's intentionally kept minimal without any caching or concurrency logic. This design allows it to be a flexible building block.

The single-flight pattern and thread management you've described are handled by the GuavaCachedJwkProvider class, which is a decorator that wraps the UrlJwkProvider. This is a common design pattern where a simple provider is decorated with powerful cross-cutting concerns like caching and concurrency.

Copy link

@tusharpandey13 tusharpandey13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@tanya732 tanya732 merged commit 1e21c7e into master Aug 1, 2025
8 checks passed
@tanya732 tanya732 deleted the sdk-6221-improve-cache-design branch August 1, 2025 05:33
This was referenced Aug 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants