Skip to content

Conversation

@peterxcli
Copy link
Owner

What changes were proposed in this pull request?

Provide a one-liner summary of the changes in the PR Title field above.
It should be in the form of HDDS-1234. Short summary of the change.

Please describe your PR in detail:

  • What changes are proposed in the PR? and Why? It would be better if it is written from third person's
    perspective not just for the reviewer.
  • Provide as much context and rationale for the pull request as possible. It could be copy-paste from
    the Jira's description if the jira is well defined.
  • If it is complex code, describe the approach used to solve the issue. If possible attach design doc,
    issue investigation, github discussion, etc.

Examples of well-written pull requests:

What is the link to the Apache JIRA

Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull
request which starts with the corresponding JIRA issue number. (e.g. HDDS-XXXX. Fix a typo in YYY.)

(Please replace this section with the link to the Apache JIRA)

How was this patch tested?

(Please explain how this patch was tested. Ex: unit tests, manual tests, workflow run on the fork git repo.)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this.)

@peterxcli peterxcli marked this pull request as draft October 8, 2025 08:02
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

authorizer.getClass().getName(), e);
}
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Plugin Lifecycle Issues Cause Resource Leaks

The new plugin lifecycle management calls IAccessAuthorizerPlugin.start() but lacks a corresponding stop() during shutdown, leading to resource leaks. The start() call also happens prematurely, before OzoneManagerAuthorizer.configure() or SharedTmpDirAuthorizer wrapping. This can result in an uninitialized authorizer if configure() returns a new instance, or a started plugin that is wrapped and not properly stopped.

Fix in Cursor Fix in Web

if (cacheEnabled && decisionCache != null) {
decisionCache.asMap().keySet().removeIf(
key -> key.getResourcePath().startsWith(resourcePath));
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Cache TTL Ignored, Over-Invalidation Risk

The put method in the authorization decision cache doesn't honor the ttlMillis parameter, always applying the default cache TTL, which limits setting specific expiration times. Additionally, invalidateResource uses startsWith for path matching, potentially causing unintended over-invalidation of cache entries.

Fix in Cursor Fix in Web

@gemini-code-assist
Copy link

Summary of Changes

Hello @peterxcli, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new, extensible authorization framework for Ozone. It includes interfaces and base classes for pluggable authorizers, a dedicated caching mechanism for authorization decisions, and a comprehensive metrics system to monitor authorization performance and activity. The framework is designed to support modern IAM features and external authorization systems, providing lifecycle management, health checks, and configurable caching with detailed statistics.

Highlights

  • Intent: This pull request introduces a comprehensive authorization framework for Ozone, designed to support pluggable external authorization systems. The primary goal is to enhance performance and observability by integrating caching mechanisms and detailed metrics collection into the authorization process.
  • Key Changes: The PR adds new interfaces and base classes for authorization plugins, a dedicated authorization decision cache, and a metrics system. It also modifies the authorizer factory to properly initialize these new plugin types.
  • New Components: - "AuthorizationDecisionCache.java": Defines an interface for caching authorization decisions, including methods for getting, putting, invalidating by resource/subject/tenant, clearing, and retrieving statistics. It also includes nested classes for CacheKey (covering subject, resource, action, tenant, session token) and CacheStats.
  • "AuthorizerMetrics.java": Introduces a new metrics source for tracking authorization operations. It collects metrics such as total checks, allowed/denied decisions, errors, latency, cache hits/misses, and per-action/per-resource check counts.
  • "IAccessAuthorizerPlugin.java": An extended interface for pluggable authorization implementations, building upon IAccessAuthorizer. It adds lifecycle management (start, stop), health check capabilities (isHealthy, getStatus), metrics retrieval (getMetrics), and an optional policy refresh mechanism.
  • "BaseAuthorizerPlugin.java": An abstract base class that implements IAccessAuthorizerPlugin and AuthorizationDecisionCache. It provides common functionality like configurable decision caching (using Guava Cache), metrics collection, and basic lifecycle management. Subclasses only need to implement the core doStart, doCheckAccess, and doStop logic.
  • "OzoneAuthorizationContext.java": A new, extended authorization context class that wraps the existing RequestContext. It adds support for modern IAM features by including fields for tenantId, sessionToken, and a flexible map of custom attributes (e.g., OIDC claims).
  • Modified Components: - "OzoneAuthorizerFactory.java": Modified to detect and properly initialize IAccessAuthorizerPlugin implementations by calling their start() method during authorizer creation, ensuring that plugins with lifecycle requirements are correctly set up.
  • Benefits: This framework provides a standardized way to integrate advanced authorization logic, improves performance by caching authorization decisions, and offers better visibility into authorization activity through comprehensive metrics. It also simplifies the development of custom authorizers by providing a robust base class with built-in features.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a solid framework for pluggable authorization, including caching and metrics. The new interfaces and base classes are well-designed. I've identified a few areas for improvement, mainly concerning correctness in metrics collection, cache implementation details, and a potential resource leak. Please see my detailed comments below.

Comment on lines 92 to 101
if (authorizer instanceof IAccessAuthorizerPlugin) {
try {
((IAccessAuthorizerPlugin) authorizer).start(conf);
LOG.info("Started authorizer plugin: {}", authorizer.getClass().getName());
} catch (Exception e) {
LOG.error("Failed to start authorizer plugin", e);
throw new RuntimeException("Failed to start authorizer plugin: " +
authorizer.getClass().getName(), e);
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This logic correctly handles the startup of IAccessAuthorizerPlugin instances by calling start(). However, there is no corresponding call to stop() (or close()) for these plugins when the OzoneManager shuts down.

The IAccessAuthorizerPlugin interface extends Closeable, which implies that implementations may hold resources (like network connections or thread pools) that must be released. Failing to call stop() will lead to resource leaks.

Please ensure that the OzoneManager's shutdown sequence is updated to call stop() on the authorizer if it is an instance of IAccessAuthorizerPlugin.

@Override
public void put(CacheKey key, boolean decision, long ttlMillis) {
if (cacheEnabled && decisionCache != null) {
decisionCache.put(key, decision);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The put method implementation ignores the ttlMillis parameter. The cache is configured with a single expireAfterWrite duration for all entries. This contradicts the design of the AuthorizationDecisionCache interface, which is intended to support per-entry TTLs. The interface's Javadoc even suggests using different TTLs for 'allow' and 'deny' decisions, which is a common and useful pattern.

To properly implement per-entry TTL with Guava Cache, you would need to use CacheBuilder.expireAfter(Expiry). Since this implementation doesn't honor the interface contract, it could lead to unexpected caching behavior.

Comment on lines +102 to +104
private final AtomicLong cacheHits = new AtomicLong(0);
private final AtomicLong cacheMisses = new AtomicLong(0);
private final AtomicLong cacheEvictions = new AtomicLong(0);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These AtomicLong counters for cache statistics are redundant. The code is already using AuthorizerMetrics for the Hadoop metrics system, and the Guava cache is configured with .recordStats(), which provides its own, more efficient and reliable statistics via decisionCache.stats().

Using three different ways to track cache stats increases complexity and can lead to inconsistencies.

I recommend removing these AtomicLong fields and refactoring getMetrics() and getStats() to use decisionCache.stats() for hit, miss, and eviction counts. This would also require removing the manual increments in checkAccess().

.maximumSize(cacheSize)
.expireAfterWrite(cacheTtlMs, TimeUnit.MILLISECONDS)
.recordStats()
.removalListener(notification -> cacheEvictions.incrementAndGet())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation of the removalListener incorrectly counts all removals as evictions. The cacheEvictions counter is incremented for any removal, including explicit invalidations (invalidate()/invalidateAll()), not just for evictions due to size, expiration, or garbage collection.

To accurately count only evictions, you should check notification.wasEvicted().

          .removalListener(notification -> {
            if (notification.wasEvicted()) {
              cacheEvictions.incrementAndGet();
            }
          })

Comment on lines 298 to 299
decisionCache.asMap().keySet().removeIf(
key -> key.getResourcePath().startsWith(resourcePath));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The cache invalidation logic in invalidateResource, invalidateSubject, and invalidateTenant involves iterating over the entire key set of the cache (decisionCache.asMap().keySet().removeIf(...)). This is an O(N) operation, where N is the cache size. For a large cache, this can be slow and may impact performance by blocking other operations, especially if policy or user updates are frequent.

For a more scalable solution, you might consider maintaining secondary indexes to allow for faster lookups of keys to invalidate.

…fecycle of access authorizer plugins. Added closing logic for Closeable authorizers in OzoneManager and improved configuration handling in OzoneAuthorizerFactory.
…ry TTL for authorization decisions. Introduced CachedDecision class to manage expiration and updated cache retrieval logic to invalidate expired entries. Enhanced resource invalidation to handle hierarchical matches.
@github-actions
Copy link

This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days.

@github-actions github-actions bot added the stale label Nov 14, 2025
@peterxcli peterxcli closed this Nov 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants