feat: Complete accelerator-aware resource limits (Issue #222) #230

yaya1738 · 2025-12-04T05:17:28Z

Summary

Completes the implementation of accelerator-aware resource limits for AI workloads.

Closes #222

Changes

Full cgroups v2 integration with unified hierarchy support
Workload presets (inference, training, batch, interactive)
CPU quota/weight controls with proper cgroup interface
Memory hard (max) and soft (high) limits
GPU isolation via CUDA_VISIBLE_DEVICES, HIP_VISIBLE_DEVICES, ONEAPI_DEVICE_SELECTOR
OOM score adjustment for process priority protection
User mode delegation support for non-root usage
SQLite-based profile storage
Complete CLI with create/apply/env/status/list/delete/presets commands

Files

cortex/kernel_features/accelerator_limits.py (718 lines) - Complete implementation
tests/test_accelerator_limits.py (394 lines) - 32 unit tests
README_ACCELERATOR_LIMITS.md - Documentation

Acceptance Criteria

Profiles can be created with presets
Limits are correctly applied to cgroups
GPU environment variables are generated
Works in user mode with delegation
Unit tests pass (32/32)

Test Plan

pytest tests/test_accelerator_limits.py -v
# All 32 tests pass

Usage Examples

cortex limits create inference-job --preset inference --gpus 2
cortex limits apply inference-job --pid 12345
eval $(cortex limits env inference-job)
cortex limits status

Summary by CodeRabbit

New Features
- Added accelerator-limits functionality to manage GPU, CPU, and memory resources.
- Introduced workload presets: inference, training, batch, and interactive.
- New CLI commands: create, apply, env, status, delete, and presets for resource profile management.
- Support for environment variable generation and non-root user delegation.
Documentation
- Added accelerator-limits documentation with quick-start guide and architecture overview.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Closes cortexlinux#222 ## Summary - Full cgroups v2 integration with unified hierarchy support - Workload presets (inference, training, batch, interactive) - CPU quota/weight, memory max/high limits - GPU isolation via CUDA_VISIBLE_DEVICES - OOM score adjustment for process priority - User mode delegation support - 32 passing unit tests ## Files - cortex/kernel_features/accelerator_limits.py (718 lines) - tests/test_accelerator_limits.py (394 lines) - README_ACCELERATOR_LIMITS.md

coderabbitai · 2025-12-04T05:17:36Z

Walkthrough

Introduces accelerator-limits functionality for managing GPU, CPU, and memory resources for AI workloads via cgroups v2. Adds comprehensive resource management classes (AcceleratorLimitsManager, LimitsDatabase, CgroupsV2Controller, OOMScoreManager) with a CLI supporting profile creation, application to live processes, environment variable generation, status reporting, and preset-based configuration.

Changes

Cohort / File(s)	Summary
Documentation `README_ACCELERATOR_LIMITS.md`	Documents the accelerator-limits feature including quick-start commands (create, apply, env, status, delete), environment variables, preset configurations (inference, training, batch, interactive), architecture components, related files, and usage guidelines.
Core Implementation `cortex/kernel_features/accelerator_limits.py`	Introduces ResourceLimits dataclass with factory methods; LimitsDatabase for persistent profile storage with PID tracking; CgroupsV2Controller for cgroups v2 operations (creation, CPU/memory limits, PID management); OOMScoreManager for OOM score adjustments; AcceleratorLimitsManager as central orchestrator for create/apply/status/delete operations; comprehensive CLI entry point with subcommands and preset selection.
Tests `tests/test_accelerator_limits.py`	Comprehensive unit test module covering ResourceLimits validation, LimitsDatabase CRUD operations, CgroupsV2Controller behavior, OOMScoreManager functionality, AcceleratorLimitsManager lifecycle, preset definitions, and CLI integration.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant CLI as CLI<br/>(main)
    participant ALM as AcceleratorLimitsManager
    participant DB as LimitsDatabase
    participant Cgroups as CgroupsV2Controller
    participant OOM as OOMScoreManager
    participant SysFS as Linux cgroups v2

    User->>CLI: cortex limits apply profile-name --pid 12345
    CLI->>ALM: apply(profile_name, pid)
    ALM->>DB: get(profile_name)
    DB-->>ALM: ResourceLimits
    ALM->>Cgroups: create_cgroup(path)
    Cgroups->>SysFS: mkdir cgroup hierarchy
    SysFS-->>Cgroups: success
    Cgroups-->>ALM: true
    ALM->>Cgroups: apply_cpu_limits(path, quota, weight)
    Cgroups->>SysFS: write cpu.max, cpu.weight
    SysFS-->>Cgroups: success
    Cgroups-->>ALM: true
    ALM->>Cgroups: apply_memory_limits(path, max, high)
    Cgroups->>SysFS: write memory.max, memory.high
    SysFS-->>Cgroups: success
    Cgroups-->>ALM: true
    ALM->>Cgroups: add_pid(path, pid)
    Cgroups->>SysFS: write cgroup.procs
    SysFS-->>Cgroups: success
    Cgroups-->>ALM: true
    ALM->>OOM: set_oom_score_adj(pid, score)
    OOM->>SysFS: write /proc/[pid]/oom_score_adj
    SysFS-->>OOM: success
    OOM-->>ALM: true
    ALM->>DB: record_pid(pid, profile_name)
    DB-->>ALM: recorded
    ALM-->>CLI: true (success)
    CLI-->>User: Profile applied successfully

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

CgroupsV2Controller implementation: Verify system-level cgroups v2 operations (file I/O, error handling, permission delegation), path construction, and availability checks
LimitsDatabase schema and CRUD logic: Ensure database initialization, profile persistence, and PID tracking are correct and handle edge cases
AcceleratorLimitsManager orchestration: Validate component interactions, error propagation, and graceful fallbacks across create/apply/delete operations
CLI parameter validation: Review CLI argument parsing, preset selection, and override handling (CPU/memory/GPU/OOM options)
Integration points: Verify data flow between manager, database, cgroups controller, and OOM manager; check for race conditions or missing cleanup

Possibly related PRs

Add Kernel-Level AI Features (Tier 1) #224: Directly related as it modifies the same accelerator_limits.py module and overlapping classes (ResourceLimits, AcceleratorLimitsManager, LimitsDatabase, CLI).

Suggested labels

enhancement, kernel-features

Poem

🐰 New limits bloom in cgroups deep,
GPU cores and memory to keep,
Presets dance from train to play,
Resources managed AI's way!
hops in approval ✨

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically summarizes the main change: completing accelerator-aware resource limits implementation with reference to the linked issue.
Description check	✅ Passed	The PR description is comprehensive with summary, changes, files, acceptance criteria, test plan, and usage examples. It exceeds template requirements and provides clear context.
Linked Issues check	✅ Passed	All coding requirements from issue #222 are met: cgroups v2 integration, workload presets (inference/training/batch/interactive), GPU isolation via environment variables, OOM score adjustment, CPU controls, memory limits, user-mode delegation, and 32 passing unit tests.
Out of Scope Changes check	✅ Passed	All changes are directly aligned with issue #222 objectives: new accelerator_limits.py module, comprehensive tests, and documentation. No out-of-scope modifications detected.
Docstring Coverage	✅ Passed	Docstring coverage is 94.37% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sonarqubecloud · 2025-12-04T05:18:08Z

Quality Gate passed

Issues
4 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (7)

cortex/kernel_features/accelerator_limits.py (4)

29-37: Avoid configuring global logging at import time in a library module

Calling logging.basicConfig at module import can interfere with applications’ own logging configuration. Prefer configuring logging in the CLI entrypoint (or the top-level app) and keep this module to just creating a logger with getLogger(__name__).
-# Set up logging
-logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s')
-logger = logging.getLogger(__name__)
+# Set up module-level logger; configuration should be done by the caller/CLI.
+logger = logging.getLogger(__name__)
You can then configure logging inside main() if desired, e.g. logging.basicConfig(... ) there rather than at import time.

243-266: Tighten broad except Exception usage and improve logging

Several methods (create_cgroup, apply_cpu_limits, apply_memory_limits, add_pid, get_pids, and the OOMScoreManager methods) catch bare Exception, sometimes with pass, which can hide unexpected errors and complicate debugging.

Narrow the caught exceptions where feasible (OSError, IOError, ValueError, etc.).

Where you do intend to catch “anything” for robustness, log with logger.exception(...) instead of logger.error(...) or pass so stack traces are preserved.
For example:
-        except Exception as e:
-            logger.error(f"Failed to set CPU limits: {e}")
+        except Exception:
+            logger.exception("Failed to set CPU limits")
And for try/except blocks that currently just pass, at least log at debug level so silent failures are observable when needed.

Also applies to: 284-300, 313-329, 336-344, 351-357, 386-394, 400-404

467-471: Remove unnecessary f-string in log message

The print call here uses an f-string without any interpolation.
-        else:
-            print(f"Could not set OOM score (may need root)")
+        else:
+            print("Could not set OOM score (may need root)")
             success = False
This also addresses the Ruff/Sonar warning about an f-string with no replacement fields.

638-672: Optional: deduplicate repeated "Profile name" help text

The literal "Profile name" is repeated across multiple subparsers; Sonar flags this for duplication. Not critical, but you could factor it into a small constant to centralize future changes.
PROFILE_NAME_HELP = "Profile name"

create_parser.add_argument("name", help=PROFILE_NAME_HELP)
apply_parser.add_argument("name", help=PROFILE_NAME_HELP)
delete_parser.add_argument("name", help=PROFILE_NAME_HELP)
env_parser.add_argument("name", help=PROFILE_NAME_HELP)
This is purely cosmetic and can be deferred if you prefer.

tests/test_accelerator_limits.py (2)

1-12: Consider removing the shebang from this test module

The file has a #!/usr/bin/env python3 shebang but is not typically executed directly (it’s run via pytest/unittest). This triggers EXE001 because the file isn’t executable and is unnecessary for a test module.
-#!/usr/bin/env python3
-"""
-Unit tests for Cortex Accelerator-Aware Resource Limits
-"""
+"""
+Unit tests for Cortex Accelerator-Aware Resource Limits
+"""
Keeping tests as plain modules avoids the linter warning and matches common practice.

212-223: Use _ for the unused path variable in cgroup creation test

In test_create_cgroup_mock, path is unpacked but never used, which Ruff flags as RUF059. Using _ clarifies that the second return value is intentionally ignored.
-            controller = CgroupsV2Controller(root)
-            success, path = controller.create_cgroup("cortex.slice/test-job")
+            controller = CgroupsV2Controller(root)
+            success, _ = controller.create_cgroup("cortex.slice/test-job")
This keeps the test behavior unchanged while addressing the linter warning.

README_ACCELERATOR_LIMITS.md (1)

124-135: Add a language to the architecture fenced code block

markdownlint flags this block for missing a language identifier. Since it’s ASCII art / diagram, text (or none) would be appropriate.

-```
+```text
 AcceleratorLimitsManager
 ├── LimitsDatabase (SQLite storage)
 ├── CgroupsV2Controller (cgroups interface)
 └── OOMScoreManager (OOM score adjustment)
@@
 ├── GPU: device IDs, percentage
 └── OOM: score adjustment

This keeps rendering unchanged while satisfying MD040.

</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used**: CodeRabbit UI

**Review profile**: CHILL

**Plan**: Pro

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between da3e6355b164ac48826016b8745cbdb2a59df02b and e7ec868a2a3b2206c6e3d5af92fe60faa8713c97.

</details>

<details>
<summary>📒 Files selected for processing (3)</summary>

* `README_ACCELERATOR_LIMITS.md` (1 hunks)
* `cortex/kernel_features/accelerator_limits.py` (1 hunks)
* `tests/test_accelerator_limits.py` (1 hunks)

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>🧬 Code graph analysis (1)</summary>

<details>
<summary>tests/test_accelerator_limits.py (1)</summary><blockquote>

<details>
<summary>cortex/kernel_features/accelerator_limits.py (24)</summary>

* `ResourceLimits` (90-135)
* `LimitsDatabase` (138-213)
* `CgroupsV2Controller` (216-368)
* `OOMScoreManager` (371-405)
* `AcceleratorLimitsManager` (408-613)
* `WorkloadPreset` (39-44)
* `from_preset` (113-126)
* `to_dict` (128-130)
* `from_dict` (133-135)
* `save` (166-173)
* `get` (175-183)
* `list_all` (192-196)
* `delete` (185-190)
* `delete` (587-600)
* `record_pid` (198-204)
* `get_pids` (206-213)
* `get_pids` (346-358)
* `is_available` (223-225)
* `get_available_controllers` (236-241)
* `create_cgroup` (243-266)
* `get_oom_score_adj` (397-405)
* `create` (416-443)
* `get_env` (478-507)
* `main` (616-714)

</details>

</blockquote></details>

</details><details>
<summary>🪛 GitHub Check: SonarCloud Code Analysis</summary>

<details>
<summary>tests/test_accelerator_limits.py</summary>

[warning] 219-219: Replace the unused local variable "path" with "_".

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZrny9ER6oUfBTHzwt2H&open=AZrny9ER6oUfBTHzwt2H&pullRequest=230

</details>
<details>
<summary>cortex/kernel_features/accelerator_limits.py</summary>

[warning] 470-470: Add replacement fields or use a normal string instead of an f-string.

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZrny9A_6oUfBTHzwt2G&open=AZrny9A_6oUfBTHzwt2G&pullRequest=230

---

[failure] 243-243: Refactor this function to reduce its Cognitive Complexity from 17 to the 15 allowed.

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZrny9A_6oUfBTHzwt2F&open=AZrny9A_6oUfBTHzwt2F&pullRequest=230

---

[failure] 639-639: Define a constant instead of duplicating this literal "Profile name" 4 times.

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZrny9A_6oUfBTHzwt2E&open=AZrny9A_6oUfBTHzwt2E&pullRequest=230

</details>

</details>
<details>
<summary>🪛 markdownlint-cli2 (0.18.1)</summary>

<details>
<summary>README_ACCELERATOR_LIMITS.md</summary>

124-124: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>
<details>
<summary>🪛 Ruff (0.14.7)</summary>

<details>
<summary>tests/test_accelerator_limits.py</summary>

1-1: Shebang is present but file is not executable

(EXE001)

---

219-219: Unpacked variable `path` is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

</details>
<details>
<summary>cortex/kernel_features/accelerator_limits.py</summary>

265-265: Do not catch blind exception: `Exception`

(BLE001)

---

294-294: Consider moving this statement to an `else` block

(TRY300)

---

298-298: Do not catch blind exception: `Exception`

(BLE001)

---

299-299: Use `logging.exception` instead of `logging.error`

Replace with `exception`

(TRY400)

---

323-323: Consider moving this statement to an `else` block

(TRY300)

---

327-327: Do not catch blind exception: `Exception`

(BLE001)

---

328-328: Use `logging.exception` instead of `logging.error`

Replace with `exception`

(TRY400)

---

342-342: Do not catch blind exception: `Exception`

(BLE001)

---

343-343: Use `logging.exception` instead of `logging.error`

Replace with `exception`

(TRY400)

---

356-357: `try`-`except`-`pass` detected, consider logging the exception

(S110)

---

356-356: Do not catch blind exception: `Exception`

(BLE001)

---

366-366: Consider moving this statement to an `else` block

(TRY300)

---

392-392: Do not catch blind exception: `Exception`

(BLE001)

---

393-393: Use `logging.exception` instead of `logging.error`

Replace with `exception`

(TRY400)

---

403-404: `try`-`except`-`pass` detected, consider logging the exception

(S110)

---

403-403: Do not catch blind exception: `Exception`

(BLE001)

---

470-470: f-string without any placeholders

Remove extraneous `f` prefix

(F541)

</details>

</details>

</details>

<details>
<summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)</summary>

* GitHub Check: Agent

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

coderabbitai · 2025-12-04T05:22:19Z

cortex/kernel_features/accelerator_limits.py

    def __post_init__(self):
-        self.gpu_ids = self.gpu_ids or []
-
+        if self.gpu_ids is None:
+            self.gpu_ids = []
+        if self.cpu_affinity is None:
+            self.cpu_affinity = []
+        if not self.cgroup_path:
+            self.cgroup_path = f"{CORTEX_CGROUP}/{self.name}"
+


⚠️ Potential issue | 🟠 Major

Sanitize profile names before embedding them into cgroup paths

cgroup_path is built directly from name and later used as a relative path under /sys/fs/cgroup. If a caller passes characters like .. or path separators, this can escape the intended CORTEX_CGROUP subtree and potentially alter limits of unrelated cgroups when run as root.

def __post_init__(self): if self.gpu_ids is None: self.gpu_ids = [] if self.cpu_affinity is None: self.cpu_affinity = [] if not self.cgroup_path: - self.cgroup_path = f"{CORTEX_CGROUP}/{self.name}" + # Sanitize profile name to avoid escaping the Cortex cgroup subtree. + safe_name = "".join( + c for c in self.name + if c.isalnum() or c in ("-", "_", ".") + ) + if not safe_name: + raise ValueError(f"Invalid profile name {self.name!r}") + self.cgroup_path = f"{CORTEX_CGROUP}/{safe_name}"

This keeps existing tests (which use simple names) passing while preventing path traversal via the profile name.

🤖 Prompt for AI Agents

In cortex/kernel_features/accelerator_limits.py around lines 104-111, sanitize self.name before embedding it into cgroup_path: compute a safe_name (e.g., use os.path.basename(self.name) or strip all path separators and '..', then replace any non-alphanumeric/_/- characters with a safe substitute or fallback to a generated slug), ensure it has no leading '/' and no '..' segments, then build cgroup_path using os.path.join(CORTEX_CGROUP, safe_name), normalize with os.path.normpath, and finally assert or raise if the resulting path does not start with CORTEX_CGROUP to prevent path traversal.

coderabbitai · 2025-12-04T05:22:19Z

cortex/kernel_features/accelerator_limits.py

+        # CPU hints
+        if limits.cpu_quota > 0:
+            cores = int(limits.cpu_quota / 100)
+            env["OMP_NUM_THREADS"] = str(cores)
+            env["MKL_NUM_THREADS"] = str(cores)
+            env["OPENBLAS_NUM_THREADS"] = str(cores)
+


⚠️ Potential issue | 🟡 Minor

Guard against OMP_NUM_THREADS=0 when cpu_quota < 100

cores = int(limits.cpu_quota / 100) will produce 0 if a user supplies a cpu_quota lower than 100%, leading to OMP_NUM_THREADS=0 and similar values, which are invalid for most runtimes.

- if limits.cpu_quota > 0: - cores = int(limits.cpu_quota / 100) + if limits.cpu_quota > 0: + # Ensure at least 1 thread even for sub-100% quotas. + cores = max(1, int(limits.cpu_quota / 100)) env["OMP_NUM_THREADS"] = str(cores) env["MKL_NUM_THREADS"] = str(cores) env["OPENBLAS_NUM_THREADS"] = str(cores)

This keeps current behavior for all presets (multiples of 100) while making custom low-CPU profiles safe.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# CPU hints

if limits.cpu_quota > 0:

cores = int(limits.cpu_quota / 100)

env["OMP_NUM_THREADS"] = str(cores)

env["MKL_NUM_THREADS"] = str(cores)

env["OPENBLAS_NUM_THREADS"] = str(cores)

# CPU hints

if limits.cpu_quota > 0:

# Ensure at least 1 thread even for sub-100% quotas.

cores = max(1, int(limits.cpu_quota / 100))

env["OMP_NUM_THREADS"] = str(cores)

env["MKL_NUM_THREADS"] = str(cores)

env["OPENBLAS_NUM_THREADS"] = str(cores)

🤖 Prompt for AI Agents

In cortex/kernel_features/accelerator_limits.py around lines 500 to 506, the calculation cores = int(limits.cpu_quota / 100) can produce 0 for cpu_quota < 100 causing invalid OMP_NUM_THREADS=0; change the calculation to ensure at least 1 thread by using cores = max(1, int(limits.cpu_quota / 100)) (keeping the existing if limits.cpu_quota > 0 guard) so presets (multiples of 100) behave the same while low custom cpu_quota values result in a safe minimum of 1.

Copilot

Pull request overview

This PR implements a comprehensive accelerator-aware resource limits system for AI workloads using cgroups v2. It introduces a Python-based wrapper that allows users to create and manage resource profiles with predefined workload presets (inference, training, batch, interactive) and apply them to processes. The implementation includes CPU quota/weight controls, memory limits (hard and soft), GPU device isolation via environment variables for multiple vendors (NVIDIA, AMD, Intel), OOM score adjustments, and SQLite-based profile storage with a full CLI interface.

Key Changes:

Complete cgroups v2 controller implementation with unified hierarchy support and user mode delegation
Four workload presets optimized for different AI use cases with sensible CPU, memory, GPU, and OOM defaults
Multi-vendor GPU isolation supporting NVIDIA CUDA, AMD ROCm, and Intel oneAPI through environment variables

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 17 comments.

File	Description
cortex/kernel_features/accelerator_limits.py	Complete implementation with ResourceLimits dataclass, LimitsDatabase (SQLite), CgroupsV2Controller, OOMScoreManager, AcceleratorLimitsManager, and CLI (718 lines)
tests/test_accelerator_limits.py	Comprehensive unit tests covering all major components with 32 test cases including preset validation, database operations, cgroups mocking, and CLI testing (394 lines)
README_ACCELERATOR_LIMITS.md	Complete documentation with quick start guide, preset descriptions, command reference, environment variables, user mode setup, architecture overview, and testing instructions

Comments suppressed due to low confidence (1)

cortex/kernel_features/accelerator_limits.py:22

Import of 'shutil' is not used.

import shutil

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-04T05:24:38Z

cortex/kernel_features/accelerator_limits.py

+import logging

+# Configuration
 CORTEX_DB = Path.home() / ".cortex/limits.db"


The database path uses Path.home() which resolves to the home directory of the user running the process. If the code is run with sudo or as a different user, this could lead to inconsistent database locations (e.g., /root/.cortex/limits.db vs /home/user/.cortex/limits.db). This could cause profiles created by one user to be invisible to another user, even when they have the necessary permissions to manage cgroups.

Consider documenting this behavior or using a consistent system-wide location (e.g., /var/lib/cortex/limits.db) when running as root, with fallback to user home directory for non-root users.

Suggested change

CORTEX_DB = Path.home() / ".cortex/limits.db"

def get_cortex_db_path():

if hasattr(os, "geteuid") and os.geteuid() == 0:

db_path = Path("/var/lib/cortex/limits.db")

else:

db_path = Path.home() / ".cortex/limits.db"

db_path.parent.mkdir(parents=True, exist_ok=True)

return db_path

CORTEX_DB = get_cortex_db_path()

Copilot · 2025-12-04T05:24:39Z

cortex/kernel_features/accelerator_limits.py

+Features:
+- cgroups v2 unified hierarchy support
+- Workload presets with sensible defaults
+- NVIDIA GPU isolation via environment variables


The documentation in the docstring at line 11 states "NVIDIA GPU isolation via environment variables", but the implementation also supports AMD ROCm (HIP_VISIBLE_DEVICES) and Intel oneAPI (ONEAPI_DEVICE_SELECTOR) as seen in lines 488-492. The docstring should be updated to reflect this multi-vendor support.

Suggested update:

- GPU isolation via CUDA_VISIBLE_DEVICES, HIP_VISIBLE_DEVICES, ONEAPI_DEVICE_SELECTOR

Suggested change

- NVIDIA GPU isolation via environment variables

- GPU isolation via CUDA_VISIBLE_DEVICES, HIP_VISIBLE_DEVICES, ONEAPI_DEVICE_SELECTOR

Copilot · 2025-12-04T05:24:39Z

cortex/kernel_features/accelerator_limits.py

+        for key, value in env.items():
+            print(f"export {key}={value}")


The environment variable values are printed without any shell escaping, which could lead to shell injection vulnerabilities if a malicious profile name or GPU IDs are stored in the database. While the values are currently generated from integers and controlled data, the profile name comes from user input and is displayed in error messages.

Consider using shlex.quote() to properly escape the values, especially since this output is intended to be evaluated with eval $(cortex limits env ...):

import shlex # ... for key, value in env.items(): print(f"export {key}={shlex.quote(value)}")

Additionally, the profile_name used in the error message should be escaped or validated.

Copilot · 2025-12-04T05:24:39Z

cortex/kernel_features/accelerator_limits.py

+CORTEX_CGROUP = "cortex.slice"
+
+# Set up logging
+logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s')


The logging.basicConfig() call at module level can interfere with the logging configuration of other modules or the main application. This is because basicConfig() only works if the root logger hasn't been configured yet, and calling it in a library module is generally discouraged.

Consider removing this line and letting the application configure logging, or check if the root logger is already configured before calling basicConfig():

if not logging.getLogger().hasHandlers(): logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s')

Suggested change

logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s')

Copilot · 2025-12-04T05:24:39Z

cortex/kernel_features/accelerator_limits.py

+    if args.command == "create":
+        limits = ResourceLimits.from_preset(args.name, args.preset, args.gpus)


The name parameter from user input is used directly in f-strings for cgroup paths and database operations without validation. This could potentially lead to path traversal or SQL injection issues (though SQLite with parameterized queries provides some protection).

Consider adding validation to ensure the profile name only contains safe characters:

import re def validate_profile_name(name: str) -> bool: """Validate profile name contains only safe characters.""" return bool(re.match(r'^[a-zA-Z0-9_-]+$', name))

This should be called in the create() method and other places where profile names are accepted from user input.

Copilot · 2025-12-04T05:24:42Z

tests/test_accelerator_limits.py

+
+import os
+import sys
+import json


Import of 'json' is not used.

Suggested change

import json

Copilot · 2025-12-04T05:24:42Z

tests/test_accelerator_limits.py

+import tempfile
+import unittest
+from pathlib import Path
+from unittest.mock import patch, MagicMock


Import of 'MagicMock' is not used.

Suggested change

from unittest.mock import patch, MagicMock

from unittest.mock import patch

Copilot · 2025-12-04T05:24:42Z

cortex/kernel_features/accelerator_limits.py

+                            try:
+                                subtree_control.write_text(f"+{ctrl}")
+                            except PermissionError:
+                                pass


'except' clause does nothing but pass and there is no explanatory comment.

Suggested change

pass

logger.warning(f"Permission denied enabling controller '{ctrl}' in {subtree_control}")

Copilot · 2025-12-04T05:24:43Z

cortex/kernel_features/accelerator_limits.py

+        except Exception:
+            pass


'except' clause does nothing but pass and there is no explanatory comment.

Suggested change

except Exception:

pass

except Exception as e:

logger.error(f"Failed to read PIDs from {procs_file}: {e}")

Copilot · 2025-12-04T05:24:43Z

cortex/kernel_features/accelerator_limits.py

+        except Exception:
+            pass


'except' clause does nothing but pass and there is no explanatory comment.

Suggested change

except Exception:

pass

except Exception as e:

logger.error(f"Failed to get OOM score for PID {pid}: {e}")

mikejmorgan-ai · 2025-12-06T22:12:26Z

Closing as duplicate - Issue #222 was already implemented in commit 7c2cb79. Thanks for the contribution!

Copilot AI review requested due to automatic review settings December 4, 2025 05:17

yaya1738 requested a review from mikejmorgan-ai as a code owner December 4, 2025 05:17

Copilot started reviewing on behalf of yaya1738 December 4, 2025 05:17 View session

Copilot finished reviewing on behalf of yaya1738 December 4, 2025 05:21

coderabbitai bot reviewed Dec 4, 2025

View reviewed changes

Copilot AI reviewed Dec 4, 2025

View reviewed changes

mikejmorgan-ai closed this Dec 6, 2025

-CORTEX_DB = Path.home() / ".cortex/limits.db"
+def get_cortex_db_path():
+    if hasattr(os, "geteuid") and os.geteuid() == 0:
+        db_path = Path("/var/lib/cortex/limits.db")
+    else:
+        db_path = Path.home() / ".cortex/limits.db"
+    db_path.parent.mkdir(parents=True, exist_ok=True)
+    return db_path
+CORTEX_DB = get_cortex_db_path()

	- NVIDIA GPU isolation via environment variables
	- GPU isolation via CUDA_VISIBLE_DEVICES, HIP_VISIBLE_DEVICES, ONEAPI_DEVICE_SELECTOR

		for key, value in env.items():
		print(f"export {key}={value}")

		if args.command == "create":
		limits = ResourceLimits.from_preset(args.name, args.preset, args.gpus)

	from unittest.mock import patch, MagicMock
	from unittest.mock import patch

	pass
	logger.warning(f"Permission denied enabling controller '{ctrl}' in {subtree_control}")

-        except Exception:
-            pass
+        except Exception as e:
+            logger.error(f"Failed to read PIDs from {procs_file}: {e}")

Uh oh!

feat: Complete accelerator-aware resource limits (Issue #222) #230

feat: Complete accelerator-aware resource limits (Issue #222) #230

Uh oh!

Conversation

yaya1738 commented Dec 4, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Files

Acceptance Criteria

Test Plan

Usage Examples

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

Pre-merge checks and finishing touches

Uh oh!

sonarqubecloud bot commented Dec 4, 2025

Quality Gate passed

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

mikejmorgan-ai commented Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

yaya1738 commented Dec 4, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 4, 2025 •

edited

Loading