Skip to content

Conversation

@mikejmorgan-ai
Copy link
Member

@mikejmorgan-ai mikejmorgan-ai commented Dec 2, 2025

Summary

This PR adds four kernel-level AI features implemented in user-space:

Feature Description Lines
Model Lifecycle Manager systemd-based LLM services ~300
KV-Cache Manager shared memory cache pools ~300
Accelerator Limits cgroups v2 wrapper ~250
/dev/llm Device FUSE LLM interface ~300

Why This Matters

These are Tier 1 features from our kernel enhancement strategy:

  • Run on current Ubuntu 24.04 (no kernel patches required)
  • Demonstrate kernel-level thinking
  • Support patent portfolio (especially KV-Cache)
  • Differentiate from 'just a wrapper' positioning

Usage

# Model management
cortex model register llama --path meta-llama/Llama-2-7b --backend vllm
cortex model start llama

# Cache management  
cortex cache create my-cache --size 16G

# Resource limits
cortex limits create inference-job --preset inference --gpus 2

# File-based LLM access
cortex-llm-device mount /mnt/llm
echo 'Hello' > /mnt/llm/claude/prompt
cat /mnt/llm/claude/response

Testing

pytest tests/kernel_features/ -v

Next Steps

  • Add comprehensive tests
  • Integration with main Cortex CLI
  • Documentation in main README
  • Create GitHub issues for bounties

Part of Kernel Enhancement Roadmap - Tier 1 (User-Space)

Summary by CodeRabbit

Release Notes

  • New Features

    • Model Lifecycle Manager for registering and managing LLM services
    • KV-Cache Manager for creating and managing inference cache pools
    • Accelerator Limits Manager for configuring resource constraints and GPU allocation
    • Virtual LLM Device interface for model interaction
  • Documentation

    • Added comprehensive Kernel Features guide with CLI usage examples
  • Tests

    • Added test coverage for new features

✏️ Tip: You can customize this high-level summary in your review settings.

Implements user-space versions of kernel-level concepts:

- Model Lifecycle Manager: systemd-based LLM service management
- KV-Cache Manager: shared memory cache pools for inference
- Accelerator Limits: cgroups v2 wrapper for AI workloads
- /dev/llm Device: FUSE-based file interface to LLMs

These demonstrate kernel-level thinking while shipping on Ubuntu 24.04.
Part of our patent portfolio for AI-native operating systems.

Total: ~1,500 lines of production code
@mikejmorgan-ai mikejmorgan-ai added enhancement New feature or request kernel-features Kernel-level AI enhancements labels Dec 2, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 2, 2025

Walkthrough

This PR introduces the cortex.kernel_features package, implementing four kernel-level feature managers: Model Lifecycle Manager for systemd-based LLM service management, KV-Cache Manager for shared memory cache pools with configurable policies, Accelerator Limits for cgroups v2-based resource constraints, and a FUSE-based /dev/llm virtual device interface. Tests and documentation are included.

Changes

Cohort / File(s) Summary
Package initialization
cortex/kernel_features/__init__.py
Exports six public entities (ModelLifecycleManager, ModelConfig, KVCacheManager, CacheConfig, AcceleratorLimitsManager, ResourceLimits) imported from three sub-modules; defines __all__ for public API surface.
Model Lifecycle Manager
cortex/kernel_features/model_lifecycle.py
Implements systemd user-service lifecycle management for LLM models. Includes ModelConfig dataclass, ModelDatabase for SQLite persistence (~/.cortex/models.db), ServiceGenerator for backend-specific systemd unit templates (vllm, llamacpp, ollama), ModelLifecycleManager orchestrator, and CLI entry point supporting register/start/stop/status/list commands.
KV-Cache Manager
cortex/kernel_features/kv_cache_manager.py
Implements user-space KV-cache pool management using POSIX shared memory. Includes CachePolicy enum (LRU/LFU/FIFO), CacheConfig and CacheEntry dataclasses, CacheDatabase for metadata persistence, SharedMemoryPool for lifecycle management, KVCacheManager coordinator, and CLI supporting create/destroy/status commands with size parsing (K/M/G suffixes).
Accelerator Limits Manager
cortex/kernel_features/accelerator_limits.py
Implements cgroups v2-based resource profile management for workloads. Defines WorkloadPreset enum with PRESETS dictionary (inference, training, batch, interactive), ResourceLimits dataclass, LimitsDatabase for SQLite persistence (~/.cortex/limits.db), AcceleratorLimitsManager wrapper, and CLI supporting create/env/status/list commands.
FUSE Virtual Device
cortex/kernel_features/llm_device.py
Implements a FUSE-backed /dev/llm virtual file system exposing models and sessions as files/directories. Includes Session dataclass, MockLLM fallback, LLMDevice with path parsing and file-system operations (getattr, readdir, read, write, truncate, open, create), mount/unmount CLI, and optional Anthropic API integration.
Documentation
docs/KERNEL_FEATURES.md
User-facing documentation describing the four kernel-feature components, architecture positioning, patent note, and command-line usage examples.
Tests
tests/kernel_features/test_kv_cache.py, tests/kernel_features/test_model_lifecycle.py
Test modules validating CacheConfig defaults/construction and ModelConfig serialization round-trips (to_dict/from_dict).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • model_lifecycle.py: Requires understanding of systemd service generation, template injection, and user-level systemctl integration; review focus on service file correctness and error handling.
  • kv_cache_manager.py: Requires verification of POSIX shared memory lifecycle, database schema consistency, and size parsing logic.
  • llm_device.py: Requires familiarity with FUSE file-system semantics, path parsing correctness, and integration with optional Anthropic API.
  • accelerator_limits.py: Requires validation of cgroups v2 configuration and preset definitions.
  • Cross-cutting concerns: SQLite schema design and serialization (to_dict/from_dict) consistency across modules; CLI argument parsing patterns and error reporting.

Possibly related issues

  • Implements the systemd-based Model Lifecycle Manager with ModelConfig, ModelDatabase, and service generation as described in issue #220.
  • Adds the KV-Cache Manager (KVCacheManager, CacheConfig, SharedMemoryPool, CacheDatabase) and related tests as specified in issue #221.
  • Directly implements the systemd-based model lifecycle functionality outlined in issues #216 and #217.
  • Adds the KV-cache component with full POSIX shared memory and database persistence as detailed in issue #218.
  • Implements the FUSE-backed /dev/llm virtual device (LLMDevice, mount CLI) as described in issue #223.
  • Adds the accelerator_limits module with cgroups v2 presets and resource profile management as described in issues #219 and #222.

Poem

🐰 Four features bloom, a rabbit's delight!
Services orchestrate, cache pools take flight,
Limits are managed, resources aligned,
A virtual device and a filesystem combined!
Hop into Cortex, where kernels are tamed,
With FUSE and cgroups, the workloads are framed! 🚀

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Add Kernel-Level AI Features (Tier 1)' is clear, specific, and accurately describes the main addition of four kernel-level AI features implemented in user-space.
Description check ✅ Passed The PR description comprehensively covers the Summary, provides context on Why This Matters, includes Usage examples, Testing instructions, and Next Steps—exceeding the template requirements.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/kernel-features

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Dec 2, 2025

cgroups v2 wrapper for AI workloads.
"""

import os
import os
import json
import sqlite3
import subprocess
User-space KV-cache management for LLM inference optimization.
"""

import os
import os
import json
import sqlite3
import time
import json
import sqlite3
import time
import hashlib
import hashlib
from pathlib import Path
from dataclasses import dataclass, asdict
from typing import Optional, List, Dict
"""

import os
import sys
import stat
import errno
from dataclasses import dataclass, field
from typing import Optional, Dict, List
Manages LLM models as first-class system services using systemd.
"""

import os
"""

import os
import sys
@mikejmorgan-ai mikejmorgan-ai merged commit b77a004 into main Dec 2, 2025
7 of 12 checks passed
@mikejmorgan-ai mikejmorgan-ai deleted the feature/kernel-features branch December 2, 2025 10:58
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (8)
cortex/kernel_features/accelerator_limits.py (2)

33-49: Fix type hint for gpu_ids to match the default value.

The type hint declares List[int] but the default is None. This violates PEP 484 and triggers static analysis warnings.

 @dataclass
 class ResourceLimits:
     name: str
     preset: str = "inference"
     cpu_quota: float = 400.0
     memory_max: int = 32 * 1024**3
-    gpu_ids: List[int] = None
+    gpu_ids: Optional[List[int]] = None
     oom_score_adj: int = 0

97-124: Consider adding help output when no subcommand is provided.

When invoked without a subcommand, the CLI silently exits. Adding a fallback to print help would improve UX. This is consistent with other managers in this PR, so it's optional.

     elif args.cmd in ("status", "list"):
         mgr.status()
+    else:
+        parser.print_help()
 
 
 if __name__ == "__main__":
cortex/kernel_features/llm_device.py (1)

136-138: Consider validating mountpoint and using absolute path for fusermount.

The mountpoint argument is passed directly to subprocess.run(). While command injection is limited here, using the absolute path to fusermount and validating the mountpoint would improve security.

     elif args.cmd == "umount":
         import subprocess
-        subprocess.run(["fusermount", "-u", args.mountpoint])
+        from pathlib import Path
+        mp = Path(args.mountpoint).resolve()
+        subprocess.run(["/bin/fusermount", "-u", str(mp)], check=True)
cortex/kernel_features/kv_cache_manager.py (1)

118-125: Fix implicit Optional and inconsistent tuple handling in status().

The name parameter should be Optional[str]. Also, the tuple handling is inconsistent—get_pool returns (CacheConfig, str) but list_pools returns [CacheConfig], requiring awkward wrapping at line 119.

-    def status(self, name: str = None):
-        pools = [self.db.get_pool(name)] if name else [(p, "") for p in self.db.list_pools()]
+    def status(self, name: Optional[str] = None):
+        if name:
+            result = self.db.get_pool(name)
+            pools = [result[0]] if result else []
+        else:
+            pools = self.db.list_pools()
         print(f"\n{'POOL':<20} {'SIZE':<12} {'POLICY':<10}")
         print("-" * 50)
-        for item in pools:
-            if item:
-                cfg = item[0] if isinstance(item, tuple) else item
-                print(f"{cfg.name:<20} {cfg.size_bytes/1e9:.1f}G{'':<6} {cfg.policy:<10}")
+        for cfg in pools:
+            print(f"{cfg.name:<20} {cfg.size_bytes/1e9:.1f}G{'':<6} {cfg.policy:<10}")
cortex/kernel_features/model_lifecycle.py (1)

134-142: Fix implicit Optional type hint.

The name parameter defaults to None but is typed as str.

-    def status(self, name: str = None):
+    def status(self, name: Optional[str] = None):
tests/kernel_features/test_model_lifecycle.py (1)

9-14: Incomplete round-trip validation.

The test only verifies name and backend after deserialization, but doesn't check other fields like model_path ("/model"), port (8080), or the remaining ModelConfig attributes. A thorough round-trip test should verify all fields are preserved.

Apply this diff to verify all key fields:

 def test_config_roundtrip():
     cfg = ModelConfig("test", "/model", "llamacpp", 8080)
     data = cfg.to_dict()
     restored = ModelConfig.from_dict(data)
     assert restored.name == cfg.name
+    assert restored.model_path == cfg.model_path
     assert restored.backend == cfg.backend
+    assert restored.port == cfg.port
tests/kernel_features/test_kv_cache.py (1)

4-7: Basic config validation is correct, but coverage is minimal.

The test correctly validates CacheConfig defaults. However, the test suite lacks coverage for KVCacheManager operations (create_pool, destroy_pool, status) and actual cache functionality. Consider adding tests for the manager's core operations when expanding test coverage.

cortex/kernel_features/__init__.py (1)

15-19: Static analysis flags unsorted __all__, but current grouping is logical.

Ruff reports that __all__ is not alphabetically sorted. However, the current organization groups related exports (ModelLifecycle, KVCache, AcceleratorLimits), which may be more maintainable than strict alphabetical order.

If you prefer to suppress the warning while keeping the logical grouping, add this comment:

 __all__ = [
+    # ruff: noqa: RUF022
     'ModelLifecycleManager', 'ModelConfig',
     'KVCacheManager', 'CacheConfig', 
     'AcceleratorLimitsManager', 'ResourceLimits',
 ]

Alternatively, sort alphabetically:

 __all__ = [
+    'AcceleratorLimitsManager',
+    'CacheConfig',
+    'KVCacheManager',
+    'ModelConfig',
     'ModelLifecycleManager',
-    'ModelConfig',
-    'KVCacheManager',
-    'CacheConfig', 
-    'AcceleratorLimitsManager',
     'ResourceLimits',
 ]
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae97db8 and 5dc1038.

📒 Files selected for processing (8)
  • cortex/kernel_features/__init__.py (1 hunks)
  • cortex/kernel_features/accelerator_limits.py (1 hunks)
  • cortex/kernel_features/kv_cache_manager.py (1 hunks)
  • cortex/kernel_features/llm_device.py (1 hunks)
  • cortex/kernel_features/model_lifecycle.py (1 hunks)
  • docs/KERNEL_FEATURES.md (1 hunks)
  • tests/kernel_features/test_kv_cache.py (1 hunks)
  • tests/kernel_features/test_model_lifecycle.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
cortex/kernel_features/__init__.py (3)
cortex/kernel_features/model_lifecycle.py (2)
  • ModelLifecycleManager (107-142)
  • ModelConfig (22-40)
cortex/kernel_features/kv_cache_manager.py (2)
  • KVCacheManager (97-125)
  • CacheConfig (28-32)
cortex/kernel_features/accelerator_limits.py (2)
  • AcceleratorLimitsManager (73-94)
  • ResourceLimits (34-49)
tests/kernel_features/test_kv_cache.py (1)
cortex/kernel_features/kv_cache_manager.py (2)
  • CacheConfig (28-32)
  • KVCacheManager (97-125)
tests/kernel_features/test_model_lifecycle.py (1)
cortex/kernel_features/model_lifecycle.py (4)
  • ModelConfig (22-40)
  • ModelLifecycleManager (107-142)
  • to_dict (35-36)
  • from_dict (39-40)
cortex/kernel_features/accelerator_limits.py (3)
cortex/kernel_features/llm_device.py (2)
  • create (110-110)
  • main (124-138)
cortex/kernel_features/kv_cache_manager.py (2)
  • status (118-125)
  • main (128-153)
cortex/kernel_features/model_lifecycle.py (2)
  • status (134-142)
  • main (145-175)
cortex/kernel_features/kv_cache_manager.py (3)
cortex/kernel_features/accelerator_limits.py (4)
  • create (77-80)
  • status (88-94)
  • main (97-120)
  • get (63-66)
cortex/kernel_features/llm_device.py (3)
  • create (110-110)
  • main (124-138)
  • getattr (66-73)
cortex/kernel_features/model_lifecycle.py (2)
  • status (134-142)
  • main (145-175)
🪛 GitHub Check: SonarCloud Code Analysis
cortex/kernel_features/model_lifecycle.py

[failure] 62-62: Don't use datetime.datetime.utcnow to create this datetime object.

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZretQQLi-4iu7DOY4jo&open=AZretQQLi-4iu7DOY4jo&pullRequest=224

cortex/kernel_features/llm_device.py

[warning] 45-45: Remove the unused function parameter "max_tokens".

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZretQMui-4iu7DOY4jg&open=AZretQMui-4iu7DOY4jg&pullRequest=224


[warning] 45-45: Remove the unused function parameter "temp".

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZretQMui-4iu7DOY4ji&open=AZretQMui-4iu7DOY4ji&pullRequest=224


[warning] 67-67: Replace the unused local variable "m" with "_".

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZretQMui-4iu7DOY4jj&open=AZretQMui-4iu7DOY4jj&pullRequest=224


[warning] 45-45: Remove the unused function parameter "system".

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZretQMui-4iu7DOY4jf&open=AZretQMui-4iu7DOY4jf&pullRequest=224


[warning] 85-85: Replace the unused local variable "m" with "_".

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZretQMui-4iu7DOY4jl&open=AZretQMui-4iu7DOY4jl&pullRequest=224


[warning] 45-45: Remove the unused function parameter "model".

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZretQMui-4iu7DOY4jh&open=AZretQMui-4iu7DOY4jh&pullRequest=224


[warning] 76-76: Replace the unused local variable "f" with "_".

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZretQMui-4iu7DOY4jk&open=AZretQMui-4iu7DOY4jk&pullRequest=224

cortex/kernel_features/accelerator_limits.py

[warning] 39-39: Replace the type hint "list[int]" with "Optional[list[int]]" or don't assign "None" to "gpu_ids"

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZretQQai-4iu7DOY4jp&open=AZretQQai-4iu7DOY4jp&pullRequest=224

cortex/kernel_features/kv_cache_manager.py

[failure] 93-93: Specify an exception class to catch or reraise the exception

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZretQP9i-4iu7DOY4jn&open=AZretQP9i-4iu7DOY4jn&pullRequest=224


[failure] 80-80: Specify an exception class to catch or reraise the exception

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZretQP9i-4iu7DOY4jm&open=AZretQP9i-4iu7DOY4jm&pullRequest=224

🪛 Ruff (0.14.7)
cortex/kernel_features/__init__.py

15-19: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

cortex/kernel_features/model_lifecycle.py

1-1: Shebang is present but file is not executable

(EXE001)


81-85: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


113-113: subprocess call: check for execution of untrusted input

(S603)


113-113: Consider ["systemctl", "--user", *list(args)] instead of concatenation

Replace with ["systemctl", "--user", *list(args)]

(RUF005)


134-134: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

cortex/kernel_features/llm_device.py

1-1: Shebang is present but file is not executable

(EXE001)


24-24: Multiple statements on one line (colon)

(E701)


45-45: Unused method argument: model

(ARG002)


45-45: Unused method argument: max_tokens

(ARG002)


45-45: Unused method argument: temp

(ARG002)


45-45: Unused method argument: system

(ARG002)


50-50: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


60-60: Multiple statements on one line (colon)

(E701)


61-61: Multiple statements on one line (colon)

(E701)


62-62: Multiple statements on one line (colon)

(E701)


63-63: Multiple statements on one line (colon)

(E701)


66-66: Unused method argument: fh

(ARG002)


67-67: Unpacked variable m is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


75-75: Unused method argument: fh

(ARG002)


76-76: Unpacked variable f is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


78-78: Multiple statements on one line (colon)

(E701)


79-79: Multiple statements on one line (colon)

(E701)


79-79: Consider [*base, 'prompt', 'response', 'config'] instead of concatenation

Replace with [*base, 'prompt', 'response', 'config']

(RUF005)


80-80: Multiple statements on one line (colon)

(E701)


81-81: Multiple statements on one line (colon)

(E701)


81-81: Consider [*base, 'prompt', 'response', 'history'] instead of concatenation

Replace with [*base, 'prompt', 'response', 'history']

(RUF005)


84-84: Unused method argument: fh

(ARG002)


85-85: Unpacked variable m is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


87-87: Multiple statements on one line (colon)

(E701)


88-88: Multiple statements on one line (colon)

(E701)


91-91: Unused method argument: offset

(ARG002)


91-91: Unused method argument: fh

(ARG002)


101-101: Do not catch blind exception: Exception

(BLE001)


108-108: Unused method argument: path

(ARG002)


108-108: Unused method argument: length

(ARG002)


108-108: Unused method argument: fh

(ARG002)


109-109: Unused method argument: path

(ARG002)


109-109: Unused method argument: flags

(ARG002)


110-110: Unused method argument: path

(ARG002)


110-110: Unused method argument: mode

(ARG002)


110-110: Unused method argument: fi

(ARG002)


138-138: subprocess call: check for execution of untrusted input

(S603)


138-138: Starting a process with a partial executable path

(S607)

cortex/kernel_features/accelerator_limits.py

1-1: Shebang is present but file is not executable

(EXE001)

cortex/kernel_features/kv_cache_manager.py

1-1: Shebang is present but file is not executable

(EXE001)


80-80: Do not use bare except

(E722)


80-81: try-except-pass detected, consider logging the exception

(S110)


93-93: Do not use bare except

(E722)


93-94: try-except-pass detected, consider logging the exception

(S110)


118-118: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

🔇 Additional comments (6)
docs/KERNEL_FEATURES.md (1)

1-48: Documentation looks good and aligns with implementations.

The documentation provides clear usage examples for all four kernel features. The CLI commands shown match the implementations in the corresponding Python modules.

cortex/kernel_features/accelerator_limits.py (1)

52-95: Database and manager implementation looks good.

The LimitsDatabase and AcceleratorLimitsManager classes follow consistent patterns with other kernel feature managers in this PR. The SQLite persistence and status output formatting are well-implemented.

cortex/kernel_features/llm_device.py (1)

66-89: FUSE operations implementation looks correct.

The getattr, readdir, and read methods properly implement the FUSE interface. The unused parameter warnings from static analysis are expected for FUSE callback signatures.

cortex/kernel_features/kv_cache_manager.py (1)

45-68: Database layer implementation looks solid.

The CacheDatabase class properly initializes tables and provides consistent CRUD operations with proper SQLite connection handling.

cortex/kernel_features/model_lifecycle.py (1)

21-41: ModelConfig dataclass is well-structured.

Good use of dataclass with sensible defaults and proper serialization methods.

tests/kernel_features/test_model_lifecycle.py (1)

4-7: LGTM: Basic defaults validated correctly.

The test appropriately verifies the default backend and port values match the ModelConfig specification.

Comment on lines +71 to +94
class SharedMemoryPool:
def __init__(self, name: str, size: int, create: bool = True):
self.name = f"{SHM_PREFIX}{name}"
self.size = size
if create:
try:
old = shared_memory.SharedMemory(name=self.name)
old.close()
old.unlink()
except:
pass
self.shm = shared_memory.SharedMemory(name=self.name, create=True, size=size + 8192)
else:
self.shm = shared_memory.SharedMemory(name=self.name)

def get_usage(self):
return self.size, 0, 0 # Simplified

def destroy(self):
self.shm.close()
try:
self.shm.unlink()
except:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Replace bare except clauses with specific exception types.

Bare except: catches all exceptions including KeyboardInterrupt and SystemExit, which can mask critical errors. The static analysis tools flag this as a failure.

 class SharedMemoryPool:
     def __init__(self, name: str, size: int, create: bool = True):
         self.name = f"{SHM_PREFIX}{name}"
         self.size = size
         if create:
             try:
                 old = shared_memory.SharedMemory(name=self.name)
                 old.close()
                 old.unlink()
-            except:
+            except FileNotFoundError:
                 pass
             self.shm = shared_memory.SharedMemory(name=self.name, create=True, size=size + 8192)
         else:
             self.shm = shared_memory.SharedMemory(name=self.name)
     
     def get_usage(self):
         return self.size, 0, 0  # Simplified
     
     def destroy(self):
         self.shm.close()
         try:
             self.shm.unlink()
-        except:
+        except FileNotFoundError:
             pass
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class SharedMemoryPool:
def __init__(self, name: str, size: int, create: bool = True):
self.name = f"{SHM_PREFIX}{name}"
self.size = size
if create:
try:
old = shared_memory.SharedMemory(name=self.name)
old.close()
old.unlink()
except:
pass
self.shm = shared_memory.SharedMemory(name=self.name, create=True, size=size + 8192)
else:
self.shm = shared_memory.SharedMemory(name=self.name)
def get_usage(self):
return self.size, 0, 0 # Simplified
def destroy(self):
self.shm.close()
try:
self.shm.unlink()
except:
pass
class SharedMemoryPool:
def __init__(self, name: str, size: int, create: bool = True):
self.name = f"{SHM_PREFIX}{name}"
self.size = size
if create:
try:
old = shared_memory.SharedMemory(name=self.name)
old.close()
old.unlink()
except FileNotFoundError:
pass
self.shm = shared_memory.SharedMemory(name=self.name, create=True, size=size + 8192)
else:
self.shm = shared_memory.SharedMemory(name=self.name)
def get_usage(self):
return self.size, 0, 0 # Simplified
def destroy(self):
self.shm.close()
try:
self.shm.unlink()
except FileNotFoundError:
pass
🧰 Tools
🪛 GitHub Check: SonarCloud Code Analysis

[failure] 93-93: Specify an exception class to catch or reraise the exception

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZretQP9i-4iu7DOY4jn&open=AZretQP9i-4iu7DOY4jn&pullRequest=224


[failure] 80-80: Specify an exception class to catch or reraise the exception

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZretQP9i-4iu7DOY4jm&open=AZretQP9i-4iu7DOY4jm&pullRequest=224

🪛 Ruff (0.14.7)

80-80: Do not use bare except

(E722)


80-81: try-except-pass detected, consider logging the exception

(S110)


93-93: Do not use bare except

(E722)


93-94: try-except-pass detected, consider logging the exception

(S110)

🤖 Prompt for AI Agents
In cortex/kernel_features/kv_cache_manager.py around lines 71 to 94, replace the
bare except blocks with specific exception types: when attempting to open/unlink
an existing shared memory segment (the first try), catch FileNotFoundError and
PermissionError (or OSError) only, and let other exceptions propagate; when
unlinking in destroy(), catch FileNotFoundError (and optionally
PermissionError/OSError) instead of a bare except. If you want to be explicit,
catch (FileNotFoundError, PermissionError, OSError) and avoid swallowing
KeyboardInterrupt/SystemExit by not using a bare except.

Comment on lines +109 to +116
def destroy_pool(self, name: str) -> bool:
if name in self.pools:
self.pools[name].destroy()
del self.pools[name]
with sqlite3.connect(CORTEX_DB) as conn:
conn.execute("DELETE FROM pools WHERE name=?", (name,))
print(f"✅ Destroyed pool '{name}'")
return True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Resource leak: shared memory not cleaned up if pool not in current instance.

If a pool was created by a different process instance, it won't be in self.pools. The method deletes the DB record but leaves the shared memory segment orphaned.

     def destroy_pool(self, name: str) -> bool:
         if name in self.pools:
             self.pools[name].destroy()
             del self.pools[name]
+        else:
+            # Try to clean up SHM created by another process
+            try:
+                pool = SharedMemoryPool(name, 0, create=False)
+                pool.destroy()
+            except FileNotFoundError:
+                pass
         with sqlite3.connect(CORTEX_DB) as conn:
             conn.execute("DELETE FROM pools WHERE name=?", (name,))
         print(f"✅ Destroyed pool '{name}'")
         return True

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +44 to +46
class MockLLM:
def complete(self, model, messages, max_tokens, temp, system=None):
return f"[Mock] Response to: {messages[-1]['content'][:50]}..."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

MockLLM.complete() interface doesn't match actual Anthropic client usage.

MockLLM.complete() is called with positional args (model, messages, etc.), but the real Anthropic client at line 98 uses self.llm.messages.create(). This interface mismatch will cause issues when testing or when the API key is unavailable.

 class MockLLM:
-    def complete(self, model, messages, max_tokens, temp, system=None):
-        return f"[Mock] Response to: {messages[-1]['content'][:50]}..."
+    class messages:
+        @staticmethod
+        def create(model, messages, max_tokens, **kwargs):
+            class Response:
+                content = [type('obj', (object,), {'text': f"[Mock] Response to: {messages[-1]['content'][:50]}..."})()]
+            return Response()

Alternatively, unify the call pattern at lines 98-100 to use a consistent interface.

🧰 Tools
🪛 GitHub Check: SonarCloud Code Analysis

[warning] 45-45: Remove the unused function parameter "max_tokens".

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZretQMui-4iu7DOY4jg&open=AZretQMui-4iu7DOY4jg&pullRequest=224


[warning] 45-45: Remove the unused function parameter "temp".

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZretQMui-4iu7DOY4ji&open=AZretQMui-4iu7DOY4ji&pullRequest=224


[warning] 45-45: Remove the unused function parameter "system".

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZretQMui-4iu7DOY4jf&open=AZretQMui-4iu7DOY4jf&pullRequest=224


[warning] 45-45: Remove the unused function parameter "model".

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZretQMui-4iu7DOY4jh&open=AZretQMui-4iu7DOY4jh&pullRequest=224

🪛 Ruff (0.14.7)

45-45: Unused method argument: model

(ARG002)


45-45: Unused method argument: max_tokens

(ARG002)


45-45: Unused method argument: temp

(ARG002)


45-45: Unused method argument: system

(ARG002)

🤖 Prompt for AI Agents
In cortex/kernel_features/llm_device.py around lines 44-46,
MockLLM.complete(...) does not match how the real client is used at lines
~98-100 (self.llm.messages.create(...)); update the mock to mirror the real
Anthropic client shape by adding a messages attribute with a create(...) method
(accepting messages, model=None, max_tokens=None, temperature=None, system=None,
**kwargs) that returns the same mock response format, or alternatively change
the call sites at lines 98-100 to call MockLLM.complete(...) consistently; pick
one approach and implement the matching signature so tests and runtime code use
the same interface.

Comment on lines +91 to +106
def write(self, path, data, offset, fh):
t, m, f = self._parse(path)
if t == 'model' and f == 'prompt':
s = self.sessions["default"]
s.prompt = data.decode().strip()
s.messages.append({"role": "user", "content": s.prompt})
try:
resp = self.llm.messages.create(model=self.MODELS.get(m, "claude-3-sonnet-20240229"),
max_tokens=s.max_tokens, messages=s.messages) if HAS_API else self.llm.complete(m, s.messages, s.max_tokens, s.temp)
s.response = resp.content[0].text if HAS_API else resp
except Exception as e:
s.response = f"Error: {e}"
s.messages.append({"role": "assistant", "content": s.response})
self.requests += 1
return len(data)
raise FuseOSError(errno.EACCES)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Runtime error when using MockLLM fallback.

When HAS_API is False, line 99 calls self.llm.complete() which returns a string, but line 100 tries to access resp.content[0].text regardless of HAS_API. The conditional only affects how the response is created, not how it's processed.

The current logic:

resp = self.llm.messages.create(...) if HAS_API else self.llm.complete(...)
s.response = resp.content[0].text if HAS_API else resp

But self.llm is MockLLM() when API is unavailable, and MockLLM doesn't have a messages attribute—it only has complete(). The if HAS_API check is on the wrong object method call.

             try:
-                resp = self.llm.messages.create(model=self.MODELS.get(m, "claude-3-sonnet-20240229"),
-                    max_tokens=s.max_tokens, messages=s.messages) if HAS_API else self.llm.complete(m, s.messages, s.max_tokens, s.temp)
-                s.response = resp.content[0].text if HAS_API else resp
+                if HAS_API and not isinstance(self.llm, MockLLM):
+                    resp = self.llm.messages.create(
+                        model=self.MODELS.get(m, "claude-3-sonnet-20240229"),
+                        max_tokens=s.max_tokens,
+                        messages=s.messages
+                    )
+                    s.response = resp.content[0].text
+                else:
+                    s.response = self.llm.complete(m, s.messages, s.max_tokens, s.temp)
🧰 Tools
🪛 Ruff (0.14.7)

91-91: Unused method argument: offset

(ARG002)


91-91: Unused method argument: fh

(ARG002)


101-101: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
In cortex/kernel_features/llm_device.py around lines 91 to 106, the code calls
self.llm.messages.create only when HAS_API is True but still tries to access
resp.content[0].text unconditionally; when HAS_API is False and self.llm is a
MockLLM (which only implements complete()), this raises an attribute error. Fix
by branching the call and the response extraction together: if HAS_API call
self.llm.messages.create(...) and set s.response = resp.content[0].text (or the
appropriate API response field), else call self.llm.complete(...) and set
s.response to the returned string; keep the existing try/except, append the
assistant message, increment requests, and return len(data).

Comment on lines +58 to +63
def save_model(self, config: ModelConfig):
with sqlite3.connect(CORTEX_DB_PATH) as conn:
conn.execute(
"INSERT OR REPLACE INTO models VALUES (?, ?, ?)",
(config.name, json.dumps(config.to_dict()), datetime.utcnow().isoformat())
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Replace deprecated datetime.utcnow() with timezone-aware alternative.

datetime.utcnow() is deprecated in Python 3.12+ and returns a naive datetime. Use datetime.now(timezone.utc) instead.

+from datetime import datetime, timezone
-from datetime import datetime
 
 ...
 
     def save_model(self, config: ModelConfig):
         with sqlite3.connect(CORTEX_DB_PATH) as conn:
             conn.execute(
                 "INSERT OR REPLACE INTO models VALUES (?, ?, ?)",
-                (config.name, json.dumps(config.to_dict()), datetime.utcnow().isoformat())
+                (config.name, json.dumps(config.to_dict()), datetime.now(timezone.utc).isoformat())
             )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def save_model(self, config: ModelConfig):
with sqlite3.connect(CORTEX_DB_PATH) as conn:
conn.execute(
"INSERT OR REPLACE INTO models VALUES (?, ?, ?)",
(config.name, json.dumps(config.to_dict()), datetime.utcnow().isoformat())
)
from datetime import datetime, timezone
def save_model(self, config: ModelConfig):
with sqlite3.connect(CORTEX_DB_PATH) as conn:
conn.execute(
"INSERT OR REPLACE INTO models VALUES (?, ?, ?)",
(config.name, json.dumps(config.to_dict()), datetime.now(timezone.utc).isoformat())
)
🧰 Tools
🪛 GitHub Check: SonarCloud Code Analysis

[failure] 62-62: Don't use datetime.datetime.utcnow to create this datetime object.

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZretQQLi-4iu7DOY4jo&open=AZretQQLi-4iu7DOY4jo&pullRequest=224

🤖 Prompt for AI Agents
In cortex/kernel_features/model_lifecycle.py around lines 58 to 63, the
save_model method uses the deprecated naive datetime.utcnow(); replace it with a
timezone-aware timestamp by calling datetime.now(timezone.utc).isoformat()
(import timezone from datetime if missing) so the inserted timestamp is aware
and compliant with Python 3.12+.

Comment on lines +80 to +104
class ServiceGenerator:
BACKENDS = {
"vllm": "python -m vllm.entrypoints.openai.api_server --model {model_path} --port {port}",
"llamacpp": "llama-server -m {model_path} --port {port}",
"ollama": "ollama serve",
}

def generate(self, config: ModelConfig) -> str:
cmd = self.BACKENDS.get(config.backend, self.BACKENDS["vllm"]).format(**asdict(config))
return f"""[Unit]
Description=Cortex Model: {config.name}
After=network.target

[Service]
Type=simple
ExecStart={cmd}
Environment=CUDA_VISIBLE_DEVICES={','.join(map(str, config.gpu_ids))}
CPUQuota={int(config.cpu_limit * 100)}%
MemoryMax={config.memory_limit}
Restart={config.restart_policy}
NoNewPrivileges=true

[Install]
WantedBy=default.target
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Command injection risk in generated systemd service.

The model_path from user input is directly interpolated into the ExecStart command without sanitization. A malicious path like foo; rm -rf / would execute arbitrary commands when the service starts.

Consider validating model_path to ensure it's a valid file path without shell metacharacters:

+import shlex
+import re
+
 class ServiceGenerator:
     BACKENDS = {
         "vllm": "python -m vllm.entrypoints.openai.api_server --model {model_path} --port {port}",
         "llamacpp": "llama-server -m {model_path} --port {port}",
         "ollama": "ollama serve",
     }
     
     def generate(self, config: ModelConfig) -> str:
+        # Validate model_path doesn't contain shell metacharacters
+        if not re.match(r'^[\w\-./]+$', config.model_path):
+            raise ValueError(f"Invalid model_path: {config.model_path}")
         cmd = self.BACKENDS.get(config.backend, self.BACKENDS["vllm"]).format(**asdict(config))

Alternatively, use shlex.quote() on user-provided values, though systemd ExecStart has its own quoting rules.

🧰 Tools
🪛 Ruff (0.14.7)

81-85: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

🤖 Prompt for AI Agents
In cortex/kernel_features/model_lifecycle.py around lines 80-104 the generated
systemd ExecStart interpolates config.model_path directly into a shell command
allowing command-injection (e.g., "foo; rm -rf /"); fix by validating and
sanitizing model_path before use: ensure it is an absolute or relative
filesystem path that exists (or matches an allowed pattern), reject or normalize
unexpected characters, and strip/escape any shell metacharacters; then build
ExecStart using a safe, quoted form (e.g., apply shlex.quote or the appropriate
systemd ExecStart escaping rules) or avoid a shell invocation entirely by
constructing ExecStart as a command+argv list so untrusted values are never
interpreted by a shell.

Comment on lines +157 to +175
for cmd in ["start", "stop", "unregister"]:
p = sub.add_parser(cmd)
p.add_argument("name")

sub.add_parser("status").add_argument("name", nargs="?")
sub.add_parser("list")

args = parser.parse_args()
mgr = ModelLifecycleManager()

if args.cmd == "register":
mgr.register(ModelConfig(args.name, args.path, args.backend, args.port,
gpu_ids=[int(x) for x in args.gpus.split(",")]))
elif args.cmd == "start":
mgr.start(args.name)
elif args.cmd == "stop":
mgr.stop(args.name)
elif args.cmd in ("status", "list"):
mgr.status(getattr(args, 'name', None))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing handler for unregister command.

The unregister subcommand is defined in the parser (line 157) but not handled in the if/elif chain, causing it to silently do nothing.

     elif args.cmd == "stop":
         mgr.stop(args.name)
+    elif args.cmd == "unregister":
+        mgr.unregister(args.name)
     elif args.cmd in ("status", "list"):
         mgr.status(getattr(args, 'name', None))

You'll also need to implement the unregister method in ModelLifecycleManager:

def unregister(self, name: str) -> bool:
    service_path = CORTEX_SERVICE_DIR / f"cortex-{name}.service"
    self._systemctl("stop", f"cortex-{name}.service")
    self._systemctl("disable", f"cortex-{name}.service")
    service_path.unlink(missing_ok=True)
    self.db.delete_model(name)
    self._systemctl("daemon-reload")
    print(f"✅ Unregistered model '{name}'")
    return True
🤖 Prompt for AI Agents
In cortex/kernel_features/model_lifecycle.py around lines 157 to 175, the CLI
defines an "unregister" subcommand but the if/elif chain never handles args.cmd
== "unregister", so the command is a no-op; add an elif branch to call
mgr.unregister(args.name) alongside the existing start/stop/status/list
handlers, and implement ModelLifecycleManager.unregister(name: str) to stop and
disable the systemd service, remove the service file, delete the model from the
DB, reload systemd daemon, print a success message and return True (ensure
service file removal uses a safe unlink and that systemctl
stop/disable/daemon-reload calls use the existing _systemctl helper).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request kernel-features Kernel-level AI enhancements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants