-
-
Notifications
You must be signed in to change notification settings - Fork 52
Add Kernel-Level AI Features (Tier 1) #224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Implements user-space versions of kernel-level concepts: - Model Lifecycle Manager: systemd-based LLM service management - KV-Cache Manager: shared memory cache pools for inference - Accelerator Limits: cgroups v2 wrapper for AI workloads - /dev/llm Device: FUSE-based file interface to LLMs These demonstrate kernel-level thinking while shipping on Ubuntu 24.04. Part of our patent portfolio for AI-native operating systems. Total: ~1,500 lines of production code
WalkthroughThis PR introduces the Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related issues
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
| cgroups v2 wrapper for AI workloads. | ||
| """ | ||
|
|
||
| import os |
| import os | ||
| import json | ||
| import sqlite3 | ||
| import subprocess |
| User-space KV-cache management for LLM inference optimization. | ||
| """ | ||
|
|
||
| import os |
| import os | ||
| import json | ||
| import sqlite3 | ||
| import time |
| import json | ||
| import sqlite3 | ||
| import time | ||
| import hashlib |
| import hashlib | ||
| from pathlib import Path | ||
| from dataclasses import dataclass, asdict | ||
| from typing import Optional, List, Dict |
| """ | ||
|
|
||
| import os | ||
| import sys |
| import stat | ||
| import errno | ||
| from dataclasses import dataclass, field | ||
| from typing import Optional, Dict, List |
| Manages LLM models as first-class system services using systemd. | ||
| """ | ||
|
|
||
| import os |
| """ | ||
|
|
||
| import os | ||
| import sys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🧹 Nitpick comments (8)
cortex/kernel_features/accelerator_limits.py (2)
33-49: Fix type hint forgpu_idsto match the default value.The type hint declares
List[int]but the default isNone. This violates PEP 484 and triggers static analysis warnings.@dataclass class ResourceLimits: name: str preset: str = "inference" cpu_quota: float = 400.0 memory_max: int = 32 * 1024**3 - gpu_ids: List[int] = None + gpu_ids: Optional[List[int]] = None oom_score_adj: int = 0
97-124: Consider adding help output when no subcommand is provided.When invoked without a subcommand, the CLI silently exits. Adding a fallback to print help would improve UX. This is consistent with other managers in this PR, so it's optional.
elif args.cmd in ("status", "list"): mgr.status() + else: + parser.print_help() if __name__ == "__main__":cortex/kernel_features/llm_device.py (1)
136-138: Consider validating mountpoint and using absolute path forfusermount.The
mountpointargument is passed directly tosubprocess.run(). While command injection is limited here, using the absolute path tofusermountand validating the mountpoint would improve security.elif args.cmd == "umount": import subprocess - subprocess.run(["fusermount", "-u", args.mountpoint]) + from pathlib import Path + mp = Path(args.mountpoint).resolve() + subprocess.run(["/bin/fusermount", "-u", str(mp)], check=True)cortex/kernel_features/kv_cache_manager.py (1)
118-125: Fix implicitOptionaland inconsistent tuple handling instatus().The
nameparameter should beOptional[str]. Also, the tuple handling is inconsistent—get_poolreturns(CacheConfig, str)butlist_poolsreturns[CacheConfig], requiring awkward wrapping at line 119.- def status(self, name: str = None): - pools = [self.db.get_pool(name)] if name else [(p, "") for p in self.db.list_pools()] + def status(self, name: Optional[str] = None): + if name: + result = self.db.get_pool(name) + pools = [result[0]] if result else [] + else: + pools = self.db.list_pools() print(f"\n{'POOL':<20} {'SIZE':<12} {'POLICY':<10}") print("-" * 50) - for item in pools: - if item: - cfg = item[0] if isinstance(item, tuple) else item - print(f"{cfg.name:<20} {cfg.size_bytes/1e9:.1f}G{'':<6} {cfg.policy:<10}") + for cfg in pools: + print(f"{cfg.name:<20} {cfg.size_bytes/1e9:.1f}G{'':<6} {cfg.policy:<10}")cortex/kernel_features/model_lifecycle.py (1)
134-142: Fix implicitOptionaltype hint.The
nameparameter defaults toNonebut is typed asstr.- def status(self, name: str = None): + def status(self, name: Optional[str] = None):tests/kernel_features/test_model_lifecycle.py (1)
9-14: Incomplete round-trip validation.The test only verifies
nameandbackendafter deserialization, but doesn't check other fields likemodel_path("/model"),port(8080), or the remaining ModelConfig attributes. A thorough round-trip test should verify all fields are preserved.Apply this diff to verify all key fields:
def test_config_roundtrip(): cfg = ModelConfig("test", "/model", "llamacpp", 8080) data = cfg.to_dict() restored = ModelConfig.from_dict(data) assert restored.name == cfg.name + assert restored.model_path == cfg.model_path assert restored.backend == cfg.backend + assert restored.port == cfg.porttests/kernel_features/test_kv_cache.py (1)
4-7: Basic config validation is correct, but coverage is minimal.The test correctly validates CacheConfig defaults. However, the test suite lacks coverage for KVCacheManager operations (create_pool, destroy_pool, status) and actual cache functionality. Consider adding tests for the manager's core operations when expanding test coverage.
cortex/kernel_features/__init__.py (1)
15-19: Static analysis flags unsorted__all__, but current grouping is logical.Ruff reports that
__all__is not alphabetically sorted. However, the current organization groups related exports (ModelLifecycle, KVCache, AcceleratorLimits), which may be more maintainable than strict alphabetical order.If you prefer to suppress the warning while keeping the logical grouping, add this comment:
__all__ = [ + # ruff: noqa: RUF022 'ModelLifecycleManager', 'ModelConfig', 'KVCacheManager', 'CacheConfig', 'AcceleratorLimitsManager', 'ResourceLimits', ]Alternatively, sort alphabetically:
__all__ = [ + 'AcceleratorLimitsManager', + 'CacheConfig', + 'KVCacheManager', + 'ModelConfig', 'ModelLifecycleManager', - 'ModelConfig', - 'KVCacheManager', - 'CacheConfig', - 'AcceleratorLimitsManager', 'ResourceLimits', ]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
cortex/kernel_features/__init__.py(1 hunks)cortex/kernel_features/accelerator_limits.py(1 hunks)cortex/kernel_features/kv_cache_manager.py(1 hunks)cortex/kernel_features/llm_device.py(1 hunks)cortex/kernel_features/model_lifecycle.py(1 hunks)docs/KERNEL_FEATURES.md(1 hunks)tests/kernel_features/test_kv_cache.py(1 hunks)tests/kernel_features/test_model_lifecycle.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
cortex/kernel_features/__init__.py (3)
cortex/kernel_features/model_lifecycle.py (2)
ModelLifecycleManager(107-142)ModelConfig(22-40)cortex/kernel_features/kv_cache_manager.py (2)
KVCacheManager(97-125)CacheConfig(28-32)cortex/kernel_features/accelerator_limits.py (2)
AcceleratorLimitsManager(73-94)ResourceLimits(34-49)
tests/kernel_features/test_kv_cache.py (1)
cortex/kernel_features/kv_cache_manager.py (2)
CacheConfig(28-32)KVCacheManager(97-125)
tests/kernel_features/test_model_lifecycle.py (1)
cortex/kernel_features/model_lifecycle.py (4)
ModelConfig(22-40)ModelLifecycleManager(107-142)to_dict(35-36)from_dict(39-40)
cortex/kernel_features/accelerator_limits.py (3)
cortex/kernel_features/llm_device.py (2)
create(110-110)main(124-138)cortex/kernel_features/kv_cache_manager.py (2)
status(118-125)main(128-153)cortex/kernel_features/model_lifecycle.py (2)
status(134-142)main(145-175)
cortex/kernel_features/kv_cache_manager.py (3)
cortex/kernel_features/accelerator_limits.py (4)
create(77-80)status(88-94)main(97-120)get(63-66)cortex/kernel_features/llm_device.py (3)
create(110-110)main(124-138)getattr(66-73)cortex/kernel_features/model_lifecycle.py (2)
status(134-142)main(145-175)
🪛 GitHub Check: SonarCloud Code Analysis
cortex/kernel_features/model_lifecycle.py
[failure] 62-62: Don't use datetime.datetime.utcnow to create this datetime object.
cortex/kernel_features/llm_device.py
[warning] 45-45: Remove the unused function parameter "max_tokens".
[warning] 45-45: Remove the unused function parameter "temp".
[warning] 67-67: Replace the unused local variable "m" with "_".
[warning] 45-45: Remove the unused function parameter "system".
[warning] 85-85: Replace the unused local variable "m" with "_".
[warning] 45-45: Remove the unused function parameter "model".
[warning] 76-76: Replace the unused local variable "f" with "_".
cortex/kernel_features/accelerator_limits.py
[warning] 39-39: Replace the type hint "list[int]" with "Optional[list[int]]" or don't assign "None" to "gpu_ids"
cortex/kernel_features/kv_cache_manager.py
[failure] 93-93: Specify an exception class to catch or reraise the exception
[failure] 80-80: Specify an exception class to catch or reraise the exception
🪛 Ruff (0.14.7)
cortex/kernel_features/__init__.py
15-19: __all__ is not sorted
Apply an isort-style sorting to __all__
(RUF022)
cortex/kernel_features/model_lifecycle.py
1-1: Shebang is present but file is not executable
(EXE001)
81-85: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
113-113: subprocess call: check for execution of untrusted input
(S603)
113-113: Consider ["systemctl", "--user", *list(args)] instead of concatenation
Replace with ["systemctl", "--user", *list(args)]
(RUF005)
134-134: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
cortex/kernel_features/llm_device.py
1-1: Shebang is present but file is not executable
(EXE001)
24-24: Multiple statements on one line (colon)
(E701)
45-45: Unused method argument: model
(ARG002)
45-45: Unused method argument: max_tokens
(ARG002)
45-45: Unused method argument: temp
(ARG002)
45-45: Unused method argument: system
(ARG002)
50-50: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
60-60: Multiple statements on one line (colon)
(E701)
61-61: Multiple statements on one line (colon)
(E701)
62-62: Multiple statements on one line (colon)
(E701)
63-63: Multiple statements on one line (colon)
(E701)
66-66: Unused method argument: fh
(ARG002)
67-67: Unpacked variable m is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
75-75: Unused method argument: fh
(ARG002)
76-76: Unpacked variable f is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
78-78: Multiple statements on one line (colon)
(E701)
79-79: Multiple statements on one line (colon)
(E701)
79-79: Consider [*base, 'prompt', 'response', 'config'] instead of concatenation
Replace with [*base, 'prompt', 'response', 'config']
(RUF005)
80-80: Multiple statements on one line (colon)
(E701)
81-81: Multiple statements on one line (colon)
(E701)
81-81: Consider [*base, 'prompt', 'response', 'history'] instead of concatenation
Replace with [*base, 'prompt', 'response', 'history']
(RUF005)
84-84: Unused method argument: fh
(ARG002)
85-85: Unpacked variable m is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
87-87: Multiple statements on one line (colon)
(E701)
88-88: Multiple statements on one line (colon)
(E701)
91-91: Unused method argument: offset
(ARG002)
91-91: Unused method argument: fh
(ARG002)
101-101: Do not catch blind exception: Exception
(BLE001)
108-108: Unused method argument: path
(ARG002)
108-108: Unused method argument: length
(ARG002)
108-108: Unused method argument: fh
(ARG002)
109-109: Unused method argument: path
(ARG002)
109-109: Unused method argument: flags
(ARG002)
110-110: Unused method argument: path
(ARG002)
110-110: Unused method argument: mode
(ARG002)
110-110: Unused method argument: fi
(ARG002)
138-138: subprocess call: check for execution of untrusted input
(S603)
138-138: Starting a process with a partial executable path
(S607)
cortex/kernel_features/accelerator_limits.py
1-1: Shebang is present but file is not executable
(EXE001)
cortex/kernel_features/kv_cache_manager.py
1-1: Shebang is present but file is not executable
(EXE001)
80-80: Do not use bare except
(E722)
80-81: try-except-pass detected, consider logging the exception
(S110)
93-93: Do not use bare except
(E722)
93-94: try-except-pass detected, consider logging the exception
(S110)
118-118: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
🔇 Additional comments (6)
docs/KERNEL_FEATURES.md (1)
1-48: Documentation looks good and aligns with implementations.The documentation provides clear usage examples for all four kernel features. The CLI commands shown match the implementations in the corresponding Python modules.
cortex/kernel_features/accelerator_limits.py (1)
52-95: Database and manager implementation looks good.The
LimitsDatabaseandAcceleratorLimitsManagerclasses follow consistent patterns with other kernel feature managers in this PR. The SQLite persistence and status output formatting are well-implemented.cortex/kernel_features/llm_device.py (1)
66-89: FUSE operations implementation looks correct.The
getattr,readdir, andreadmethods properly implement the FUSE interface. The unused parameter warnings from static analysis are expected for FUSE callback signatures.cortex/kernel_features/kv_cache_manager.py (1)
45-68: Database layer implementation looks solid.The
CacheDatabaseclass properly initializes tables and provides consistent CRUD operations with proper SQLite connection handling.cortex/kernel_features/model_lifecycle.py (1)
21-41:ModelConfigdataclass is well-structured.Good use of dataclass with sensible defaults and proper serialization methods.
tests/kernel_features/test_model_lifecycle.py (1)
4-7: LGTM: Basic defaults validated correctly.The test appropriately verifies the default backend and port values match the ModelConfig specification.
| class SharedMemoryPool: | ||
| def __init__(self, name: str, size: int, create: bool = True): | ||
| self.name = f"{SHM_PREFIX}{name}" | ||
| self.size = size | ||
| if create: | ||
| try: | ||
| old = shared_memory.SharedMemory(name=self.name) | ||
| old.close() | ||
| old.unlink() | ||
| except: | ||
| pass | ||
| self.shm = shared_memory.SharedMemory(name=self.name, create=True, size=size + 8192) | ||
| else: | ||
| self.shm = shared_memory.SharedMemory(name=self.name) | ||
|
|
||
| def get_usage(self): | ||
| return self.size, 0, 0 # Simplified | ||
|
|
||
| def destroy(self): | ||
| self.shm.close() | ||
| try: | ||
| self.shm.unlink() | ||
| except: | ||
| pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace bare except clauses with specific exception types.
Bare except: catches all exceptions including KeyboardInterrupt and SystemExit, which can mask critical errors. The static analysis tools flag this as a failure.
class SharedMemoryPool:
def __init__(self, name: str, size: int, create: bool = True):
self.name = f"{SHM_PREFIX}{name}"
self.size = size
if create:
try:
old = shared_memory.SharedMemory(name=self.name)
old.close()
old.unlink()
- except:
+ except FileNotFoundError:
pass
self.shm = shared_memory.SharedMemory(name=self.name, create=True, size=size + 8192)
else:
self.shm = shared_memory.SharedMemory(name=self.name)
def get_usage(self):
return self.size, 0, 0 # Simplified
def destroy(self):
self.shm.close()
try:
self.shm.unlink()
- except:
+ except FileNotFoundError:
pass📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| class SharedMemoryPool: | |
| def __init__(self, name: str, size: int, create: bool = True): | |
| self.name = f"{SHM_PREFIX}{name}" | |
| self.size = size | |
| if create: | |
| try: | |
| old = shared_memory.SharedMemory(name=self.name) | |
| old.close() | |
| old.unlink() | |
| except: | |
| pass | |
| self.shm = shared_memory.SharedMemory(name=self.name, create=True, size=size + 8192) | |
| else: | |
| self.shm = shared_memory.SharedMemory(name=self.name) | |
| def get_usage(self): | |
| return self.size, 0, 0 # Simplified | |
| def destroy(self): | |
| self.shm.close() | |
| try: | |
| self.shm.unlink() | |
| except: | |
| pass | |
| class SharedMemoryPool: | |
| def __init__(self, name: str, size: int, create: bool = True): | |
| self.name = f"{SHM_PREFIX}{name}" | |
| self.size = size | |
| if create: | |
| try: | |
| old = shared_memory.SharedMemory(name=self.name) | |
| old.close() | |
| old.unlink() | |
| except FileNotFoundError: | |
| pass | |
| self.shm = shared_memory.SharedMemory(name=self.name, create=True, size=size + 8192) | |
| else: | |
| self.shm = shared_memory.SharedMemory(name=self.name) | |
| def get_usage(self): | |
| return self.size, 0, 0 # Simplified | |
| def destroy(self): | |
| self.shm.close() | |
| try: | |
| self.shm.unlink() | |
| except FileNotFoundError: | |
| pass |
🧰 Tools
🪛 GitHub Check: SonarCloud Code Analysis
[failure] 93-93: Specify an exception class to catch or reraise the exception
[failure] 80-80: Specify an exception class to catch or reraise the exception
🪛 Ruff (0.14.7)
80-80: Do not use bare except
(E722)
80-81: try-except-pass detected, consider logging the exception
(S110)
93-93: Do not use bare except
(E722)
93-94: try-except-pass detected, consider logging the exception
(S110)
🤖 Prompt for AI Agents
In cortex/kernel_features/kv_cache_manager.py around lines 71 to 94, replace the
bare except blocks with specific exception types: when attempting to open/unlink
an existing shared memory segment (the first try), catch FileNotFoundError and
PermissionError (or OSError) only, and let other exceptions propagate; when
unlinking in destroy(), catch FileNotFoundError (and optionally
PermissionError/OSError) instead of a bare except. If you want to be explicit,
catch (FileNotFoundError, PermissionError, OSError) and avoid swallowing
KeyboardInterrupt/SystemExit by not using a bare except.
| def destroy_pool(self, name: str) -> bool: | ||
| if name in self.pools: | ||
| self.pools[name].destroy() | ||
| del self.pools[name] | ||
| with sqlite3.connect(CORTEX_DB) as conn: | ||
| conn.execute("DELETE FROM pools WHERE name=?", (name,)) | ||
| print(f"✅ Destroyed pool '{name}'") | ||
| return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resource leak: shared memory not cleaned up if pool not in current instance.
If a pool was created by a different process instance, it won't be in self.pools. The method deletes the DB record but leaves the shared memory segment orphaned.
def destroy_pool(self, name: str) -> bool:
if name in self.pools:
self.pools[name].destroy()
del self.pools[name]
+ else:
+ # Try to clean up SHM created by another process
+ try:
+ pool = SharedMemoryPool(name, 0, create=False)
+ pool.destroy()
+ except FileNotFoundError:
+ pass
with sqlite3.connect(CORTEX_DB) as conn:
conn.execute("DELETE FROM pools WHERE name=?", (name,))
print(f"✅ Destroyed pool '{name}'")
return TrueCommittable suggestion skipped: line range outside the PR's diff.
| class MockLLM: | ||
| def complete(self, model, messages, max_tokens, temp, system=None): | ||
| return f"[Mock] Response to: {messages[-1]['content'][:50]}..." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MockLLM.complete() interface doesn't match actual Anthropic client usage.
MockLLM.complete() is called with positional args (model, messages, etc.), but the real Anthropic client at line 98 uses self.llm.messages.create(). This interface mismatch will cause issues when testing or when the API key is unavailable.
class MockLLM:
- def complete(self, model, messages, max_tokens, temp, system=None):
- return f"[Mock] Response to: {messages[-1]['content'][:50]}..."
+ class messages:
+ @staticmethod
+ def create(model, messages, max_tokens, **kwargs):
+ class Response:
+ content = [type('obj', (object,), {'text': f"[Mock] Response to: {messages[-1]['content'][:50]}..."})()]
+ return Response()Alternatively, unify the call pattern at lines 98-100 to use a consistent interface.
🧰 Tools
🪛 GitHub Check: SonarCloud Code Analysis
[warning] 45-45: Remove the unused function parameter "max_tokens".
[warning] 45-45: Remove the unused function parameter "temp".
[warning] 45-45: Remove the unused function parameter "system".
[warning] 45-45: Remove the unused function parameter "model".
🪛 Ruff (0.14.7)
45-45: Unused method argument: model
(ARG002)
45-45: Unused method argument: max_tokens
(ARG002)
45-45: Unused method argument: temp
(ARG002)
45-45: Unused method argument: system
(ARG002)
🤖 Prompt for AI Agents
In cortex/kernel_features/llm_device.py around lines 44-46,
MockLLM.complete(...) does not match how the real client is used at lines
~98-100 (self.llm.messages.create(...)); update the mock to mirror the real
Anthropic client shape by adding a messages attribute with a create(...) method
(accepting messages, model=None, max_tokens=None, temperature=None, system=None,
**kwargs) that returns the same mock response format, or alternatively change
the call sites at lines 98-100 to call MockLLM.complete(...) consistently; pick
one approach and implement the matching signature so tests and runtime code use
the same interface.
| def write(self, path, data, offset, fh): | ||
| t, m, f = self._parse(path) | ||
| if t == 'model' and f == 'prompt': | ||
| s = self.sessions["default"] | ||
| s.prompt = data.decode().strip() | ||
| s.messages.append({"role": "user", "content": s.prompt}) | ||
| try: | ||
| resp = self.llm.messages.create(model=self.MODELS.get(m, "claude-3-sonnet-20240229"), | ||
| max_tokens=s.max_tokens, messages=s.messages) if HAS_API else self.llm.complete(m, s.messages, s.max_tokens, s.temp) | ||
| s.response = resp.content[0].text if HAS_API else resp | ||
| except Exception as e: | ||
| s.response = f"Error: {e}" | ||
| s.messages.append({"role": "assistant", "content": s.response}) | ||
| self.requests += 1 | ||
| return len(data) | ||
| raise FuseOSError(errno.EACCES) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Runtime error when using MockLLM fallback.
When HAS_API is False, line 99 calls self.llm.complete() which returns a string, but line 100 tries to access resp.content[0].text regardless of HAS_API. The conditional only affects how the response is created, not how it's processed.
The current logic:
resp = self.llm.messages.create(...) if HAS_API else self.llm.complete(...)
s.response = resp.content[0].text if HAS_API else respBut self.llm is MockLLM() when API is unavailable, and MockLLM doesn't have a messages attribute—it only has complete(). The if HAS_API check is on the wrong object method call.
try:
- resp = self.llm.messages.create(model=self.MODELS.get(m, "claude-3-sonnet-20240229"),
- max_tokens=s.max_tokens, messages=s.messages) if HAS_API else self.llm.complete(m, s.messages, s.max_tokens, s.temp)
- s.response = resp.content[0].text if HAS_API else resp
+ if HAS_API and not isinstance(self.llm, MockLLM):
+ resp = self.llm.messages.create(
+ model=self.MODELS.get(m, "claude-3-sonnet-20240229"),
+ max_tokens=s.max_tokens,
+ messages=s.messages
+ )
+ s.response = resp.content[0].text
+ else:
+ s.response = self.llm.complete(m, s.messages, s.max_tokens, s.temp)🧰 Tools
🪛 Ruff (0.14.7)
91-91: Unused method argument: offset
(ARG002)
91-91: Unused method argument: fh
(ARG002)
101-101: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
In cortex/kernel_features/llm_device.py around lines 91 to 106, the code calls
self.llm.messages.create only when HAS_API is True but still tries to access
resp.content[0].text unconditionally; when HAS_API is False and self.llm is a
MockLLM (which only implements complete()), this raises an attribute error. Fix
by branching the call and the response extraction together: if HAS_API call
self.llm.messages.create(...) and set s.response = resp.content[0].text (or the
appropriate API response field), else call self.llm.complete(...) and set
s.response to the returned string; keep the existing try/except, append the
assistant message, increment requests, and return len(data).
| def save_model(self, config: ModelConfig): | ||
| with sqlite3.connect(CORTEX_DB_PATH) as conn: | ||
| conn.execute( | ||
| "INSERT OR REPLACE INTO models VALUES (?, ?, ?)", | ||
| (config.name, json.dumps(config.to_dict()), datetime.utcnow().isoformat()) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace deprecated datetime.utcnow() with timezone-aware alternative.
datetime.utcnow() is deprecated in Python 3.12+ and returns a naive datetime. Use datetime.now(timezone.utc) instead.
+from datetime import datetime, timezone
-from datetime import datetime
...
def save_model(self, config: ModelConfig):
with sqlite3.connect(CORTEX_DB_PATH) as conn:
conn.execute(
"INSERT OR REPLACE INTO models VALUES (?, ?, ?)",
- (config.name, json.dumps(config.to_dict()), datetime.utcnow().isoformat())
+ (config.name, json.dumps(config.to_dict()), datetime.now(timezone.utc).isoformat())
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def save_model(self, config: ModelConfig): | |
| with sqlite3.connect(CORTEX_DB_PATH) as conn: | |
| conn.execute( | |
| "INSERT OR REPLACE INTO models VALUES (?, ?, ?)", | |
| (config.name, json.dumps(config.to_dict()), datetime.utcnow().isoformat()) | |
| ) | |
| from datetime import datetime, timezone | |
| def save_model(self, config: ModelConfig): | |
| with sqlite3.connect(CORTEX_DB_PATH) as conn: | |
| conn.execute( | |
| "INSERT OR REPLACE INTO models VALUES (?, ?, ?)", | |
| (config.name, json.dumps(config.to_dict()), datetime.now(timezone.utc).isoformat()) | |
| ) |
🧰 Tools
🪛 GitHub Check: SonarCloud Code Analysis
[failure] 62-62: Don't use datetime.datetime.utcnow to create this datetime object.
🤖 Prompt for AI Agents
In cortex/kernel_features/model_lifecycle.py around lines 58 to 63, the
save_model method uses the deprecated naive datetime.utcnow(); replace it with a
timezone-aware timestamp by calling datetime.now(timezone.utc).isoformat()
(import timezone from datetime if missing) so the inserted timestamp is aware
and compliant with Python 3.12+.
| class ServiceGenerator: | ||
| BACKENDS = { | ||
| "vllm": "python -m vllm.entrypoints.openai.api_server --model {model_path} --port {port}", | ||
| "llamacpp": "llama-server -m {model_path} --port {port}", | ||
| "ollama": "ollama serve", | ||
| } | ||
|
|
||
| def generate(self, config: ModelConfig) -> str: | ||
| cmd = self.BACKENDS.get(config.backend, self.BACKENDS["vllm"]).format(**asdict(config)) | ||
| return f"""[Unit] | ||
| Description=Cortex Model: {config.name} | ||
| After=network.target | ||
|
|
||
| [Service] | ||
| Type=simple | ||
| ExecStart={cmd} | ||
| Environment=CUDA_VISIBLE_DEVICES={','.join(map(str, config.gpu_ids))} | ||
| CPUQuota={int(config.cpu_limit * 100)}% | ||
| MemoryMax={config.memory_limit} | ||
| Restart={config.restart_policy} | ||
| NoNewPrivileges=true | ||
|
|
||
| [Install] | ||
| WantedBy=default.target | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Command injection risk in generated systemd service.
The model_path from user input is directly interpolated into the ExecStart command without sanitization. A malicious path like foo; rm -rf / would execute arbitrary commands when the service starts.
Consider validating model_path to ensure it's a valid file path without shell metacharacters:
+import shlex
+import re
+
class ServiceGenerator:
BACKENDS = {
"vllm": "python -m vllm.entrypoints.openai.api_server --model {model_path} --port {port}",
"llamacpp": "llama-server -m {model_path} --port {port}",
"ollama": "ollama serve",
}
def generate(self, config: ModelConfig) -> str:
+ # Validate model_path doesn't contain shell metacharacters
+ if not re.match(r'^[\w\-./]+$', config.model_path):
+ raise ValueError(f"Invalid model_path: {config.model_path}")
cmd = self.BACKENDS.get(config.backend, self.BACKENDS["vllm"]).format(**asdict(config))Alternatively, use shlex.quote() on user-provided values, though systemd ExecStart has its own quoting rules.
🧰 Tools
🪛 Ruff (0.14.7)
81-85: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
🤖 Prompt for AI Agents
In cortex/kernel_features/model_lifecycle.py around lines 80-104 the generated
systemd ExecStart interpolates config.model_path directly into a shell command
allowing command-injection (e.g., "foo; rm -rf /"); fix by validating and
sanitizing model_path before use: ensure it is an absolute or relative
filesystem path that exists (or matches an allowed pattern), reject or normalize
unexpected characters, and strip/escape any shell metacharacters; then build
ExecStart using a safe, quoted form (e.g., apply shlex.quote or the appropriate
systemd ExecStart escaping rules) or avoid a shell invocation entirely by
constructing ExecStart as a command+argv list so untrusted values are never
interpreted by a shell.
| for cmd in ["start", "stop", "unregister"]: | ||
| p = sub.add_parser(cmd) | ||
| p.add_argument("name") | ||
|
|
||
| sub.add_parser("status").add_argument("name", nargs="?") | ||
| sub.add_parser("list") | ||
|
|
||
| args = parser.parse_args() | ||
| mgr = ModelLifecycleManager() | ||
|
|
||
| if args.cmd == "register": | ||
| mgr.register(ModelConfig(args.name, args.path, args.backend, args.port, | ||
| gpu_ids=[int(x) for x in args.gpus.split(",")])) | ||
| elif args.cmd == "start": | ||
| mgr.start(args.name) | ||
| elif args.cmd == "stop": | ||
| mgr.stop(args.name) | ||
| elif args.cmd in ("status", "list"): | ||
| mgr.status(getattr(args, 'name', None)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing handler for unregister command.
The unregister subcommand is defined in the parser (line 157) but not handled in the if/elif chain, causing it to silently do nothing.
elif args.cmd == "stop":
mgr.stop(args.name)
+ elif args.cmd == "unregister":
+ mgr.unregister(args.name)
elif args.cmd in ("status", "list"):
mgr.status(getattr(args, 'name', None))You'll also need to implement the unregister method in ModelLifecycleManager:
def unregister(self, name: str) -> bool:
service_path = CORTEX_SERVICE_DIR / f"cortex-{name}.service"
self._systemctl("stop", f"cortex-{name}.service")
self._systemctl("disable", f"cortex-{name}.service")
service_path.unlink(missing_ok=True)
self.db.delete_model(name)
self._systemctl("daemon-reload")
print(f"✅ Unregistered model '{name}'")
return True🤖 Prompt for AI Agents
In cortex/kernel_features/model_lifecycle.py around lines 157 to 175, the CLI
defines an "unregister" subcommand but the if/elif chain never handles args.cmd
== "unregister", so the command is a no-op; add an elif branch to call
mgr.unregister(args.name) alongside the existing start/stop/status/list
handlers, and implement ModelLifecycleManager.unregister(name: str) to stop and
disable the systemd service, remove the service file, delete the model from the
DB, reload systemd daemon, print a success message and return True (ensure
service file removal uses a safe unlink and that systemctl
stop/disable/daemon-reload calls use the existing _systemctl helper).



Summary
This PR adds four kernel-level AI features implemented in user-space:
Why This Matters
These are Tier 1 features from our kernel enhancement strategy:
Usage
Testing
Next Steps
Part of Kernel Enhancement Roadmap - Tier 1 (User-Space)
Summary by CodeRabbit
Release Notes
New Features
Documentation
Tests
✏️ Tip: You can customize this high-level summary in your review settings.