Skip to content

Conversation

@yaya1738
Copy link

@yaya1738 yaya1738 commented Dec 4, 2025

Summary

Complete implementation of issue #220 - Model Lifecycle Manager ($150 bounty).

Brings "systemctl for AI models" to Cortex Linux with:

  • Systemd service generation for any LLM backend
  • Multi-backend support: vLLM, llama.cpp, Ollama, TGI
  • Health check monitoring with auto-restart on failure
  • Resource limits via systemd (CPU, memory, I/O, tasks)
  • Security hardening (NoNewPrivileges, ProtectSystem, etc.)
  • SQLite persistence for configuration and event logging
  • Boot auto-start via systemd enable

Usage

# Register a model
cortex-model register llama-70b --path meta-llama/Llama-2-70b-hf --backend vllm --gpus 0,1

# Start/stop/restart
cortex-model start llama-70b
cortex-model stop llama-70b

# Enable auto-start on boot
cortex-model enable llama-70b

# View status and logs
cortex-model status
cortex-model logs llama-70b -f

Files Changed

File Lines Description
cortex/kernel_features/model_lifecycle.py 993 Main implementation
tests/test_model_lifecycle.py 907 63 unit tests
README_MODEL_LIFECYCLE.md 297 Documentation

Test Results

============================= test session starts ==============================
collected 63 items
tests/test_model_lifecycle.py ............................ [100%]
============================== 63 passed in 0.79s ==============================

Acceptance Criteria

  • cortex model register creates valid systemd service
  • cortex model start/stop works reliably
  • Health checks trigger auto-restart on failure
  • Unit tests pass with >80% coverage (63 tests)

Closes #220

Summary by CodeRabbit

  • New Features

    • Added Model Lifecycle Manager with systemd service integration for model deployment and lifecycle management
    • Support for register, start, stop, restart, enable/disable workflows via CLI
    • Health monitoring with automatic restart on service failures
    • Persistent storage for model configurations and operational events
    • Multi-backend support (vLLM, Ollama, LlamaCpp, TGI) with advanced configuration options
    • Service status reporting and log access capabilities
  • Documentation

    • Added comprehensive Model Lifecycle documentation with quick start guide and architecture overview

✏️ Tip: You can customize this high-level summary in your review settings.

…anagement

Complete implementation of issue cortexlinux#220 ($150 bounty):

Features:
- Systemd service generation for any LLM backend
- Multi-backend support (vLLM, llama.cpp, Ollama, TGI)
- Health check monitoring with auto-restart on failure
- Resource limits via systemd (CPU, memory, I/O, tasks)
- Security hardening (NoNewPrivileges, ProtectSystem, etc.)
- SQLite database for configuration persistence
- Event logging for audit trail
- Boot auto-start via systemd enable

Files:
- model_lifecycle.py: 993 lines of implementation
- test_model_lifecycle.py: 907 lines, 63 tests (all passing)
- README_MODEL_LIFECYCLE.md: Complete documentation

Closes cortexlinux#220
Copilot AI review requested due to automatic review settings December 4, 2025 05:56
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 4, 2025

Walkthrough

Introduces a comprehensive Model Lifecycle Manager subsystem enabling systemd-based LLM service management for Cortex. Adds persistent SQLite storage, multi-backend support (vLLM, llama.cpp, TGI, Ollama), health monitoring with auto-restart, resource configuration, and rich CLI commands for model registration, start/stop, enable/disable workflows.

Changes

Cohort / File(s) Summary
Documentation
README_MODEL_LIFECYCLE.md
New documentation file introducing systemd-based model lifecycle management, including quick start, supported backends, register/start/stop/restart workflows, status monitoring, usage examples, architecture overview, service files, and testing guidance.
Core Implementation
cortex/kernel_features/model_lifecycle.py
Comprehensive model lifecycle subsystem: introduces ModelState and EventType enums; adds nested configuration dataclasses (HealthCheckConfig, ResourceLimits, SecurityConfig); implements ModelDatabase for SQLite-backed persistence; creates ServiceGenerator for multi-backend systemd service generation; implements HealthChecker for HTTP health monitoring with auto-restart; adds ModelLifecycleManager orchestrating registration, start/stop/restart, enable/disable, status, logs, and events; expands CLI with rich command handlers.
Test Suite
tests/test_model_lifecycle.py
Comprehensive test suite covering dataclass validation/serialization, database CRUD operations and event logging, service generation across backends with health endpoints and resource/security settings, lifecycle manager operations, health checker logic, enum coverage, configuration round-trips, and edge cases.

Sequence Diagram

sequenceDiagram
    participant User as User/CLI
    participant Manager as ModelLifecycleManager
    participant DB as ModelDatabase
    participant Gen as ServiceGenerator
    participant Systemd as systemd
    participant HC as HealthChecker
    participant Service as LLM Service

    rect rgb(200, 220, 255)
    note over User,Systemd: Model Registration & Start Flow
    User->>Manager: register(config)
    Manager->>DB: save_model(config)
    DB-->>Manager: ✓ saved
    Manager->>Gen: generate(config)
    Gen-->>Manager: systemd unit file
    Manager->>Systemd: write unit + daemon-reload
    Systemd-->>Manager: ✓ loaded
    Manager->>DB: log_event(REGISTERED)
    DB-->>Manager: ✓ logged
    Manager-->>User: registration complete
    
    User->>Manager: start(name)
    Manager->>Systemd: systemctl start service
    Systemd->>Service: launch LLM backend
    Service-->>Systemd: running
    Systemd-->>Manager: ✓ started
    Manager->>DB: log_event(STARTED)
    Manager-->>User: ✓ started
    end
    
    rect rgb(220, 240, 220)
    note over HC,Service: Health Monitoring & Auto-Restart
    HC->>HC: monitor interval elapsed
    HC->>Service: GET /health
    Service-->>HC: 200 OK
    HC->>DB: log_event(HEALTH_CHECK_PASSED)
    DB-->>HC: ✓ logged
    
    HC->>Service: GET /health (later)
    Service-->>HC: timeout/error
    HC->>HC: increment failure count
    alt Failure threshold reached
        HC->>Systemd: systemctl restart service
        Systemd->>Service: restart LLM backend
        Service-->>Systemd: running (restarted)
        HC->>DB: log_event(AUTO_RESTARTED)
    end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas requiring extra attention:

  • ModelDatabase SQLite integration: Verify schema, query correctness, transaction handling, and serialization/deserialization logic for nested dataclasses.
  • ServiceGenerator backend support: Review correctness of generated systemd ExecStart commands for each backend (vLLM, llamacpp, TGI, Ollama), resource limit blocks, security hardening, and health endpoint mappings.
  • HealthChecker state machine: Validate HTTP health check logic, failure counting, restart triggering, and integration with the lifecycle manager to prevent race conditions.
  • ModelLifecycleManager orchestration: Examine event logging consistency across all operations, systemd service file lifecycle (creation, deletion, reload), state transitions, and error handling paths.
  • CLI command wiring: Ensure all CLI arguments are properly parsed, nested configurations constructed correctly, and output formatting is user-friendly.
  • Test coverage: Verify test suite exercises happy paths, error conditions, edge cases (special characters, multi-GPU), and interactions between components.

Possibly related PRs

Suggested labels

enhancement, kernel-features

Poem

🐰 Hoppy systemd hops around,
Models service-d with joyful sound,
Auto-restarts when health checks fail,
SQLite stores each tiny tale,
AI services now have class—
"cortex model start" does pass!

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: implementing a Model Lifecycle Manager with systemd-based service management for LLM models, matching the PR's core objective.
Description check ✅ Passed The description includes a summary, usage examples, files changed, test results, acceptance criteria, and closes statement, though it doesn't explicitly follow the template's checklist format.
Linked Issues check ✅ Passed The PR implementation comprehensively addresses all requirements from issue #220: systemd service generation, multi-backend support (vLLM, llama.cpp, Ollama, TGI), health checks with auto-restart, resource limits, security hardening, SQLite persistence, and 63 passing unit tests.
Out of Scope Changes check ✅ Passed All changes are directly scoped to issue #220: the main implementation file, comprehensive test suite, and documentation are all focused on the Model Lifecycle Manager feature without unrelated modifications.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Dec 4, 2025

Quality Gate Failed Quality Gate failed

Failed conditions
2 Security Hotspots

See analysis details on SonarQube Cloud

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (8)
README_MODEL_LIFECYCLE.md (1)

188-206: Add language specifier to fenced code block.

The architecture diagram code block lacks a language specifier. While this is plaintext, adding a specifier improves rendering consistency.

-```
+```text
 ModelLifecycleManager
tests/test_model_lifecycle.py (1)

355-361: Rename unused loop variable to _.

The loop variable i is not used within the loop body. Per Python convention, rename it to _ to indicate it's intentionally unused.

-        for i in range(10):
+        for _ in range(10):
             self.db.log_event("test", EventType.STARTED)
cortex/kernel_features/model_lifecycle.py (6)

239-246: Use explicit Optional type hint.

PEP 484 prohibits implicit Optional. The details parameter should be explicitly typed.

-    def log_event(self, model_name: str, event_type: EventType, details: str = None) -> None:
+    def log_event(self, model_name: str, event_type: EventType, details: Optional[str] = None) -> None:

248-267: Use explicit Optional type hint for model_name parameter.

-    def get_events(self, model_name: str = None, limit: int = 100) -> List[Dict[str, Any]]:
+    def get_events(self, model_name: Optional[str] = None, limit: int = 100) -> List[Dict[str, Any]]:

270-317: Annotate class-level mutable attributes with ClassVar.

The BACKENDS and HEALTH_ENDPOINTS dictionaries are class-level constants and should be annotated with ClassVar to clarify they are not instance attributes.

+from typing import ClassVar
+
 class ServiceGenerator:
     """Generate systemd service files for LLM backends."""
 
     # Backend command templates
-    BACKENDS = {
+    BACKENDS: ClassVar[Dict[str, str]] = {
         ...
     }
 
     # Health check endpoints by backend
-    HEALTH_ENDPOINTS = {
+    HEALTH_ENDPOINTS: ClassVar[Dict[str, str]] = {
         ...
     }

331-336: Deduplicate identical branches for dtype handling.

The vllm and tgi branches have identical logic for dtype argument construction.

         dtype_arg = ""
         if config.dtype != "auto":
-            if config.backend == "vllm":
-                dtype_arg = f"--dtype {config.dtype}"
-            elif config.backend == "tgi":
+            if config.backend in ("vllm", "tgi"):
                 dtype_arg = f"--dtype {config.dtype}"

502-544: High cognitive complexity in monitor_loop - consider refactoring.

The nested monitor_loop closure has high cognitive complexity (17 vs allowed 15 per SonarCloud). Consider extracting the health check result handling into a separate method.

def _handle_health_result(self, name: str, healthy: bool, msg: str, hc: HealthCheckConfig) -> float:
    """Handle health check result. Returns sleep duration."""
    if healthy:
        if self._failure_counts.get(name, 0) > 0:
            self.manager.db.log_event(name, EventType.HEALTH_CHECK_PASSED, msg)
        self._failure_counts[name] = 0
        return hc.interval_seconds
    
    self._failure_counts[name] = self._failure_counts.get(name, 0) + 1
    self.manager.db.log_event(
        name, EventType.HEALTH_CHECK_FAILED,
        f"Failure {self._failure_counts[name]}/{hc.max_failures}: {msg}"
    )
    
    if self._failure_counts[name] >= hc.max_failures:
        self.manager.db.log_event(name, EventType.AUTO_RESTARTED)
        self.manager.restart(name, log_event=False)
        self._failure_counts[name] = 0
        return hc.startup_delay_seconds
    
    return hc.interval_seconds

600-605: Narrow the exception handler to specific exceptions.

Catching bare Exception can mask programming errors. Consider catching specific exceptions like OSError, sqlite3.Error, etc.

-        except Exception as e:
+        except (OSError, sqlite3.Error, subprocess.SubprocessError) as e:
             self.db.log_event(config.name, EventType.ERROR, str(e))
             print(f"Failed to register '{config.name}': {e}")
             return False

Apply similar changes to unregister() at lines 635-638.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between da3e635 and ea0f124.

📒 Files selected for processing (3)
  • README_MODEL_LIFECYCLE.md (1 hunks)
  • cortex/kernel_features/model_lifecycle.py (1 hunks)
  • tests/test_model_lifecycle.py (1 hunks)
🧰 Additional context used
🪛 GitHub Check: SonarCloud Code Analysis
tests/test_model_lifecycle.py

[warning] 357-357: Replace the unused loop index "i" with "_".

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZrn7288Tmn1wGm3II4E&open=AZrn7288Tmn1wGm3II4E&pullRequest=234

cortex/kernel_features/model_lifecycle.py

[failure] 68-68: Define a constant instead of duplicating this literal "/health" 5 times.

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZrn7261Tmn1wGm3II39&open=AZrn7261Tmn1wGm3II39&pullRequest=234


[warning] 336-336: Either merge this branch with the identical one on line "334" or change one of the implementations.

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZrn7261Tmn1wGm3II4B&open=AZrn7261Tmn1wGm3II4B&pullRequest=234


[failure] 860-860: Define a constant instead of duplicating this literal "Model name" 9 times.

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZrn7261Tmn1wGm3II3-&open=AZrn7261Tmn1wGm3II3-&pullRequest=234


[failure] 502-502: Refactor this function to reduce its Cognitive Complexity from 17 to the 15 allowed.

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZrn7261Tmn1wGm3II4C&open=AZrn7261Tmn1wGm3II4C&pullRequest=234


[failure] 839-839: Refactor this function to reduce its Cognitive Complexity from 21 to the 15 allowed.

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZrn7261Tmn1wGm3II4D&open=AZrn7261Tmn1wGm3II4D&pullRequest=234

🪛 markdownlint-cli2 (0.18.1)
README_MODEL_LIFECYCLE.md

188-188: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.14.7)
tests/test_model_lifecycle.py

1-1: Shebang is present but file is not executable

(EXE001)


182-182: Possible binding to all interfaces

(S104)


357-357: Loop control variable i not used within loop body

Rename unused i to _i

(B007)


541-541: Unused method argument: mock_run

(ARG002)


756-756: Possible binding to all interfaces

(S104)

cortex/kernel_features/model_lifecycle.py

239-239: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


248-248: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


274-309: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


312-317: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


492-492: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)


493-493: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)


499-499: Do not catch blind exception: Exception

(BLE001)


560-560: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


569-569: subprocess call: check for execution of untrusted input

(S603)


570-570: Consider ["systemctl", "--user", *list(args)] instead of concatenation

Replace with ["systemctl", "--user", *list(args)]

(RUF005)


601-601: Consider moving this statement to an else block

(TRY300)


602-602: Do not catch blind exception: Exception

(BLE001)


634-634: Consider moving this statement to an else block

(TRY300)


635-635: Do not catch blind exception: Exception

(BLE001)


782-782: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


820-820: subprocess call: check for execution of untrusted input

(S603)


822-822: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Agent
🔇 Additional comments (8)
tests/test_model_lifecycle.py (3)

540-544: Unused mock_run parameter is intentional.

The mock_run parameter appears unused but is necessary to prevent actual subprocess.run calls during the test. The @patch decorator requires a corresponding parameter. This is correct.


656-712: Good coverage of health check scenarios.

The HealthChecker tests cover success, bad status codes, and connection errors with proper mocking of urllib.request.urlopen. The context manager mock pattern is correctly implemented.


746-831: Thorough serialization round-trip testing.

The configuration serialization tests verify nested dataclass handling and JSON compatibility. This ensures database persistence integrity.

cortex/kernel_features/model_lifecycle.py (5)

64-79: Well-designed dataclass with serialization support.

The HealthCheckConfig dataclass provides sensible defaults and proper to_dict/from_dict methods for persistence. The field filtering in from_dict handles forward compatibility gracefully.


173-237: SQLite database implementation is solid.

Good use of:

  • UTC timestamps for consistency
  • Upsert pattern with ON CONFLICT
  • Index creation for query performance
  • Parameterized queries preventing SQL injection

431-470: Service file generation is well-structured.

The generated systemd unit file includes proper sections for resources, security hardening, restart policies, and logging. The template approach is clean and maintainable.


814-820: Subprocess call in logs() should use explicit argument list.

The subprocess.run() call passes args directly. While the list is constructed internally (safe from injection), the name parameter should be validated to ensure it doesn't contain malicious characters before being passed to _service_name().


567-577: Validate model names to prevent systemd service naming issues.

Model names are passed to systemctl via subprocess without validation. While subprocess.run with a list avoids shell injection, special characters in service names could cause issues with systemd service registration and management.

Consider adding name validation in register() or ModelConfig to ensure model names conform to systemd service naming requirements:

import re

def _validate_name(name: str) -> bool:
    """Validate model name for safe systemd service naming."""
    # Allow alphanumeric, dash, underscore only
    return bool(re.match(r'^[a-zA-Z0-9][a-zA-Z0-9_-]*$', name))

Comment on lines +486 to +500
def check_health(self, config: ModelConfig) -> Tuple[bool, str]:
"""Perform a single health check."""
url = config.get_health_url()
timeout = config.health_check.timeout_seconds

try:
req = urllib.request.Request(url, method='GET')
with urllib.request.urlopen(req, timeout=timeout) as response:
if response.status == 200:
return True, "OK"
return False, f"HTTP {response.status}"
except urllib.error.URLError as e:
return False, f"Connection failed: {e.reason}"
except Exception as e:
return False, str(e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate URL scheme before opening to prevent SSRF.

The check_health method opens URLs constructed from user-provided host values without validating the scheme. This could allow file:// or other dangerous schemes if a malicious host value is provided.

     def check_health(self, config: ModelConfig) -> Tuple[bool, str]:
         """Perform a single health check."""
         url = config.get_health_url()
+        # Validate URL scheme to prevent SSRF
+        if not url.startswith(('http://', 'https://')):
+            return False, f"Invalid URL scheme: {url}"
         timeout = config.health_check.timeout_seconds
 
         try:
             req = urllib.request.Request(url, method='GET')
             with urllib.request.urlopen(req, timeout=timeout) as response:
                 if response.status == 200:
                     return True, "OK"
                 return False, f"HTTP {response.status}"
         except urllib.error.URLError as e:
             return False, f"Connection failed: {e.reason}"
-        except Exception as e:
-            return False, str(e)
+        except (TimeoutError, OSError) as e:
+            return False, f"Connection error: {e}"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def check_health(self, config: ModelConfig) -> Tuple[bool, str]:
"""Perform a single health check."""
url = config.get_health_url()
timeout = config.health_check.timeout_seconds
try:
req = urllib.request.Request(url, method='GET')
with urllib.request.urlopen(req, timeout=timeout) as response:
if response.status == 200:
return True, "OK"
return False, f"HTTP {response.status}"
except urllib.error.URLError as e:
return False, f"Connection failed: {e.reason}"
except Exception as e:
return False, str(e)
def check_health(self, config: ModelConfig) -> Tuple[bool, str]:
"""Perform a single health check."""
url = config.get_health_url()
# Validate URL scheme to prevent SSRF
if not url.startswith(('http://', 'https://')):
return False, f"Invalid URL scheme: {url}"
timeout = config.health_check.timeout_seconds
try:
req = urllib.request.Request(url, method='GET')
with urllib.request.urlopen(req, timeout=timeout) as response:
if response.status == 200:
return True, "OK"
return False, f"HTTP {response.status}"
except urllib.error.URLError as e:
return False, f"Connection failed: {e.reason}"
except (TimeoutError, OSError) as e:
return False, f"Connection error: {e}"
🧰 Tools
🪛 Ruff (0.14.7)

492-492: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)


493-493: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)


499-499: Do not catch blind exception: Exception

(BLE001)

Comment on lines +930 to +950
if args.command == "register":
config = ModelConfig(
name=args.name,
model_path=args.path,
backend=args.backend,
port=args.port,
host=args.host,
gpu_ids=[int(x) for x in args.gpus.split(",")],
max_model_len=args.max_model_len,
tensor_parallel_size=args.tensor_parallel,
quantization=args.quantization,
extra_args=args.extra_args,
health_check=HealthCheckConfig(
enabled=not args.no_health_check,
endpoint=ServiceGenerator().get_default_health_endpoint(args.backend)
),
resources=ResourceLimits(
memory_max=args.memory,
cpu_quota=args.cpu
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

GPU ID parsing lacks validation.

The GPU ID parsing splits on comma and converts to int without validation. Invalid input like "0,abc,1" will raise an unhandled ValueError.

+def parse_gpu_ids(gpus_str: str) -> List[int]:
+    """Parse comma-separated GPU IDs with validation."""
+    try:
+        ids = [int(x.strip()) for x in gpus_str.split(",") if x.strip()]
+        if not ids:
+            raise ValueError("No valid GPU IDs provided")
+        if any(id < 0 for id in ids):
+            raise ValueError("GPU IDs must be non-negative")
+        return ids
+    except ValueError as e:
+        raise argparse.ArgumentTypeError(f"Invalid GPU IDs '{gpus_str}': {e}")

 # In register command setup:
-reg.add_argument("--gpus", default="0", help="Comma-separated GPU IDs")
+reg.add_argument("--gpus", default="0", type=parse_gpu_ids, help="Comma-separated GPU IDs")

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In cortex/kernel_features/model_lifecycle.py around lines 930 to 950, the GPU ID
parsing (gpu_ids=[int(x) for x in args.gpus.split(",")]) lacks validation and
will raise on invalid entries; change it to robustly parse by splitting on
commas, stripping whitespace, ignoring empty entries, and validating each token
is a non-negative integer (or raise a clear ValueError with a helpful message),
or alternatively collect only valid ints and log/warn about skipped invalid
tokens; ensure the final gpu_ids is a list of ints and handle the case of an
empty args.gpus string.

Comment on lines +36 to +42
| Backend | Command | Health Endpoint |
|---------|---------|-----------------||
| vLLM | `python -m vllm.entrypoints.openai.api_server` | `/health` |
| llama.cpp | `llama-server` | `/health` |
| Ollama | `ollama serve` | `/api/tags` |
| TGI | `text-generation-launcher` | `/health` |

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Malformed Markdown table syntax.

Line 37 has a double pipe || at the end of the header separator row which will break table rendering in most Markdown processors.

-|---------|---------|-----------------||
+|---------|---------|-----------------|
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
| Backend | Command | Health Endpoint |
|---------|---------|-----------------||
| vLLM | `python -m vllm.entrypoints.openai.api_server` | `/health` |
| llama.cpp | `llama-server` | `/health` |
| Ollama | `ollama serve` | `/api/tags` |
| TGI | `text-generation-launcher` | `/health` |
| Backend | Command | Health Endpoint |
|---------|---------|-----------------|
| vLLM | `python -m vllm.entrypoints.openai.api_server` | `/health` |
| llama.cpp | `llama-server` | `/health` |
| Ollama | `ollama serve` | `/api/tags` |
| TGI | `text-generation-launcher` | `/health` |
🤖 Prompt for AI Agents
In README_MODEL_LIFECYCLE.md around lines 36 to 42, the Markdown table header
separator row contains an extra trailing pipe ("||") which breaks table
rendering; remove the extra pipe so the separator row has exactly three columns
(e.g., |---------|---------|-----------------|) and verify all data rows have
three pipe-separated columns to restore correct table formatting.

Comment on lines +291 to +293
- `cortex/kernel_features/model_lifecycle.py` - Main implementation (~1000 lines)
- `tests/test_model_lifecycle.py` - Unit tests (~650 lines, 50+ tests)
- `README_MODEL_LIFECYCLE.md` - This documentation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Documentation line counts are outdated.

The test file is described as "~650 lines, 50+ tests" but the actual file has 907 lines and 63 tests according to the PR description. Consider updating to reflect current state.

-- `tests/test_model_lifecycle.py` - Unit tests (~650 lines, 50+ tests)
+- `tests/test_model_lifecycle.py` - Unit tests (~900 lines, 63 tests)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- `cortex/kernel_features/model_lifecycle.py` - Main implementation (~1000 lines)
- `tests/test_model_lifecycle.py` - Unit tests (~650 lines, 50+ tests)
- `README_MODEL_LIFECYCLE.md` - This documentation
- `cortex/kernel_features/model_lifecycle.py` - Main implementation (~1000 lines)
- `tests/test_model_lifecycle.py` - Unit tests (~900 lines, 63 tests)
- `README_MODEL_LIFECYCLE.md` - This documentation
🤖 Prompt for AI Agents
In README_MODEL_LIFECYCLE.md around lines 291 to 293, the test file line counts
are stale; update the bullet to reflect the current test file size and count by
replacing "~650 lines, 50+ tests" with the current values (907 lines, 63 tests)
so the entry reads "- `tests/test_model_lifecycle.py` - Unit tests (~907 lines,
63 tests)". Ensure formatting and punctuation match the surrounding list.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a comprehensive Model Lifecycle Manager that brings systemd-based service management to LLM models in Cortex Linux. The implementation provides a production-ready solution for managing AI models as first-class system services with health monitoring, resource limits, and security hardening.

Key Changes

  • Core Implementation: 993-line model lifecycle manager with SQLite persistence, systemd service generation for 4 backends (vLLM, llama.cpp, Ollama, TGI), and HTTP health check monitoring
  • Comprehensive Testing: 907-line test suite with 63 unit tests covering all major functionality including configuration, database operations, service generation, and lifecycle management
  • Documentation: Complete user guide with usage examples, architecture overview, and configuration reference

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 18 comments.

File Description
cortex/kernel_features/model_lifecycle.py Main implementation providing ModelConfig dataclasses, SQLite database layer, systemd service file generation, lifecycle operations (register/start/stop/enable), health monitoring with auto-restart, and CLI interface
tests/test_model_lifecycle.py Comprehensive test suite with 63 tests covering dataclass serialization, database CRUD operations, service generation for all backends, lifecycle management, health checking, and edge cases
README_MODEL_LIFECYCLE.md User documentation with quick start guide, command reference, usage examples for all backends, configuration details for resources/security/health checks, and architecture overview

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +805 to +812
# Color-code state
state_str = state.value
if state == ModelState.ACTIVE:
state_str = f"\033[32m{state_str}\033[0m" # Green
elif state == ModelState.FAILED:
state_str = f"\033[31m{state_str}\033[0m" # Red

print(f"{m.name:<20} {state_str:<21} {enabled:<8} {m.backend:<10} {m.port:<6}")
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The status output uses ANSI color codes directly (lines 808-810) but doesn't check if the output is a TTY. When piping output or redirecting to a file, the color codes will appear as raw escape sequences, making the output harder to read. Consider checking sys.stdout.isatty() before applying colors or using a library like colorama for cross-platform color support.

Copilot uses AI. Check for mistakes.
## Supported Backends

| Backend | Command | Health Endpoint |
|---------|---------|-----------------||
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table header format in the documentation (line 37) has an extra pipe character at the end. The correct markdown table format should be:

| Backend | Command | Health Endpoint |
|---------|---------|-----------------|

Without the trailing | on the separator line.

Suggested change
|---------|---------|-----------------||
|---------|---------|-----------------|

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,907 @@
#!/usr/bin/env python3
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test file location inconsistency: There's already a test file at tests/kernel_features/test_model_lifecycle.py (15 lines), but this PR creates a new test file at tests/test_model_lifecycle.py (907 lines). This creates duplicate test files with the same name in different locations, which could cause confusion. Consider consolidating these tests or clarifying the intended structure.

Copilot uses AI. Check for mistakes.
Comment on lines +534 to +538
if self._failure_counts[name] >= hc.max_failures:
self.manager.db.log_event(name, EventType.AUTO_RESTARTED)
self.manager.restart(name, log_event=False)
self._failure_counts[name] = 0
time.sleep(hc.startup_delay_seconds)
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test coverage: The health monitoring auto-restart functionality (lines 534-538) is a critical feature but lacks comprehensive integration testing. While there are unit tests for individual health check methods, there's no test verifying the complete flow: health check fails → failure count increases → max failures reached → auto-restart triggered. Consider adding an integration test for this critical path.

Copilot uses AI. Check for mistakes.
Comment on lines +420 to +430
"""Generate health check watchdog configuration."""
if not config.health_check.enabled:
return ""

hc = config.health_check
# Use systemd watchdog for health monitoring
return f"""
# Health check via systemd watchdog
WatchdogSec={hc.interval_seconds}
"""

Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The watchdog health check configuration is generated but systemd's WatchdogSec requires the service to actively send watchdog signals (via sd_notify). The current implementation doesn't send these signals, so this watchdog configuration won't actually monitor health. Either implement proper sd_notify integration or remove this misleading configuration and rely solely on the Python-based health checker.

Suggested change
"""Generate health check watchdog configuration."""
if not config.health_check.enabled:
return ""
hc = config.health_check
# Use systemd watchdog for health monitoring
return f"""
# Health check via systemd watchdog
WatchdogSec={hc.interval_seconds}
"""
"""Health monitoring is handled by Python, not systemd watchdog."""
return ""

Copilot uses AI. Check for mistakes.
Comment on lines +602 to +605
except Exception as e:
self.db.log_event(config.name, EventType.ERROR, str(e))
print(f"Failed to register '{config.name}': {e}")
return False
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error handling in the register method catches all exceptions generically (line 602), which makes debugging more difficult. The same pattern is used in unregister (line 635). Consider catching specific exception types (e.g., IOError, OSError, subprocess.CalledProcessError) separately to provide more informative error messages, or at least include the exception type in the log message.

Copilot uses AI. Check for mistakes.
if follow:
args.append("-f")

subprocess.run(args)
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logs method (lines 814-820) directly calls subprocess.run() without handling potential exceptions. If journalctl is not available or the command fails, this will raise an unhandled exception. Consider adding error handling and a user-friendly error message.

Suggested change
subprocess.run(args)
try:
subprocess.run(args, check=True)
except FileNotFoundError:
print("Error: 'journalctl' command not found. Please ensure systemd and journalctl are installed.")
except subprocess.CalledProcessError as e:
print(f"Error: Failed to retrieve logs for '{name}'.\nDetails: {e}")

Copilot uses AI. Check for mistakes.
- CLI parsing
"""

import os
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'os' is not used.

Suggested change
import os

Copilot uses AI. Check for mistakes.
import unittest
from pathlib import Path
from unittest.mock import patch, MagicMock
from dataclasses import asdict
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'asdict' is not used.

Suggested change
from dataclasses import asdict

Copilot uses AI. Check for mistakes.
ModelDatabase,
ServiceGenerator,
ModelLifecycleManager,
HealthChecker,
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'HealthChecker' is not used.

Suggested change
HealthChecker,

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Kernel Feature] Model Lifecycle Manager - Systemd-Based LLM Service Management

2 participants