Skip to content

Conversation

@Sahilbhatane
Copy link
Collaborator

@Sahilbhatane Sahilbhatane commented Nov 14, 2025

Summary

  • add native Kimi K2 and fake providers to the command interpreter and CLI

Testing

  • C:/Python314/python.exe test\run_all_tests.py

this pull request has test file which was need for issue #40 but it fixed #16 because of that as well.

Summary by CodeRabbit

  • New Features

    • Added support for Moonshot AI's Kimi K2 provider.
    • Added runtime dependencies for HTTP requests and YAML config handling.
  • Documentation

    • Contributor guide with coding standards and PR checklist.
    • Testing strategy and setup docs.
    • Detailed Kimi K2 integration guide.
  • Tests

    • Docker-backed end-to-end integration tests.
    • Expanded CLI unit tests and a unified test runner.
  • Chores

    • Relaxed strict permission checks on non-POSIX platforms.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 14, 2025

Warning

Rate limit exceeded

@Sahilbhatane has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 17 minutes and 37 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between fb4e654 and b059c82.

📒 Files selected for processing (6)
  • .github/workflows/ci.yml (1 hunks)
  • cortex/sandbox/sandbox_executor.py (2 hunks)
  • docs/ISSUE_40_KIMI_K2_IMPLEMENTATION.md (1 hunks)
  • requirements-dev.txt (1 hunks)
  • tests/test_cli.py (10 hunks)
  • tests/test_llm_router.py (3 hunks)

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds a testing and contribution docs, a unittest test runner, Docker-backed integration test utilities and end-to-end tests, expanded unit tests, minor platform-specific security short-circuit, and two runtime dependencies in requirements.txt. No production API signatures were removed.

Changes

Cohort / File(s) Summary
Documentation
contribution.md, test.md, docs/ISSUE_40_KIMI_K2_IMPLEMENTATION.md
New contributor guide, testing strategy guide, and provider implementation doc; purely documentation additions.
Test Runner & Harness
test/run_all_tests.py, test/integration/__init__.py
Adds unittest discovery runner (discover_tests, main) and integration package initializer with module docstring.
Docker Integration Utilities
test/integration/docker_utils.py
New Docker helpers: DockerRunResult dataclass, docker_available() check, and run_in_docker() to execute commands with mounts/env and return structured results.
Docker-backed Integration Tests
test/integration/test_end_to_end.py
New end-to-end test class TestEndToEndWorkflows exercising CLI and installation flows inside disposable Docker containers.
Unit Tests — CLI & Interpreter
test/test_cli.py, tests/test_interpreter.py, tests/test_hardware_detection.py
Extensive CLI unit tests, updated CommandInterpreter constructor signature adding optional cache param, and small test robustness improvements.
Dependencies
requirements.txt
Adds requests>=2.32.4 and PyYAML>=6.0.0.
Config behavior
cortex/config_manager.py
Short-circuits directory security ownership/permission enforcement on non-POSIX or when uid/gid unavailable.

Sequence Diagram(s)

sequenceDiagram
    participant TR as Test Runner
    participant DU as docker_utils
    participant DC as Docker Engine
    participant CT as Container (bash)
    participant RM as Repo Mount

    TR->>DU: docker_available()
    DU->>DC: run `docker --version` and `docker info`
    DC-->>DU: return status (ok / fail)

    TR->>DU: run_in_docker(image, command, mounts, env)
    DU->>DC: docker run --rm [--env ...] [--mount ...] image bash -lc "cd /workspace && <command>"
    DC->>RM: mount host repo into /workspace
    DC->>CT: start container shell with env and workdir
    CT->>CT: bootstrap (pip install) and run CLI/tests
    CT-->>DU: stdout, stderr, exit code
    DU-->>TR: DockerRunResult(returncode, stdout, stderr)
    TR->>TR: assert results / mark test pass/fail
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Pay attention to:
    • test/integration/docker_utils.py — command construction, shell-escaping, mount handling, and timeouts.
    • test/integration/test_end_to_end.py — Docker lifecycle, bootstrap commands, environment propagation, and test isolation.
    • test/test_cli.py — many mocked paths; ensure mocks align with production behavior.
    • cortex/config_manager.py — platform-specific security short-circuit implications.

Possibly related PRs

Suggested labels

enhancement

Suggested reviewers

  • dhvll

Poem

🐰
With tiny paws I scrub and test,
Through Docker fields I build my nest.
CLI hops, containers sing,
New docs and tests — a springtime spring! 🌱

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is incomplete against the template. It lacks the 'Related Issue' section header/structure, the Closes directive is missing, and the Checklist section is not present. Add 'Closes #40' (and #16 if intended), include a 'Checklist' section with the required items, and ensure the summary clearly states all major changes (Kimi K2 provider, integration tests, Docker utilities, dependencies).
Docstring Coverage ⚠️ Warning Docstring coverage is 40.49% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Added Kimi K2 provider tests Fixes #40' addresses the main code changes including Kimi K2 provider support and testing infrastructure, but is somewhat vague about the comprehensive scope of changes (documentation, integration tests, dependencies).
Linked Issues check ✅ Passed The PR substantially addresses issue #16 requirements: delivers 6 end-to-end tests (test_cli_help_executes, test_cli_dry_run_with_fake_provider, test_cli_execute_with_fake_provider, test_coordinator_executes_in_container, test_project_tests_run_inside_container, plus integration test infrastructure), uses Docker for clean environments, includes comprehensive testing documentation, and establishes CI infrastructure foundation.
Out of Scope Changes check ✅ Passed All changes align with issue #16 objectives: integration test suite (test files, Docker utilities), documentation (test.md, contribution.md), CLI/provider support (CommandInterpreter.cache parameter, fake provider), and dependency additions (requests, PyYAML for Docker/YAML config). No unrelated changes detected.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Sahilbhatane Sahilbhatane requested a review from dhvll November 14, 2025 16:40
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (20)
cortex/__init__.py (1)

1-2: Avoid duplicating the version string in multiple places.

__version__ = "0.1.0" here and version="0.1.0" in setup.py will drift over time. Consider defining the version in a single module (e.g., cortex/_version.py) and importing it both here and in setup.py.

setup.py (1)

7-8: Make requirements parsing robust to indented comments.

Current list comprehension:

requirements = [line.strip() for line in fh if line.strip() and not line.startswith("#")]

will include lines like " # comment" as a requirement because line.startswith("#") ignores leading whitespace. Consider:

stripped = line.strip()
if stripped and not stripped.startswith("#"):
    requirements.append(stripped)

to avoid accidentally treating indented comments as dependencies.

cortex/coordinator.py (5)

12-39: Use explicit None checks in duration for clarity.

To avoid relying on truthiness of floats and to be explicit about the sentinel value, consider:

def duration(self) -> Optional[float]:
    if self.start_time is not None and self.end_time is not None:
        return self.end_time - self.start_time
    return None

This is a minor readability/robustness improvement.


87-97: Avoid fully silent failures in _log.

Right now any error opening/writing the log file is swallowed:

try:
    ...
except Exception:
    pass

Consider at least emitting a minimal print(..., file=sys.stderr) or narrowing the exception type so that unexpected errors don’t fail silently. This keeps installation resilient while still giving a hint when logging is misconfigured.


99-141: Be explicit about shell=True and broad exception handling trade-offs.

Given these commands are arbitrary shell lines, shell=True is probably required, but:

  • It’s a security footgun if ever exposed to untrusted input or run under elevated privileges.
  • Catching bare Exception here may hide unexpected programmer errors.

At minimum, consider documenting that InstallationCoordinator is intended for local, trusted environments and not for executing untrusted commands. If feasible, narrow the except Exception to specific failures (e.g., OSError) so unexpected errors surface during development.


166-197: Progress callback is invoked before status is updated, so callers never see final state.

progress_callback is called before _execute_command, while step.status is still PENDING. The CLI’s callback uses step.status to print [OK]/[FAIL], so it will only ever see [PENDING].

Consider instead:

for i, step in enumerate(self.steps):
    if self.progress_callback:
        self.progress_callback(i + 1, len(self.steps), step)  # optional “starting” hook

    success = self._execute_command(step)

    if self.progress_callback:
        self.progress_callback(i + 1, len(self.steps), step)  # “completed” hook with final status

or move the callback to after _execute_command if you only want the completed status. This will make CLI progress output reflect real step outcomes.


269-307: install_docker is tightly coupled to Debian/Ubuntu + systemd environments.

The hard-coded apt commands, add-apt-repository, and systemctl checks make this helper effectively Ubuntu/Debian with systemd only. That’s fine for container-based tests, but if you plan to reuse it on other distros (e.g., Arch, Fedora, non-systemd setups), it may fail in surprising ways.

Consider either documenting this assumption clearly or wiring distro detection into the higher-level CLI so alternate installers can be plugged in later.

test/run_all_tests.py (1)

11-33: Test discovery correctly scoped to test/ but won’t pick tests outside

discover_tests() starts from os.path.dirname(__file__), so only tests under the test directory are discovered, which matches how cortex/cli.py locates run_all_tests.py. If you ever want LLM/test_interpreter.py (or other top-level tests) to be run via this entry point, you’ll need to broaden start_dir (e.g., repo root) or mirror those tests under test/.

README.md (1)

93-95: Contribution and testing references are helpful—verify link targets

Pointing contributors to Issues, contribution.md, and test.md is good for onboarding and testing discipline. Just double-check that these relative links resolve correctly in the GitHub UI from this README.

LLM/test_interpreter.py (1)

97-103: Remove redundant @patch('openai.OpenAI') to satisfy Ruff and simplify the test

In test_parse_empty_input, the mock_openai argument is unused, and setUp already stubs the openai module globally. You can safely drop the patch decorator and argument—this will also resolve Ruff’s ARG002 warning.

-    @patch('openai.OpenAI')
-    def test_parse_empty_input(self, mock_openai):
+    def test_parse_empty_input(self):
         interpreter = CommandInterpreter(api_key=self.api_key, provider="openai")
 
         with self.assertRaises(ValueError):
             interpreter.parse("")
test/test_cli.py (2)

32-37: Strengthen test_get_api_key_not_found by asserting error output

You already patch sys.stderr here, but the test only checks that the API key is None, and mock_stderr is otherwise unused (Ruff ARG002).

To fully exercise _get_api_key’s error path and use the mock, consider also asserting that something was written to stderr, e.g.:

@patch.dict(os.environ, {}, clear=True)
@patch('sys.stderr')
def test_get_api_key_not_found(self, mock_stderr) -> None:
-    api_key = self.cli._get_api_key('openai')
-    self.assertIsNone(api_key)
+    api_key = self.cli._get_api_key('openai')
+    self.assertIsNone(api_key)
+    self.assertTrue(mock_stderr.write.called)

This both validates the user-visible error and clears the static-analysis warning.


90-196: Avoid real spinner sleeps in install-path unit tests (optional)

All install tests that reach CortexCLI.install with a valid API key will run the 10‑iteration spinner loop, each iteration sleeping for 0.1s. As the number of tests grows, this can noticeably slow the suite.

If runtime becomes an issue, consider patching _animate_spinner for these tests:

@patch.object(CortexCLI, "_animate_spinner", return_value=None)
@patch.dict(os.environ, {'OPENAI_API_KEY': 'test-key'}, clear=True)
@patch('cortex.cli.CommandInterpreter')
def test_install_dry_run(self, mock_interpreter_class, _mock_spinner) -> None:
    ...

This keeps behaviour identical from the CLI’s perspective while making tests faster and less timing‑sensitive.

cortex/test_coordinator.py (1)

36-321: Coordinator tests look comprehensive; consider making test_step_timing more robust

The coordinator test suite here is thorough and matches the coordinator’s behaviour well (init, stop/continue on error, rollback, logging, verification, summaries, and JSON export).

One minor robustness point: in test_step_timing, you assert step.end_time > step.start_time. On platforms with coarse time.time() resolution, two consecutive calls can occasionally return the same value, which would make this test intermittently fail even though the coordinator is behaving correctly.

If you want to eliminate that flakiness risk, you could either:

  • Relax to >= or
  • Patch time.time() to return controlled, strictly increasing values during the test.

Otherwise, this module looks solid.

LLM/interpreter.py (1)

154-185: Minor cleanup in _call_fake: avoid double JSON parsing of CORTEX_FAKE_COMMANDS

The fake provider path correctly validates that CORTEX_FAKE_COMMANDS is JSON with a top‑level "commands" key, but then discards the parsed data and calls _parse_commands(payload), which re‑parses the same JSON string.

Functionally this is fine, but you can simplify and avoid double parsing by reusing data, for example:

        payload = os.environ.get("CORTEX_FAKE_COMMANDS")
        if payload:
            try:
-                data = json.loads(payload)
+                data = json.loads(payload)
             except json.JSONDecodeError as exc:
                 raise ValueError("CORTEX_FAKE_COMMANDS must contain valid JSON") from exc
-            if not isinstance(data, dict) or "commands" not in data:
-                raise ValueError("CORTEX_FAKE_COMMANDS must define a 'commands' list")
-            return self._parse_commands(payload)
+            if not isinstance(data, dict) or "commands" not in data:
+                raise ValueError("CORTEX_FAKE_COMMANDS must define a 'commands' list")
+            commands = data["commands"]
+            if not isinstance(commands, list):
+                raise ValueError("CORTEX_FAKE_COMMANDS 'commands' must be a list")
+            return [cmd for cmd in commands if cmd and isinstance(cmd, str)]

parse() will still run _validate_commands on this list, so you keep the safety guarantees while reducing redundant work.

test/integration/docker_utils.py (1)

55-108: Mounted paths are documented as read‑only but are actually read‑write

In run_in_docker() the mounts parameter is documented as:

“Iterable of host Path instances mounted read-only to the same location within the container.”

However, the volume flag is built as:

docker_cmd.extend([
    "-v",
    f"{str(host_path.resolve())}:{container_path}",
])

Without a :ro suffix, these mounts are read‑write by default. For integration tests this may be intentional (e.g., writing artifacts back to the host), but it contradicts the docstring and slightly weakens isolation.

Consider either:

  • Appending :ro to the volume spec to enforce read‑only mounts, or
  • Updating the docstring to reflect that mounts are read‑write and, if needed, adding an option to request read‑only behaviour.
cortex/cli.py (5)

22-36: Provider fallback could cause confusing error messages.

Line 36 returns 'openai' as the default even when no OPENAI_API_KEY is present. This will cause a delayed error in _get_api_key rather than failing fast with a clear message that no provider is configured.

Consider one of these approaches:

Option 1: Return None and handle in caller

         return 'fake'
-    return 'openai'
+    return None  # No provider configured

Then check in the install method:

provider = self._get_provider()
if provider is None:
    self._print_error("No LLM provider configured. Set OPENAI_API_KEY, ANTHROPIC_API_KEY, or KIMI_API_KEY.")
    return 1

Option 2: Emit clear error immediately

         return 'fake'
-    return 'openai'
+    self._print_error("No LLM provider configured. Set one of: OPENAI_API_KEY, ANTHROPIC_API_KEY, or KIMI_API_KEY.")
+    sys.exit(1)

92-108: Consider removing redundant "install" prefix.

Line 103 prefixes the software parameter with "install " when calling the interpreter. Since the CLI command is already install and the parameter is named software, this might be redundant or confusing. Users might naturally include verbs in their input (e.g., cortex install "set up docker").

Consider passing the software parameter directly:

-            commands = interpreter.parse(f"install {software}")
+            commands = interpreter.parse(software)

Or make the prompt more explicit:

-            commands = interpreter.parse(f"install {software}")
+            commands = interpreter.parse(f"Generate shell commands to install {software}")

118-156: Execute mode logic is sound, but consider adding else block.

The execution flow with progress callbacks and error handling is well-structured. However, Lines 153-155 should be in an else block for clarity, as flagged by static analysis (TRY300).

Apply this diff to improve structure:

                     print(f"  Error: {result.error_message}", file=sys.stderr)
                 return 1
+            else:
+                print("\nTo execute these commands, run with --execute flag")
+                print("Example: cortex install docker --execute")
+                return 0
 
-            print("\nTo execute these commands, run with --execute flag")
-            print("Example: cortex install docker --execute")
-            return 0
-
         except ValueError as exc:

157-166: Exception handling is appropriate for CLI entry point.

The tiered exception handling (ValueError for input errors, RuntimeError for API failures, and Exception as fallback) is suitable for a CLI entry point where graceful error messages are important. The blind Exception catch on Line 163 prevents unexpected tracebacks from reaching end users, which is acceptable in this context.

Static analysis suggests using explicit conversion flags in f-strings (RUF010):

-            self._print_error(f"API call failed: {str(exc)}")
+            self._print_error(f"API call failed: {exc!s}")
-            self._print_error(f"Unexpected error: {str(exc)}")
+            self._print_error(f"Unexpected error: {exc!s}")

These are minor style improvements and not critical.


197-198: Consider making --execute and --dry-run mutually exclusive.

Both flags can currently be specified together, and dry_run takes precedence (checked first on Line 114). This could confuse users who expect both flags to trigger an error.

Make the flags mutually exclusive:

     install_parser = subparsers.add_parser('install', help='Install software using natural language')
     install_parser.add_argument('software', type=str, help='Software to install (natural language)')
-    install_parser.add_argument('--execute', action='store_true', help='Execute the generated commands')
-    install_parser.add_argument('--dry-run', action='store_true', help='Show commands without executing')
+    
+    exec_group = install_parser.add_mutually_exclusive_group()
+    exec_group.add_argument('--execute', action='store_true', help='Execute the generated commands')
+    exec_group.add_argument('--dry-run', action='store_true', help='Show commands without executing')
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 90236f7 and fe37193.

📒 Files selected for processing (19)
  • .gitignore (1 hunks)
  • LLM/interpreter.py (9 hunks)
  • LLM/requirements.txt (1 hunks)
  • LLM/test_interpreter.py (11 hunks)
  • MANIFEST.in (1 hunks)
  • README.md (2 hunks)
  • contribution.md (1 hunks)
  • cortex/__init__.py (1 hunks)
  • cortex/cli.py (1 hunks)
  • cortex/coordinator.py (1 hunks)
  • cortex/test_cli.py (1 hunks)
  • cortex/test_coordinator.py (1 hunks)
  • setup.py (1 hunks)
  • test.md (1 hunks)
  • test/integration/__init__.py (1 hunks)
  • test/integration/docker_utils.py (1 hunks)
  • test/integration/test_end_to_end.py (1 hunks)
  • test/run_all_tests.py (1 hunks)
  • test/test_cli.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (9)
test/run_all_tests.py (1)
cortex/cli.py (1)
  • main (168-222)
cortex/cli.py (3)
LLM/interpreter.py (2)
  • CommandInterpreter (18-268)
  • parse (224-243)
cortex/coordinator.py (3)
  • InstallationCoordinator (53-266)
  • StepStatus (12-19)
  • execute (166-213)
test/run_all_tests.py (1)
  • main (19-33)
LLM/test_interpreter.py (1)
LLM/interpreter.py (4)
  • CommandInterpreter (18-268)
  • APIProvider (9-15)
  • parse (224-243)
  • _call_kimi (118-152)
test/integration/test_end_to_end.py (3)
test/integration/docker_utils.py (3)
  • docker_available (26-52)
  • run_in_docker (55-108)
  • succeeded (21-23)
cortex/cli.py (1)
  • main (168-222)
test/run_all_tests.py (1)
  • main (19-33)
cortex/test_coordinator.py (1)
cortex/coordinator.py (11)
  • InstallationCoordinator (53-266)
  • InstallationStep (23-39)
  • InstallationResult (43-50)
  • StepStatus (12-19)
  • install_docker (269-307)
  • duration (35-39)
  • execute (166-213)
  • add_rollback_command (162-164)
  • verify_installation (215-237)
  • get_summary (239-261)
  • export_log (263-266)
cortex/test_cli.py (1)
cortex/cli.py (9)
  • CortexCLI (14-165)
  • main (168-222)
  • _get_api_key (38-54)
  • _get_provider (22-36)
  • _print_status (56-58)
  • _print_error (60-62)
  • _print_success (64-66)
  • install (80-165)
  • _animate_spinner (68-73)
cortex/coordinator.py (1)
cortex/cli.py (1)
  • progress_callback (119-126)
cortex/__init__.py (1)
cortex/cli.py (1)
  • main (168-222)
test/test_cli.py (1)
cortex/cli.py (9)
  • CortexCLI (14-165)
  • main (168-222)
  • _get_api_key (38-54)
  • _get_provider (22-36)
  • _print_status (56-58)
  • _print_error (60-62)
  • _print_success (64-66)
  • install (80-165)
  • _animate_spinner (68-73)
🪛 Ruff (0.14.4)
test/integration/docker_utils.py

34-34: subprocess call: check for execution of untrusted input

(S603)


42-42: subprocess call: check for execution of untrusted input

(S603)


50-50: Consider moving this statement to an else block

(TRY300)


91-91: Use explicit conversion flag

Replace with conversion flag

(RUF010)


99-99: subprocess call: check for execution of untrusted input

(S603)

cortex/cli.py

155-155: Consider moving this statement to an else block

(TRY300)


161-161: Use explicit conversion flag

Replace with conversion flag

(RUF010)


163-163: Do not catch blind exception: Exception

(BLE001)


164-164: Use explicit conversion flag

Replace with conversion flag

(RUF010)


210-210: subprocess call: check for execution of untrusted input

(S603)

LLM/test_interpreter.py

98-98: Unused method argument: mock_openai

(ARG002)

cortex/test_cli.py

28-28: Unused method argument: mock_stderr

(ARG002)


48-48: Unused method argument: mock_stderr

(ARG002)


53-53: Unused method argument: mock_stdout

(ARG002)

cortex/coordinator.py

75-75: Avoid specifying long messages outside the exception class

(TRY003)


96-97: try-except-pass detected, consider logging the exception

(S110)


96-96: Do not catch blind exception: Exception

(BLE001)


107-107: subprocess call with shell=True identified, security issue

(S602)


136-136: Do not catch blind exception: Exception

(BLE001)


140-140: Use explicit conversion flag

Replace with conversion flag

(RUF010)


153-153: subprocess call with shell=True identified, security issue

(S602)


159-159: Do not catch blind exception: Exception

(BLE001)


160-160: Use explicit conversion flag

Replace with conversion flag

(RUF010)


223-223: subprocess call with shell=True identified, security issue

(S602)


233-233: Do not catch blind exception: Exception

(BLE001)


235-235: Use explicit conversion flag

Replace with conversion flag

(RUF010)

LLM/interpreter.py

55-55: Avoid specifying long messages outside the exception class

(TRY003)


146-146: Abstract raise to an inner function

(TRY301)


146-146: Avoid specifying long messages outside the exception class

(TRY003)


149-149: Abstract raise to an inner function

(TRY301)


149-149: Avoid specifying long messages outside the exception class

(TRY003)


152-152: Avoid specifying long messages outside the exception class

(TRY003)


152-152: Use explicit conversion flag

Replace with conversion flag

(RUF010)


162-162: Avoid specifying long messages outside the exception class

(TRY003)


164-164: Avoid specifying long messages outside the exception class

(TRY003)


227-227: Avoid specifying long messages outside the exception class

(TRY003)

test/test_cli.py

34-34: Unused method argument: mock_stderr

(ARG002)

🔇 Additional comments (28)
.gitignore (1)

1-30: Solid foundation for Python project hygiene.

The .gitignore file appropriately covers Python bytecode, build artifacts, virtual environments, and testing/coverage outputs—all essential for the multi-provider LLM CLI and Docker-backed integration test infrastructure this PR introduces. Patterns are syntactically correct and follow standard conventions.

MANIFEST.in (1)

1-5: Packaging manifest looks appropriate for the current project layout.

README, LICENSE, Python sources, and LLM requirements are included; this is a reasonable minimal set for distribution.

contribution.md (1)

12-20: Confirm src/requirements.txt path and keep install docs aligned.

The setup instructions depend on src/requirements.txt. Please double-check that this file exists in the repo and that other docs (e.g., test.md, README) reference the same path so new contributors don’t hit a missing-file error.

test.md (1)

1-67: Testing guide is clear and aligns with the described test suites.

Unit vs integration responsibilities, Docker usage, and CI recommendations are well scoped and actionable; this should make it straightforward for contributors to run the right tests locally and in CI.

test/integration/__init__.py (1)

1-1: Clear package-level docstring

The docstring succinctly documents the purpose of the integration test package; no changes needed.

README.md (1)

61-76: Provider table and AI layer docs align with implementation

The updated AI stack line and the “Supported LLM Providers” table accurately reflect the current providers and defaults (OPENAI_API_KEY, ANTHROPIC_API_KEY, KIMI_API_KEY, CORTEX_PROVIDER=fake with CORTEX_FAKE_COMMANDS). This should make configuring OpenAI/Claude/Kimi/Fake much clearer for users.

LLM/test_interpreter.py (7)

2-10: Test module setup is pragmatic and isolates interpreter dependencies

Using sys.path.insert plus sys.modules stubs for openai and anthropic is a practical way to exercise CommandInterpreter without requiring real SDKs. This keeps tests self-contained and avoids network calls while still letting you assert initialization behavior.


15-24: Shared stubs in setUp keep provider imports safe across tests

The setUp method’s patch.dict(sys.modules, {"openai": ..., "anthropic": ...}) ensures that every test uses stubs instead of real clients, which nicely centralizes mocking and avoids per-test boilerplate for import safety.


49-55: Kimi initialization test cleanly validates defaults and dependency handling

The test_initialization_kimi case, with @patch.dict(os.environ, {}, clear=True) and a stubbed requests module, correctly validates that the interpreter:

  • Selects APIProvider.KIMI
  • Falls back to the kimi-k2 default model when KIMI_DEFAULT_MODEL is unset
  • Requires requests to be importable

This is a solid pattern for providers that use plain HTTP clients rather than SDKs.


104-199: OpenAI/Claude call and parsing tests comprehensively cover core flows

The tests around _call_openai, _call_claude, parse, and parse_with_context (including success and failure branches, validation on/off, and context injection) exercise the main provider paths and the dangerous-command filtering well. They should catch most regressions in prompt formatting, JSON parsing, and validation logic.


221-242: Docker installation parsing test validates realistic multi-command output

test_parse_docker_installation ensures that the interpreter returns a non-empty list and that at least one command mentions docker, using a realistic multi-command JSON payload. This is a good high-level sanity check on both parsing and basic intent preservation.


243-254: Fake provider tests nicely pin env-driven and default behaviors

The FAKE provider tests:

  • Confirm that CORTEX_FAKE_COMMANDS JSON is honored exactly when set.
  • Fall back to the safe default docker command sequence when the env var is absent.

This gives you deterministic offline behavior and protects against regressions in the fake provider path.


255-284: Kimi HTTP tests correctly isolate requests and exercise success/error paths

The Kimi tests stub requests via sys.modules, then:

  • Validate happy-path JSON handling and command extraction in _call_kimi.
  • Ensure any underlying exception from post is surfaced as a RuntimeError.

This pattern cleanly decouples tests from the real HTTP client and should make future refactors to Kimi integration safer.

test/integration/test_end_to_end.py (8)

1-21: LGTM! Well-structured test module setup.

The module docstring, imports, and constants are well-organized. The Docker image selection via environment variable provides good flexibility, and the base environment variables are appropriate for containerized testing.


23-37: LGTM! Clean helper method for Docker execution.

The test class properly skips when Docker is unavailable, and the _run helper method correctly merges environments and prepends pip bootstrapping. The use of dict | None syntax with from __future__ import annotations ensures compatibility.


39-44: LGTM! Effective smoke test.

This test provides a good baseline check that the CLI entry point is functional in a clean container environment.


46-64: LGTM! Comprehensive dry-run test.

The test properly configures the fake provider and validates that dry-run mode displays generated commands without executing them. The JSON structure correctly matches the expected format.


65-82: LGTM! Good coverage of execution flow.

This test validates the execute path without requiring real API calls or executing potentially dangerous commands. The assertion checks for the expected success message.


83-99: LGTM! Good component-level test.

The test properly isolates the coordinator component and uses secure heredoc syntax to prevent variable expansion. This validates the coordinator works correctly inside the container environment.


113-114: LGTM! Standard unittest main block.

The main block follows standard Python testing patterns and is appropriately marked for coverage exclusion.


100-111: Assertion is valid and test is well-structured.

The unittest runner correctly outputs "OK" in the last line of stdout, as verified. The assertion self.assertIn("OK", result.stdout.splitlines()[-1]) is sound and will reliably detect successful test runs even when additional details like skipped test counts are present.

cortex/cli.py (7)

1-12: LGTM! Clean module setup.

The imports and module docstring are well-organized and appropriate for the CLI implementation.


14-21: LGTM! Clean initialization.

The spinner initialization is straightforward and uses appropriate Unicode braille patterns for visual feedback.


38-54: LGTM! Clean API key retrieval with helpful error messages.

The method properly maps providers to environment variables and provides clear guidance when keys are missing.


56-79: LGTM! Well-designed output helpers.

The helper methods provide clean abstractions for console output with appropriate use of stdout/stderr and ANSI escape sequences.


109-117: LGTM! Clear command display and dry-run handling.

The generated commands are displayed in a user-friendly numbered format, and dry-run mode exits appropriately with helpful guidance.


202-211: LGTM! Test runner integration is secure.

The test flag handling correctly constructs the path to the test runner and executes it using the current Python interpreter. The static analysis warning on Line 210 (S603) is a false positive—the test_runner path is derived from __file__, not user input.


213-226: LGTM! Clean command dispatch and entry point.

The command dispatch logic is straightforward, and the main block follows standard Python patterns.

@dhvll
Copy link
Collaborator

dhvll commented Nov 14, 2025

@Sahilbhatane why are still solving multiple issues in single PR. Limit one issue for only one PR.
And give some kind of demonstration that its working video is the best.
Too much AI slop is not good. You can use AI but understand the code and keep the files structured.
Do self review before raising PRs.

@dhvll dhvll closed this Nov 14, 2025
@Sahilbhatane
Copy link
Collaborator Author

hey @dhvll here is the video you asked -
https://www.loom.com/share/338206862fd94b1db8178dc21366471f

i will also share this on you discord. this video show CLI implantation and kimi k2 provider implementation as well. other details, reviews and conversation we will do on discord

@mikejmorgan-ai
Copy link
Member

🎯 Closing for MVP Focus

This issue is being closed to help the team focus on MVP-critical features (#1-45).

This is NOT abandoned - it's an important feature we'll revisit after MVP completion.

Timeline:

Want to work on this anyway?
Comment below and we can discuss! We're always open to great contributions.

Tracking: Labeled as post-mvp for easy filtering when we reopen.

Thanks for understanding! 🚀

— Mike (@mikejmorgan-ai)

@Sahilbhatane Sahilbhatane reopened this Dec 2, 2025
@Sahilbhatane Sahilbhatane changed the title Added Kimi K2 provider and Docker integration tests Fixes #40 and #16(optional) Added Kimi K2 provider tests Fixes #40 Dec 2, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
LLM/interpreter.py (1)

262-271: Default model identifiers have been updated appropriately.

The defaults have been updated from the previous review:

  • OpenAI: "gpt-4o" (updated from deprecated "gpt-4")
  • Kimi: "kimi-k2-turbo-preview" with env override support

This addresses the concerns from the previous review about deprecated model identifiers.

🧹 Nitpick comments (11)
test/test_coordinator.py (2)

5-5: Unused import.

The time module is imported but doesn't appear to be used in this file.

-import time

319-319: Prefer assertGreater over assertTrue with comparison.

As flagged by SonarCloud, assertGreater provides better failure messages and is the idiomatic choice for this comparison.

-            self.assertTrue(step.end_time > step.start_time)
+            self.assertGreater(step.end_time, step.start_time)
test/test_cli.py (1)

31-35: Consider prefixing unused mock argument.

The mock_stderr parameter is used only to suppress stderr output during the test. Prefixing with underscore signals intentional non-use.

     @patch.dict(os.environ, {}, clear=True)
     @patch('sys.stderr')
-    def test_get_api_key_not_found(self, mock_stderr):
+    def test_get_api_key_not_found(self, _mock_stderr):
         api_key = self.cli._get_api_key('openai')
         self.assertIsNone(api_key)
setup.py (1)

6-6: Minor typo in comment.

The comment has an extra # character.

-# # Try to read requirements from root, fallback to LLM directory
+# Try to read requirements from root, fallback to LLM directory
docs/ISSUE_40_KIMI_K2_IMPLEMENTATION.md (1)

190-194: Add language specifier to fenced code block.

The test output code block (lines 190-194) is missing a language specifier. Add text or plaintext for better markdown rendering.

-```
+```text
 Ran 143 tests in 10.136s
 
 OK (skipped=5)

</blockquote></details>
<details>
<summary>LLM/interpreter.py (1)</summary><blockquote>

`135-136`: **Redundant import of `requests` inside `_call_kimi`.**

The `requests` module is already imported and validated during `_initialize_client()` (line 53). This second import inside `_call_kimi` is unnecessary and the ImportError handling at line 152-153 will never trigger if initialization succeeded.


```diff
         try:
-            import requests
             response = requests.post(
                 f"{self._kimi_base_url.rstrip('/')}/v1/chat/completions",
                 headers=headers,
                 json=payload,
                 timeout=60,
             )

Note: This also means removing the ImportError catch at lines 152-153 since initialization already guarantees requests is available.

cortex/user_preferences.py (5)

201-205: Silent exception swallowing hides initialization errors.

The try-except-pass pattern silently ignores all exceptions during initial load. This can hide configuration corruption or permission issues.

Consider logging the exception or at least catching a more specific exception type:

         try:
             self.load()
-        except Exception:
-            # Ignore load errors here; callers can handle later
-            pass
+        except (IOError, ValueError) as e:
+            # Log but don't fail - callers can handle missing config
+            import logging
+            logging.debug(f"Initial config load skipped: {e}")

328-342: Unused key_parts parameter in _coerce_value_for_set.

The key_parts parameter is defined but never used in the method body. Either remove it or use it to provide context-aware coercion (e.g., knowing the target type based on the key path).

-    def _coerce_value_for_set(self, key_parts: List[str], value: Any) -> Any:
+    def _coerce_value_for_set(self, value: Any) -> Any:
         """Coerce string inputs into appropriate types for set()."""

Then update the call site at line 361:

-            coerced = self._coerce_value_for_set(parts, value)
+            coerced = self._coerce_value_for_set(value)

344-432: set() method has excessive cognitive complexity (56 vs. allowed 15).

This method handles many different preference types with nested conditionals, making it difficult to maintain and test. Consider refactoring into a dispatch pattern or strategy-based approach.

One approach is to use a handler registry:

def set(self, key: str, value: Any) -> bool:
    if self._preferences is None:
        self.load()
    
    parts = key.split(".")
    coerced = self._coerce_value_for_set(value)
    
    handlers = {
        "verbosity": self._set_verbosity,
        "confirmations": self._set_confirmations,
        "auto_update": self._set_auto_update,
        "ai": self._set_ai,
        "packages": self._set_packages,
        "conflicts": self._set_conflicts,
    }
    
    handler = handlers.get(parts[0])
    if handler:
        handler(parts, coerced, value)
    elif parts[0] in ["theme", "language", "timezone"]:
        setattr(self._preferences, parts[0], coerced)
    else:
        raise ValueError(f"Unknown preference key: {key}")
    
    self.save()
    return True

Then extract each branch into a dedicated _set_* method.


434-461: reset() always returns True, even on success paths.

The method signature promises a boolean return indicating success, but it always returns True or raises an exception. This is fine semantically (exceptions for errors, True for success), but SonarCloud flags it because there's no False return path.

If the intent is that errors always raise exceptions, consider changing the return type to None or documenting that exceptions are the error signaling mechanism.


131-166: from_dict has broad exception handling that masks errors.

Lines 141-142 and 151-152 catch bare Exception and silently default to fallback values. This can hide data corruption or invalid enum values without any indication to the caller.

Consider catching the specific ValueError that Enum() raises:

             try:
                 ai_data['creativity'] = AICreativity(ai_data['creativity'])
-            except Exception:
+            except ValueError:
                 ai_data['creativity'] = AICreativity.BALANCED

Similarly for verbosity at lines 149-152.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fe37193 and 9b9a36e.

📒 Files selected for processing (11)
  • LLM/interpreter.py (9 hunks)
  • LLM/requirements.txt (1 hunks)
  • LLM/test_interpreter.py (5 hunks)
  • cortex/cli.py (3 hunks)
  • cortex/user_preferences.py (5 hunks)
  • docs/ISSUE_40_KIMI_K2_IMPLEMENTATION.md (1 hunks)
  • requirements-dev.txt (1 hunks)
  • requirements.txt (1 hunks)
  • setup.py (3 hunks)
  • test/test_cli.py (3 hunks)
  • test/test_coordinator.py (2 hunks)
✅ Files skipped from review due to trivial changes (1)
  • requirements.txt
🚧 Files skipped from review as they are similar to previous changes (1)
  • LLM/requirements.txt
🧰 Additional context used
🧬 Code graph analysis (4)
test/test_cli.py (1)
cortex/cli.py (3)
  • _get_api_key (47-63)
  • _get_provider (31-45)
  • install (84-212)
LLM/test_interpreter.py (1)
LLM/interpreter.py (3)
  • APIProvider (9-15)
  • _call_kimi (118-155)
  • parse (227-246)
test/test_coordinator.py (1)
cortex/coordinator.py (5)
  • InstallationCoordinator (65-350)
  • InstallationStep (40-53)
  • InstallationResult (57-62)
  • StepStatus (31-36)
  • install_docker (353-407)
cortex/cli.py (1)
cortex/user_preferences.py (1)
  • get (288-315)
🪛 GitHub Actions: Cortex Automation
cortex/user_preferences.py

[error] 12-12: ModuleNotFoundError: No module named 'yaml'. PyYAML is not installed; importing 'yaml' fails during test run.

🪛 GitHub Check: SonarCloud Code Analysis
test/test_coordinator.py

[warning] 319-319: Consider using "assertGreater" instead.

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZqDPfPMNlR5npI5tRyC&open=AZqDPfPMNlR5npI5tRyC&pullRequest=192

cortex/user_preferences.py

[warning] 328-328: Remove the unused function parameter "key_parts".

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZrfnHzVPc78BrE-3Apy&open=AZrfnHzVPc78BrE-3Apy&pullRequest=192


[failure] 344-344: Refactor this function to reduce its Cognitive Complexity from 56 to the 15 allowed.

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZrfnHzVPc78BrE-3Apz&open=AZrfnHzVPc78BrE-3Apz&pullRequest=192


[failure] 434-434: Refactor this method to not always return the same value.

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZrfnHzVPc78BrE-3Ap0&open=AZrfnHzVPc78BrE-3Ap0&pullRequest=192

🪛 LanguageTool
docs/ISSUE_40_KIMI_K2_IMPLEMENTATION.md

[style] ~214-~214: This phrase is redundant (‘I’ stands for ‘interface’). Use simply “CLI”.
Context: ... addressed in PR #192) - Issue #11: CLI interface improvements (referenced in commits) - ...

(ACRONYM_TAUTOLOGY)

🪛 markdownlint-cli2 (0.18.1)
docs/ISSUE_40_KIMI_K2_IMPLEMENTATION.md

3-3: Bare URL used

(MD034, no-bare-urls)


4-4: Bare URL used

(MD034, no-bare-urls)


190-190: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


234-234: Bare URL used

(MD034, no-bare-urls)


235-235: Bare URL used

(MD034, no-bare-urls)


236-236: Bare URL used

(MD034, no-bare-urls)

🪛 Ruff (0.14.7)
test/test_cli.py

33-33: Unused method argument: mock_stderr

(ARG002)


80-80: Unused method argument: mock_history

(ARG002)

LLM/interpreter.py

55-55: Avoid specifying long messages outside the exception class

(TRY003)


147-147: Abstract raise to an inner function

(TRY301)


147-147: Avoid specifying long messages outside the exception class

(TRY003)


150-150: Abstract raise to an inner function

(TRY301)


150-150: Avoid specifying long messages outside the exception class

(TRY003)


153-153: Avoid specifying long messages outside the exception class

(TRY003)


155-155: Avoid specifying long messages outside the exception class

(TRY003)


155-155: Use explicit conversion flag

Replace with conversion flag

(RUF010)


165-165: Avoid specifying long messages outside the exception class

(TRY003)


167-167: Avoid specifying long messages outside the exception class

(TRY003)


230-230: Avoid specifying long messages outside the exception class

(TRY003)

cortex/user_preferences.py

141-141: Do not catch blind exception: Exception

(BLE001)


151-151: Do not catch blind exception: Exception

(BLE001)


203-205: try-except-pass detected, consider logging the exception

(S110)


203-203: Do not catch blind exception: Exception

(BLE001)


226-226: Consider moving this statement to an else block

(TRY300)


227-227: Do not catch blind exception: Exception

(BLE001)


228-228: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


228-228: Avoid specifying long messages outside the exception class

(TRY003)


228-228: Use explicit conversion flag

Replace with conversion flag

(RUF010)


250-250: Consider moving this statement to an else block

(TRY300)


253-253: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


253-253: Avoid specifying long messages outside the exception class

(TRY003)


253-253: Use explicit conversion flag

Replace with conversion flag

(RUF010)


254-254: Do not catch blind exception: Exception

(BLE001)


255-255: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


255-255: Avoid specifying long messages outside the exception class

(TRY003)


255-255: Use explicit conversion flag

Replace with conversion flag

(RUF010)


283-283: Consider moving this statement to an else block

(TRY300)


285-285: Do not catch blind exception: Exception

(BLE001)


286-286: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


286-286: Avoid specifying long messages outside the exception class

(TRY003)


286-286: Use explicit conversion flag

Replace with conversion flag

(RUF010)


328-328: Unused method argument: key_parts

(ARG002)


370-370: Do not catch blind exception: Exception

(BLE001)


371-371: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


371-371: Avoid specifying long messages outside the exception class

(TRY003)


375-375: Abstract raise to an inner function

(TRY301)


375-375: Avoid specifying long messages outside the exception class

(TRY003)


377-377: Abstract raise to an inner function

(TRY301)


377-377: Avoid specifying long messages outside the exception class

(TRY003)


382-382: Abstract raise to an inner function

(TRY301)


382-382: Avoid specifying long messages outside the exception class

(TRY003)


385-385: Abstract raise to an inner function

(TRY301)


385-385: Avoid specifying long messages outside the exception class

(TRY003)


387-387: Abstract raise to an inner function

(TRY301)


387-387: Avoid specifying long messages outside the exception class

(TRY003)


392-392: Abstract raise to an inner function

(TRY301)


392-392: Avoid specifying long messages outside the exception class

(TRY003)


399-399: Do not catch blind exception: Exception

(BLE001)


400-400: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


400-400: Avoid specifying long messages outside the exception class

(TRY003)


403-403: Abstract raise to an inner function

(TRY301)


403-403: Avoid specifying long messages outside the exception class

(TRY003)


410-410: Abstract raise to an inner function

(TRY301)


410-410: Avoid specifying long messages outside the exception class

(TRY003)


412-412: Abstract raise to an inner function

(TRY301)


412-412: Avoid specifying long messages outside the exception class

(TRY003)


417-417: Abstract raise to an inner function

(TRY301)


417-417: Avoid specifying long messages outside the exception class

(TRY003)


419-419: Abstract raise to an inner function

(TRY301)


419-419: Avoid specifying long messages outside the exception class

(TRY003)


426-426: Abstract raise to an inner function

(TRY301)


426-426: Avoid specifying long messages outside the exception class

(TRY003)


430-430: Consider moving this statement to an else block

(TRY300)


432-432: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


432-432: Avoid specifying long messages outside the exception class

(TRY003)


432-432: Use explicit conversion flag

Replace with conversion flag

(RUF010)


457-457: Avoid specifying long messages outside the exception class

(TRY003)


539-539: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (18)
LLM/test_interpreter.py (4)

15-24: LGTM!

The setUp correctly patches sys.modules to stub external dependencies and ensures cleanup via addCleanup.


49-54: LGTM!

The Kimi provider initialization test correctly verifies the default model kimi-k2-turbo-preview is set when no KIMI_DEFAULT_MODEL environment variable is present.


56-72: LGTM!

The test properly verifies successful Kimi API calls, including the Authorization header format and command parsing.


74-84: LGTM!

The failure test correctly verifies that HTTP errors are caught and re-raised as RuntimeError.

test/test_coordinator.py (1)

1-16: LGTM on import updates.

The added imports for Mock, patch, call, InstallationResult, and install_docker are appropriate for the test file's needs.

test/test_cli.py (3)

26-29: LGTM!

Good addition of the KIMI provider API key test, maintaining consistency with the other provider tests.


47-55: LGTM!

The KIMI provider detection and provider override tests correctly verify the _get_provider() logic for KIMI and the CORTEX_PROVIDER override behavior.


77-88: LGTM!

This test correctly verifies that the fake provider path bypasses API key validation and proceeds with command generation.

Note: The mock_history parameter (line 80) is unused but necessary to prevent InstallationHistory from performing file I/O. Consider prefixing with underscore (_mock_history) to signal intentional non-use.

setup.py (3)

15-15: LGTM on fallback requirements.

The fallback requirements list appropriately includes requests>=2.32.4 to support the new Kimi provider HTTP integration when no requirements file exists.


35-42: Verify Python 3.8 and 3.9 compatibility.

The minimum Python version was lowered from 3.10 to 3.8. Ensure the codebase doesn't use Python 3.10+ features (e.g., match statements, union types with | syntax).


33-33: Verify license change is intentional.

The license classifier was changed from Apache Software License to MIT License. Ensure this change is intentional and that the LICENSE file in the repository reflects this update.

requirements-dev.txt (1)

10-10: Version requests>=2.32.4 is valid.

The requests>=2.32.4 dependency is a released PyPI package (includes security fix for CVE-2024-47081). However, verify that this dependency is not already included in the root requirements.txt referenced at line 4 to avoid duplication.

cortex/cli.py (4)

47-63: LGTM! API key retrieval is clean and well-structured.

The provider-to-env-var mapping is correct and the error messaging guides users appropriately. Returning None for unknown providers (line 57) is acceptable since those would be handled elsewhere or caught in validation.


87-92: Fake provider handling is correct.

The special case for fake provider using CORTEX_FAKE_API_KEY or the default 'fake-api-key' is appropriate for testing scenarios. The early return on line 92 when the API key is missing prevents further execution with invalid credentials.


536-537: Documentation updates are helpful for discoverability.

Adding KIMI_API_KEY and CORTEX_PROVIDER to the CLI help text ensures users can discover these options via --help.


31-45: Provider detection logic looks good, but consider validating the CORTEX_PROVIDER override.

The logic correctly prioritizes CORTEX_PROVIDER override, then falls back to detecting keys in order: OpenAI → Claude → Kimi → Fake. The fallback to 'openai' at line 45 is reasonable as a default when no keys or override are found.

If CORTEX_PROVIDER is set to an invalid value (e.g., "groq"), it will be passed through and fail later in CommandInterpreter. Consider validating the override value against supported providers here to provide earlier, clearer feedback to users.

LLM/interpreter.py (2)

51-61: Kimi initialization correctly validates the requests dependency.

The try/except for the requests import provides a clear error message if the package is missing. Setting self.client = None for the FAKE provider is appropriate.


157-187: Fake provider implementation is well-designed for testing.

The implementation:

  • Supports environment-driven commands via CORTEX_FAKE_COMMANDS
  • Validates JSON structure properly
  • Provides sensible keyword-based defaults for common test cases
  • Has a fallback for unknown inputs

This enables deterministic testing without API costs.

Copilot AI review requested due to automatic review settings December 2, 2025 15:18
@Sahilbhatane
Copy link
Collaborator Author

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
docs/ISSUE_40_KIMI_K2_IMPLEMENTATION.md (1)

25-26: Kimi base URL documented inconsistently across sections

Earlier sections state the default KIMI_API_BASE_URL as https://api.moonshot.ai (and the advanced config example uses that), but the “API Request Format” section uses https://api.moonshot.cn/v1/chat/completions.

Please pick the correct default (ideally whatever CommandInterpreter actually uses by default) and make all occurrences consistent, or explicitly call out that one is the default and the other is only an example.

Also applies to: 48-49, 106-111, 129-145

🧹 Nitpick comments (7)
LLM/test_interpreter.py (3)

49-55: Avoid clearing the entire environment in Kimi init test

@patch.dict(os.environ, {}, clear=True) guarantees a clean slate for Kimi-related env vars, but it also temporarily removes all environment variables for this test, which can cause surprising behavior if CommandInterpreter or any imported code ever depends on unrelated env settings (e.g., HOME, PATH). It’s safer to patch only the keys you care about.

Consider:

-    @patch.dict(os.environ, {}, clear=True)
+    @patch.dict(os.environ, {
+        "KIMI_API_KEY": "",
+        "KIMI_API_BASE_URL": "",
+        "KIMI_DEFAULT_MODEL": "",
+    }, clear=False)

This keeps non-Kimi env intact while still testing default Kimi behavior.


274-341: Integration tests are well-gated; custom-model env handling now safe

The TestKimiK2Integration class:

  • Uses @unittest.skipUnless(RUN_KIMI_INTEGRATION_TESTS == "1") to keep real Kimi calls out of normal CI.
  • Validates basic, complex, and validated-command flows against the live API.
  • Uses @patch.dict(os.environ, {'KIMI_DEFAULT_MODEL': 'kimi-k2-0905-preview'}) so the custom-model test no longer leaks env state after failures (addressing the earlier cleanup concern).

Two minor suggestions:

  1. Since these are external-network integration tests, consider moving them to a dedicated integration module/package to keep this unit-test file focused on pure unit tests.
  2. The doc file currently describes the default Kimi model as kimi-k2, while these tests (and likely the code) use kimi-k2-turbo-preview as the default/custom choice; please align the documentation with the actual default behavior.

Both are non-blocking; current structure is functionally correct.


56-73: Kimi success test is solid; asserting request URL and payload structure is a valuable strengthening—Kimi API is stable and OpenAI-compatible

The success path correctly verifies:

  • Parsed commands from _call_kimi
  • That requests.post is called
  • Presence and value of the Authorization header

The suggested assertion for the HTTP contract is well-founded. Kimi's API uses a documented, stable OpenAI-compatible endpoint at /v1/chat/completions with standard payload structure (model field, messages array). Adding assertions for the URL path and key JSON fields will reliably catch regressions in _call_kimi's request shape without becoming brittle:

call_args, call_kwargs = mock_post.call_args
self.assertIn("/v1/chat/completions", call_args[0])
self.assertEqual(call_kwargs["json"]["model"], interpreter.model)
self.assertIn("messages", call_kwargs["json"])

This strengthens test coverage by locking in the Kimi API contract, which is officially documented and widely implemented by third-party integrators.

docs/ISSUE_40_KIMI_K2_IMPLEMENTATION.md (4)

189-194: Add language hint to fenced code in Testing Results

The test output block uses a bare fenced code block. To satisfy markdownlint (MD040) and improve rendering, you can add a language, e.g.:

-```
+```text
 Ran 143 tests in 10.136s

 OK (skipped=5)

3-6: Markdownlint nits: bare URLs and minor wording

A few small polish items from static analysis:

  • Replace “CLI interface improvements” with just “CLI improvements” to avoid tautology.
  • Wrap bare URLs in Markdown links or <...> to satisfy MD034 (e.g., [Moonshot AI Platform](https://platform.moonshot.ai/)).

These are purely cosmetic and can be deferred.

Also applies to: 83-87, 190-196, 234-236


5-7: Status/Date fields may need updating when PR merges

The header states Status: ✅ Implemented and a specific “Date Completed”, but this PR is still open. To avoid confusion for future readers, consider phrasing this as “Planned/Implemented in PR #192 (pending merge)” or updating the status/date when the PR actually lands.


50-55: Consider updating requests version constraint and documenting versioning approach

The CVE information for requests>=2.32.4 is accurate (all three CVEs are properly fixed at that version). However, the latest requests release is 2.32.5 (August 2025), making the constraint outdated within months. To make this section age more gracefully, consider:

  • Updating to requests>=2.32.5 for the latest patch, or
  • Noting "(minimum secure version as of [date])" and/or linking to upstream security advisories rather than hard-coding CVE IDs inline.

This ensures the documentation remains correct without requiring constant updates as newer patch versions are released.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9b9a36e and e0faa7b.

📒 Files selected for processing (2)
  • LLM/test_interpreter.py (5 hunks)
  • docs/ISSUE_40_KIMI_K2_IMPLEMENTATION.md (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
LLM/test_interpreter.py (1)
LLM/interpreter.py (3)
  • APIProvider (9-15)
  • _call_kimi (118-155)
  • parse (227-246)
🪛 LanguageTool
docs/ISSUE_40_KIMI_K2_IMPLEMENTATION.md

[style] ~214-~214: This phrase is redundant (‘I’ stands for ‘interface’). Use simply “CLI”.
Context: ... addressed in PR #192) - Issue #11: CLI interface improvements (referenced in commits) - ...

(ACRONYM_TAUTOLOGY)

🪛 markdownlint-cli2 (0.18.1)
docs/ISSUE_40_KIMI_K2_IMPLEMENTATION.md

3-3: Bare URL used

(MD034, no-bare-urls)


4-4: Bare URL used

(MD034, no-bare-urls)


190-190: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


234-234: Bare URL used

(MD034, no-bare-urls)


235-235: Bare URL used

(MD034, no-bare-urls)


236-236: Bare URL used

(MD034, no-bare-urls)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Agent
🔇 Additional comments (3)
LLM/test_interpreter.py (3)

3-6: Sys-module stubbing for OpenAI/Anthropic looks correct and contained

Using SimpleNamespace plus patch.dict(sys.modules, ...) in setUp cleanly isolates tests from real openai/anthropic installs and avoids ImportError, and addCleanup(self.sys_modules_patcher.stop) ensures restoration after each test. This pattern is sound for these unit tests.

Also applies to: 15-24


26-31: OpenAI default model assertion aligns with intended behavior

Changing the expected default model to "gpt-4o" in test_initialization_openai matches the stated best-practice default elsewhere in the PR and keeps this test tightly coupled to the current interpreter default. No issues here.


74-85: Kimi failure path test matches _call_kimi error behavior

Simulating a 401 via raise_for_status.side_effect and asserting a RuntimeError from _call_kimi correctly exercises the generic exception-wrapping logic. This gives good coverage of the negative path without overfitting on the error message.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR successfully integrates Moonshot AI's Kimi K2 as a new LLM provider for Cortex, expanding multi-LLM capabilities alongside OpenAI and Anthropic. It also introduces a FAKE provider for cost-free testing and adds comprehensive Docker-based integration tests.

Key Changes

  • Provider Integration: Added native Kimi K2 support with HTTP-based API calls, configurable via KIMI_API_KEY environment variable
  • Testing Infrastructure: Introduced FAKE provider for offline testing and Docker-based integration test suite with helper utilities
  • Documentation: Added comprehensive testing strategy guide, contribution guidelines, and Kimi K2 implementation reference documentation

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 16 comments.

Show a summary per file
File Description
test/test_coordinator.py Removed obsolete test methods and added unused imports that should be cleaned up
test/test_cli.py Added Kimi and FAKE provider test coverage for API key and provider detection
test/integration/test_end_to_end.py New Docker-based integration tests for end-to-end CLI workflows
test/integration/docker_utils.py Docker container execution utilities for integration testing
test/integration/__init__.py Integration test package initialization
LLM/test_interpreter.py Added Kimi K2 integration tests with real API call validation (conditionally skipped)
LLM/interpreter.py Core Kimi K2 and FAKE provider implementation with API call methods
cortex/cli.py Enhanced provider detection with override support and provider-aware API key handling
cortex/user_preferences.py Major refactoring with improved validation and reduced language support list
setup.py Added requests dependency, updated Python version support (3.8+), corrected license
requirements.txt Added requests and PyYAML dependencies
requirements-dev.txt Added requests to dev dependencies
LLM/requirements.txt Added requests for Kimi HTTP client
test.md Comprehensive testing strategy documentation with Docker requirements
contribution.md New contributor guidelines with coding standards and workflow
docs/ISSUE_40_KIMI_K2_IMPLEMENTATION.md Detailed implementation reference with API examples and configuration
Comments suppressed due to low confidence (2)

test/test_coordinator.py:2

  • Import of 'call' is not used.
from unittest.mock import Mock, patch, call

test/test_coordinator.py:5

  • Import of 'time' is not used.
import time

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
docs/ISSUE_40_KIMI_K2_IMPLEMENTATION.md (1)

21-22: Default Kimi model in documentation doesn't match code.

The documentation states kimi-k2 as the default model (lines 21, 48, 138), but the code in LLM/interpreter.py uses kimi-k2-turbo-preview as the actual default (line 268). This inconsistency may confuse users.

Update the documentation to consistently use kimi-k2-turbo-preview or change the code default if kimi-k2 is the intended value.

Also applies to: 48-48, 138-138

🧹 Nitpick comments (9)
docs/ISSUE_40_KIMI_K2_IMPLEMENTATION.md (1)

190-194: Add language specifier to fenced code block.

The test results code block is missing a language specifier which aids syntax highlighting and accessibility.

-```
+```text
 Ran 143 tests in 10.136s

 OK (skipped=5)

</blockquote></details>
<details>
<summary>cortex/user_preferences.py (3)</summary><blockquote>

`328-342`: **Remove unused `key_parts` parameter.**

The `key_parts` parameter is never used in `_coerce_value_for_set()`. Either remove it or implement the intended type coercion logic based on the key path.


```diff
-    def _coerce_value_for_set(self, key_parts: List[str], value: Any) -> Any:
+    def _coerce_value_for_set(self, value: Any) -> Any:
         """Coerce string inputs into appropriate types for set()."""

And update the call site at line 361:

-            coerced = self._coerce_value_for_set(parts, value)
+            coerced = self._coerce_value_for_set(value)

202-206: Silent exception swallowing may hide configuration issues.

The try-except-pass block silently ignores all load errors during initialization. This could mask configuration file corruption or permission issues that users should be aware of.

Consider logging the exception or storing the error state:

         try:
             self.load()
         except Exception:
-            # Ignore load errors here; callers can handle later
-            pass
+            # Store the error; callers can check or load() will re-raise
+            self._preferences = UserPreferences()

Alternatively, if errors truly should be ignored, at least log them at debug level for troubleshooting.


252-256: Chain exceptions to preserve traceback context.

When re-raising exceptions, use raise ... from e to preserve the original traceback for easier debugging.

         except yaml.YAMLError as e:
-            raise ValueError(f"Invalid YAML in config file: {str(e)}")
+            raise ValueError(f"Invalid YAML in config file: {e!s}") from e
         except Exception as e:
-            raise IOError(f"Failed to load config file: {str(e)}")
+            raise IOError(f"Failed to load config file: {e!s}") from e

The same pattern applies to other exception handlers in save() (line 286), _create_backup() (line 228), and set() (line 432).

test/integration/test_end_to_end.py (1)

107-110: Potential fragile assertion on last output line.

The assertion self.assertIn("OK", result.stdout.splitlines()[-1]) will raise an IndexError if stdout is empty. Consider adding a guard or using a more robust check.

         result = self._run("python test/run_all_tests.py", env=env)

         self.assertTrue(result.succeeded(), msg=result.stderr)
-        self.assertIn("OK", result.stdout.splitlines()[-1])
+        lines = result.stdout.splitlines()
+        self.assertTrue(lines, "Expected test output but stdout was empty")
+        self.assertIn("OK", lines[-1])
LLM/test_interpreter.py (1)

56-72: Mock response missing raise_for_status stub.

The mock response in test_call_kimi_success doesn't stub raise_for_status(). While this works because Mock() returns another mock by default, explicitly stubbing it ensures the test intent is clear and protects against future changes.

         mock_response = Mock()
         mock_response.status_code = 200
+        mock_response.raise_for_status = Mock()  # Explicitly stub to avoid accidental side effects
         mock_response.json.return_value = {
             "choices": [{"message": {"content": '{"commands": ["apt update", "apt install curl"]}'}}]
         }
LLM/interpreter.py (1)

136-137: Redundant requests import inside _call_kimi.

The requests module is already imported (and validated) during _initialize_client() for the KIMI provider (lines 52-58). The import at line 136 is redundant and adds unnecessary overhead on each API call.

         try:
-            import requests
             response = requests.post(

Since self.client is set to the requests module in _initialize_client(), you can use self.client.post(...) instead for consistency:

-            import requests
-            response = requests.post(
+            response = self.client.post(
test/integration/docker_utils.py (2)

24-50: docker_available() is robust; subprocess security lint is low risk here

The availability check via docker --version and docker info with small timeouts and shell=False is solid for a test helper. Ruff’s S603 warnings here are low risk since arguments are constant and not user-controlled; if those warnings are noisy in CI, you can either # noqa: S603 these calls or disable that rule for test modules.


53-106: run_in_docker helper looks good; minor f-string / lint tweak and trust boundary note

The argv-based docker invocation plus bash -lc inside the container is appropriate for an integration-test harness and avoids host-side shell injection; just ensure callers only pass trusted command strings since they execute with full shell semantics in the container. To address the RUF010 hint and avoid str() inside an f-string, you can simplify the volume mount construction:

-    for host_path, container_path in mounts or []:
-        docker_cmd.extend([
-            "-v",
-            f"{str(host_path.resolve())}:{container_path}",
-        ])
+    for host_path, container_path in mounts or []:
+        docker_cmd.extend(
+            [
+                "-v",
+                f"{host_path.resolve()!s}:{container_path}",
+            ]
+        )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e0faa7b and 9746f92.

📒 Files selected for processing (7)
  • LLM/interpreter.py (9 hunks)
  • LLM/test_interpreter.py (5 hunks)
  • cortex/user_preferences.py (5 hunks)
  • docs/ISSUE_40_KIMI_K2_IMPLEMENTATION.md (1 hunks)
  • test/integration/docker_utils.py (1 hunks)
  • test/integration/test_end_to_end.py (1 hunks)
  • test/test_coordinator.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/test_coordinator.py
🧰 Additional context used
🧬 Code graph analysis (1)
LLM/test_interpreter.py (1)
LLM/interpreter.py (3)
  • APIProvider (9-15)
  • _call_kimi (118-153)
  • parse (225-244)
🪛 LanguageTool
docs/ISSUE_40_KIMI_K2_IMPLEMENTATION.md

[style] ~214-~214: This phrase is redundant (‘I’ stands for ‘interface’). Use simply “CLI”.
Context: ... addressed in PR #192) - Issue #11: CLI interface improvements (referenced in commits) - ...

(ACRONYM_TAUTOLOGY)

🪛 markdownlint-cli2 (0.18.1)
docs/ISSUE_40_KIMI_K2_IMPLEMENTATION.md

3-3: Bare URL used

(MD034, no-bare-urls)


4-4: Bare URL used

(MD034, no-bare-urls)


190-190: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


234-234: Bare URL used

(MD034, no-bare-urls)


235-235: Bare URL used

(MD034, no-bare-urls)


236-236: Bare URL used

(MD034, no-bare-urls)

🪛 Ruff (0.14.7)
LLM/interpreter.py

55-55: Avoid specifying long messages outside the exception class

(TRY003)


147-147: Abstract raise to an inner function

(TRY301)


147-147: Avoid specifying long messages outside the exception class

(TRY003)


150-150: Abstract raise to an inner function

(TRY301)


150-150: Avoid specifying long messages outside the exception class

(TRY003)


153-153: Avoid specifying long messages outside the exception class

(TRY003)


153-153: Use explicit conversion flag

Replace with conversion flag

(RUF010)


163-163: Avoid specifying long messages outside the exception class

(TRY003)


165-165: Avoid specifying long messages outside the exception class

(TRY003)


228-228: Avoid specifying long messages outside the exception class

(TRY003)

test/integration/docker_utils.py

32-32: subprocess call: check for execution of untrusted input

(S603)


40-40: subprocess call: check for execution of untrusted input

(S603)


48-48: Consider moving this statement to an else block

(TRY300)


89-89: Use explicit conversion flag

Replace with conversion flag

(RUF010)


97-97: subprocess call: check for execution of untrusted input

(S603)

cortex/user_preferences.py

142-142: Do not catch blind exception: Exception

(BLE001)


152-152: Do not catch blind exception: Exception

(BLE001)


204-206: try-except-pass detected, consider logging the exception

(S110)


204-204: Do not catch blind exception: Exception

(BLE001)


226-226: Consider moving this statement to an else block

(TRY300)


227-227: Do not catch blind exception: Exception

(BLE001)


228-228: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


228-228: Avoid specifying long messages outside the exception class

(TRY003)


228-228: Use explicit conversion flag

Replace with conversion flag

(RUF010)


250-250: Consider moving this statement to an else block

(TRY300)


253-253: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


253-253: Avoid specifying long messages outside the exception class

(TRY003)


253-253: Use explicit conversion flag

Replace with conversion flag

(RUF010)


254-254: Do not catch blind exception: Exception

(BLE001)


255-255: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


255-255: Avoid specifying long messages outside the exception class

(TRY003)


255-255: Use explicit conversion flag

Replace with conversion flag

(RUF010)


283-283: Consider moving this statement to an else block

(TRY300)


285-285: Do not catch blind exception: Exception

(BLE001)


286-286: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


286-286: Avoid specifying long messages outside the exception class

(TRY003)


286-286: Use explicit conversion flag

Replace with conversion flag

(RUF010)


328-328: Unused method argument: key_parts

(ARG002)


370-370: Do not catch blind exception: Exception

(BLE001)


371-371: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


371-371: Avoid specifying long messages outside the exception class

(TRY003)


375-375: Abstract raise to an inner function

(TRY301)


375-375: Avoid specifying long messages outside the exception class

(TRY003)


377-377: Abstract raise to an inner function

(TRY301)


377-377: Avoid specifying long messages outside the exception class

(TRY003)


382-382: Abstract raise to an inner function

(TRY301)


382-382: Avoid specifying long messages outside the exception class

(TRY003)


385-385: Abstract raise to an inner function

(TRY301)


385-385: Avoid specifying long messages outside the exception class

(TRY003)


387-387: Abstract raise to an inner function

(TRY301)


387-387: Avoid specifying long messages outside the exception class

(TRY003)


392-392: Abstract raise to an inner function

(TRY301)


392-392: Avoid specifying long messages outside the exception class

(TRY003)


399-399: Do not catch blind exception: Exception

(BLE001)


400-400: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


400-400: Avoid specifying long messages outside the exception class

(TRY003)


403-403: Abstract raise to an inner function

(TRY301)


403-403: Avoid specifying long messages outside the exception class

(TRY003)


410-410: Abstract raise to an inner function

(TRY301)


410-410: Avoid specifying long messages outside the exception class

(TRY003)


412-412: Abstract raise to an inner function

(TRY301)


412-412: Avoid specifying long messages outside the exception class

(TRY003)


417-417: Abstract raise to an inner function

(TRY301)


417-417: Avoid specifying long messages outside the exception class

(TRY003)


419-419: Abstract raise to an inner function

(TRY301)


419-419: Avoid specifying long messages outside the exception class

(TRY003)


426-426: Abstract raise to an inner function

(TRY301)


426-426: Avoid specifying long messages outside the exception class

(TRY003)


430-430: Consider moving this statement to an else block

(TRY300)


432-432: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


432-432: Avoid specifying long messages outside the exception class

(TRY003)


432-432: Use explicit conversion flag

Replace with conversion flag

(RUF010)


457-457: Avoid specifying long messages outside the exception class

(TRY003)


539-539: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (5)
test/integration/test_end_to_end.py (1)

27-37: Add return type annotation to _run method.

The method has parameter type hints but is missing the return type annotation.

-    def _run(self, command: str, env: dict | None = None) -> DockerRunResult:
+    def _run(self, command: str, env: dict | None = None) -> DockerRunResult:

The signature already includes -> DockerRunResult so this appears to be addressed.

LLM/test_interpreter.py (1)

274-285: LGTM - Integration test class is well-guarded.

The skip decorator with RUN_KIMI_INTEGRATION_TESTS environment variable check and the setUp method's skipTest for missing API key provide appropriate guards for real API tests. The docstring correctly shows the unittest command.

LLM/interpreter.py (2)

155-166: LGTM - _call_fake correctly returns commands directly.

The implementation properly validates that commands is a list and returns it directly, avoiding the redundant JSON parsing that was previously flagged.


260-269: LGTM - Default model resolution is well-implemented.

The _default_model method correctly provides current model defaults for each provider, with the Kimi provider supporting environment variable override via KIMI_DEFAULT_MODEL. The OpenAI default has been updated to gpt-4o addressing the previous deprecation concern.

test/integration/docker_utils.py (1)

11-21: DockerRunResult dataclass is a clean abstraction

The dataclass and succeeded() helper neatly wrap subprocess results; no changes needed.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
LLM/interpreter.py (1)

30-34: Use _default_model() in __init__ to fix incorrect defaults for KIMI/FAKE

Right now, __init__ hard-codes default models:

if model:
    self.model = model
else:
    self.model = "gpt-4" if self.provider == APIProvider.OPENAI else "claude-sonnet-4-20250514"

You’ve introduced _default_model() with provider-specific defaults (including KIMI and FAKE), but __init__ doesn’t call it. As a result:

  • KIMI and FAKE instances created without an explicit model will incorrectly get "claude-sonnet-4-20250514" (the else branch), never using the KIMI/FAKE defaults defined in _default_model().
  • There are now two divergent sources of truth for default models, which is brittle.

This is a functional bug for KIMI/FAKE defaults and a maintainability issue overall. Suggest centralizing on _default_model():

     def __init__(
         self,
         api_key: str,
         provider: str = "openai",
         model: Optional[str] = None
     ):
         self.api_key = api_key
         self.provider = APIProvider(provider.lower())
-        
-        if model:
-            self.model = model
-        else:
-            self.model = "gpt-4" if self.provider == APIProvider.OPENAI else "claude-sonnet-4-20250514"
-        
+
+        # Use provider-specific defaults, overridden by explicit `model` if provided.
+        self.model = model or self._default_model()
+
         self._initialize_client()

With this, KIMI and FAKE will correctly pick up "kimi-k2-turbo-preview" (or KIMI_DEFAULT_MODEL) and "fake-local-model" when no explicit model is passed, and there’s a single place to adjust defaults going forward.

Also applies to: 260-269

🧹 Nitpick comments (1)
LLM/interpreter.py (1)

37-62: Unify KIMI client setup and HTTP call; avoid re-importing requests

For the KIMI path, _initialize_client() imports requests, assigns it to self.client, and sets _kimi_base_url, but _call_kimi() re-imports requests and doesn’t use self.client. This duplication slightly complicates testability and makes the client setup pattern inconsistent with the other providers.

You can simplify and make this more consistent by relying on self.client in _call_kimi() and dropping the second import:

     def _call_kimi(self, user_input: str) -> List[str]:
         """Call the Kimi K2 HTTP API and parse the response body."""
@@
-        try:
-            import requests
-            response = requests.post(
+        try:
+            response = self.client.post(
                 f"{self._kimi_base_url.rstrip('/')}/v1/chat/completions",
                 headers=headers,
                 json=payload,
                 timeout=60,
             )

(Optionally, you could make self.client a requests.Session() in _initialize_client() if you want connection pooling, but that’s not required for correctness.)

Also, please double-check that the endpoint path (/v1/chat/completions) and response shape (choices[0].message.content) still match the current Kimi K2 API.

Also applies to: 118-154

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9746f92 and f8038b9.

📒 Files selected for processing (1)
  • LLM/interpreter.py (9 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
LLM/interpreter.py (1)
cortex/user_preferences.py (2)
  • get (288-315)
  • validate (463-500)
🪛 Ruff (0.14.7)
LLM/interpreter.py

55-55: Avoid specifying long messages outside the exception class

(TRY003)


147-147: Abstract raise to an inner function

(TRY301)


147-147: Avoid specifying long messages outside the exception class

(TRY003)


150-150: Abstract raise to an inner function

(TRY301)


150-150: Avoid specifying long messages outside the exception class

(TRY003)


153-153: Avoid specifying long messages outside the exception class

(TRY003)


153-153: Use explicit conversion flag

Replace with conversion flag

(RUF010)


163-163: Avoid specifying long messages outside the exception class

(TRY003)


165-165: Avoid specifying long messages outside the exception class

(TRY003)


228-228: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (2)
LLM/interpreter.py (2)

10-15: Enum additions for KIMI/FAKE look consistent with the rest of the interpreter

The new APIProvider.KIMI and APIProvider.FAKE members align with the branches in _initialize_client and parse(), and the string values match the provider argument usage. No changes needed here.


64-80: Docstrings and high-level flow for parsing/validation are clear

The added/updated docstrings around _get_system_prompt, provider-specific call methods, _parse_commands, _validate_commands, parse, and parse_with_context accurately describe the behavior and make the NL→commands pipeline much easier to follow. No issues here.

Also applies to: 82-98, 101-116, 187-203, 205-223, 225-244, 246-258

@dhvll
Copy link
Collaborator

dhvll commented Dec 5, 2025

I saw both of your video. There is not clear showcasing of kimi usage. Have you even use api key to test if your code is working. Create a separate video just showcase kimi usage or how you tested on your system. Testing as in not running test files fyi, proper usage.

@Sahilbhatane
Copy link
Collaborator Author

Sahilbhatane commented Dec 5, 2025

I can pull up echo command to show API is in environment, or I can add show short function in CLI to show which LLM is getting used for command execution.
Which method would you like?
2nd one can be converted into a proper cortex command like "cortex check-LLM",
if 1st then I will only add screenshot making video for it is not necessary, for 2nd I can make video showing function and the execution. I will also show API rests in environment through echo command in video 👍🏻

@dhvll
Copy link
Collaborator

dhvll commented Dec 5, 2025

Echo is more like print statement.
Show your implementation using postman.
I am not asking if api is in environment or where your code is I can read codebase. I am asking how you know your kimi implementation is working.
Add recorded video under this comment, only kimi api usage don't add extras unnecessary stuff.

@Sahilbhatane
Copy link
Collaborator Author

@mikejmorgan-ai
Copy link
Member

@dhvll Could you review this PR? Would appreciate your testing perspective on the Kimi K2 provider. Thanks!

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
LLM/interpreter.py (1)

31-40: Missing default model assignment for KIMI and FAKE providers.

When model is not explicitly provided and provider is "kimi" or "fake", self.model is never assigned. This causes an AttributeError in _call_kimi (line 136) when building the payload.

Apply this diff to add default models for the new providers:

         if model:
             self.model = model
         else:
             if self.provider == APIProvider.OPENAI:
                 self.model = "gpt-4"
             elif self.provider == APIProvider.CLAUDE:
                 self.model = "claude-sonnet-4-20250514"
+            elif self.provider == APIProvider.KIMI:
+                self.model = os.environ.get("KIMI_DEFAULT_MODEL", "kimi-k2-turbo-preview")
+            elif self.provider == APIProvider.FAKE:
+                self.model = "fake-local-model"
             elif self.provider == APIProvider.OLLAMA:
                 self.model = "llama3.2"  # Default Ollama model

Alternatively, consolidate this logic by calling self._default_model() (defined at line 306) instead of duplicating defaults.

♻️ Duplicate comments (1)
LLM/interpreter.py (1)

169-176: Missing key existence check before accessing data["commands"].

If the JSON in CORTEX_FAKE_COMMANDS is valid but lacks the "commands" key, line 174 raises a bare KeyError instead of a descriptive ValueError.

This was flagged in a previous review. Add an explicit key check:

             try:
                 data = json.loads(payload)
             except json.JSONDecodeError as exc:
                 raise ValueError("CORTEX_FAKE_COMMANDS must contain valid JSON") from exc
+            if "commands" not in data:
+                raise ValueError("CORTEX_FAKE_COMMANDS must contain a 'commands' key")
             if not isinstance(data["commands"], list):  
                 raise ValueError("'commands' must be a list in CORTEX_FAKE_COMMANDS")  
             return data["commands"]
🧹 Nitpick comments (2)
LLM/interpreter.py (2)

145-152: Redundant requests import; use the stored client instead.

requests is already imported and stored in self.client during _initialize_client() (line 63). The re-import here is unnecessary.

         try:
-            import requests
-            response = requests.post(
+            response = self.client.post(
                 f"{self._kimi_base_url.rstrip('/')}/v1/chat/completions",
                 headers=headers,
                 json=payload,
                 timeout=60,
             )

306-315: _default_model() is defined but never called.

This method duplicates the default model logic already present in __init__ (lines 34-39) but is never invoked. This creates maintenance burden and potential drift between the two locations.

Either:

  1. Remove _default_model() if it was added speculatively, or
  2. Refactor __init__ to use it:
         if model:
             self.model = model
         else:
-            if self.provider == APIProvider.OPENAI:
-                self.model = "gpt-4"
-            elif self.provider == APIProvider.CLAUDE:
-                self.model = "claude-sonnet-4-20250514"
-            elif self.provider == APIProvider.OLLAMA:
-                self.model = "llama3.2"  # Default Ollama model
+            self.model = self._default_model()

Note: This requires moving the method definition before __init__ or deferring the call until after full initialization.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0b32082 and 791427e.

📒 Files selected for processing (3)
  • LLM/interpreter.py (10 hunks)
  • cortex/cli.py (3 hunks)
  • requirements.txt (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • requirements.txt
🧰 Additional context used
🧬 Code graph analysis (1)
cortex/cli.py (1)
cortex/branding.py (1)
  • cx_print (55-75)
🪛 Ruff (0.14.7)
LLM/interpreter.py

61-61: Avoid specifying long messages outside the exception class

(TRY003)


157-157: Abstract raise to an inner function

(TRY301)


157-157: Avoid specifying long messages outside the exception class

(TRY003)


160-160: Abstract raise to an inner function

(TRY301)


160-160: Avoid specifying long messages outside the exception class

(TRY003)


163-163: Avoid specifying long messages outside the exception class

(TRY003)


163-163: Use explicit conversion flag

Replace with conversion flag

(RUF010)


173-173: Avoid specifying long messages outside the exception class

(TRY003)


175-175: Avoid specifying long messages outside the exception class

(TRY003)


272-272: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (7)
LLM/interpreter.py (3)

57-64: LGTM for KIMI client initialization.

The import check and base URL configuration are properly handled. The fallback to https://api.moonshot.ai is reasonable.


178-195: LGTM for fake provider defaults.

The safe defaults use only echo commands, making them safe for dry-run and integration tests. The keyword matching and fallback logic are reasonable.


278-281: LGTM for new provider routing.

The KIMI and FAKE providers are correctly routed to their respective handler methods, consistent with the existing pattern.

cortex/cli.py (4)

68-71: Auto-detection of fake provider via CORTEX_FAKE_COMMANDS presence.

This implicit detection allows tests to set CORTEX_FAKE_COMMANDS without needing CORTEX_PROVIDER=fake. While convenient for testing, it could surprise users who set the variable for other reasons (e.g., debugging) without intending to switch providers.

Consider documenting this behavior or requiring explicit CORTEX_PROVIDER=fake for production safety.


76-99: LGTM for provider-aware API key retrieval.

The method correctly maps providers to their environment variables and provides helpful guidance when keys are missing. The Ollama placeholder and structured error messaging are appropriate.


137-143: LGTM for fake provider handling in install flow.

The fake provider correctly bypasses real API key requirements with a sensible default, enabling offline testing without additional configuration.


830-831: LGTM for CLI help documentation.

The epilog correctly documents the new KIMI_API_KEY and CORTEX_PROVIDER environment variables, including all supported provider options.

@Sahilbhatane Sahilbhatane self-assigned this Dec 10, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 791427e and ad1080d.

📒 Files selected for processing (1)
  • requirements-dev.txt (1 hunks)
🔇 Additional comments (1)
requirements-dev.txt (1)

4-5: New dev dependencies appropriately support test infrastructure.

The addition of pytest-mock, requests, PyYAML, and pre-commit aligns well with the PR's integration test suite and multi-provider support (Kimi K2, FAKE). These dependencies enable mocking, HTTP interactions, YAML configuration handling, and pre-commit consistency checks.

Also applies to: 8-8, 25-25

@Sahilbhatane Sahilbhatane linked an issue Dec 11, 2025 that may be closed by this pull request
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (3)
cortex/user_preferences.py (2)

193-214: Don’t silently swallow all load errors in PreferencesManager.__init__() (at least store/report them).
Right now a permissions/parse error can be masked until much later. Consider capturing the exception into self.last_load_error and/or only swallowing expected “file missing” cases.


359-448: Extract per-root-key handlers from PreferencesManager.set() to improve maintainability.
This method handles coercion, validation, routing, mutation, and persistence in a single large if/elif chain, making it difficult to maintain and extend. Consider extracting handlers for each root key (e.g., _set_ai(), _set_packages(), …) and using a dispatch table like { "ai": self._set_ai, ... } to route to the appropriate handler. This pattern is already demonstrated effectively in the get() method, which uses a simple loop instead of branching.

test/test_cli.py (1)

48-57: Provider selection tests cover both detection and explicit override.
Good coverage for KIMI_API_KEY detection and CORTEX_PROVIDER=fake override behavior.

Optional: add a test for _get_api_key() (no provider arg) when only KIMI_API_KEY is set, to cover the auto-detect + lookup path end-to-end.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ad1080d and 5fcebfe.

📒 Files selected for processing (5)
  • cortex/cli.py (2 hunks)
  • cortex/user_preferences.py (4 hunks)
  • requirements-dev.txt (1 hunks)
  • requirements.txt (1 hunks)
  • test/test_cli.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • cortex/cli.py
  • requirements-dev.txt
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Follow PEP 8 style guide
Type hints required in Python code
Docstrings required for all public APIs

Files:

  • test/test_cli.py
  • cortex/user_preferences.py
🧬 Code graph analysis (1)
test/test_cli.py (1)
cortex/cli.py (2)
  • _get_api_key (77-104)
  • _get_provider (57-75)
🪛 GitHub Check: SonarCloud Code Analysis
cortex/user_preferences.py

[warning] 343-343: Remove the unused function parameter "key_parts".

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZrfnHzVPc78BrE-3Apy&open=AZrfnHzVPc78BrE-3Apy&pullRequest=192


[failure] 359-359: Refactor this function to reduce its Cognitive Complexity from 56 to the 15 allowed.

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZrfnHzVPc78BrE-3Apz&open=AZrfnHzVPc78BrE-3Apz&pullRequest=192


[failure] 449-449: Refactor this method to not always return the same value.

See more on https://sonarcloud.io/project/issues?id=cortexlinux_cortex&issues=AZrfnHzVPc78BrE-3Ap0&open=AZrfnHzVPc78BrE-3Ap0&pullRequest=192

🔇 Additional comments (2)
test/test_cli.py (1)

26-30: KIMI API key test looks solid and isolates env correctly.
Directly passing 'kimi' also avoids ambiguity around provider auto-detection.

requirements.txt (1)

3-7: No action needed. Both requests>=2.32.4 (supports Python 3.9+) and PyYAML>=6.0.0 (supports Python 3.6+) are fully compatible with the project's minimum Python 3.10 requirement and Python 3.14 pre-release testing.

@Sahilbhatane
Copy link
Collaborator Author

test are failing because of automation, ci.yml file doesn't install requirement properly, it just need pip install -r requirements.txt/requirements-dev.txt command in it.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
docs/ISSUE_40_KIMI_K2_IMPLEMENTATION.md (1)

21-22: Default model inconsistency remains unresolved.

The documentation still shows conflicting default models: Line 21-22 mentions kimi-k2/kimi-k2-turbo-preview, line 48 states default is kimi-k2, and line 138 shows kimi-k2 in the example. Per previous review feedback, the tests expect kimi-k2-turbo-preview as the default.

Please ensure all references use the same default model that matches the actual code implementation.

Also applies to: 48-48, 138-138

🧹 Nitpick comments (2)
docs/ISSUE_40_KIMI_K2_IMPLEMENTATION.md (1)

190-194: Optional: Address markdown linting issues.

Static analysis flagged a few minor markdown issues:

  • Line 190: Code block is missing a language identifier (add bash or text)
  • Line 214: "CLI interface" is redundant since CLI already means "Command Line Interface" (use just "CLI")

Also applies to: 214-214

test/integration/docker_utils.py (1)

32-47: Recommended: Use capture_output=True for cleaner subprocess calls.

The subprocess calls use stdout=subprocess.PIPE, stderr=subprocess.PIPE, but Python 3.7+ provides the cleaner capture_output=True parameter.

Apply these diffs:

Lines 32-39:

         subprocess.run(
             [docker_path, "--version"],
             check=True,
-            stdout=subprocess.PIPE,
-            stderr=subprocess.PIPE,
+            capture_output=True,
             text=True,
             timeout=5,
         )

Lines 40-47:

         subprocess.run(
             [docker_path, "info"],
             check=True,
-            stdout=subprocess.PIPE,
-            stderr=subprocess.PIPE,
+            capture_output=True,
             text=True,
             timeout=5,
         )

Lines 97-104:

     result = subprocess.run(
         docker_cmd,
         check=False,
-        stdout=subprocess.PIPE,
-        stderr=subprocess.PIPE,
+        capture_output=True,
         text=True,
         timeout=timeout,
     )

Also applies to: 97-104

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 469f4cb and 8cfa7af.

📒 Files selected for processing (9)
  • contribution.md (1 hunks)
  • docs/ISSUE_40_KIMI_K2_IMPLEMENTATION.md (1 hunks)
  • requirements.txt (1 hunks)
  • test.md (1 hunks)
  • test/integration/__init__.py (1 hunks)
  • test/integration/docker_utils.py (1 hunks)
  • test/integration/test_end_to_end.py (1 hunks)
  • test/run_all_tests.py (1 hunks)
  • test/test_cli.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • requirements.txt
  • test/integration/test_end_to_end.py
  • contribution.md
  • test.md
  • test/run_all_tests.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Follow PEP 8 style guide
Type hints required in Python code
Docstrings required for all public APIs

Files:

  • test/integration/__init__.py
  • test/integration/docker_utils.py
  • test/test_cli.py
{setup.py,setup.cfg,pyproject.toml,**/__init__.py}

📄 CodeRabbit inference engine (AGENTS.md)

Use Python 3.10 or higher as the minimum supported version

Files:

  • test/integration/__init__.py
🧬 Code graph analysis (1)
test/test_cli.py (1)
cortex/cli.py (9)
  • CortexCLI (32-538)
  • main (587-704)
  • _get_api_key (45-59)
  • _get_provider (61-74)
  • _print_status (76-85)
  • _print_error (87-88)
  • _print_success (90-91)
  • install (174-307)
  • _animate_spinner (93-97)
🪛 GitHub Actions: CI
test/integration/docker_utils.py

[error] 3-3: Ruff lint error I001: Import block is un-sorted or un-formatted.

🪛 GitHub Check: Lint
test/integration/docker_utils.py

[failure] 58-58: Ruff (UP045)
test/integration/docker_utils.py:58:13: UP045 Use X | None for type annotations


[failure] 57-57: Ruff (UP006)
test/integration/docker_utils.py:57:19: UP006 Use dict instead of Dict for type annotation


[failure] 57-57: Ruff (UP045)
test/integration/docker_utils.py:57:10: UP045 Use X | None for type annotations


[failure] 40-47: Ruff (UP022)
test/integration/docker_utils.py:40:9: UP022 Prefer capture_output over sending stdout and stderr to PIPE


[failure] 32-39: Ruff (UP022)
test/integration/docker_utils.py:32:9: UP022 Prefer capture_output over sending stdout and stderr to PIPE


[failure] 8-8: Ruff (UP035)
test/integration/docker_utils.py:8:1: UP035 typing.Tuple is deprecated, use tuple instead


[failure] 8-8: Ruff (UP035)
test/integration/docker_utils.py:8:1: UP035 typing.List is deprecated, use list instead


[failure] 8-8: Ruff (UP035)
test/integration/docker_utils.py:8:1: UP035 typing.Dict is deprecated, use dict instead


[failure] 8-8: Ruff (UP035)
test/integration/docker_utils.py:8:1: UP035 Import from collections.abc instead: Iterable


[failure] 3-8: Ruff (I001)
test/integration/docker_utils.py:3:1: I001 Import block is un-sorted or un-formatted

🪛 LanguageTool
docs/ISSUE_40_KIMI_K2_IMPLEMENTATION.md

[style] ~214-~214: This phrase is redundant (‘I’ stands for ‘interface’). Use simply “CLI”.
Context: ... addressed in PR #192) - Issue #11: CLI interface improvements (referenced in commits) - ...

(ACRONYM_TAUTOLOGY)

🪛 markdownlint-cli2 (0.18.1)
docs/ISSUE_40_KIMI_K2_IMPLEMENTATION.md

3-3: Bare URL used

(MD034, no-bare-urls)


4-4: Bare URL used

(MD034, no-bare-urls)


190-190: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


234-234: Bare URL used

(MD034, no-bare-urls)


235-235: Bare URL used

(MD034, no-bare-urls)


236-236: Bare URL used

(MD034, no-bare-urls)

🔇 Additional comments (1)
test/integration/__init__.py (1)

1-1: LGTM!

Clean package initializer with a clear docstring describing the integration test suite.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tests/test_interpreter.py (2)

18-29: Add cache mocking to initialization tests for proper isolation.

These initialization tests create CommandInterpreter instances without passing a cache parameter. According to the constructor implementation, this will attempt to initialize a real SemanticCache, which could:

  • Create actual cache files in the file system
  • Cause test instability if initialization behavior varies across environments
  • Introduce non-determinism in unit tests

Add cache mocking to ensure test isolation:

 @patch("openai.OpenAI")
 def test_initialization_openai(self, mock_openai):
-    interpreter = CommandInterpreter(api_key=self.api_key, provider="openai")
+    mock_cache = Mock()
+    interpreter = CommandInterpreter(api_key=self.api_key, provider="openai", cache=mock_cache)
     self.assertEqual(interpreter.provider, APIProvider.OPENAI)
     self.assertEqual(interpreter.model, "gpt-4")
     mock_openai.assert_called_once_with(api_key=self.api_key)

 @patch("anthropic.Anthropic")
 def test_initialization_claude(self, mock_anthropic):
-    interpreter = CommandInterpreter(api_key=self.api_key, provider="claude")
+    mock_cache = Mock()
+    interpreter = CommandInterpreter(api_key=self.api_key, provider="claude", cache=mock_cache)
     self.assertEqual(interpreter.provider, APIProvider.CLAUDE)
     self.assertEqual(interpreter.model, "claude-sonnet-4-20250514")
     mock_anthropic.assert_called_once_with(api_key=self.api_key)

32-36: Add cache mocking to these tests for consistency.

Similar to the initialization tests, these methods create CommandInterpreter instances without mocking the cache parameter, which may trigger unintended SemanticCache initialization.

Apply this pattern:

 @patch("openai.OpenAI")
 def test_initialization_custom_model(self, mock_openai):
+    mock_cache = Mock()
     interpreter = CommandInterpreter(
-        api_key=self.api_key, provider="openai", model="gpt-4-turbo"
+        api_key=self.api_key, provider="openai", model="gpt-4-turbo", cache=mock_cache
     )
     self.assertEqual(interpreter.model, "gpt-4-turbo")

 @patch("openai.OpenAI")
 def test_parse_empty_input(self, mock_openai):
-    interpreter = CommandInterpreter(api_key=self.api_key, provider="openai")
+    mock_cache = Mock()
+    interpreter = CommandInterpreter(api_key=self.api_key, provider="openai", cache=mock_cache)

     with self.assertRaises(ValueError):
         interpreter.parse("")

Also applies to: 80-84

🧹 Nitpick comments (1)
tests/test_interpreter.py (1)

1-241: Recommend adding tests for cache hit scenarios and cache write verification.

The current tests only cover cache miss scenarios (when cache.get_commands returns None). To ensure comprehensive coverage of the new cache parameter, consider adding tests for:

  1. Cache hit scenario: Verify that when cache.get_commands returns cached commands, the LLM provider is not called.
  2. Cache write verification: Verify that cache.put_commands is called after a successful LLM response.
  3. Offline mode with cache: Test that offline mode works when cache has a response.
  4. Offline mode without cache: Test that offline mode raises RuntimeError when cache misses.

Example test for cache hit scenario:

@patch("openai.OpenAI")
def test_parse_with_cache_hit(self, mock_openai):
    mock_client = Mock()
    mock_cache = Mock()
    cached_commands = ["apt update", "apt install docker"]
    mock_cache.get_commands.return_value = cached_commands
    
    interpreter = CommandInterpreter(api_key=self.api_key, provider="openai", cache=mock_cache)
    interpreter.client = mock_client
    
    result = interpreter.parse("install docker")
    
    # Verify cached commands are returned
    self.assertEqual(result, cached_commands)
    # Verify LLM was NOT called
    mock_client.chat.completions.create.assert_not_called()
    # Verify cache.put_commands was NOT called (no new response to cache)
    mock_cache.put_commands.assert_not_called()

Example test for cache write verification:

@patch("openai.OpenAI")
def test_parse_caches_response(self, mock_openai):
    mock_client = Mock()
    mock_response = Mock()
    mock_response.choices = [Mock()]
    mock_response.choices[0].message.content = '{"commands": ["apt update"]}'
    mock_client.chat.completions.create.return_value = mock_response
    
    mock_cache = Mock()
    mock_cache.get_commands.return_value = None
    
    interpreter = CommandInterpreter(api_key=self.api_key, provider="openai", cache=mock_cache)
    interpreter.client = mock_client
    
    result = interpreter.parse("install docker", validate=True)
    
    # Verify cache.put_commands was called with correct arguments
    mock_cache.put_commands.assert_called_once()
    call_kwargs = mock_cache.put_commands.call_args[1]
    self.assertEqual(call_kwargs["prompt"], "install docker")
    self.assertEqual(call_kwargs["provider"], "openai")
    self.assertEqual(call_kwargs["commands"], ["apt update"])
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8cfa7af and fb4e654.

📒 Files selected for processing (7)
  • cortex/config_manager.py (1 hunks)
  • test/integration/docker_utils.py (1 hunks)
  • test/integration/test_end_to_end.py (1 hunks)
  • test/run_all_tests.py (1 hunks)
  • test/test_cli.py (1 hunks)
  • tests/test_hardware_detection.py (2 hunks)
  • tests/test_interpreter.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • test/integration/test_end_to_end.py
  • test/run_all_tests.py
  • test/test_cli.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Follow PEP 8 style guide
Type hints required in Python code
Docstrings required for all public APIs

Files:

  • tests/test_hardware_detection.py
  • test/integration/docker_utils.py
  • cortex/config_manager.py
  • tests/test_interpreter.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Maintain >80% test coverage for pull requests

Files:

  • tests/test_hardware_detection.py
  • tests/test_interpreter.py
🧬 Code graph analysis (1)
tests/test_interpreter.py (2)
cortex/semantic_cache.py (1)
  • get_commands (200-290)
cortex/llm/interpreter.py (1)
  • CommandInterpreter (17-275)
🔇 Additional comments (4)
tests/test_hardware_detection.py (1)

287-287: LGTM! Cross-platform test robustness improvement.

Adding create=True to the @patch decorators ensures these tests can run on platforms where os.statvfs and os.uname don't exist (e.g., Windows), aligning with the PR's cross-platform testing goals.

Also applies to: 320-320

cortex/config_manager.py (1)

75-78: LGTM! Platform-aware security enforcement.

The short-circuit correctly handles non-POSIX platforms where os.getuid, os.getgid, and os.chown are unavailable. The defensive hasattr checks add robustness, and the comment clearly explains the rationale. This aligns with the PR's cross-platform testing strategy while maintaining security on Linux (the target platform).

test/integration/docker_utils.py (1)

1-106: LGTM! Clean Docker integration utilities.

The implementation is well-structured with:

  • Modern type hints following Python 3.10+ guidelines
  • Comprehensive error handling in docker_available()
  • Proper documentation for all public APIs
  • Two-step Docker availability check (CLI + daemon)

All previous review comments have been addressed:

  • ✓ Docstring for mounts parameter now correctly describes the tuple format
  • ✓ Unused imports removed
  • ✓ Type hints modernized (using dict, list, tuple, and collections.abc.Iterable)
  • ✓ Imports properly sorted
tests/test_interpreter.py (1)

144-147: LGTM! Proper cache mocking for parse() tests.

The mock cache setup correctly simulates cache misses, ensuring the LLM provider is called in these tests. The pattern is consistent and aligns well with the updated CommandInterpreter constructor.

Also applies to: 161-164, 178-181, 229-232

@sonarqubecloud
Copy link

@Sahilbhatane
Copy link
Collaborator Author

@mikejmorgan-ai we can merge this PR, I have rebased and resolved conflicts.

Copy link
Member

@mikejmorgan-ai mikejmorgan-ai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kimi K2 support expands our LLM options. Approved.

@mikejmorgan-ai
Copy link
Member

Approved! Please rebase against main: git fetch origin && git rebase origin/main && git push -f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Kimi K2 API Integration End-to-end integration test suite

3 participants