Skip to content

Conversation

@hyaku0121
Copy link
Contributor

@hyaku0121 hyaku0121 commented Dec 14, 2025

Related Issue

Closes #125

Summary

Implements the Smart Cleanup and Disk Space Optimizer feature with automatic scheduling support.

Changes

  • Add CleanupScanner for identifying cleanup opportunities
  • Add DiskCleaner for executing cleanup operations
  • Add CleanupManager for quarantine/undo functionality
  • Add CleanupScheduler for automatic scheduling (systemd/cron)
  • Add schedule subcommand to cortex cleanup CLI
  • Add comprehensive unit tests (88% coverage)
  • Add CLEANUP_GUIDE.md documentation

Features

  • Scan for cleanable items (package cache, orphaned packages, temp files, old logs)
  • Safe cleanup mode with quarantine (undo capability)
  • Log compression (gzip)
  • Automatic cleanup scheduling (daily/weekly/monthly)

Example Usage

$ cortex cleanup scan
$ cortex cleanup run --safe --yes
$ cortex cleanup undo
$ cortex cleanup schedule --enable --interval weekly

Summary by CodeRabbit

Release Notes

  • New Features

    • Disk cleanup scanning for package cache, orphaned packages, temporary files, and old logs
    • Cleanup execution with dry-run and safe mode options
    • File quarantine and restoration (undo) capability
    • Scheduled cleanup tasks via system timers or cron
  • Documentation

    • New cleanup guide with command reference and troubleshooting
  • Chores

    • Enhanced command validation for sudo command handling

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 14, 2025

Walkthrough

This PR introduces a comprehensive disk cleanup feature for Cortex, enabling scanning for cleanup opportunities across package cache, orphaned packages, temporary files, and logs; executing cleanup with safe mode and dry-run options; quarantining cleaned files with restore capability; and scheduling automated cleanup via systemd timers or cron jobs.

Changes

Cohort / File(s) Summary
Core cleanup modules
cortex/cleanup/scanner.py, cortex/cleanup/cleaner.py, cortex/cleanup/manager.py, cortex/cleanup/scheduler.py
Introduces CleanupScanner to identify cleanup opportunities; DiskCleaner to execute per-category cleanup (package cache, orphaned packages, temporary files, log compression); CleanupManager with QuarantineItem for file quarantine/restore with atomic metadata handling; CleanupScheduler for configuring systemd timers or cron-based automation with support for DAILY/WEEKLY/MONTHLY intervals.
Package initialization
cortex/cleanup/__init__.py
Exposes public API: CleanupScanner, ScanResult, DiskCleaner, CleanupManager, QuarantineItem.
CLI integration
cortex/cli.py
Adds cleanup command with subcommands: scan (identify opportunities), run (execute cleanup with safety/dry-run/yes options), undo (restore quarantined items), schedule (enable/disable/show scheduling). Includes _format_size helper for human-readable size formatting and new demo command.
Tests
tests/test_cleanup_*.py (scanner, cleaner, manager, scheduler)
Comprehensive unit tests covering initialization, per-category scanning, cleanup execution, quarantine/restore flow, scheduling configuration, edge cases, and error handling.
Utilities
cortex/utils/commands.py
Updates validate_command to skip over sudo flags when validating the actual command token.
Documentation
docs/CLEANUP_GUIDE.md
Detailed cleanup guide including quick-start examples, command reference, cleanup categories, safety features (quarantine, safe mode, dry-run), scheduling backends, troubleshooting.

Sequence Diagram

sequenceDiagram
    participant User
    participant CortexCLI
    participant CleanupScanner
    participant DiskCleaner
    participant CleanupManager
    participant FileSystem

    User->>CortexCLI: cleanup scan
    CortexCLI->>CleanupScanner: scan_all()
    CleanupScanner->>FileSystem: inspect cache/packages/temp/logs
    CleanupScanner-->>CortexCLI: list[ScanResult]
    CortexCLI-->>User: display opportunities & space

    User->>CortexCLI: cleanup run --safe
    CortexCLI->>DiskCleaner: run_cleanup(results)
    
    par Category Processing
        DiskCleaner->>DiskCleaner: clean_package_cache()
        DiskCleaner->>DiskCleaner: remove_orphaned_packages()
        DiskCleaner->>DiskCleaner: clean_temp_files()
        DiskCleaner->>DiskCleaner: compress_logs()
    end
    
    DiskCleaner->>CleanupManager: quarantine_file(path)
    CleanupManager->>FileSystem: move file to ~/.cortex/trash
    CleanupManager->>FileSystem: save metadata.json
    CleanupManager-->>DiskCleaner: item_id
    
    DiskCleaner-->>CortexCLI: dict[category → bytes_freed]
    CortexCLI-->>User: display cleanup summary

    User->>CortexCLI: cleanup undo <item_id>
    CortexCLI->>CleanupManager: restore_item(item_id)
    CleanupManager->>FileSystem: move file back to original_path
    CleanupManager-->>CortexCLI: success status
    CortexCLI-->>User: restore confirmation
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Parser logic: ReDoS mitigation in _extract_size (non-regex approach) and _parse_autoremove_output should be validated for correctness and edge case handling (KB, MB, GB parsing, malformed output)
  • File operations atomicity: Verify atomic metadata writes in CleanupManager (especially _save_metadata with 600 permissions) and file move semantics in quarantine_file
  • Systemd/cron integration: Confirm subprocess command construction and error fallback logic in CleanupScheduler's enable_schedule, timer/cron setup, and status checks
  • Dry-run logic: Ensure all cleaner methods respect self.dry_run and return accurate freed-space estimates without mutation

Possibly related PRs

  • PR #208: Modifies cortex/utils/commands.py validate_command for sudo flag handling, with same file affected in this PR for command validation updates.

Poem

🐰 The Cleanup Hop

With scanner keen and cleaner spry,
I hop through cache with watchful eye!
Quarantine safe, restore with glee,
Schedule tasks—wild, wild and free!
Disk space reclaimed, oh what a sight, ✨
Your system springs back, shiny and bright! 🪺

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(cleanup): Smart Cleanup and Disk Space Optimizer (#125)' clearly summarizes the main change - implementing a comprehensive cleanup and disk space optimization feature with automatic scheduling support.
Description check ✅ Passed The PR description includes all required template sections: Related Issue (Closes #125), Summary describing what the PR does, and a checklist. It also provides comprehensive detail about components added, features implemented, and example usage.
Linked Issues check ✅ Passed All primary coding requirements from issue #125 are met: CleanupScanner identifies cleanup opportunities across four categories; DiskCleaner executes cleanup with safe mode support; CleanupManager provides quarantine/undo capability; CleanupScheduler enables automatic scheduling; comprehensive unit tests with 88% coverage; CLEANUP_GUIDE.md documentation provided.
Out of Scope Changes check ✅ Passed All changes directly support the cleanup feature: new cleanup modules (scanner, cleaner, manager, scheduler), CLI integration, unit tests, and documentation are all aligned with issue #125 requirements. Minor update to cortex/utils/commands.py for sudo flag handling is a supporting change that enables safer command validation.
Docstring Coverage ✅ Passed Docstring coverage is 93.23% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

🧹 Nitpick comments (18)
docs/CLEANUP_GUIDE.md (1)

127-150: Add a short “platform assumptions” note (apt + system directories) to reduce surprise on non-Debian systems.
Right now the guide reads distro-agnostic, but it hardcodes apt paths and /var/log behaviors.

Also applies to: 185-190

cortex/cleanup/scanner.py (2)

28-45: Make CleanupScanner configurable + add missing type hints on public APIs.
This improves testability and portability (non-apt systems, custom temp dirs).

-class CleanupScanner:
+class CleanupScanner:
@@
-    def __init__(self):
-        self.apt_cache_dir = Path("/var/cache/apt/archives")
-        self.log_dir = Path("/var/log")
-        self.temp_dirs = [Path("/tmp"), Path.home() / ".cache"]
+    def __init__(
+        self,
+        apt_cache_dir: Path | None = None,
+        log_dir: Path | None = None,
+        temp_dirs: list[Path] | None = None,
+    ) -> None:
+        self.apt_cache_dir = apt_cache_dir or Path("/var/cache/apt/archives")
+        self.log_dir = log_dir or Path("/var/log")
+        self.temp_dirs = temp_dirs or [Path("/tmp"), Path.home() / ".cache"]

161-175: Avoid following symlinks / unbounded traversal in rglob scans.
On real systems this can explode in runtime and unintentionally scan outside intended dirs via symlinks.

Also applies to: 200-210

cortex/cleanup/scheduler.py (2)

100-116: Make save_config() atomic to prevent corrupt JSON on partial writes.
Write to a temp file in the same dir and replace() it.


284-292: Write unit files with explicit encoding (and consider perms).
At least set encoding="utf-8"; optionally chmod(0o600) since it’s user-controlled executable metadata.

cortex/cleanup/__init__.py (1)

11-21: Consider re-exporting scheduler types for a complete cleanup public API.
If cortex.cleanup.scheduler.CleanupScheduler is meant to be public, add it (and ScheduleInterval / ScheduleConfig) to __all__.

tests/test_cleanup_manager.py (3)

57-68: Align metadata round-trip test with the real quarantine schema.
Right now it validates JSON IO, but not that metadata matches what list_items() / QuarantineItem(**v) expects.


145-163: Avoid time.sleep() in tests; patch time for deterministic ordering.
This will reduce flakes under load/CI jitter.


215-216: Drop the __main__ pytest runner in test modules.
Repo test execution should be driven by pytest, not ad-hoc module execution.

tests/test_cleanup_scanner.py (1)

235-236: Drop the __main__ runner in pytest suites (optional).
Not harmful, but it’s uncommon in repo test modules and can confuse execution patterns.

cortex/cli.py (2)

625-672: Scheduling enable hardcodes safe_mode=True (no CLI control).
If “unsafe/aggressive scheduled cleanup” is a non-goal, OK; otherwise expose a flag (e.g., --unsafe) and plumb it through.


676-682: _format_size(): clamp/guard negatives.
If a future computation yields a negative delta, this will print negative sizes; consider size_bytes = max(size_bytes, 0).

tests/test_cleanup_scheduler.py (2)

87-96: Avoid bypassing __init__ in tests; prefer an injectable config path.
Patching CleanupScheduler.__init__ + using __new__ is brittle (future init logic changes won’t be exercised). Consider updating CleanupScheduler.__init__(config_dir: Optional[Path]=None) and use that in tests.


174-210: Systemd/cron behavior needs at least one “assert command called with …” test.
Right now it mostly asserts method output; adding a minimal assertion on subprocess.run call args would prevent regressions in the systemd detection / cron install commands.

cortex/cleanup/cleaner.py (1)

52-78: remove_orphaned_packages(packages) ignores packages input.
If intentional (apt decides what’s orphaned), drop the parameter to avoid implying the list is honored; otherwise include the packages explicitly (with correct apt semantics).

tests/test_cleanup_cleaner.py (2)

129-152: Add a non-dry-run temp-file test for quarantine success + freed bytes.
Once clean_temp_files() is fixed to count bytes on successful quarantine, add a test that patches CleanupManager.quarantine_file to return an id and asserts returned bytes > 0.


190-217: Consider testing run_cleanup(..., safe=False) once behavior is implemented.
This will prevent the safe knob from regressing again (currently unused).

cortex/cleanup/manager.py (1)

10-27: Consider avoiding id as a field name and tightening the dataclass.
Using id can be a little confusing vs Python’s built-in id(); item_id is clearer. Also consider slots=True (and possibly frozen=True) for a metadata model.

-@dataclass
+@dataclass(slots=True)
 class QuarantineItem:
@@
-    id: str
+    item_id: str
     original_path: str
     quarantine_path: str
     timestamp: float
     size_bytes: int
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1b8a042 and d298e01.

📒 Files selected for processing (11)
  • cortex/cleanup/__init__.py (1 hunks)
  • cortex/cleanup/cleaner.py (1 hunks)
  • cortex/cleanup/manager.py (1 hunks)
  • cortex/cleanup/scanner.py (1 hunks)
  • cortex/cleanup/scheduler.py (1 hunks)
  • cortex/cli.py (3 hunks)
  • docs/CLEANUP_GUIDE.md (1 hunks)
  • tests/test_cleanup_cleaner.py (1 hunks)
  • tests/test_cleanup_manager.py (1 hunks)
  • tests/test_cleanup_scanner.py (1 hunks)
  • tests/test_cleanup_scheduler.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Follow PEP 8 style guide
Type hints required in Python code
Docstrings required for all public APIs

Files:

  • cortex/cleanup/__init__.py
  • tests/test_cleanup_cleaner.py
  • tests/test_cleanup_scheduler.py
  • cortex/cleanup/cleaner.py
  • cortex/cleanup/manager.py
  • tests/test_cleanup_manager.py
  • tests/test_cleanup_scanner.py
  • cortex/cli.py
  • cortex/cleanup/scanner.py
  • cortex/cleanup/scheduler.py
{setup.py,setup.cfg,pyproject.toml,**/__init__.py}

📄 CodeRabbit inference engine (AGENTS.md)

Use Python 3.10 or higher as the minimum supported version

Files:

  • cortex/cleanup/__init__.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Maintain >80% test coverage for pull requests

Files:

  • tests/test_cleanup_cleaner.py
  • tests/test_cleanup_scheduler.py
  • tests/test_cleanup_manager.py
  • tests/test_cleanup_scanner.py
🧬 Code graph analysis (6)
cortex/cleanup/__init__.py (3)
cortex/cleanup/scanner.py (2)
  • CleanupScanner (24-216)
  • ScanResult (9-22)
cortex/cleanup/cleaner.py (1)
  • DiskCleaner (12-227)
cortex/cleanup/manager.py (2)
  • CleanupManager (28-176)
  • QuarantineItem (11-26)
tests/test_cleanup_cleaner.py (2)
cortex/cleanup/cleaner.py (6)
  • clean_package_cache (28-50)
  • remove_orphaned_packages (52-77)
  • _parse_freed_space (79-107)
  • clean_temp_files (109-151)
  • compress_logs (153-194)
  • run_cleanup (196-227)
cortex/cleanup/scanner.py (1)
  • ScanResult (9-22)
cortex/cleanup/cleaner.py (2)
cortex/cleanup/scanner.py (2)
  • ScanResult (9-22)
  • scan_package_cache (47-71)
cortex/cleanup/manager.py (1)
  • quarantine_file (57-101)
tests/test_cleanup_manager.py (1)
cortex/cleanup/manager.py (8)
  • QuarantineItem (11-26)
  • _ensure_dir (37-40)
  • _load_metadata (42-50)
  • _save_metadata (52-55)
  • quarantine_file (57-101)
  • restore_item (103-135)
  • list_items (137-148)
  • cleanup_old_items (150-176)
cortex/cli.py (4)
cortex/cleanup/scanner.py (2)
  • CleanupScanner (24-216)
  • scan_all (33-45)
cortex/cleanup/cleaner.py (2)
  • DiskCleaner (12-227)
  • run_cleanup (196-227)
cortex/cleanup/manager.py (3)
  • CleanupManager (28-176)
  • list_items (137-148)
  • restore_item (103-135)
cortex/cleanup/scheduler.py (5)
  • CleanupScheduler (61-448)
  • ScheduleInterval (18-22)
  • get_status (188-204)
  • disable_schedule (166-186)
  • enable_schedule (118-164)
cortex/cleanup/scanner.py (1)
src/sandbox_executor.py (1)
  • success (52-54)
🪛 markdownlint-cli2 (0.18.1)
docs/CLEANUP_GUIDE.md

34-34: Dollar signs used before commands without showing output

(MD014, commands-show-output)


38-38: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


221-221: Bare URL used

(MD034, no-bare-urls)

🔇 Additional comments (1)
cortex/cleanup/scanner.py (1)

80-90: Remove the critical bug assertion; the result.success check is correct as written.

The CommandResult class defines success as a bool attribute (not a method), and run_command() returns CommandResult with success=result.returncode == 0. All existing callers throughout the codebase use if result.success: as a boolean check. The defensive getattr/callable pattern is unnecessary.

However, update the inline comment: it states "We use strict=False" but the code passes validate=True to run_command(). Either correct the comment or clarify the intent.

@hyaku0121 hyaku0121 changed the title feat(cleanup): Smart Cleanup and Disk Space Optimizer feat(cleanup): Smart Cleanup and Disk Space Optimizer (#125) Dec 14, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (9)
cortex/cleanup/scheduler.py (1)

118-164: LGTM: enable_schedule now correctly passes safe_mode to backend methods.

The previous review concern about safe_mode being ignored has been addressed. Both _setup_systemd_timer and _setup_cron now receive the safe_mode parameter.

docs/CLEANUP_GUIDE.md (1)

1-50: LGTM: Documentation structure and quick start section are well-organized.

The markdown formatting issues from previous reviews have been addressed - bash blocks no longer have $ prompts, and output blocks use the text language identifier.

cortex/cli.py (2)

523-536: LGTM: Missing subcommand handling has been addressed.

The early check for args.cleanup_action with a user-friendly error message addresses the previous review concern.


562-596: The --safe flag is still ineffective and misleading.

At line 566, is_safe = not args.force effectively ignores args.safe. Combined with the CLI definition at line 793 where --safe has action='store_true', default=True, the --safe flag can never be false and serves no purpose.

Consider either removing --safe entirely (since it's always true unless --force is used), or use a mutually exclusive group:

-    run_parser.add_argument('--safe', action='store_true', default=True, help='Safe cleanup (default)')
-    run_parser.add_argument('--force', '--all', action='store_true', help='Clean all found items')
+    mode_group = run_parser.add_mutually_exclusive_group()
+    mode_group.add_argument('--safe', action='store_true', default=True, help='Safe cleanup (default)')
+    mode_group.add_argument('--force', '--all', action='store_true', help='Clean all found items')
cortex/cleanup/cleaner.py (3)

28-50: Previous review concern addressed: sudo -n flag added.

The command now uses sudo -n to prevent hanging on password prompts in non-interactive contexts.


52-78: Previous review concern addressed: sudo -n flag added for autoremove.

Consistent with clean_package_cache, this method now uses sudo -n to prevent interactive prompts.


110-144: Previous review concern addressed: clean_temp_files now correctly computes freed bytes.

The method now stats the file before quarantine and accumulates freed_bytes when quarantine_file() succeeds.

cortex/cleanup/manager.py (2)

69-113: Critical: Address unresolved safety issues from previous review.

Several major issues from the previous review remain unaddressed:

  1. No file type validation (lines 79-81): The method can quarantine directories or symlinks, which may cause issues during restore or cleanup operations.

  2. TOCTOU vulnerability (lines 92-93): The os.access check creates a time-of-check to time-of-use race condition.

  3. No rollback on metadata save failure (lines 105-107): If _save_metadata fails after the file has been moved (line 95), the file is orphaned in quarantine with no undo record.

  4. UUID collision risk (line 83): Using only 8 characters from the UUID significantly increases collision probability compared to the full hex representation.

Apply this diff to address these issues:

     def quarantine_file(self, filepath_str: str) -> Optional[str]:
         """
         Move a file to quarantine and return its ID.
         
         Args:
             filepath_str (str): Path to the file to quarantine.
             
         Returns:
             Optional[str]: ID of the quarantined item, or None if failed.
         """
         filepath = Path(filepath_str)
         if not filepath.exists():
             return None
+        # Only quarantine regular files, not directories or symlinks
+        if not filepath.is_file():
+            return None
+        if filepath.is_symlink():
+            return None
 
-        item_id = str(uuid.uuid4())[:8]
+        item_id = uuid.uuid4().hex  # Use full UUID to avoid collisions
         filename = filepath.name
         quarantine_path = self.quarantine_dir / f"{item_id}_{filename}"
         
         try:
             # Get file stats before moving
             size = filepath.stat().st_size
             
-            # Check if we have write access to the file
-            if not os.access(filepath, os.W_OK):
-                return None
-
             shutil.move(str(filepath), str(quarantine_path))
             
             item = QuarantineItem(
                 id=item_id,
                 original_path=str(filepath),
                 quarantine_path=str(quarantine_path),
                 timestamp=time.time(),
                 size_bytes=size
             )
             
             metadata = self._load_metadata()
             metadata[item_id] = asdict(item)
-            self._save_metadata(metadata)
+            try:
+                self._save_metadata(metadata)
+            except OSError:
+                # Rollback to avoid orphaned file
+                try:
+                    shutil.move(str(quarantine_path), str(filepath))
+                except Exception:
+                    pass
+                return None
             
             return item_id
             
         except Exception:
-            # Log error?
             return None

115-147: Critical: Path validation and overwrite behavior remain unaddressed.

Two critical issues from the previous review are still present:

  1. No path containment validation (lines 129-131): The method trusts quarantine_path from metadata without verifying it's actually within the quarantine directory. If metadata is tampered with, this could restore files from arbitrary locations or overwrite arbitrary paths.

  2. Undefined overwrite behavior (line 141): If original_path already exists, the behavior is OS-dependent and may silently overwrite existing files, causing data loss.

Apply this diff to validate paths and prevent overwrites:

     def restore_item(self, item_id: str) -> bool:
         """
         Restore a file from quarantine.
         
         Args:
             item_id (str): ID of the item to restore.
             
         Returns:
             bool: True if restored successfully, False otherwise.
         """
         metadata = self._load_metadata()
         if item_id not in metadata:
             return False
             
         item_data = metadata[item_id]
         original_path = Path(item_data['original_path'])
         quarantine_path = Path(item_data['quarantine_path'])
+        
+        # Validate quarantine_path is within quarantine_dir
+        try:
+            qdir = self.quarantine_dir.resolve()
+            qpath = quarantine_path.resolve()
+            if not qpath.is_relative_to(qdir):
+                return False
+        except Exception:
+            return False
+        
+        # Refuse to overwrite existing files
+        if original_path.exists():
+            return False
         
         if not quarantine_path.exists():
             return False
             
         try:
             # Ensure parent dir exists
-            if not original_path.parent.exists():
-                original_path.parent.mkdir(parents=True)
+            original_path.parent.mkdir(parents=True, exist_ok=True)
                 
             shutil.move(str(quarantine_path), str(original_path))
             
             del metadata[item_id]
             self._save_metadata(metadata)
             return True
         except Exception:
             return False
🧹 Nitpick comments (5)
docs/CLEANUP_GUIDE.md (1)

86-99: Remove shell prompts from the example output block.

The example at lines 89 and 96 still contains $ prompts which can trigger MD014 lint warnings. Consider removing them for consistency with the rest of the documentation.

 **Example:**
 
 ```text
-$ cortex cleanup undo
+cortex cleanup undo
 ID       File              Size       Date
 abc123   temp_file.txt     1.2 MB     2024-01-15 10:30
 def456   old_log.log       500 KB     2024-01-15 10:30
 
 Run 'cortex cleanup undo <id>' to restore.
 
-$ cortex cleanup undo abc123
+cortex cleanup undo abc123
 ✓ Restored item abc123

</blockquote></details>
<details>
<summary>tests/test_cleanup_cleaner.py (1)</summary><blockquote>

`190-221`: **Consider adding assertions for all categories in `test_run_cleanup_empty`.**

The test only asserts `Package Cache` and `Orphaned Packages`. For completeness, consider asserting all four categories.


```diff
     def test_run_cleanup_empty(self, cleaner):
         """Test run_cleanup with empty results."""
         summary = cleaner.run_cleanup([])
         
         assert summary["Package Cache"] == 0
         assert summary["Orphaned Packages"] == 0
+        assert summary["Temporary Files"] == 0
+        assert summary["Old Logs"] == 0
cortex/cleanup/cleaner.py (1)

189-222: safe parameter now partially used but could be more comprehensive.

The safe parameter now controls whether orphaned packages are removed (good!). However, consider whether other categories should also respect this flag. For instance, in --force mode, temp files could be permanently deleted rather than quarantined.

The current behavior is reasonable - orphaned package removal is the riskiest operation. If the intent is to keep temp files always quarantined for undo capability, this is a valid design choice. Consider adding a comment clarifying the design decision.

cortex/cleanup/manager.py (2)

149-160: Make list_items resilient to malformed metadata entries.

The method will fail completely if any metadata entry is missing required keys or has been modified. This was flagged in the previous review but remains unaddressed.

Apply this diff to skip malformed entries gracefully:

     def list_items(self) -> List[QuarantineItem]:
         """
         List all items in quarantine.
         
         Returns:
             List[QuarantineItem]: List of quarantined items sorted by date.
         """
         metadata = self._load_metadata()
-        items = []
+        items: List[QuarantineItem] = []
         for k, v in metadata.items():
-            items.append(QuarantineItem(**v))
+            try:
+                items.append(QuarantineItem(**v))
+            except (TypeError, ValueError):
+                # Skip malformed entries
+                continue
         return sorted(items, key=lambda x: x.timestamp, reverse=True)

185-192: Defensively access 'quarantine_path' to avoid KeyError.

If a metadata entry is missing the quarantine_path key, line 186 will raise a KeyError and halt the cleanup process partway through.

Apply this diff to handle missing keys gracefully:

         for item_id in to_remove:
-            path = Path(metadata[item_id]['quarantine_path'])
+            qpath = metadata[item_id].get('quarantine_path')
+            if qpath is None:
+                del metadata[item_id]
+                continue
+            path = Path(qpath)
             if path.exists():
                 try:
                     path.unlink()
                 except OSError:
                     pass
             del metadata[item_id]
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d298e01 and da14bd9.

📒 Files selected for processing (7)
  • cortex/cleanup/cleaner.py (1 hunks)
  • cortex/cleanup/manager.py (1 hunks)
  • cortex/cleanup/scheduler.py (1 hunks)
  • cortex/cli.py (3 hunks)
  • docs/CLEANUP_GUIDE.md (1 hunks)
  • tests/test_cleanup_cleaner.py (1 hunks)
  • tests/test_cleanup_scanner.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_cleanup_scanner.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Follow PEP 8 style guide
Type hints required in Python code
Docstrings required for all public APIs

Files:

  • tests/test_cleanup_cleaner.py
  • cortex/cleanup/manager.py
  • cortex/cleanup/scheduler.py
  • cortex/cleanup/cleaner.py
  • cortex/cli.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Maintain >80% test coverage for pull requests

Files:

  • tests/test_cleanup_cleaner.py
🧠 Learnings (1)
📚 Learning: 2025-12-11T12:03:24.071Z
Learnt from: CR
Repo: cortexlinux/cortex PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-11T12:03:24.071Z
Learning: Applies to **/*install*.py : No silent sudo execution - require explicit user confirmation

Applied to files:

  • cortex/cleanup/cleaner.py
🧬 Code graph analysis (3)
tests/test_cleanup_cleaner.py (2)
cortex/cleanup/cleaner.py (7)
  • DiskCleaner (12-222)
  • clean_package_cache (28-50)
  • remove_orphaned_packages (52-78)
  • _parse_freed_space (80-108)
  • clean_temp_files (110-144)
  • compress_logs (146-187)
  • run_cleanup (189-222)
cortex/cleanup/scanner.py (1)
  • ScanResult (9-22)
cortex/cleanup/cleaner.py (2)
cortex/cleanup/scanner.py (2)
  • ScanResult (9-22)
  • scan_package_cache (47-71)
cortex/cleanup/manager.py (2)
  • CleanupManager (28-195)
  • quarantine_file (69-113)
cortex/cli.py (4)
cortex/cleanup/scanner.py (2)
  • CleanupScanner (24-216)
  • scan_all (33-45)
cortex/cleanup/cleaner.py (2)
  • DiskCleaner (12-222)
  • run_cleanup (189-222)
cortex/cleanup/manager.py (3)
  • CleanupManager (28-195)
  • list_items (149-160)
  • restore_item (115-147)
cortex/cleanup/scheduler.py (5)
  • CleanupScheduler (61-454)
  • ScheduleInterval (18-22)
  • get_status (188-204)
  • disable_schedule (166-186)
  • enable_schedule (118-164)
🔇 Additional comments (19)
cortex/cleanup/scheduler.py (5)

1-16: LGTM: Module setup and imports are well-organized.

Type hints, logging setup, and imports follow best practices.


18-59: LGTM: ScheduleInterval enum and ScheduleConfig dataclass are well-designed.

The serialization methods (to_dict/from_dict) handle the enum conversion properly, and defaults are sensible.


61-116: LGTM: CleanupScheduler initialization and config persistence.

Good defensive error handling in load_config with fallback to defaults on parse errors. The config directory creation with mode=0o700 ensures privacy.


240-316: LGTM: systemd timer setup correctly uses safe_mode.

The implementation properly generates the mode flag based on safe_mode and includes good error handling with timeouts.


357-410: LGTM: cron setup correctly uses safe_mode.

The cron entry management properly handles existing entries and generates the correct command with the mode flag.

docs/CLEANUP_GUIDE.md (2)

129-181: LGTM: Cleanup categories and safety features are well-documented.

Clear explanations of what each category targets, the quarantine system, and safety modes.


183-223: LGTM: Troubleshooting and configuration sections are helpful.

Good coverage of common issues and their solutions. The final link is properly formatted as a markdown link.

cortex/cli.py (2)

680-685: LGTM: _format_size helper is correctly implemented.

The size formatting logic handles the full range of units appropriately.


784-807: LGTM: Cleanup CLI argument parsing is well-structured.

The subcommand structure follows the existing pattern established by the notify command.

tests/test_cleanup_cleaner.py (5)

1-16: LGTM: Test file setup and imports are correct.

Proper imports for pytest, mocking utilities, and the modules under test.


17-38: LGTM: Test class and fixtures are well-defined.

The fixtures provide clean instances for both normal and dry-run modes.


40-78: LGTM: Package cache tests cover success, failure, and dry-run scenarios.

Good coverage of the three main code paths.


79-128: LGTM: Orphaned packages and parsing tests are thorough.

Tests cover empty list, success path, dry-run mode, and parsing with different units (MB, KB, no match).


129-189: LGTM: Temp file and log compression tests are comprehensive.

Good use of tmp_path fixture for creating real temporary files. Tests verify both the returned values and filesystem state.

cortex/cleanup/cleaner.py (3)

1-11: LGTM: Imports and module setup are correct.

Type hints are present and logging is properly configured.


12-27: LGTM: DiskCleaner initialization with proper type hints and docstring.

The class docstring and __init__ method follow PEP 8 guidelines.


146-187: LGTM: Log compression with proper error handling.

The implementation correctly calculates freed space as the difference between original and compressed sizes, with appropriate dry-run estimation.

cortex/cleanup/manager.py (2)

32-46: LGTM! Permission hardening properly implemented.

The initialization and directory setup correctly address the previous critical issue by enforcing secure permissions even when the quarantine directory pre-exists.


48-68: LGTM! Metadata operations are robust and atomic.

The implementation properly addresses the previous major issue with atomic writes, comprehensive exception handling, and restricted file permissions.

cortex/cli.py Outdated
Comment on lines 658 to 665
if args.enable:
interval = ScheduleInterval(args.interval) if args.interval else ScheduleInterval.WEEKLY
result = scheduler.enable_schedule(interval=interval, safe_mode=True)
if result['success']:
self._print_success(result['message'])
else:
self._print_error(result.get('message', 'Failed to enable schedule'))
return 0 if result['success'] else 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Schedule enable hardcodes safe_mode=True, ignoring user preference.

The CLI has no option to disable safe mode for scheduled cleanup. If this is intentional for safety, consider documenting it. Otherwise, add a --no-safe or --force option to the schedule subcommand.

             if args.enable:
                 interval = ScheduleInterval(args.interval) if args.interval else ScheduleInterval.WEEKLY
-                result = scheduler.enable_schedule(interval=interval, safe_mode=True)
+                # Use safe mode by default for scheduled cleanup
+                safe = not getattr(args, 'force', False)
+                result = scheduler.enable_schedule(interval=interval, safe_mode=safe)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if args.enable:
interval = ScheduleInterval(args.interval) if args.interval else ScheduleInterval.WEEKLY
result = scheduler.enable_schedule(interval=interval, safe_mode=True)
if result['success']:
self._print_success(result['message'])
else:
self._print_error(result.get('message', 'Failed to enable schedule'))
return 0 if result['success'] else 1
if args.enable:
interval = ScheduleInterval(args.interval) if args.interval else ScheduleInterval.WEEKLY
# Use safe mode by default for scheduled cleanup
safe = not getattr(args, 'force', False)
result = scheduler.enable_schedule(interval=interval, safe_mode=safe)
if result['success']:
self._print_success(result['message'])
else:
self._print_error(result.get('message', 'Failed to enable schedule'))
return 0 if result['success'] else 1
🤖 Prompt for AI Agents
In cortex/cli.py around lines 658 to 665 the call to scheduler.enable_schedule
currently hardcodes safe_mode=True which ignores user preference; add a CLI flag
(e.g. --no-safe or --force) to the schedule subcommand, parse it into args
(default preserving current safe_mode=True unless user passes the flag), convert
the flag into a boolean safe_mode value and pass that value to
scheduler.enable_schedule, and update the command help text and any docs to
reflect the new option.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (3)
cortex/cleanup/manager.py (3)

115-148: Path traversal vulnerability and undefined overwrite behavior in restore_item.

The method trusts metadata paths without validation, creating a critical security risk if metadata is tampered (weak permissions or corruption). Additionally, restoring when original_path exists has undefined behavior.

Apply this diff to validate quarantine path containment and prevent overwrites:

         item_data = metadata[item_id]
         original_path = Path(item_data['original_path'])
         quarantine_path = Path(item_data['quarantine_path'])
         
+        # Validate quarantine_path is actually in quarantine_dir
+        try:
+            qdir = self.quarantine_dir.resolve()
+            qpath = quarantine_path.resolve()
+            if not qpath.is_relative_to(qdir):
+                return False
+        except Exception:
+            return False
+        
         if not quarantine_path.exists():
             return False
+        
+        # Refuse to overwrite existing files
+        if original_path.exists():
+            return False
             
         try:
             # Ensure parent dir exists
-            if not original_path.parent.exists():
-                original_path.parent.mkdir(parents=True)
+            original_path.parent.mkdir(parents=True, exist_ok=True)

69-114: Critical security and correctness issues remain unaddressed in quarantine_file.

Despite past feedback, several major problems persist:

  1. Line 83: 8-character UUID truncation creates collision risk. Use full UUID hex.
  2. Lines 80-81: Missing file type validation—can accept directories or symlinks, breaking quarantine assumptions.
  3. Line 92: os.access() is a TOCTOU vulnerability; permissions can change between check and move.
  4. Line 107: If _save_metadata() fails, the file is moved but undo metadata is lost (orphaned in quarantine).

Apply this diff to address all issues:

-        item_id = str(uuid.uuid4())[:8]
+        item_id = uuid.uuid4().hex
         filename = filepath.name
         quarantine_path = self.quarantine_dir / f"{item_id}_{filename}"
         
         try:
+            # Reject non-files and symlinks
+            if not filepath.is_file() or filepath.is_symlink():
+                return None
+            
             # Get file stats before moving
             size = filepath.stat().st_size
             
-            # Check if we have write access to the file
-            if not os.access(filepath, os.W_OK):
-                return None
-
             shutil.move(str(filepath), str(quarantine_path))
             
             item = QuarantineItem(
@@ -104,7 +104,15 @@
             
             metadata = self._load_metadata()
             metadata[item_id] = asdict(item)
-            self._save_metadata(metadata)
+            try:
+                self._save_metadata(metadata)
+            except OSError:
+                # Rollback move to avoid orphaned quarantine file
+                try:
+                    shutil.move(str(quarantine_path), str(filepath))
+                except Exception:
+                    pass
+                return None
             
             return item_id

149-161: Handle malformed metadata entries defensively in list_items.

QuarantineItem(**v) will raise TypeError if metadata entries have missing or renamed keys, breaking the entire listing. This is especially problematic since you're using id as the field name, which could be changed to item_id for consistency with the generated UUID.

Apply this diff to skip malformed entries:

         metadata = self._load_metadata()
-        items = []
+        items: List[QuarantineItem] = []
         for k, v in metadata.items():
-            items.append(QuarantineItem(**v))
+            try:
+                items.append(QuarantineItem(**v))
+            except TypeError:
+                logger.warning(f"Skipping malformed metadata entry: {k}")
+                continue
         return sorted(items, key=lambda x: x.timestamp, reverse=True)
🧹 Nitpick comments (4)
tests/test_cleanup_cleaner.py (2)

161-177: Remove unused variable or utilize it in assertion.

Line 167 marks original_size as noqa: F841 claiming it's "used for documentation," but it's not referenced anywhere. Either remove it or add an assertion to verify compression effectiveness.

Apply this diff:

         log_file.write_bytes(log_content)
-        original_size = log_file.stat().st_size  # noqa: F841 - used for documentation
         
         freed = cleaner.compress_logs([str(log_file)])
         
         # Original should be gone
         assert not log_file.exists()
         # Compressed should exist
         gz_file = tmp_path / "test.log.gz"
         assert gz_file.exists()
         # Should have freed some space
         assert freed > 0

190-218: Add test coverage for safe mode behavior on temporary files.

Based on past review comments (lines 206-229 in cleaner.py), the safe parameter should affect temp file handling (quarantine vs. permanent delete), but current tests don't verify this distinction. The test at line 204 uses safe=False only to enable orphaned package removal.

Add a test case to verify safe mode affects temp file cleanup behavior:

def test_run_cleanup_safe_mode_temp_files(self, cleaner):
    """Test that safe mode uses quarantine for temp files."""
    scan_results = [
        ScanResult("Temporary Files", 500, 2, ["/tmp/f1", "/tmp/f2"]),
    ]
    
    with patch.object(cleaner, 'clean_temp_files', return_value=500) as mock_clean:
        # safe=True should still call clean_temp_files (which uses quarantine)
        summary = cleaner.run_cleanup(scan_results, safe=True)
        
        mock_clean.assert_called_once()
        assert summary["Temporary Files"] == 500
cortex/cleanup/cleaner.py (1)

206-251: Clarify safe mode semantics in documentation.

The safe parameter (Line 212) currently only affects orphaned packages (Line 246), but temp files (Line 248) always use quarantine regardless of safe mode. This creates confusion about what "safe mode" actually controls.

Update the docstring to clarify safe mode behavior:

     def run_cleanup(self, scan_results: List[ScanResult], safe: bool = True) -> Dict[str, int]:
         """
         Run cleanup based on scan results.
         
         Args:
             scan_results (List[ScanResult]): Results from scanner.
-            safe (bool): If True, perform safe cleanup (default).
+            safe (bool): If True, skip orphaned package removal (default).
+                Note: Temp files always use quarantine for undo capability.
             
         Returns:
             Dict[str, int]: Summary of bytes freed per category.
         """
cortex/cleanup/scheduler.py (1)

291-299: Set restrictive permissions on systemd service and timer files.

The systemd service file (Line 297) contains the cleanup command with flags. While these files are in the user directory, they should still have restrictive permissions (0o600) to prevent other users from reading the configuration.

Apply this diff:

             service_path.write_text(service_content)
             timer_path.write_text(timer_content)
+            
+            try:
+                service_path.chmod(0o600)
+                timer_path.chmod(0o600)
+            except OSError:
+                pass
             
             # Reload and enable timer
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c61f3f3 and 25933b0.

📒 Files selected for processing (9)
  • cortex/cleanup/cleaner.py (1 hunks)
  • cortex/cleanup/manager.py (1 hunks)
  • cortex/cleanup/scanner.py (1 hunks)
  • cortex/cleanup/scheduler.py (1 hunks)
  • cortex/cli.py (3 hunks)
  • tests/test_cleanup_cleaner.py (1 hunks)
  • tests/test_cleanup_manager.py (1 hunks)
  • tests/test_cleanup_scanner.py (1 hunks)
  • tests/test_cleanup_scheduler.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • tests/test_cleanup_scanner.py
  • tests/test_cleanup_scheduler.py
  • tests/test_cleanup_manager.py
  • cortex/cli.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Follow PEP 8 style guide
Type hints required in Python code
Docstrings required for all public APIs

Files:

  • cortex/cleanup/manager.py
  • cortex/cleanup/scheduler.py
  • cortex/cleanup/scanner.py
  • tests/test_cleanup_cleaner.py
  • cortex/cleanup/cleaner.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Maintain >80% test coverage for pull requests

Files:

  • tests/test_cleanup_cleaner.py
🧠 Learnings (1)
📚 Learning: 2025-12-11T12:03:24.071Z
Learnt from: CR
Repo: cortexlinux/cortex PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-11T12:03:24.071Z
Learning: Applies to **/*install*.py : No silent sudo execution - require explicit user confirmation

Applied to files:

  • cortex/cleanup/cleaner.py
🧬 Code graph analysis (2)
tests/test_cleanup_cleaner.py (2)
cortex/cleanup/cleaner.py (7)
  • DiskCleaner (26-251)
  • clean_package_cache (42-64)
  • remove_orphaned_packages (66-92)
  • _parse_freed_space (94-107)
  • clean_temp_files (127-161)
  • compress_logs (163-204)
  • run_cleanup (206-229)
cortex/cleanup/scanner.py (1)
  • ScanResult (16-29)
cortex/cleanup/cleaner.py (2)
cortex/cleanup/scanner.py (3)
  • CleanupScanner (31-233)
  • ScanResult (16-29)
  • scan_package_cache (54-78)
cortex/cleanup/manager.py (2)
  • CleanupManager (28-195)
  • quarantine_file (69-113)

Comment on lines 163 to 205
def scan_temp_files(self, days_old: int = 7) -> ScanResult:
"""
Scan for temporary files older than X days.
Args:
days_old (int): Minimum age of files in days to include.
Returns:
ScanResult: Result containing size and count of old temp files.
"""
total_size = 0
files = []
now = time.time()
cutoff = now - (days_old * 86400)

for temp_dir in self.temp_dirs:
if not temp_dir.exists():
continue

for filepath in temp_dir.rglob("*"):
if filepath.is_file():
try:
stat = filepath.stat()
# Check if file is older than cutoff
if stat.st_mtime < cutoff:
total_size += stat.st_size
files.append(str(filepath))
except OSError:
pass

return ScanResult(
category="Temporary Files",
size_bytes=total_size,
count=len(files),
items=files
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Validate days_old parameter to prevent unintended behavior.

Negative days_old values will set cutoff in the future (Line 176), causing all files to be considered "old" and included in the scan. This could lead to unintended data loss if cleanup is run with invalid parameters.

Apply this diff to validate the parameter:

     def scan_temp_files(self, days_old: int = 7) -> ScanResult:
         """
         Scan for temporary files older than X days.
         
         Args:
             days_old (int): Minimum age of files in days to include.
             
         Returns:
             ScanResult: Result containing size and count of old temp files.
         """
+        if days_old < 0:
+            raise ValueError("days_old must be >= 0")
+        
         total_size = 0
         files = []
         now = time.time()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def scan_temp_files(self, days_old: int = 7) -> ScanResult:
"""
Scan for temporary files older than X days.
Args:
days_old (int): Minimum age of files in days to include.
Returns:
ScanResult: Result containing size and count of old temp files.
"""
total_size = 0
files = []
now = time.time()
cutoff = now - (days_old * 86400)
for temp_dir in self.temp_dirs:
if not temp_dir.exists():
continue
for filepath in temp_dir.rglob("*"):
if filepath.is_file():
try:
stat = filepath.stat()
# Check if file is older than cutoff
if stat.st_mtime < cutoff:
total_size += stat.st_size
files.append(str(filepath))
except OSError:
pass
return ScanResult(
category="Temporary Files",
size_bytes=total_size,
count=len(files),
items=files
)
def scan_temp_files(self, days_old: int = 7) -> ScanResult:
"""
Scan for temporary files older than X days.
Args:
days_old (int): Minimum age of files in days to include.
Returns:
ScanResult: Result containing size and count of old temp files.
"""
if days_old < 0:
raise ValueError("days_old must be >= 0")
total_size = 0
files = []
now = time.time()
cutoff = now - (days_old * 86400)
for temp_dir in self.temp_dirs:
if not temp_dir.exists():
continue
for filepath in temp_dir.rglob("*"):
if filepath.is_file():
try:
stat = filepath.stat()
# Check if file is older than cutoff
if stat.st_mtime < cutoff:
total_size += stat.st_size
files.append(str(filepath))
except OSError:
pass
return ScanResult(
category="Temporary Files",
size_bytes=total_size,
count=len(files),
items=files
)
🤖 Prompt for AI Agents
In cortex/cleanup/scanner.py around lines 163 to 198, validate the days_old
parameter at the start of scan_temp_files to prevent negative values from
producing a future cutoff; add a guard that raises a ValueError (or TypeError if
non-int) when days_old is negative (and optionally coerce/convert non-int
numeric types), so the function only proceeds with days_old >= 0 and computes
cutoff = now - (days_old * 86400) safely.

Comment on lines 200 to 233
def scan_logs(self, min_size_mb: int = 100, days_old: int = 7) -> ScanResult:
"""
Scan for large, old log files.
Args:
min_size_mb (int): Minimum size in MB to include.
days_old (int): Minimum age in days to include.
Returns:
ScanResult: Result containing size and count of old log files.
"""
total_size = 0
files = []
now = time.time()
cutoff = now - (days_old * 86400)
min_size = min_size_mb * 1024 * 1024

if self.log_dir.exists():
for filepath in self.log_dir.rglob("*.log"):
if filepath.is_file():
try:
stat = filepath.stat()
if stat.st_size > min_size and stat.st_mtime < cutoff:
total_size += stat.st_size
files.append(str(filepath))
except OSError:
pass

return ScanResult(
category="Old Logs",
size_bytes=total_size,
count=len(files),
items=files
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Validate parameters to prevent unintended scan behavior.

Similar to scan_temp_files, negative days_old or min_size_mb values can cause unexpected behavior:

  • Negative days_old sets cutoff in the future, matching all files.
  • Negative min_size_mb sets min_size to a negative value, matching all files.

Apply this diff to validate parameters:

     def scan_logs(self, min_size_mb: int = 100, days_old: int = 7) -> ScanResult:
         """
         Scan for large, old log files.
         
         Args:
             min_size_mb (int): Minimum size in MB to include.
             days_old (int): Minimum age in days to include.
             
         Returns:
             ScanResult: Result containing size and count of old log files.
         """
+        if days_old < 0:
+            raise ValueError("days_old must be >= 0")
+        if min_size_mb < 0:
+            raise ValueError("min_size_mb must be >= 0")
+        
         total_size = 0
         files = []
         now = time.time()
🤖 Prompt for AI Agents
In cortex/cleanup/scanner.py around lines 200 to 233, the scan_logs method does
not validate its input parameters so negative days_old or min_size_mb produce
unintended matches; add parameter validation at the start of the method (e.g.,
if min_size_mb < 0 or days_old < 0) and either raise a ValueError with a clear
message or coerce negatives to safe defaults (0) consistent with scan_temp_files
behavior, then proceed using the validated values so cutoff and min_size are
never negative and the scan only matches intended files.

Comment on lines 103 to 121
def save_config(self, config: ScheduleConfig) -> bool:
"""
Save schedule configuration to file.
Args:
config: Configuration to save.
Returns:
bool: True if saved successfully.
"""
try:
with open(self.config_file, 'w', encoding='utf-8') as f:
json.dump(config.to_dict(), f, indent=2)
return True
except OSError as e:
logger.error(f"Failed to save schedule config: {e}")
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Set restrictive permissions on configuration file.

Similar to manager.py (Lines 64-67), the schedule configuration file should have restrictive permissions (0o600) since it contains potentially sensitive settings. Currently, permissions depend on umask.

Apply this diff:

     def save_config(self, config: ScheduleConfig) -> bool:
         """
         Save schedule configuration to file.
         
         Args:
             config: Configuration to save.
             
         Returns:
             bool: True if saved successfully.
         """
         try:
             with open(self.config_file, 'w', encoding='utf-8') as f:
                 json.dump(config.to_dict(), f, indent=2)
+            try:
+                self.config_file.chmod(0o600)
+            except OSError:
+                pass
             return True
         except OSError as e:
             logger.error(f"Failed to save schedule config: {e}")
             return False
🤖 Prompt for AI Agents
In cortex/cleanup/scheduler.py around lines 103 to 119, the config file is
written without forcing file permissions, leaving permissions dependent on the
process umask; after successfully writing the JSON file, set restrictive
permissions explicitly by calling os.chmod(self.config_file, 0o600) (and import
os if missing), and handle any OSError from chmod similarly to the existing
exception handling so the function still returns False on failure.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
cortex/cleanup/scanner.py (2)

166-201: Validate days_old parameter to prevent unintended behavior.

Negative days_old values will set cutoff in the future (line 179), causing all files to be considered "old" and included in the scan, potentially leading to unintended data loss during cleanup.

Apply this diff to validate the parameter:

 def scan_temp_files(self, days_old: int = 7) -> ScanResult:
     """
     Scan for temporary files older than X days.
     
     Args:
         days_old (int): Minimum age of files in days to include.
         
     Returns:
         ScanResult: Result containing size and count of old temp files.
     """
+    if days_old < 0:
+        raise ValueError("days_old must be >= 0")
+    
     total_size = 0
     files = []
     now = time.time()

As per past review comments, this issue remains unresolved.


203-236: Validate parameters to prevent unintended scan behavior.

Similar to scan_temp_files, negative parameter values can cause unexpected behavior:

  • Negative days_old sets cutoff in the future (line 217), matching all files.
  • Negative min_size_mb sets min_size to a negative value (line 218), matching all files.

Apply this diff to validate parameters:

 def scan_logs(self, min_size_mb: int = 100, days_old: int = 7) -> ScanResult:
     """
     Scan for large, old log files.
     
     Args:
         min_size_mb (int): Minimum size in MB to include.
         days_old (int): Minimum age in days to include.
         
     Returns:
         ScanResult: Result containing size and count of old log files.
     """
+    if days_old < 0:
+        raise ValueError("days_old must be >= 0")
+    if min_size_mb < 0:
+        raise ValueError("min_size_mb must be >= 0")
+    
     total_size = 0
     files = []
     now = time.time()

As per past review comments, this issue remains unresolved.

🧹 Nitpick comments (1)
cortex/cleanup/cleaner.py (1)

207-252: Consider clarifying the safe parameter's scope or extending its behavior.

The safe parameter currently only affects orphaned packages (line 246-247: packages are removed only in non-safe mode). For temporary files (line 249), clean_temp_files always uses quarantine regardless of the safe flag, which means temp files always get safe treatment.

This is functionally correct but may be inconsistent with the API's intent. Consider:

  1. Option A (extend behavior): Add a non-safe path for temp files that permanently deletes without quarantine when safe=False.
  2. Option B (document intent): Update the docstring to clarify that safe only controls orphaned package removal, and temp files are always quarantined.

Based on past review comments, this concern was previously raised but appears unresolved.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 25933b0 and a2624aa.

📒 Files selected for processing (2)
  • cortex/cleanup/cleaner.py (1 hunks)
  • cortex/cleanup/scanner.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Follow PEP 8 style guide
Type hints required in Python code
Docstrings required for all public APIs

Files:

  • cortex/cleanup/scanner.py
  • cortex/cleanup/cleaner.py
🧠 Learnings (1)
📚 Learning: 2025-12-11T12:03:24.071Z
Learnt from: CR
Repo: cortexlinux/cortex PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-11T12:03:24.071Z
Learning: Applies to **/*install*.py : No silent sudo execution - require explicit user confirmation

Applied to files:

  • cortex/cleanup/cleaner.py
🧬 Code graph analysis (1)
cortex/cleanup/cleaner.py (2)
cortex/cleanup/scanner.py (2)
  • ScanResult (16-29)
  • scan_package_cache (56-80)
cortex/cleanup/manager.py (1)
  • CleanupManager (28-195)

@mikejmorgan-ai
Copy link
Member

CI Failing - Action Required

Hey @hyaku0121! Thanks for this contribution. Before we can merge, please fix the failing checks:

  • CodeQL Analysis: failing
  • SonarCloud: failing

Please push fixes and let us know when ready for re-review!

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
cortex/cleanup/scanner.py (2)

169-204: Parameter validation not implemented (per past review comments).

Past review comments specifically requested validation for days_old parameter to prevent negative values. Line 182 computes cutoff = now - (days_old * 86400). If days_old is negative, cutoff will be in the future, causing all files to match and potentially be deleted.

Apply this diff to add validation as recommended in past reviews:

     def scan_temp_files(self, days_old: int = 7) -> ScanResult:
         """
         Scan for temporary files older than X days.
         
         Args:
             days_old (int): Minimum age of files in days to include.
             
         Returns:
             ScanResult: Result containing size and count of old temp files.
         """
+        if days_old < 0:
+            raise ValueError("days_old must be >= 0")
+        
         total_size = 0
         files = []
         now = time.time()
         cutoff = now - (days_old * 86400)

206-239: Parameter validation not implemented (per past review comments).

Past review comments specifically requested validation for both days_old and min_size_mb parameters. Lines 220-221 compute values without validation:

  • Negative days_old makes cutoff future, matching all files
  • Negative min_size_mb makes min_size negative, matching all files (since sizes are always > negative numbers)

Apply this diff to add validation as recommended in past reviews:

     def scan_logs(self, min_size_mb: int = 100, days_old: int = 7) -> ScanResult:
         """
         Scan for large, old log files.
         
         Args:
             min_size_mb (int): Minimum size in MB to include.
             days_old (int): Minimum age in days to include.
             
         Returns:
             ScanResult: Result containing size and count of old log files.
         """
+        if days_old < 0:
+            raise ValueError("days_old must be >= 0")
+        if min_size_mb < 0:
+            raise ValueError("min_size_mb must be >= 0")
+        
         total_size = 0
         files = []
         now = time.time()
         cutoff = now - (days_old * 86400)
         min_size = min_size_mb * 1024 * 1024
🧹 Nitpick comments (5)
tests/test_cleanup_manager.py (4)

20-33: Incomplete test assertions.

The test creates a QuarantineItem with 5 fields but only asserts 3 of them (lines 31-33). Consider adding assertions for quarantine_path and timestamp to fully validate the dataclass behavior.

Apply this diff to add complete assertions:

         assert item.id == "abc123"
         assert item.original_path == temp_path
+        assert item.quarantine_path == "/home/user/.cortex/trash/abc123_test.txt"
+        assert item.timestamp == 1234567890.0
         assert item.size_bytes == 1024

39-47: Fragile test fixture bypassing actual initialization.

The fixture bypasses CleanupManager.__init__ entirely (line 42) and manually sets attributes. This approach is fragile because:

  • It won't catch bugs in the actual __init__ method
  • Changes to __init__ won't be reflected in tests
  • It doesn't test the default quarantine directory setup

Consider using the actual constructor with a custom quarantine directory:

     @pytest.fixture
     def manager(self, tmp_path):
         """Create a manager instance with temp quarantine directory."""
-        with patch.object(CleanupManager, '__init__', lambda self: None):
-            mgr = CleanupManager.__new__(CleanupManager)
-            mgr.quarantine_dir = tmp_path / "trash"
-            mgr.metadata_file = mgr.quarantine_dir / "metadata.json"
-            mgr._ensure_dir()
-            return mgr
+        # If CleanupManager accepts a custom quarantine_dir parameter:
+        return CleanupManager(quarantine_dir=tmp_path / "trash")
+        # Or patch the default location:
+        # with patch('cortex.cleanup.manager.Path.home', return_value=tmp_path):
+        #     return CleanupManager()

147-164: Brittle timing test using sleep.

Line 156 uses time.sleep(0.1) to ensure different timestamps between quarantined items. This approach:

  • Makes tests slower
  • Can be flaky if system is under load
  • Doesn't clearly express the test intent

Consider mocking time.time() to have explicit control over timestamps:

     def test_list_items_sorted(self, manager, tmp_path):
         """Test listing items sorted by timestamp."""
+        from unittest.mock import patch
+        
         # Create and quarantine multiple files
         file1 = tmp_path / "file1.txt"
         file2 = tmp_path / "file2.txt"
         file1.write_text("1")
         file2.write_text("2")
         
-        id1 = manager.quarantine_file(str(file1))
-        time.sleep(0.1)
-        id2 = manager.quarantine_file(str(file2))
+        with patch('time.time', return_value=1000.0):
+            id1 = manager.quarantine_file(str(file1))
+        with patch('time.time', return_value=2000.0):
+            id2 = manager.quarantine_file(str(file2))
         
         items = manager.list_items()
         
         assert len(items) == 2
         # Most recent should be first
         assert items[0].id == id2
         assert items[1].id == id1

166-202: Missing test for parameter validation.

The cleanup_old_items method raises ValueError for negative days (per cortex/cleanup/manager.py line 172), but there's no test validating this behavior. This is important for ensuring proper error handling.

Add a test for parameter validation:

def test_cleanup_old_items_negative_days(self, manager):
    """Test cleanup_old_items with negative days parameter."""
    with pytest.raises(ValueError, match="days must be >= 0"):
        manager.cleanup_old_items(days=-1)
tests/test_cleanup_cleaner.py (1)

212-217: Incomplete test assertions for empty cleanup.

The test test_run_cleanup_empty only asserts two categories (Package Cache and Orphaned Packages) but the run_cleanup method should return all four categories in the summary dict. The test should verify all expected keys are present with zero values.

Apply this diff to add complete assertions:

     def test_run_cleanup_empty(self, cleaner):
         """Test run_cleanup with empty results."""
         summary = cleaner.run_cleanup([])
         
         assert summary["Package Cache"] == 0
         assert summary["Orphaned Packages"] == 0
+        assert summary["Temporary Files"] == 0
+        assert summary["Old Logs"] == 0
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a2624aa and cf32829.

📒 Files selected for processing (5)
  • cortex/cleanup/cleaner.py (1 hunks)
  • cortex/cleanup/scanner.py (1 hunks)
  • tests/test_cleanup_cleaner.py (1 hunks)
  • tests/test_cleanup_manager.py (1 hunks)
  • tests/test_cleanup_scanner.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • cortex/cleanup/cleaner.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Follow PEP 8 style guide
Type hints required in Python code
Docstrings required for all public APIs

Files:

  • tests/test_cleanup_manager.py
  • tests/test_cleanup_scanner.py
  • tests/test_cleanup_cleaner.py
  • cortex/cleanup/scanner.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Maintain >80% test coverage for pull requests

Files:

  • tests/test_cleanup_manager.py
  • tests/test_cleanup_scanner.py
  • tests/test_cleanup_cleaner.py
🧬 Code graph analysis (2)
tests/test_cleanup_manager.py (1)
cortex/cleanup/manager.py (8)
  • QuarantineItem (11-26)
  • _ensure_dir (38-46)
  • _load_metadata (48-56)
  • _save_metadata (58-67)
  • quarantine_file (69-113)
  • restore_item (115-147)
  • list_items (149-160)
  • cleanup_old_items (162-195)
tests/test_cleanup_cleaner.py (3)
cortex/cleanup/cleaner.py (7)
  • DiskCleaner (26-255)
  • clean_package_cache (42-64)
  • remove_orphaned_packages (66-92)
  • _parse_freed_space (94-107)
  • clean_temp_files (131-165)
  • compress_logs (167-208)
  • run_cleanup (210-233)
cortex/cleanup/scanner.py (1)
  • ScanResult (17-30)
src/sandbox_executor.py (1)
  • success (52-54)
🔇 Additional comments (6)
tests/test_cleanup_scanner.py (2)

189-204: Past review feedback properly addressed.

The test now creates a small 2MB file (line 195) and uses min_size_mb=1 (line 201) instead of the previous 150MB file, which resolves the CI time/memory concerns from the earlier review.


189-204: Add parameter validation to scan_logs method.

The scan_logs method is missing validation for days_old >= 0 and min_size_mb >= 0. Without this, negative values would cause logical errors in the cutoff time and file size calculations. This validation pattern already exists in cortex/cleanup/manager.py. Add validation checks to both parameters and include tests verifying that ValueError is raised with appropriate messages when negative values are passed.

cortex/cleanup/scanner.py (3)

146-167: LGTM: Good ReDoS prevention.

Lines 159-161 implement a pre-check to avoid running the regex on lines without size units, preventing potential ReDoS attacks. Line 162's regex uses a bounded quantifier {0,20}, which is safe and efficient.


56-80: Missing return type hint.

The method scan_package_cache lacks a return type hint. According to coding guidelines, type hints are required in Python code.

This applies to all methods in the file. Apply type hints consistently:

-    def scan_package_cache(self) -> ScanResult:
+    # Already has return type hint - this is correct

Note: After checking, line 56 already has -> ScanResult. Verify all other methods have proper type hints.

Likely an incorrect or invalid review comment.


82-105: No permission issue with apt-get autoremove --simulate.

The --simulate flag performs a simulation without changing the system and disables locking, allowing simulations to be executed by non-root users. The command should succeed without sudo in most environments. If you need actual package removal, then sudo would be required, but for scanning orphaned packages, the simulate mode is appropriate and functional without elevated permissions.

tests/test_cleanup_cleaner.py (1)

40-53: [Your rewritten review comment text here]
[Exactly ONE classification tag]

Comment on lines 163 to 172
def test_scan_temp_files_with_old_files(self, scanner, tmp_path):
"""Test scan_temp_files with old files."""
scanner.temp_dirs = [tmp_path]

old_file = tmp_path / "old_file.txt"
old_file.write_bytes(b"x" * 500)

# Modify mtime to be old
old_time = time.time() - (10 * 86400) # 10 days ago
import os
os.utime(old_file, (old_time, old_time))

result = scanner.scan_temp_files(days_old=7)

assert result.count == 1
assert result.size_bytes == 500
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find scanner.py file
find . -name "scanner.py" -type f | head -20

Repository: cortexlinux/cortex

Length of output: 89


🏁 Script executed:

#!/bin/bash
# Read scanner.py to find scan_temp_files method
cat -n ./cortex/cleanup/scanner.py | grep -A 30 "def scan_temp_files"

Repository: cortexlinux/cortex

Length of output: 1385


🏁 Script executed:

#!/bin/bash
# Get full scan_temp_files method
cat -n ./cortex/cleanup/scanner.py | sed -n '169,210p'

Repository: cortexlinux/cortex

Length of output: 1769


🏁 Script executed:

#!/bin/bash
# Search for validation tests or negative parameter tests
grep -n "negative\|validation\|ValueError\|-1\|days_old" ./tests/test_cleanup_scanner.py | head -20

Repository: cortexlinux/cortex

Length of output: 243


Parameter validation missing in source code.

The scan_temp_files method in scanner.py does not validate that days_old >= 0. Negative values would create a future cutoff date, causing all files to match instead of old ones. Add validation to scanner.py first:

def scan_temp_files(self, days_old: int = 7) -> ScanResult:
    if days_old < 0:
        raise ValueError("days_old must be >= 0")
    # ... rest of implementation

Then add a test to verify this validation:

def test_scan_temp_files_negative_days(self, scanner):
    """Test scan_temp_files with negative days_old parameter."""
    with pytest.raises(ValueError, match="days_old must be >= 0"):
        scanner.scan_temp_files(days_old=-1)
🤖 Prompt for AI Agents
In tests/test_cleanup_scanner.py around lines 163 to 178: the review points out
missing parameter validation for scan_temp_files(days_old) in scanner.py; update
scanner.py's scan_temp_files to raise ValueError when days_old < 0 (e.g.,
validate at the start: if days_old < 0: raise ValueError("days_old must be >=
0")), and add a unit test in tests/test_cleanup_scanner.py that asserts a
ValueError is raised for scanner.scan_temp_files(days_old=-1) with the message
"days_old must be >= 0".

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
cortex/cleanup/scanner.py (2)

175-210: Validate days_old parameter to prevent unintended behavior.

This issue was raised in a previous review and has not been addressed. Negative days_old values will set cutoff in the future (Line 188), causing all files to be considered "old" and included in the scan, potentially leading to unintended data loss if cleanup is run with invalid parameters.

Apply this diff to validate the parameter:

     def scan_temp_files(self, days_old: int = 7) -> ScanResult:
         """
         Scan for temporary files older than X days.
         
         Args:
             days_old (int): Minimum age of files in days to include.
             
         Returns:
             ScanResult: Result containing size and count of old temp files.
         """
+        if days_old < 0:
+            raise ValueError("days_old must be >= 0")
+        
         total_size = 0
         files = []
         now = time.time()

212-245: Validate parameters to prevent unintended scan behavior.

This issue was raised in a previous review and has not been addressed. Similar to scan_temp_files:

  • Negative days_old sets cutoff in the future, matching all files.
  • Negative min_size_mb sets min_size to a negative value, matching all files.

Apply this diff to validate parameters:

     def scan_logs(self, min_size_mb: int = 100, days_old: int = 7) -> ScanResult:
         """
         Scan for large, old log files.
         
         Args:
             min_size_mb (int): Minimum size in MB to include.
             days_old (int): Minimum age in days to include.
             
         Returns:
             ScanResult: Result containing size and count of old log files.
         """
+        if days_old < 0:
+            raise ValueError("days_old must be >= 0")
+        if min_size_mb < 0:
+            raise ValueError("min_size_mb must be >= 0")
+        
         total_size = 0
         files = []
         now = time.time()
🧹 Nitpick comments (4)
cortex/cleanup/scanner.py (4)

2-2: Remove unused import.

The re module is imported but never used in the code. Line 159 comments about avoiding regex, but no regex operations are present.

Apply this diff:

-import re

36-40: Add type hints for instance attributes.

The instance attributes lack type hints, which are required per coding guidelines. While type checkers can infer these, explicit hints improve code clarity and maintainability.

Apply this diff:

     def __init__(self):
-        self.apt_cache_dir = Path("/var/cache/apt/archives")
-        self.log_dir = Path("/var/log")
+        self.apt_cache_dir: Path = Path("/var/cache/apt/archives")
+        self.log_dir: Path = Path("/var/log")
         # Use tempfile.gettempdir() for platform-independent temp directory
-        self.temp_dirs = [Path(tempfile.gettempdir()), Path.home() / ".cache"]
+        self.temp_dirs: List[Path] = [Path(tempfile.gettempdir()), Path.home() / ".cache"]

90-92: Remove or correct misleading comment.

The comment mentions strict=False but the actual code uses validate=True, and there's no strict parameter in the run_command call. This inconsistency may confuse future maintainers.

Apply this diff to remove the misleading comment:

         # Note: This requires apt-get to be installed
         cmd = "apt-get autoremove --simulate"
-        # We use strict=False because apt-get might output to stderr which run_command captures
         result = run_command(cmd, validate=True)

230-230: Consider matching rotated and compressed log files.

The current pattern *.log only matches active log files. Consider including rotated logs (e.g., *.log.*, *.log.1, *.log.gz) to provide a more comprehensive scan.

Apply this diff to expand the pattern:

         if self.log_dir.exists():
-            for filepath in self.log_dir.rglob("*.log"):
+            for filepath in self.log_dir.rglob("*.log*"):
                 if filepath.is_file():
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cf32829 and dd9c944.

📒 Files selected for processing (2)
  • cortex/cleanup/cleaner.py (1 hunks)
  • cortex/cleanup/scanner.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • cortex/cleanup/cleaner.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Follow PEP 8 style guide
Type hints required in Python code
Docstrings required for all public APIs

Files:

  • cortex/cleanup/scanner.py
🔇 Additional comments (5)
cortex/cleanup/scanner.py (5)

16-30: LGTM!

The dataclass is well-structured with proper type hints, docstring, and correct use of default_factory to avoid mutable default arguments.


42-54: LGTM!

The method correctly orchestrates all scan operations and returns a consolidated list of results.


56-80: LGTM!

The method properly scans the apt cache directory with appropriate error handling for file access issues.


121-144: LGTM!

The parsing logic correctly identifies and extracts package names from the apt-get autoremove output format.


146-173: LGTM!

The size extraction uses string operations instead of regex to avoid ReDoS vulnerabilities, with proper error handling and safe fallback to 0.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
cortex/cli.py (3)

579-613: The --safe flag issue flagged in past reviews remains unfixed.

Line 583 derives safety from args.force instead of using args.safe. Combined with the flag definition at line 830 (store_true with default=True), the --safe flag cannot be disabled and is effectively ignored.

This matches the past review comment that suggested making --safe and --force mutually exclusive and fixing the logic to use args.safe directly.


646-693: Schedule enable still hardcodes safe_mode=True (duplicate of past review).

Line 677 hardcodes safe_mode=True when enabling scheduled cleanup, ignoring any user preference. This was previously flagged in the past review comments.

The past review suggested adding a --no-safe or --force flag to the schedule subcommand to give users control over safe mode behavior in scheduled cleanups.


821-843: CLI argument definition for --safe flag is broken (duplicate of past review).

Line 830 defines --safe with action='store_true' and default=True, which means:

  • The flag is always True (can't be disabled)
  • There's no way for users to turn off safe mode
  • The flag is subsequently ignored in favor of not args.force (line 583)

This was flagged in past reviews with a suggested fix to make --safe and --force mutually exclusive and properly respect the user's choice.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dd9c944 and b89e73c.

📒 Files selected for processing (2)
  • cortex/cli.py (3 hunks)
  • cortex/utils/commands.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Follow PEP 8 style guide
Type hints required in Python code
Docstrings required for all public APIs

Files:

  • cortex/utils/commands.py
  • cortex/cli.py
🧬 Code graph analysis (1)
cortex/cli.py (5)
cortex/cleanup/scanner.py (2)
  • CleanupScanner (32-245)
  • scan_all (42-54)
cortex/cleanup/cleaner.py (2)
  • DiskCleaner (26-264)
  • run_cleanup (219-242)
cortex/cleanup/manager.py (3)
  • CleanupManager (28-195)
  • list_items (149-160)
  • restore_item (115-147)
cortex/first_run_wizard.py (1)
  • _print_error (746-748)
cortex/cleanup/scheduler.py (5)
  • CleanupScheduler (64-457)
  • ScheduleInterval (21-25)
  • get_status (191-207)
  • disable_schedule (169-189)
  • enable_schedule (121-167)
🔇 Additional comments (4)
cortex/cli.py (4)

540-553: Good: Missing subcommand now handled correctly.

The check for missing cleanup_action properly addresses the past review feedback. Users will now receive clear guidance when running cortex cleanup without a subcommand.


554-577: LGTM: Scan implementation is clear and functional.

The scan subcommand correctly orchestrates the scanner, formats results in a table, and provides a clear summary of reclaimable space.


615-644: LGTM: Undo subcommand logic is correct.

The implementation properly handles both listing quarantined items and restoring specific items. The table output provides clear information for users.


697-702: LGTM: Size formatting helper is correct.

The _format_size method correctly converts bytes to human-readable format using binary units (1024-based). The implementation is clear and follows common patterns.

Comment on lines 190 to 197
# Skip sudo and any flags (starting with -)
actual_command_index = 1
while actual_command_index < len(parts) and parts[actual_command_index].startswith('-'):
actual_command_index += 1
if actual_command_index < len(parts):
first_word = parts[actual_command_index]
else:
return False, "No command found after sudo"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, let's find and examine the code in cortex/utils/commands.py
find . -name "commands.py" -type f | head -20

Repository: cortexlinux/cortex

Length of output: 88


🏁 Script executed:

#!/bin/bash
# Read the validate_command function and surrounding context
sed -n '130,210p' ./cortex/utils/commands.py

Repository: cortexlinux/cortex

Length of output: 3030


🏁 Script executed:

#!/bin/bash
# Check ALLOWED_COMMAND_PREFIXES and surrounding context
grep -n "ALLOWED_COMMAND_PREFIXES" ./cortex/utils/commands.py -A 10 -B 2

Repository: cortexlinux/cortex

Length of output: 722


🏁 Script executed:

#!/bin/bash
# Extract lines 190-197 specifically
sed -n '190,197p' ./cortex/utils/commands.py

Repository: cortexlinux/cortex

Length of output: 482


🏁 Script executed:

#!/bin/bash
# Run the test script from the review comment to verify the bug
cd . && python3 << 'EOF'
import sys
sys.path.insert(0, '.')

from cortex.utils.commands import validate_command

test_cases = [
    ("sudo apt update", True, "Basic sudo"),
    ("sudo -E apt update", True, "Sudo with -E flag"),
    ("sudo -u root apt update", True, "Sudo with -u flag (takes arg)"),
    ("sudo -u alice apt-get install vim", True, "Sudo with -u alice"),
    ("sudo -g wheel yum update", True, "Sudo with -g flag (takes arg)"),
    ("sudo -H -E apt update", True, "Multiple flags"),
]

print("Testing sudo command validation:")
print("=" * 70)
for cmd, expected_valid, description in test_cases:
    try:
        is_valid, error = validate_command(cmd, strict=True)
        status = "✓ PASS" if is_valid == expected_valid else "✗ FAIL"
        print(f"{status}: {description}")
        print(f"  Command: {cmd}")
        print(f"  Result: valid={is_valid}, error={error}")
        if is_valid != expected_valid:
            print(f"  EXPECTED: valid={expected_valid}")
    except Exception as e:
        print(f"✗ ERROR: {description}")
        print(f"  Command: {cmd}")
        print(f"  Exception: {e}")
    print()
EOF

Repository: cortexlinux/cortex

Length of output: 148


🏁 Script executed:

#!/bin/bash
# Simpler direct test of the parsing logic
python3 << 'EOF'
# Test the parsing logic directly without importing the module
def test_parsing_logic():
    test_cases = [
        ("sudo apt update", ['sudo', 'apt', 'update']),
        ("sudo -E apt update", ['sudo', '-E', 'apt', 'update']),
        ("sudo -u root apt update", ['sudo', '-u', 'root', 'apt', 'update']),
        ("sudo -u alice apt-get install vim", ['sudo', '-u', 'alice', 'apt-get', 'install', 'vim']),
        ("sudo -g wheel yum update", ['sudo', '-g', 'wheel', 'yum', 'update']),
    ]
    
    print("Testing sudo flag parsing logic:")
    print("=" * 70)
    
    for cmd, parts in test_cases:
        # Replicate the exact logic from lines 190-197
        actual_command_index = 1
        while actual_command_index < len(parts) and parts[actual_command_index].startswith('-'):
            actual_command_index += 1
        
        if actual_command_index < len(parts):
            first_word = parts[actual_command_index]
        else:
            first_word = "ERROR: No command found"
        
        print(f"Command: {cmd}")
        print(f"  Parts: {parts}")
        print(f"  Detected first_word: '{first_word}'")
        
        # Check if it would be in allowlist
        ALLOWED = ["apt", "apt-get", "apt-cache", "dpkg", "yum", "dnf"]
        is_allowed = first_word in ALLOWED
        print(f"  In allowlist: {is_allowed}")
        print()

test_parsing_logic()
EOF

Repository: cortexlinux/cortex

Length of output: 828


🏁 Script executed:

#!/bin/bash
# Verify docstring and type hints are present
head -n 210 ./cortex/utils/commands.py | grep -A 3 "def validate_command"

Repository: cortexlinux/cortex

Length of output: 191


Critical bug: sudo flag parsing fails with flags that take arguments.

The code at lines 190-197 skips tokens starting with - but doesn't account for flags with arguments (e.g., -u user, -g group). For sudo -u alice apt update, the code incorrectly treats alice as the command instead of apt, causing validation to fail.

This affects common sudo usage patterns. Fix by skipping flag arguments as well:

# Skip sudo and its flags (some take arguments)
sudo_flags_with_args = {'-u', '-g', '-C', '-R', '-D', '-P', '-U', '--user', '--group', '--close-from', '--role'}
actual_command_index = 1
while actual_command_index < len(parts):
    token = parts[actual_command_index]
    if not token.startswith('-'):
        break
    actual_command_index += 1
    # Skip the argument if this flag takes one
    if token in sudo_flags_with_args or token.split('=')[0] in sudo_flags_with_args:
        actual_command_index += 1

Alternatively, use shlex for more robust parsing or test against real sudo invocations.

🤖 Prompt for AI Agents
In cortex/utils/commands.py around lines 190-197, the sudo parsing loop only
skips tokens that start with '-' and therefore treats flag arguments (e.g. the
"alice" in "sudo -u alice apt update") as the command; update the loop to also
skip flag arguments for known sudo flags that take values (e.g. -u, -g, -C, -R,
-D, -P, -U and their long forms) by incrementing the index an extra step when
the current token is one of those flags (or is a flag with an "=" assignment),
so the loop advances past both the flag and its argument before selecting the
first real command token; alternatively replace this parsing with shlex-based
parsing and/or validate against actual sudo invocations for robustness.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (5)
cortex/cleanup/scanner.py (2)

173-205: Validate days_old parameter to prevent unintended behavior.

Negative days_old values will set cutoff in the future (line 186), causing all files to be considered "old" and included in the scan. This could lead to unintended data loss if cleanup is run with invalid parameters.

Apply this diff to validate the parameter:

     def scan_temp_files(self, days_old: int = 7) -> ScanResult:
         """
         Scan for temporary files older than X days.
         
         Args:
             days_old (int): Minimum age of files in days to include.
             
         Returns:
             ScanResult: Result containing size and count of old temp files.
         """
+        if days_old < 0:
+            raise ValueError("days_old must be >= 0")
+        
         total_size = 0
         files = []
         now = time.time()

207-235: Validate parameters to prevent unintended scan behavior.

Similar to scan_temp_files, negative days_old or min_size_mb values can cause unexpected behavior:

  • Negative days_old sets cutoff in the future (line 221), matching all files.
  • Negative min_size_mb sets min_size to a negative value (line 222), matching all files.

Apply this diff to validate parameters:

     def scan_logs(self, min_size_mb: int = 100, days_old: int = 7) -> ScanResult:
         """
         Scan for large, old log files.
         
         Args:
             min_size_mb (int): Minimum size in MB to include.
             days_old (int): Minimum age in days to include.
             
         Returns:
             ScanResult: Result containing size and count of old log files.
         """
+        if days_old < 0:
+            raise ValueError("days_old must be >= 0")
+        if min_size_mb < 0:
+            raise ValueError("min_size_mb must be >= 0")
+        
         total_size = 0
         files = []
         now = time.time()
cortex/cleanup/manager.py (2)

119-151: Path validation and overwrite handling missing.

The past review identified two security/correctness issues that remain unaddressed:

  1. No path validation: Metadata could be tampered to contain paths outside the quarantine directory, allowing arbitrary file moves if metadata has weak permissions.
  2. Undefined overwrite behavior: If original_path already exists, shutil.move() behavior is OS-dependent and could silently overwrite or fail.

Consider applying the validation suggested in the past review comment:

     def restore_item(self, item_id: str) -> bool:
         ...
         original_path = Path(item_data["original_path"])
         quarantine_path = Path(item_data["quarantine_path"])
+
+        try:
+            qdir = self.quarantine_dir.resolve()
+            qpath = quarantine_path.resolve()
+            if not qpath.is_relative_to(qdir):
+                return False
+        except Exception:
+            return False
+
+        # Refuse to overwrite existing files by default
+        if original_path.exists():
+            return False

153-164: Add defensive handling for malformed metadata entries.

Line 163 directly constructs QuarantineItem(**v) without error handling. If metadata entries have missing or renamed keys (e.g., if id field is missing or if future schema changes occur), this will raise TypeError and break the entire list_items() method.

Apply this diff to handle malformed entries gracefully:

     def list_items(self) -> list[QuarantineItem]:
         """
         List all items in quarantine.
         
         Returns:
             List[QuarantineItem]: List of quarantined items sorted by date.
         """
         metadata = self._load_metadata()
-        items = []
-        for k, v in metadata.items():
-            items.append(QuarantineItem(**v))
+        items: list[QuarantineItem] = []
+        for _k, v in metadata.items():
+            try:
+                items.append(QuarantineItem(**v))
+            except TypeError:
+                continue
         return sorted(items, key=lambda x: x.timestamp, reverse=True)
cortex/cleanup/scheduler.py (1)

105-121: Set restrictive permissions on configuration file.

Similar to manager.py (lines 68-71), the schedule configuration file should have restrictive permissions (0o600) since it contains potentially sensitive settings. Currently, permissions depend on umask.

Apply this diff:

     def save_config(self, config: ScheduleConfig) -> bool:
         """
         Save schedule configuration to file.
         
         Args:
             config: Configuration to save.
             
         Returns:
             bool: True if saved successfully.
         """
         try:
             with open(self.config_file, 'w', encoding='utf-8') as f:
                 json.dump(config.to_dict(), f, indent=2)
+            try:
+                self.config_file.chmod(0o600)
+            except OSError:
+                pass
             return True
         except OSError as e:
             logger.error(f"Failed to save schedule config: {e}")
             return False
🧹 Nitpick comments (2)
cortex/cleanup/cleaner.py (1)

1-6: Remove unused import.

The re module is imported on line 3 but is not used anywhere in this file.

Apply this diff:

 import gzip
 import logging
-import re
 import shutil
 from pathlib import Path
 from typing import Optional
cortex/cleanup/scanner.py (1)

1-6: Remove unused import.

The re module is imported on line 1 but is not used anywhere in this file. The code uses string operations instead of regex (as noted in comments at line 156).

Apply this diff:

-import re
 import tempfile
 import time
 from dataclasses import dataclass, field
 from pathlib import Path
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b89e73c and feb5cca.

📒 Files selected for processing (9)
  • cortex/cleanup/__init__.py (1 hunks)
  • cortex/cleanup/cleaner.py (1 hunks)
  • cortex/cleanup/manager.py (1 hunks)
  • cortex/cleanup/scanner.py (1 hunks)
  • cortex/cleanup/scheduler.py (1 hunks)
  • tests/test_cleanup_cleaner.py (1 hunks)
  • tests/test_cleanup_manager.py (1 hunks)
  • tests/test_cleanup_scanner.py (1 hunks)
  • tests/test_cleanup_scheduler.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • tests/test_cleanup_cleaner.py
  • tests/test_cleanup_scheduler.py
  • tests/test_cleanup_manager.py
  • tests/test_cleanup_scanner.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Follow PEP 8 style guide
Type hints required in Python code
Docstrings required for all public APIs

Files:

  • cortex/cleanup/scanner.py
  • cortex/cleanup/manager.py
  • cortex/cleanup/__init__.py
  • cortex/cleanup/cleaner.py
  • cortex/cleanup/scheduler.py
{setup.py,setup.cfg,pyproject.toml,**/__init__.py}

📄 CodeRabbit inference engine (AGENTS.md)

Use Python 3.10 or higher as the minimum supported version

Files:

  • cortex/cleanup/__init__.py
🧠 Learnings (1)
📚 Learning: 2025-12-11T12:03:24.071Z
Learnt from: CR
Repo: cortexlinux/cortex PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-11T12:03:24.071Z
Learning: Applies to **/*install*.py : No silent sudo execution - require explicit user confirmation

Applied to files:

  • cortex/cleanup/cleaner.py
🧬 Code graph analysis (3)
cortex/cleanup/__init__.py (3)
cortex/cleanup/cleaner.py (1)
  • DiskCleaner (28-267)
cortex/cleanup/manager.py (2)
  • CleanupManager (31-199)
  • QuarantineItem (12-28)
cortex/cleanup/scanner.py (2)
  • CleanupScanner (35-235)
  • ScanResult (18-32)
cortex/cleanup/cleaner.py (2)
cortex/cleanup/manager.py (1)
  • quarantine_file (73-117)
cortex/cleanup/scanner.py (3)
  • CleanupScanner (35-235)
  • ScanResult (18-32)
  • scan_package_cache (60-81)
cortex/cleanup/scheduler.py (1)
cortex/logging_system.py (2)
  • warning (202-204)
  • error (206-208)
🔇 Additional comments (21)
cortex/cleanup/cleaner.py (6)

45-67: LGTM!

The method correctly uses sudo -n to prevent password prompts in non-interactive environments, estimates size before cleaning, and handles errors appropriately. Type hints and docstring are present.


69-95: LGTM!

The method correctly uses sudo -n to prevent password prompts, handles dry-run mode appropriately, and parses freed space from command output. Type hints and docstring are present.


97-141: LGTM!

The parsing helpers correctly extract size information from command output using string operations (avoiding ReDoS concerns). Error handling is appropriate with try-except for ValueError. Type hints and docstrings are present.


143-177: LGTM!

The method correctly stats files before quarantining them, tracks freed bytes in both dry-run and real modes, and handles missing files gracefully. Type hints and docstring are present.


179-220: LGTM!

The method correctly compresses log files using gzip, verifies the compressed output, and calculates freed space. The broad exception handling is acceptable for batch operations where individual failures shouldn't stop the entire process. Type hints and docstring are present.


222-267: LGTM! The safe parameter is correctly implemented.

The safe parameter is properly honored in _process_category (line 262): orphaned packages are only removed when safe is False. Other categories use inherently safe operations (quarantine for temp files, compression for logs, standard cache cleaning for packages). The implementation aligns with the acceptance criteria.

cortex/cleanup/scanner.py (3)

17-32: LGTM!

The ScanResult dataclass is properly defined with appropriate type hints and a docstring. Using field(default_factory=list) for the mutable default is the correct pattern.


60-81: LGTM!

The method correctly scans the package cache, handles missing files gracefully, and returns a properly structured ScanResult. Type hints and docstring are present.


83-171: LGTM!

The orphaned package scanning logic correctly parses apt-get output using string operations (avoiding ReDoS concerns). Error handling is appropriate, and all methods have proper type hints and docstrings.

cortex/cleanup/manager.py (4)

11-28: LGTM!

The QuarantineItem dataclass is properly defined with appropriate type hints and a comprehensive docstring.


36-50: LGTM!

The initialization and directory setup correctly enforce secure permissions (0o700) even for pre-existing directories. Type hints and docstrings are present as required by coding guidelines.


52-71: LGTM!

Metadata handling is robust with atomic writes (temp file + os.replace) and restrictive permissions (0o600). Error handling covers both JSON and OS errors. Type hints and docstrings are present.


166-199: LGTM!

The cleanup_old_items method correctly validates parameters, handles malformed metadata gracefully, and only updates metadata if items were actually removed. Type hints and docstring are present.

cortex/cleanup/__init__.py (1)

1-21: LGTM!

The package initialization correctly documents the module's purpose and exposes the public API. All imports are properly listed in __all__, ensuring a clean package interface.

cortex/cleanup/scheduler.py (7)

21-26: LGTM!

The ScheduleInterval enum is cleanly defined with appropriate values for scheduling.


29-63: LGTM!

The ScheduleConfig dataclass is properly defined with type hints, docstring, and serialization methods. The use of float | None for last_run is appropriate.


76-85: LGTM!

The initialization correctly sets up the config directory with secure permissions (0o700). Type hints are present as required.


123-167: LGTM!

The enable_schedule method correctly propagates the safe_mode parameter to both systemd and cron setup methods, addressing the past review concern. Error handling and fallback logic are appropriate.


169-207: LGTM!

Both methods are properly implemented with appropriate type hints and docstrings. The status reporting includes all relevant information for debugging.


209-241: LGTM!

The schedule generation methods correctly produce systemd OnCalendar and cron expressions for each interval. Type hints and docstrings are present.


243-455: LGTM!

All scheduling methods correctly implement the safe_mode parameter (lines 268, 277 for systemd; lines 375, 377 for cron), addressing the past review concern. Timeout handling and error recovery are appropriate. Type hints and docstrings are present throughout.

Comment on lines +34 to +43
def __init__(self, dry_run: bool = False):
"""
Initialize the DiskCleaner.
Args:
dry_run (bool): If True, simulate actions without modifying the filesystem.
"""
self.dry_run = dry_run
self.scanner = CleanupScanner()
self.manager = CleanupManager()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add missing return type hint.

The __init__ method is missing a return type hint (-> None). As per coding guidelines, type hints are required for Python code.

Apply this diff:

-    def __init__(self, dry_run: bool = False):
+    def __init__(self, dry_run: bool = False) -> None:
         """
         Initialize the DiskCleaner.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def __init__(self, dry_run: bool = False):
"""
Initialize the DiskCleaner.
Args:
dry_run (bool): If True, simulate actions without modifying the filesystem.
"""
self.dry_run = dry_run
self.scanner = CleanupScanner()
self.manager = CleanupManager()
def __init__(self, dry_run: bool = False) -> None:
"""
Initialize the DiskCleaner.
Args:
dry_run (bool): If True, simulate actions without modifying the filesystem.
"""
self.dry_run = dry_run
self.scanner = CleanupScanner()
self.manager = CleanupManager()
🤖 Prompt for AI Agents
In cortex/cleanup/cleaner.py around lines 34 to 43, the __init__ method is
missing the return type hint; update the method signature to include "-> None"
(i.e., def __init__(self, dry_run: bool = False) -> None:) and keep the existing
docstring and body unchanged so the constructor conforms to the project's typing
guidelines.

Comment on lines +73 to +117
def quarantine_file(self, filepath_str: str) -> str | None:
"""
Move a file to quarantine and return its ID.
Args:
filepath_str (str): Path to the file to quarantine.
Returns:
Optional[str]: ID of the quarantined item, or None if failed.
"""
filepath = Path(filepath_str)
if not filepath.exists():
return None

item_id = str(uuid.uuid4())[:8]
filename = filepath.name
quarantine_path = self.quarantine_dir / f"{item_id}_{filename}"

try:
# Get file stats before moving
size = filepath.stat().st_size

# Check if we have write access to the file
if not os.access(filepath, os.W_OK):
return None

shutil.move(str(filepath), str(quarantine_path))

item = QuarantineItem(
id=item_id,
original_path=str(filepath),
quarantine_path=str(quarantine_path),
timestamp=time.time(),
size_bytes=size,
)

metadata = self._load_metadata()
metadata[item_id] = asdict(item)
self._save_metadata(metadata)

return item_id

except Exception:
# Log error?
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Several defensive checks from past review remain unaddressed.

While some improvements were made, the following concerns from the past review are still present:

  1. Line 87: Using [:8] truncation of UUID increases collision risk. Past review suggested using full uuid.uuid4().hex.
  2. Lines 95-97: os.access() is a TOCTOU check that should be removed. The subsequent shutil.move() will fail if permissions are insufficient.
  3. Line 84-85: Missing checks for filepath.is_file() and filepath.is_symlink() means directories and symlinks could be quarantined.
  4. Lines 109-111: No rollback attempt if _save_metadata() fails after the file has been moved, which could orphan files in quarantine.

Consider applying the changes suggested in the original past review comment to address these issues.

🤖 Prompt for AI Agents
In cortex/cleanup/manager.py around lines 73 to 117, the quarantine_file
implementation needs four fixes: replace the truncated id generation ([:8]) with
uuid.uuid4().hex to avoid collision risk; before proceeding ensure
filepath.is_file() is True and filepath.is_symlink() is False to reject
directories and symlinks; remove the TOCTOU os.access() check and let
shutil.move raise on permission errors; and add rollback/cleanup if metadata
save fails — after moving the file attempt to persist metadata and if
_save_metadata raises, move the file back to its original location (or remove
the quarantine file if move-back fails), log the error, and return None so no
orphaned quarantined files remain. Ensure exceptions are logged with details.

Comment on lines +40 to +44
def __init__(self):
self.apt_cache_dir = Path("/var/cache/apt/archives")
self.log_dir = Path("/var/log")
# Use tempfile.gettempdir() for platform-independent temp directory
self.temp_dirs = [Path(tempfile.gettempdir()), Path.home() / ".cache"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add missing return type hint.

The __init__ method is missing a return type hint (-> None). As per coding guidelines, type hints are required for Python code.

Apply this diff:

-    def __init__(self):
+    def __init__(self) -> None:
         self.apt_cache_dir = Path("/var/cache/apt/archives")
         self.log_dir = Path("/var/log")
         # Use tempfile.gettempdir() for platform-independent temp directory
         self.temp_dirs = [Path(tempfile.gettempdir()), Path.home() / ".cache"]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def __init__(self):
self.apt_cache_dir = Path("/var/cache/apt/archives")
self.log_dir = Path("/var/log")
# Use tempfile.gettempdir() for platform-independent temp directory
self.temp_dirs = [Path(tempfile.gettempdir()), Path.home() / ".cache"]
def __init__(self) -> None:
self.apt_cache_dir = Path("/var/cache/apt/archives")
self.log_dir = Path("/var/log")
# Use tempfile.gettempdir() for platform-independent temp directory
self.temp_dirs = [Path(tempfile.gettempdir()), Path.home() / ".cache"]
🤖 Prompt for AI Agents
In cortex/cleanup/scanner.py around lines 40 to 44, the __init__ method is
missing the return type hint; update the method signature to include "-> None"
(i.e., def __init__(self) -> None:) to satisfy the project's type hinting
guidelines and static type checkers, leaving the body unchanged.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
cortex/cli.py (3)

812-812: Schedule enable still hardcodes safe_mode=True.

This is the same issue flagged in the previous unresolved review comment. The scheduled cleanup always uses safe_mode=True, ignoring any user preference. If this is intentional for safety reasons, it should be documented. Otherwise, consider adding a flag (e.g., --no-safe or --force) to the schedule subcommand.

This duplicates the existing unresolved review comment about hardcoded safe_mode=True.


970-973: Fix --safe flag configuration (always True, can't be disabled).

The --safe flag is misconfigured with action='store_true' and default=True, making it always True whether the user specifies it or not. This makes the flag meaningless and prevents users from disabling safe mode.

The past review suggested using a mutually exclusive group, which would properly enforce that only one of --safe or --force can be specified at a time.

🔎 Recommended fix using mutually exclusive group
     # Run
     run_parser = cleanup_subs.add_parser("run", help="Execute cleanup")
-    run_parser.add_argument(
-        "--safe", action="store_true", default=True, help="Safe cleanup (default)"
-    )
-    run_parser.add_argument("--force", "--all", action="store_true", help="Clean all found items")
+    safety_group = run_parser.add_mutually_exclusive_group()
+    safety_group.add_argument(
+        "--safe", action="store_true", default=True, help="Safe cleanup with quarantine (default)"
+    )
+    safety_group.add_argument("--force", "--all", action="store_true", help="Aggressive cleanup (may be destructive)")
     run_parser.add_argument("-y", "--yes", action="store_true", help="Skip confirmation")

Then update line 711 to properly use the flag:

-            is_safe = not args.force
+            is_safe = args.safe and not args.force

711-711: The --safe flag remains ineffective (past review issue not fully resolved).

Line 711 still derives safety from args.force rather than respecting args.safe. Additionally, the --safe flag at line 971 is defined with default=True and store_true, making it always True and impossible to disable. Past review comments claimed this was addressed in commit 09a7a21, but the implementation still doesn't work correctly.

Current behavior:

  • --safe is always True (can't be toggled)
  • is_safe is determined by not args.force, ignoring args.safe
🔎 Recommended fix

At line 711:

-            # Check for force/safe flags
-            is_safe = not args.force
+            # Safe mode is the default; --force overrides it
+            is_safe = args.safe and not args.force

And update line 971 to make --safe properly toggleable or use a mutually exclusive group (see also the argparse configuration review comment).

🧹 Nitpick comments (1)
cortex/cli.py (1)

678-680: Simplify redundant hasattr check.

The hasattr(args, "cleanup_action") check is redundant because argparse sets cleanup_action to None when no subcommand is provided. You can simplify this to if args.cleanup_action is None:.

🔎 Proposed simplification
-        if not hasattr(args, "cleanup_action") or args.cleanup_action is None:
+        if args.cleanup_action is None:
             self._print_error("Please specify a subcommand (scan/run/undo/schedule)")
             return 1
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between feb5cca and c463787.

📒 Files selected for processing (2)
  • cortex/cli.py (5 hunks)
  • cortex/utils/commands.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • cortex/utils/commands.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Follow PEP 8 style guide
Type hints required in Python code
Docstrings required for all public APIs

Files:

  • cortex/cli.py
🧬 Code graph analysis (1)
cortex/cli.py (4)
cortex/cleanup/cleaner.py (2)
  • DiskCleaner (28-267)
  • run_cleanup (222-245)
cortex/cleanup/manager.py (3)
  • CleanupManager (31-199)
  • list_items (153-164)
  • restore_item (119-151)
cortex/cleanup/scanner.py (2)
  • CleanupScanner (35-235)
  • scan_all (46-58)
cortex/cleanup/scheduler.py (5)
  • CleanupScheduler (66-455)
  • ScheduleInterval (21-26)
  • get_status (191-207)
  • disable_schedule (169-189)
  • enable_schedule (123-167)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build Package
🔇 Additional comments (3)
cortex/cli.py (3)

376-376: LGTM: Improved step descriptions.

The change to use f-strings with 1-indexed step numbers improves clarity and consistency with the progress callback output.


505-505: LGTM: Consistent formatting.

The f-string formatting for additional package count is clear and correct.


832-837: LGTM: Clean size formatting helper.

The _format_size method is well-implemented with proper type hints and standard byte-to-human-readable conversion logic.

return 0

# --- Cleanup Command ---
def cleanup(self, args):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add missing type hints to public method.

The cleanup method lacks type hints, violating the coding guidelines requirement that "Type hints required in Python code."

🔎 Proposed fix
-    def cleanup(self, args):
+    def cleanup(self, args: argparse.Namespace) -> int:
         """Handle cleanup commands"""

Based on coding guidelines, type hints are required for Python code.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def cleanup(self, args):
def cleanup(self, args: argparse.Namespace) -> int:
"""Handle cleanup commands"""
🤖 Prompt for AI Agents
In cortex/cli.py around line 667, the public method cleanup lacks type hints;
add a parameter and return type annotation (e.g., def cleanup(self, args:
argparse.Namespace) -> None:) or if the concrete type is unknown use args: Any
and add the corresponding import (from typing import Any). Update the signature
accordingly and import argparse or typing as needed so the method and its
parameter are fully type-annotated.

@mikejmorgan-ai
Copy link
Member

⚠️ This PR has merge conflicts. Please rebase on main to resolve conflicts before it can be reviewed and merged.

- Add CleanupScanner for identifying cleanup opportunities
- Add DiskCleaner for executing cleanup operations
- Add CleanupManager for quarantine/undo functionality
- Add CleanupScheduler for automatic scheduling (systemd/cron)
- Add schedule subcommand to cortex cleanup CLI
- Add comprehensive unit tests (88% coverage)
- Add CLEANUP_GUIDE.md documentation

Closes cortexlinux#125
- manager.py: Harden directory permissions, atomic writes, robust metadata
- cleaner.py: Add sudo -n flag, fix freed bytes calculation, use safe param
- scheduler.py: Pass safe_mode to systemd/cron setup
- cli.py: Add error message when no subcommand specified
- test_cleanup_scanner.py: Reduce test file size from 150MB to 2MB
- test_cleanup_cleaner.py: Fix test for safe mode behavior
- CLEANUP_GUIDE.md: Fix markdownlint issues
Skip sudo options like -n when validating the actual command.
This allows sudo -n apt-get clean to be properly validated.
- cleaner.py: Define category constants, simplify _parse_freed_space
- scanner.py: Extract _extract_packages/_extract_size, add constants
- scheduler.py: Define CRON_TAG constant, fix redundant exception
- manager.py: Remove redundant ValueError from exception handler
- cli.py: Fix unnecessary f-strings
- tests: Fix unused variables and float comparisons
- cleaner.py: Document regex is safe from ReDoS
- scanner.py: Document temp directory scanning is intentional
- Use bounded regex quantifier {0,20} instead of * to prevent ReDoS
- Use tempfile.gettempdir() instead of hardcoded /tmp
- Add guard clause before regex to check for KB/MB/GB units
- Use tempfile.gettempdir() in test files instead of hardcoded /tmp
- Prevents regex execution on lines without valid units
- Remove re.search() usage in size extraction functions
- Use str.find() and split() for parsing apt-get output
- Fully eliminates ReDoS vulnerability warnings from SonarQube
- Remove deprecated typing.List/Dict/Tuple imports
- Fix import sorting and whitespace issues
- Apply ruff format to cleanup modules and test files
- Fix import sorting (I001)
- Remove whitespace from blank lines (W293)
@hyaku0121 hyaku0121 force-pushed the feature/disk-cleanup-optimization branch from c463787 to 467c5c5 Compare December 20, 2025 04:42
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/intent/clarifier.py (1)

14-37: Missing docstring for public method.

The needs_clarification method is a public API but lacks a docstring. As per coding guidelines, docstrings are required for all public APIs.

🔎 Suggested docstring
     def needs_clarification(self, intents: list[Intent], text: str) -> str | None:
+        """
+        Determine if the detected intents require clarification from the user.
+        
+        Args:
+            intents: List of detected Intent objects from the user's input.
+            text: The original user input text.
+            
+        Returns:
+            A clarifying question string if clarification is needed, None otherwise.
+        """
         text = text.lower()

Based on coding guidelines requiring docstrings for all public APIs.

♻️ Duplicate comments (9)
cortex/cli.py (2)

667-667: Add missing type hints to public method.

The cleanup method lacks type hints, violating the coding guidelines requirement that "Type hints required in Python code."

🔎 Proposed fix
-    def cleanup(self, args):
+    def cleanup(self, args: argparse.Namespace) -> int:
         """Handle cleanup commands"""

Based on coding guidelines, type hints are required for Python code.


808-812: Schedule enable hardcodes safe_mode=True, ignoring user preference.

The CLI has no option to disable safe mode for scheduled cleanup. If this is intentional for safety, consider documenting it. Otherwise, add a --force option to the schedule subcommand to allow users to opt out of safe mode.

🔎 Proposed fix
             if args.enable:
                 interval = (
                     ScheduleInterval(args.interval) if args.interval else ScheduleInterval.WEEKLY
                 )
-                result = scheduler.enable_schedule(interval=interval, safe_mode=True)
+                # Use safe mode by default for scheduled cleanup unless --force is specified
+                safe = not getattr(args, 'force', False)
+                result = scheduler.enable_schedule(interval=interval, safe_mode=safe)

And add to the schedule parser around line 998:

     schedule_parser.add_argument("--enable", action="store_true", help="Enable scheduled cleanup")
     schedule_parser.add_argument("--disable", action="store_true", help="Disable scheduled cleanup")
+    schedule_parser.add_argument("--force", action="store_true", help="Use aggressive cleanup (not safe mode)")
     schedule_parser.add_argument(
         "--interval", choices=["daily", "weekly", "monthly"], help="Cleanup interval"
     )
cortex/cleanup/manager.py (2)

73-117: Address remaining defensive checks from previous review.

Multiple defensive improvements from the past review remain unaddressed:

  1. Line 87: Using [:8] truncation of UUID increases collision risk. Should use full uuid.uuid4().hex.
  2. Lines 95-97: os.access() is a TOCTOU check that should be removed. The subsequent shutil.move() will fail if permissions are insufficient.
  3. Lines 84-85: Missing checks for filepath.is_file() and filepath.is_symlink() means directories and symlinks could be quarantined.
  4. Lines 109-111: No rollback attempt if _save_metadata() fails after the file has been moved, which could orphan files in quarantine.
🔎 Proposed fix
     def quarantine_file(self, filepath_str: str) -> str | None:
         """
         Move a file to quarantine and return its ID.
         
         Args:
             filepath_str (str): Path to the file to quarantine.
             
         Returns:
             Optional[str]: ID of the quarantined item, or None if failed.
         """
         filepath = Path(filepath_str)
         if not filepath.exists():
             return None
+        if not filepath.is_file():
+            return None
+        if filepath.is_symlink():
+            return None
         
-        item_id = str(uuid.uuid4())[:8]
+        item_id = uuid.uuid4().hex
         filename = filepath.name
         quarantine_path = self.quarantine_dir / f"{item_id}_{filename}"
         
         try:
             # Get file stats before moving
             size = filepath.stat().st_size
             
-            # Check if we have write access to the file
-            if not os.access(filepath, os.W_OK):
-                return None
-            
             shutil.move(str(filepath), str(quarantine_path))
             
             item = QuarantineItem(
                 id=item_id,
                 original_path=str(filepath),
                 quarantine_path=str(quarantine_path),
                 timestamp=time.time(),
                 size_bytes=size,
             )
             
             metadata = self._load_metadata()
             metadata[item_id] = asdict(item)
-            self._save_metadata(metadata)
+            try:
+                self._save_metadata(metadata)
+            except OSError:
+                # Rollback: try to move file back
+                try:
+                    shutil.move(str(quarantine_path), str(filepath))
+                except Exception:
+                    pass
+                return None
             
             return item_id
             
         except Exception:
-            # Log error?
             return None

153-164: Add defensive parsing for malformed metadata entries.

QuarantineItem(**v) will raise if keys are missing or renamed, breaking list_items entirely. This was flagged in a previous review but remains unaddressed.

🔎 Proposed fix
     def list_items(self) -> list[QuarantineItem]:
         """
         List all items in quarantine.
         
         Returns:
             List[QuarantineItem]: List of quarantined items sorted by date.
         """
         metadata = self._load_metadata()
         items = []
         for k, v in metadata.items():
-            items.append(QuarantineItem(**v))
+            try:
+                items.append(QuarantineItem(**v))
+            except TypeError:
+                # Skip malformed entries
+                continue
         return sorted(items, key=lambda x: x.timestamp, reverse=True)
cortex/cleanup/cleaner.py (1)

34-43: Add missing return type hint.

The __init__ method is missing a return type hint (-> None). As per coding guidelines, type hints are required for Python code.

🔎 Proposed fix
-    def __init__(self, dry_run: bool = False):
+    def __init__(self, dry_run: bool = False) -> None:
         """
         Initialize the DiskCleaner.

Based on coding guidelines, type hints are required for Python code.

cortex/cleanup/scheduler.py (1)

105-121: Set restrictive permissions on configuration file.

The schedule configuration file should have restrictive permissions (0o600) since it contains scheduling settings. Currently, permissions depend on umask. This was flagged in a previous review but appears unaddressed.

🔎 Proposed fix
     def save_config(self, config: ScheduleConfig) -> bool:
         """
         Save schedule configuration to file.
         
         Args:
             config: Configuration to save.
             
         Returns:
             bool: True if saved successfully.
         """
         try:
             with open(self.config_file, "w", encoding="utf-8") as f:
                 json.dump(config.to_dict(), f, indent=2)
+            try:
+                self.config_file.chmod(0o600)
+            except OSError:
+                pass
             return True
         except OSError as e:
             logger.error(f"Failed to save schedule config: {e}")
             return False

Based on learnings, similar security hardening was applied to manager.py for the metadata file.

cortex/cleanup/scanner.py (3)

40-44: Add missing return type hint.

The __init__ method is missing a return type hint (-> None). As per coding guidelines, type hints are required for Python code.

🔎 Proposed fix
-    def __init__(self):
+    def __init__(self) -> None:
         self.apt_cache_dir = Path("/var/cache/apt/archives")
         self.log_dir = Path("/var/log")
         # Use tempfile.gettempdir() for platform-independent temp directory
         self.temp_dirs = [Path(tempfile.gettempdir()), Path.home() / ".cache"]

Based on coding guidelines, type hints are required for Python code.


173-205: Validate days_old parameter to prevent unintended behavior.

Negative days_old values will set cutoff in the future (Line 186), causing all files to be considered "old" and included in the scan. This could lead to unintended data loss if cleanup is run with invalid parameters.

🔎 Proposed fix
     def scan_temp_files(self, days_old: int = 7) -> ScanResult:
         """
         Scan for temporary files older than X days.
         
         Args:
             days_old (int): Minimum age of files in days to include.
             
         Returns:
             ScanResult: Result containing size and count of old temp files.
         """
+        if days_old < 0:
+            raise ValueError("days_old must be >= 0")
+        
         total_size = 0
         files = []
         now = time.time()

207-235: Validate parameters to prevent unintended scan behavior.

Similar to scan_temp_files, negative days_old or min_size_mb values can cause unexpected behavior:

  • Negative days_old sets cutoff in the future, matching all files (Line 221).
  • Negative min_size_mb sets min_size negative, matching all files (Line 222).
🔎 Proposed fix
     def scan_logs(self, min_size_mb: int = 100, days_old: int = 7) -> ScanResult:
         """
         Scan for large, old log files.
         
         Args:
             min_size_mb (int): Minimum size in MB to include.
             days_old (int): Minimum age in days to include.
             
         Returns:
             ScanResult: Result containing size and count of old log files.
         """
+        if days_old < 0:
+            raise ValueError("days_old must be >= 0")
+        if min_size_mb < 0:
+            raise ValueError("min_size_mb must be >= 0")
+        
         total_size = 0
         files = []
         now = time.time()
🧹 Nitpick comments (2)
cortex/cleanup/scanner.py (2)

1-1: Remove unused import.

The re module is imported but never used. Line 156's comment indicates regex was intentionally replaced with string operations.

🔎 Proposed fix
-import re
 import tempfile
 import time
 from dataclasses import dataclass, field

92-93: Update or remove outdated comment.

Line 92 mentions using strict=False, but no such parameter is passed to run_command. This comment appears outdated and may confuse future maintainers.

🔎 Proposed fix
-        # We use strict=False because apt-get might output to stderr which run_command captures
         result = run_command(cmd, validate=True)
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c463787 and 09aecb2.

📒 Files selected for processing (14)
  • cortex/cleanup/__init__.py (1 hunks)
  • cortex/cleanup/cleaner.py (1 hunks)
  • cortex/cleanup/manager.py (1 hunks)
  • cortex/cleanup/scanner.py (1 hunks)
  • cortex/cleanup/scheduler.py (1 hunks)
  • cortex/cli.py (5 hunks)
  • cortex/utils/commands.py (1 hunks)
  • docs/CLEANUP_GUIDE.md (1 hunks)
  • src/intent/clarifier.py (1 hunks)
  • src/intent/llm_agent.py (4 hunks)
  • tests/test_cleanup_cleaner.py (1 hunks)
  • tests/test_cleanup_manager.py (1 hunks)
  • tests/test_cleanup_scanner.py (1 hunks)
  • tests/test_cleanup_scheduler.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • tests/test_cleanup_scanner.py
  • cortex/utils/commands.py
  • tests/test_cleanup_cleaner.py
  • tests/test_cleanup_manager.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Follow PEP 8 style guide
Type hints required in Python code
Docstrings required for all public APIs

Files:

  • cortex/cleanup/__init__.py
  • cortex/cleanup/manager.py
  • cortex/cleanup/cleaner.py
  • tests/test_cleanup_scheduler.py
  • cortex/cli.py
  • src/intent/llm_agent.py
  • src/intent/clarifier.py
  • cortex/cleanup/scanner.py
  • cortex/cleanup/scheduler.py
{setup.py,setup.cfg,pyproject.toml,**/__init__.py}

📄 CodeRabbit inference engine (AGENTS.md)

Use Python 3.10 or higher as the minimum supported version

Files:

  • cortex/cleanup/__init__.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Maintain >80% test coverage for pull requests

Files:

  • tests/test_cleanup_scheduler.py
🧠 Learnings (2)
📚 Learning: 2025-12-11T12:03:24.071Z
Learnt from: CR
Repo: cortexlinux/cortex PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-11T12:03:24.071Z
Learning: Applies to **/*install*.py : No silent sudo execution - require explicit user confirmation

Applied to files:

  • cortex/cleanup/cleaner.py
📚 Learning: 2025-12-11T12:03:24.071Z
Learnt from: CR
Repo: cortexlinux/cortex PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-11T12:03:24.071Z
Learning: Applies to **/*.py : Type hints required in Python code

Applied to files:

  • cortex/cli.py
🧬 Code graph analysis (5)
cortex/cleanup/__init__.py (3)
cortex/cleanup/cleaner.py (1)
  • DiskCleaner (28-267)
cortex/cleanup/manager.py (2)
  • CleanupManager (31-199)
  • QuarantineItem (12-28)
cortex/cleanup/scanner.py (2)
  • CleanupScanner (35-235)
  • ScanResult (18-32)
cortex/cleanup/cleaner.py (2)
cortex/cleanup/manager.py (2)
  • CleanupManager (31-199)
  • quarantine_file (73-117)
cortex/cleanup/scanner.py (3)
  • CleanupScanner (35-235)
  • ScanResult (18-32)
  • scan_package_cache (60-81)
tests/test_cleanup_scheduler.py (1)
cortex/cleanup/scheduler.py (15)
  • CleanupScheduler (66-455)
  • ScheduleConfig (30-63)
  • ScheduleInterval (21-26)
  • to_dict (46-53)
  • from_dict (56-63)
  • load_config (87-103)
  • save_config (105-121)
  • get_status (191-207)
  • _get_interval_calendar (209-224)
  • _get_cron_schedule (226-241)
  • enable_schedule (123-167)
  • disable_schedule (169-189)
  • _check_systemd_timer (349-360)
  • _check_cron (444-455)
  • _setup_cron (362-414)
cortex/cli.py (4)
cortex/cleanup/cleaner.py (2)
  • DiskCleaner (28-267)
  • run_cleanup (222-245)
cortex/cleanup/manager.py (3)
  • CleanupManager (31-199)
  • list_items (153-164)
  • restore_item (119-151)
cortex/cleanup/scanner.py (2)
  • CleanupScanner (35-235)
  • scan_all (46-58)
cortex/cleanup/scheduler.py (5)
  • CleanupScheduler (66-455)
  • ScheduleInterval (21-26)
  • get_status (191-207)
  • disable_schedule (169-189)
  • enable_schedule (123-167)
src/intent/llm_agent.py (2)
src/intent/detector.py (1)
  • Intent (8-11)
cortex/kernel_features/llm_device.py (1)
  • getattr (81-105)
🔇 Additional comments (10)
src/intent/clarifier.py (1)

26-26: LGTM: Formatting improvement.

The single-line return statement is clear and improves readability.

src/intent/llm_agent.py (1)

27-27: LGTM! Formatting changes are clean.

The multi-line to single-line reformatting maintains readability while keeping all type hints and logic intact.

Also applies to: 93-93, 115-115, 158-158, 162-162

cortex/cleanup/scheduler.py (1)

143-167: LGTM! Safe mode is now correctly propagated.

The scheduling logic now correctly passes safe_mode to both systemd and cron setup methods, and the generated commands use the appropriate flags (--safe or --force).

Also applies to: 243-322, 362-414

docs/CLEANUP_GUIDE.md (1)

1-223: LGTM! Documentation is comprehensive and well-structured.

The guide clearly documents all cleanup features, safety mechanisms, and troubleshooting steps. Markdown formatting issues from previous review have been addressed.

tests/test_cleanup_scheduler.py (1)

1-277: LGTM! Comprehensive test coverage.

Tests thoroughly validate scheduler functionality including config persistence, systemd/cron orchestration, interval generation, and error handling paths.

cortex/cleanup/__init__.py (1)

1-21: LGTM! Clean public API definition.

The package initialization properly exposes the core cleanup components. Note that CleanupScheduler is intentionally not exported here, which is appropriate since it's primarily used by the CLI layer.

cortex/cli.py (1)

666-830: LGTM! Cleanup command integration is well-structured.

The cleanup command properly handles all subcommands (scan, run, undo, schedule) with appropriate user feedback and error handling. The _format_size helper correctly includes type hints.

Also applies to: 832-837

cortex/cleanup/manager.py (1)

36-50: LGTM! Core security improvements are in place.

The quarantine directory permissions are properly hardened, metadata operations are atomic with restrictive file permissions, and the cleanup_old_items method correctly validates the days parameter.

Also applies to: 52-71, 166-199

cortex/cleanup/cleaner.py (1)

45-67: LGTM! Cleanup operations are correctly implemented.

All cleanup methods properly:

  • Use sudo -n to prevent interactive prompts
  • Calculate and return freed bytes accurately
  • Honor the safe parameter to skip aggressive operations in safe mode
  • Handle errors gracefully with logging

Also applies to: 69-95, 143-177, 179-220, 222-267

cortex/cleanup/scanner.py (1)

17-32: LGTM: Well-structured dataclass.

The dataclass is properly typed with appropriate defaults. Minor style note: dataclass fields are typically documented under "Attributes" rather than "Args".

@hyaku0121
Copy link
Contributor Author

@mikejmorgan-ai
The conflict has been resolved! Please review!

@sonarqubecloud
Copy link

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
cortex/cli.py (3)

756-756: Add missing type hints to public method.

The cleanup method lacks type hints, violating the coding guidelines requirement that "Type hints required in Python code."

🔎 Proposed fix
-    def cleanup(self, args):
+    def cleanup(self, args: argparse.Namespace) -> int:
         """Handle cleanup commands"""

Based on coding guidelines, type hints are required for Python code.


796-830: Fix --safe flag behavior (currently ineffective).

The --safe flag is defined as store_true with default=True (line 1088), which means it's always True and cannot be turned off by users. Additionally, line 800 ignores args.safe entirely and only checks args.force:

is_safe = not args.force  # args.safe is never consulted

This makes the --safe flag misleading and ineffective.

🔎 Proposed fix: Make flags mutually exclusive and use args.safe

Option 1: Mutually exclusive group (recommended)

Update CLI parsing (lines 1086-1091):

-    run_parser.add_argument(
-        "--safe", action="store_true", default=True, help="Safe cleanup (default)"
-    )
-    run_parser.add_argument("--force", "--all", action="store_true", help="Clean all found items")
+    cleanup_mode = run_parser.add_mutually_exclusive_group()
+    cleanup_mode.add_argument(
+        "--safe", action="store_true", default=True, help="Safe cleanup (default)"
+    )
+    cleanup_mode.add_argument("--force", "--all", action="store_true", help="Aggressive cleanup (may be destructive)")

Update logic (line 800):

-            is_safe = not args.force
+            is_safe = args.safe and not args.force

Option 2: Remove default=True and derive from flags

-    run_parser.add_argument(
-        "--safe", action="store_true", default=True, help="Safe cleanup (default)"
-    )
+    run_parser.add_argument(
+        "--safe", action="store_true", help="Safe cleanup (default)"
+    )
-            is_safe = not args.force
+            is_safe = (args.safe or not args.force)

897-906: Schedule enable hardcodes safe_mode=True, ignoring user preference.

Line 901 hardcodes safe_mode=True when enabling scheduled cleanup, giving users no option to configure this behavior. If this is intentional for safety reasons, consider documenting it. Otherwise, add a --no-safe or --force option to the schedule subcommand.

🔎 Proposed fix: Add user control for safe_mode

Add a flag to the schedule parser (after line 1104):

schedule_parser.add_argument("--force", action="store_true", help="Disable safe mode for scheduled cleanup")

Update the logic:

             if args.enable:
                 interval = (
                     ScheduleInterval(args.interval) if args.interval else ScheduleInterval.WEEKLY
                 )
-                result = scheduler.enable_schedule(interval=interval, safe_mode=True)
+                # Use safe mode by default for scheduled cleanup unless --force is specified
+                safe = not getattr(args, 'force', False)
+                result = scheduler.enable_schedule(interval=interval, safe_mode=safe)
🧹 Nitpick comments (1)
cortex/cli.py (1)

465-465: Good stylistic improvement.

Adding spaces around the + operator improves readability and follows PEP 8 style guidelines.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 09aecb2 and 5300c88.

📒 Files selected for processing (1)
  • cortex/cli.py (5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Follow PEP 8 style guide
Type hints required in Python code
Docstrings required for all public APIs

Files:

  • cortex/cli.py
🧠 Learnings (1)
📚 Learning: 2025-12-11T12:03:24.071Z
Learnt from: CR
Repo: cortexlinux/cortex PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-11T12:03:24.071Z
Learning: Applies to **/*.py : Type hints required in Python code

Applied to files:

  • cortex/cli.py
🧬 Code graph analysis (1)
cortex/cli.py (4)
cortex/cleanup/cleaner.py (2)
  • DiskCleaner (28-267)
  • run_cleanup (222-245)
cortex/cleanup/manager.py (3)
  • CleanupManager (31-199)
  • list_items (153-164)
  • restore_item (119-151)
cortex/cleanup/scanner.py (2)
  • CleanupScanner (35-235)
  • scan_all (46-58)
cortex/cleanup/scheduler.py (5)
  • CleanupScheduler (66-455)
  • ScheduleInterval (21-26)
  • get_status (191-207)
  • disable_schedule (169-189)
  • enable_schedule (123-167)
🪛 GitHub Actions: CI
cortex/cli.py

[error] 747-747: Ruff: F811 - Redefinition of unused demo from line 174.

🪛 GitHub Check: Lint
cortex/cli.py

[failure] 747-747: Ruff (F811)
cortex/cli.py:747:9: F811 Redefinition of unused demo from line 174

🔇 Additional comments (1)
cortex/cli.py (1)

1144-1145: LGTM!

The cleanup command dispatcher integration follows the established pattern and correctly routes to the cleanup method.

Comment on lines +747 to +753
def demo(self):
"""Run a demo showing Cortex capabilities without API key"""
show_banner()
console.print()
cx_print("Running Demo...", "info")
# (Keep existing demo logic)
return 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Remove duplicate demo method definition.

The demo method is already defined at line 174, and this redefinition at line 747 shadows the earlier definition. This causes the static analysis failure (F811) and pipeline failure.

🔎 Proposed fix: Remove the duplicate definition
-    def demo(self):
-        """Run a demo showing Cortex capabilities without API key"""
-        show_banner()
-        console.print()
-        cx_print("Running Demo...", "info")
-        # (Keep existing demo logic)
-        return 0
-

The original definition at line 174 should be retained as it properly calls run_demo() from cortex.demo.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def demo(self):
"""Run a demo showing Cortex capabilities without API key"""
show_banner()
console.print()
cx_print("Running Demo...", "info")
# (Keep existing demo logic)
return 0
🧰 Tools
🪛 GitHub Actions: CI

[error] 747-747: Ruff: F811 - Redefinition of unused demo from line 174.

🪛 GitHub Check: Lint

[failure] 747-747: Ruff (F811)
cortex/cli.py:747:9: F811 Redefinition of unused demo from line 174

🤖 Prompt for AI Agents
In cortex/cli.py around lines 747 to 753 there is a duplicate demo() method that
shadows the original defined at line 174 and triggers a static analysis F811
error; remove this duplicate definition (lines 747–753) so the original demo()
at line 174 remains, ensuring the retained implementation continues to call
run_demo() from cortex.demo.

@hyaku0121
Copy link
Contributor Author

Even if I resolve the conflicts, new ones keep appearing. Since I'm not very familiar with GitHub, it seems unlikely I can achieve a complete fix. Therefore, I'll have to close this PR for now. I'm sorry.

@hyaku0121 hyaku0121 closed this Dec 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Smart Cleanup and Disk Space Optimizer

2 participants