Skip to content

More secure file extraction#12394

Merged
StpMax merged 2 commits into
releases/26.1.0from
fix-12296
Apr 17, 2026
Merged

More secure file extraction#12394
StpMax merged 2 commits into
releases/26.1.0from
fix-12296

Conversation

@StpMax
Copy link
Copy Markdown
Collaborator

@StpMax StpMax commented Apr 17, 2026

Description

More secure file extraction

Fixes #12296

Type of change

(Please delete options that are not relevant)

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • ⚡ New feature (non-breaking change which adds functionality)
  • 📢 Breaking change (fix or feature that would cause existing functionality not to work as expected)
  • 📄 This change requires a documentation update

Verification Process

To ensure the changes are working as expected:

  • Test Location: Specify the URL or path for testing.
  • Verification Steps: Outline the steps or queries needed to validate the change. Include any data, configurations, or actions required to reproduce or see the new functionality.

Additional Media:

  • I have attached a brief loom video or screenshots showcasing the new functionality or change.

Checklist:

  • My code follows the style guidelines(PEP 8) of MindsDB.
  • I have appropriately commented on my code, especially in complex areas.
  • Necessary documentation updates are either made or tracked in issues.
  • Relevant unit and integration tests are updated or added.

@entelligence-ai-pr-reviews
Copy link
Copy Markdown
Contributor

entelligence-ai-pr-reviews Bot commented Apr 17, 2026

EntelligenceAI PR Summary

Hardens safe_extract in mindsdb/utilities/fs.py against path traversal and link-based attacks during tar extraction.

  • Replaced os.path.abspath with os.path.realpath in __is_within_directory to resolve symlinks before directory boundary comparison
  • Switched from os.path.relpath string-prefix check to os.path.commonpath for more robust path containment validation
  • Added __get_tar_members helper to resolve mixed str/TarInfo member inputs into a uniform list
  • Replaced bulk extractall call with a per-member extract loop that explicitly skips symlinks, hardlinks, and non-file/non-directory entry types

Confidence Score: 5/5 - Safe to Merge

Safe to merge — this PR meaningfully hardens safe_extract in mindsdb/utilities/fs.py against path traversal and symlink-based attacks with well-chosen, targeted fixes. The switch from os.path.abspath to os.path.realpath in __is_within_directory correctly resolves symlinks before boundary checks, and replacing the fragile os.path.relpath string-prefix approach with os.path.commonpath eliminates a classic path containment bypass. The new __get_tar_members helper cleanly normalizes mixed str/TarInfo inputs, and member-by-member extraction replaces the unsafe bulk extractall, which is the correct pattern for untrusted archives. No review comments were raised, coverage was complete on the changed file, and no logic, correctness, or security regressions were identified.

Key Findings:

  • os.path.realpath in __is_within_directory correctly resolves symlinks before comparing paths, closing the symlink escape vector that os.path.abspath left open.
  • os.path.commonpath replaces the brittle relpath string-prefix check, which could be fooled by directory names that share a prefix (e.g., /safe/dir vs /safe/dir-evil), making the containment check semantically correct.
  • Per-member extraction loop instead of bulk extractall ensures every archive entry is validated before being written to disk, which is the standard mitigation for CVE-class tar path traversal issues.
  • No issues — functional, security, or stylistic — were identified by the automated review, and the changes are narrowly scoped to the security utility with no unintended side effects.
Files requiring special attention
  • mindsdb/utilities/fs.py

@entelligence-ai-pr-reviews
Copy link
Copy Markdown
Contributor

Walkthrough

This PR hardens the safe_extract function in mindsdb/utilities/fs.py against path traversal, symlink, and hardlink attacks. It replaces os.path.abspath with os.path.realpath for symlink-aware path resolution, adopts os.path.commonpath for boundary validation, introduces a helper to normalize tar member lists, and replaces bulk extractall with a per-member extraction loop that enforces strict entry-type filtering.

Changes

File(s) Summary
mindsdb/utilities/fs.py Replaced os.path.abspath with os.path.realpath in __is_within_directory; switched path boundary check from os.path.relpath string-prefix to os.path.commonpath; added __get_tar_members helper to normalize mixed str/TarInfo member lists; replaced extractall with a per-member extract loop that rejects symlinks, hardlinks, and unsupported entry types.

Sequence Diagram

This diagram shows the interactions between components:

sequenceDiagram
    participant Caller
    participant safe_extract
    participant __get_tar_members
    participant __is_within_directory
    participant archivefile

    Caller->>safe_extract: safe_extract(archivefile, path, members, numeric_owner)

    alt archivefile is ZipFile
        loop each member in namelist()
            safe_extract->>__is_within_directory: __is_within_directory(path, member_path)
            __is_within_directory-->>safe_extract: bool (uses realpath + commonpath)
        end
        safe_extract->>archivefile: extractall(path)
    else archivefile is TarFile (py < 3.12)
        safe_extract->>__get_tar_members: __get_tar_members(archivefile, members)
        Note over __get_tar_members: Returns all members if members=None,<br/>otherwise resolves strings to TarInfo objects
        __get_tar_members-->>safe_extract: resolved_members[]

        loop each member in resolved_members
            safe_extract->>__is_within_directory: __is_within_directory(path, member_path)
            __is_within_directory-->>safe_extract: bool

            opt member passes all checks
                safe_extract->>archivefile: extract(member, path, numeric_owner)
                archivefile-->>safe_extract: done
            end
        end
    end

    safe_extract-->>Caller: extraction complete
Loading

🔗 Cross-Repository Impact Analysis

Enable automatic detection of breaking changes across your dependent repositories. → Set up now

Learn more about Cross-Repository Analysis

What It Does

  • Automatically identifies repositories that depend on this code
  • Analyzes potential breaking changes across your entire codebase
  • Provides risk assessment before merging to prevent cross-repo issues

How to Enable

  1. Visit Settings → Code Management
  2. Configure repository dependencies
  3. Future PRs will automatically include cross-repo impact analysis!

Benefits

  • 🛡️ Prevent breaking changes across repositories
  • 🔍 Catch integration issues before they reach production
  • 📊 Better visibility into your multi-repo architecture

@StpMax StpMax changed the title safer safe_extract More secure file extraction Apr 17, 2026
@StpMax StpMax requested a review from ea-rus April 17, 2026 09:45
@StpMax StpMax merged commit f77e2ab into releases/26.1.0 Apr 17, 2026
30 of 34 checks passed
@StpMax StpMax deleted the fix-12296 branch April 17, 2026 11:28
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 17, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants