Skip to content

Add safe path construction utility to fix path traversal and satisfy static analysis #53

@SeaCelo

Description

@SeaCelo

This follows on from #34 and #45, which document the path traversal vulnerability across case and download routes. Both issues agree on what needs to happen — validate that any user-supplied path stays inside DATA_STORAGE before using it. This issue proposes a concrete implementation that fixes the vulnerability and is also recognised by static analysis tools like CodeQL.

The problem with validation-only approaches

Adding a validation check like validate_path() before a Path() call protects the app at runtime, but static analysis tools (CodeQL, Bandit) still flag the code. They track user-supplied data from source to sink — if request.args.get('case') reaches a Path() or send_file() call, it's flagged, regardless of what runs between them. The tool can't know your custom function cleaned the input.

The fix: case_path() using werkzeug.utils.safe_join

werkzeug is already a Flask dependency. CodeQL explicitly models werkzeug.utils.safe_join as a path sanitizer — its return value is considered clean. Combined with a containment check for symlink safety, this gives both runtime protection and static analysis compliance.

Add to Config.py:

from werkzeug.utils import safe_join

def case_path(*components):
    """Build a validated path inside DATA_STORAGE.

    Uses werkzeug.utils.safe_join (recognised by CodeQL as a sanitizer)
    plus a realpath containment check to catch symlinks. Raises
    PermissionError if any component escapes DATA_STORAGE.
    """
    safe = safe_join(str(DATA_STORAGE), *[str(c) or '' for c in components])
    if safe is None:
        raise PermissionError("Path Traversal Attempt Detected")
    validate_path(DATA_STORAGE, safe)  # catches symlinks safe_join misses
    return Path(safe)

Then replace path constructions throughout the codebase:

# Before
case = request.json['casename']
casePath = Path(Config.DATA_STORAGE, case)

# After
casePath = Config.case_path(case)
# Before
dataFile = Path(Config.DATA_STORAGE, case, 'res', 'csv', file)

# After
dataFile = Config.case_path(case, 'res', 'csv', file)

What this achieves

  • Path traversal is blocked at construction time — you can't get a Path object without passing validation
  • CodeQL stops flagging these locations because safe_join's return value is a recognised clean source
  • The pattern is impossible to misuse — there's no separate validate + build step to forget
  • None components are handled safely (converted to empty strings)

Scope

Main files to update: DataFileClass.py, DataFileRoute.py, CaseRoute.py, UploadRoute.py, OsemosysClass.py. The function itself is a small addition to Config.py.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions