Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 30 additions & 30 deletions docs/src/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,15 +28,15 @@ flowchart LR
TF --> FB --> E
E --> R --> F --> O

style MF fill:#e1f5fe
style TF fill:#e1f5fe
style P fill:#fff3e0
style AST fill:#fff3e0
style FB fill:#fff3e0
style E fill:#fff3e0
style R fill:#e8f5e9
style F fill:#e8f5e9
style O fill:#e8f5e9
style MF fill:#1a3a5c,stroke:#4a9eff,color:#e0e0e0
style TF fill:#1a3a5c,stroke:#4a9eff,color:#e0e0e0
style P fill:#4a3000,stroke:#ffb74d,color:#e0e0e0
style AST fill:#4a3000,stroke:#ffb74d,color:#e0e0e0
style FB fill:#4a3000,stroke:#ffb74d,color:#e0e0e0
style E fill:#4a3000,stroke:#ffb74d,color:#e0e0e0
style R fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0
style F fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0
style O fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0
```

## Core Components
Expand Down Expand Up @@ -188,8 +188,8 @@ flowchart LR
C --> D[Validation]
D --> E[Cached Rules]

style A fill:#e3f2fd
style E fill:#c8e6c9
style A fill:#1a3a5c,stroke:#4a9eff,color:#e0e0e0
style E fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0
```

1. **Parsing**: Convert text DSL to structured AST
Expand All @@ -207,8 +207,8 @@ flowchart LR
D --> E[Results]
E --> F[Formatting]

style A fill:#e3f2fd
style F fill:#c8e6c9
style A fill:#1a3a5c,stroke:#4a9eff,color:#e0e0e0
style F fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0
```

1. **File Access**: Create memory-mapped buffer
Expand Down Expand Up @@ -238,17 +238,17 @@ Magic rules form a tree structure where:

```mermaid
flowchart TD
R[Root Rule<br/>e.g., "0 string PK"]
R -->|match| C1[Child Rule 1<br/>e.g., ">4 ubyte 0x14"]
R -->|match| C2[Child Rule 2<br/>e.g., ">4 ubyte 0x06"]
C1 -->|match| G1[Grandchild<br/>ZIP archive v2.0]
C2 -->|match| G2[Grandchild<br/>ZIP archive v1.0]

style R fill:#e3f2fd
style C1 fill:#fff3e0
style C2 fill:#fff3e0
style G1 fill:#c8e6c9
style G2 fill:#c8e6c9
R["Root Rule<br/>e.g., 0 string PK"]
R -->|match| C1["Child Rule 1<br/>e.g., #gt;4 ubyte 0x14"]
R -->|match| C2["Child Rule 2<br/>e.g., #gt;4 ubyte 0x06"]
Comment on lines +242 to +243
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#gt; is not a valid HTML escape for > and will render literally in the node labels. If the goal is to avoid Mermaid/Markdown parsing issues with >, use &gt; (or remove the > from the label text) so the diagram renders correctly and matches the PR’s intent of fixing the >-related syntax issue.

Suggested change
R -->|match| C1["Child Rule 1<br/>e.g., #gt;4 ubyte 0x14"]
R -->|match| C2["Child Rule 2<br/>e.g., #gt;4 ubyte 0x06"]
R -->|match| C1["Child Rule 1<br/>e.g., &gt;4 ubyte 0x14"]
R -->|match| C2["Child Rule 2<br/>e.g., &gt;4 ubyte 0x06"]

Copilot uses AI. Check for mistakes.
C1 -->|match| G1["Grandchild<br/>ZIP archive v2.0"]
C2 -->|match| G2["Grandchild<br/>ZIP archive v1.0"]

style R fill:#1a3a5c,stroke:#4a9eff,color:#e0e0e0
style C1 fill:#4a3000,stroke:#ffb74d,color:#e0e0e0
style C2 fill:#4a3000,stroke:#ffb74d,color:#e0e0e0
style G1 fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0
style G2 fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0
```

**Operator Support:**
Expand Down Expand Up @@ -359,12 +359,12 @@ flowchart TD
E --> ER
O --> ER

style L fill:#e8eaf6
style P fill:#fff8e1
style E fill:#fff8e1
style O fill:#fff8e1
style I fill:#e8f5e9
style ER fill:#ffebee
style L fill:#2a1a4a,stroke:#b39ddb,color:#e0e0e0
style P fill:#4a3000,stroke:#ffb74d,color:#e0e0e0
style E fill:#4a3000,stroke:#ffb74d,color:#e0e0e0
style O fill:#4a3000,stroke:#ffb74d,color:#e0e0e0
style I fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0
style ER fill:#4a1a1a,stroke:#ef5350,color:#e0e0e0
```

**Dependency Rules:**
Expand Down
46 changes: 30 additions & 16 deletions docs/src/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ The output module is organized across three files:

### `output::MatchResult`

Represents a single magic rule match in the output layer. Created by converting from an evaluator-level `MatchResult`, with additional fields for structured output.
Represents a single magic rule match in the output layer. Created by converting from an evaluator-level `RuleMatch`, with additional fields for structured output.

```rust
pub struct MatchResult {
Expand All @@ -32,7 +32,7 @@ Key constructors:

- `MatchResult::new(message, offset, value)` -- Creates a match with default confidence of 50.
- `MatchResult::with_metadata(...)` -- Creates a fully specified match. Confidence is clamped to 100.
- `MatchResult::from_evaluator_match(m, mime_type)` -- Converts from the evaluator's `MatchResult`. Scales confidence from 0.0--1.0 to 0--100 and extracts rule path tags using the shared `TagExtractor`.
- `MatchResult::from_evaluator_match(m, mime_type)` -- Converts from the evaluator's `RuleMatch`. Scales confidence from 0.0--1.0 to 0--100 and extracts rule path tags using the shared `TagExtractor`.

### `output::EvaluationResult`

Expand Down Expand Up @@ -96,21 +96,25 @@ The text module (`src/output/text.rs`) produces output compatible with the GNU `
### Examples

Single file, single match:

```text
photo.png: PNG image data
```

Single file, multiple matches:

```text
ls: ELF 64-bit LSB executable, x86-64, dynamically linked
```

No matches:

```text
unknown.bin: data
```

Error case:

```text
missing.txt: ERROR: File not found
```
Expand Down Expand Up @@ -195,24 +199,34 @@ All three return `Result<String, serde_json::Error>`.

The full conversion pipeline from evaluation to output:

```text
evaluator::MatchResult ──from_evaluator_match──> output::MatchResult
┌─────────────┼─────────────┐
v v v
format_text format_json format_json_line
output output output
```mermaid
flowchart TD
EM["evaluator::RuleMatch"]
EM -- "from_evaluator_match" --> OM["output::MatchResult"]
OM --> FT["format_text_output"]
OM --> FJ["format_json_output"]
OM --> FL["format_json_line_output"]

style EM fill:#4a3000,stroke:#ffb74d,color:#e0e0e0
style OM fill:#2a1a4a,stroke:#b39ddb,color:#e0e0e0
style FT fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0
style FJ fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0
style FL fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0
```

When converting from the library's top-level `EvaluationResult`:

```text
lib::EvaluationResult ──from_library_result──> output::EvaluationResult
┌─────────────────┬──────┘
v v
format_evaluation JsonOutput::from_evaluation_result
_result (text) (JSON)
```mermaid
flowchart TD
LE["lib::EvaluationResult"]
LE -- "from_library_result" --> OE["output::EvaluationResult"]
OE --> FER["format_evaluation_result<br/>(text)"]
OE --> JER["JsonOutput::from_evaluation_result<br/>(JSON)"]

style LE fill:#4a3000,stroke:#ffb74d,color:#e0e0e0
style OE fill:#2a1a4a,stroke:#b39ddb,color:#e0e0e0
style FER fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0
style JER fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0
```

## Serialization
Expand Down
63 changes: 35 additions & 28 deletions docs/src/security-assurance.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ libmagic-rs is a file type detection library and CLI tool. Its security requirem
| Malicious file author | Exploit the detection tool to gain code execution or cause DoS | Can craft arbitrary file contents |
| Malicious magic file author | Inject rules that cause crashes, resource exhaustion, or incorrect results | Can craft arbitrary magic rule syntax |
| Supply chain attacker | Compromise a dependency to inject malicious code | Can publish malicious crate versions |

### 2.3 Attack Vectors

| ID | Vector | Target SR |
Expand All @@ -40,36 +41,38 @@ libmagic-rs is a file type detection library and CLI tool. Its security requirem
| AV-5 | Malformed magic file causes parser crash | SR-2 |
| AV-6 | CLI argument with path traversal reads unintended files | SR-4 |
| AV-7 | Compromised dependency introduces unsafe code | SR-5 |

## 3. Trust Boundaries

```text
+------------------------------------------------------------------+
| Untrusted |
| +------------------+ +-------------------+ |
| | Input Files | | Magic Files | |
| | (any content) | | (user or system) | |
| +--------+---------+ +--------+----------+ |
| | | |
+-----------+-----------------------+-------------------------------+
| |
=========|=======================|============ Trust Boundary ====
| |
+-----------v-----------------------v-------------------------------+
| libmagic-rs |
| |
| +----------------+ +----------------+ +--------------+ |
| | Parser | | Evaluator | | Output | |
| | - validates | | - bounds-check | | - formats | |
| | magic syntax | | all access | | results | |
| +----------------+ +----------------+ +--------------+ |
| |
| +----------------+ +----------------+ |
| | I/O Layer | | CLI | |
| | - mmap files | | - clap args | |
| | - size limits | | - validates | |
| +----------------+ +----------------+ |
+------------------------------------------------------------------+
```mermaid
flowchart TD
subgraph Untrusted["Untrusted Zone"]
direction LR
IF["Input Files<br/>(any content)"]
MF["Magic Files<br/>(user or system)"]
CA["CLI Arguments<br/>(user paths)"]
end

subgraph libmagic-rs["libmagic-rs (Trusted Zone)"]
IO["I/O Layer<br/>mmap files, size limits"]
CLI["CLI<br/>clap args, validates paths"]
P["Parser<br/>validates magic syntax"]
E["Evaluator<br/>bounds-checks all access"]
O["Output<br/>formats results"]
end

IF -- "file bytes" --> IO
MF -- "magic syntax" --> P
CA -- "user paths" --> CLI
IO -- "mapped buffer" --> E
CLI -- "validated paths" --> IO
P -- "validated AST" --> E
E -- "match results" --> O

style Untrusted fill:#4a1a1a,stroke:#ef5350,color:#e0e0e0,stroke-width:2px
style libmagic-rs fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0,stroke-width:2px
Comment on lines +56 to +73
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Mermaid flowcharts, subgraph identifiers should be simple IDs (typically alphanumeric/underscore). Using libmagic-rs as the subgraph ID risks being parsed incorrectly because of the -, and it’s also referenced in the style directive. Rename the subgraph ID to something like libmagic_rs (and update the corresponding style line) while keeping the displayed label as "libmagic-rs (Trusted Zone)".

Copilot uses AI. Check for mistakes.
```

All data crossing the trust boundary (file contents, magic file syntax, CLI arguments) is treated as untrusted and validated before use.

## 4. Secure Design Principles (Saltzer and Schroeder)
Expand All @@ -84,6 +87,7 @@ All data crossing the trust boundary (file contents, magic file syntax, CLI argu
| **Least privilege** | The tool only reads files; it never writes, executes, or modifies them. No network access. No elevated permissions required. |
| **Least common mechanism** | No shared mutable state between file evaluations. Each evaluation operates on its own data. No global caches that could leak information. |
| **Psychological acceptability** | CLI follows GNU `file` conventions. Error messages are descriptive and actionable. Default behavior is safe (built-in rules, no network). |

## 5. Common Weakness Countermeasures

### 5.1 CWE/SANS Top 25
Expand All @@ -104,6 +108,7 @@ All data crossing the trust boundary (file contents, magic file syntax, CLI argu
| CWE-190 | Integer overflow | Rust panics on integer overflow in debug builds. Offset calculations use checked arithmetic. | Mitigated |
| CWE-502 | Deserialization of untrusted data | Magic files are parsed with a strict grammar, not deserialized from arbitrary formats. | Mitigated |
| CWE-400 | Resource exhaustion | Evaluation timeouts prevent unbounded CPU use. Memory-mapped I/O avoids loading entire files into memory. | Mitigated |

### 5.2 OWASP Top 10 (where applicable)

Most OWASP Top 10 categories target web applications and are not applicable to a file detection library. The applicable items are:
Expand All @@ -114,6 +119,7 @@ Most OWASP Top 10 categories target web applications and are not applicable to a
| A04: Insecure Design | Applicable | Secure design principles applied throughout (see Section 4) |
| A06: Vulnerable Components | Applicable | `cargo audit` daily, `cargo deny`, Dependabot, `cargo-auditable` |
| A09: Security Logging | Partial | Evaluation errors logged; security events reported via GitHub Advisories |

## 6. Supply Chain Security

| Measure | Implementation |
Expand All @@ -127,6 +133,7 @@ Most OWASP Top 10 categories target web applications and are not applicable to a
| Binary auditing | `cargo-auditable` embeds dependency metadata in binaries |
| CI integrity | All GitHub Actions pinned to SHA hashes |
| Code review | Required on all PRs; automated by CodeRabbit with security-focused checks |

## 7. Ongoing Assurance

This assurance case is maintained as a living document. It is updated when:
Expand All @@ -136,4 +143,4 @@ This assurance case is maintained as a living document. It is updated when:
* Dependencies change significantly
* Security incidents occur

The project maintains continuous assurance through automated CI checks (clippy, CodeQL, cargo audit, cargo deny) that run on every commit.
The project maintains continuous assurance through automated CI checks (clippy, CodeQL, cargo audit, cargo deny) that run on every commit.
Loading