diff --git a/docs/src/architecture.md b/docs/src/architecture.md
index 8896d909..e9f744a5 100644
--- a/docs/src/architecture.md
+++ b/docs/src/architecture.md
@@ -28,15 +28,15 @@ flowchart LR
TF --> FB --> E
E --> R --> F --> O
- style MF fill:#e1f5fe
- style TF fill:#e1f5fe
- style P fill:#fff3e0
- style AST fill:#fff3e0
- style FB fill:#fff3e0
- style E fill:#fff3e0
- style R fill:#e8f5e9
- style F fill:#e8f5e9
- style O fill:#e8f5e9
+ style MF fill:#1a3a5c,stroke:#4a9eff,color:#e0e0e0
+ style TF fill:#1a3a5c,stroke:#4a9eff,color:#e0e0e0
+ style P fill:#4a3000,stroke:#ffb74d,color:#e0e0e0
+ style AST fill:#4a3000,stroke:#ffb74d,color:#e0e0e0
+ style FB fill:#4a3000,stroke:#ffb74d,color:#e0e0e0
+ style E fill:#4a3000,stroke:#ffb74d,color:#e0e0e0
+ style R fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0
+ style F fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0
+ style O fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0
```
## Core Components
@@ -188,8 +188,8 @@ flowchart LR
C --> D[Validation]
D --> E[Cached Rules]
- style A fill:#e3f2fd
- style E fill:#c8e6c9
+ style A fill:#1a3a5c,stroke:#4a9eff,color:#e0e0e0
+ style E fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0
```
1. **Parsing**: Convert text DSL to structured AST
@@ -207,8 +207,8 @@ flowchart LR
D --> E[Results]
E --> F[Formatting]
- style A fill:#e3f2fd
- style F fill:#c8e6c9
+ style A fill:#1a3a5c,stroke:#4a9eff,color:#e0e0e0
+ style F fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0
```
1. **File Access**: Create memory-mapped buffer
@@ -238,17 +238,17 @@ Magic rules form a tree structure where:
```mermaid
flowchart TD
- R[Root Rule
e.g., "0 string PK"]
- R -->|match| C1[Child Rule 1
e.g., ">4 ubyte 0x14"]
- R -->|match| C2[Child Rule 2
e.g., ">4 ubyte 0x06"]
- C1 -->|match| G1[Grandchild
ZIP archive v2.0]
- C2 -->|match| G2[Grandchild
ZIP archive v1.0]
-
- style R fill:#e3f2fd
- style C1 fill:#fff3e0
- style C2 fill:#fff3e0
- style G1 fill:#c8e6c9
- style G2 fill:#c8e6c9
+ R["Root Rule
e.g., 0 string PK"]
+ R -->|match| C1["Child Rule 1
e.g., #gt;4 ubyte 0x14"]
+ R -->|match| C2["Child Rule 2
e.g., #gt;4 ubyte 0x06"]
+ C1 -->|match| G1["Grandchild
ZIP archive v2.0"]
+ C2 -->|match| G2["Grandchild
ZIP archive v1.0"]
+
+ style R fill:#1a3a5c,stroke:#4a9eff,color:#e0e0e0
+ style C1 fill:#4a3000,stroke:#ffb74d,color:#e0e0e0
+ style C2 fill:#4a3000,stroke:#ffb74d,color:#e0e0e0
+ style G1 fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0
+ style G2 fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0
```
**Operator Support:**
@@ -359,12 +359,12 @@ flowchart TD
E --> ER
O --> ER
- style L fill:#e8eaf6
- style P fill:#fff8e1
- style E fill:#fff8e1
- style O fill:#fff8e1
- style I fill:#e8f5e9
- style ER fill:#ffebee
+ style L fill:#2a1a4a,stroke:#b39ddb,color:#e0e0e0
+ style P fill:#4a3000,stroke:#ffb74d,color:#e0e0e0
+ style E fill:#4a3000,stroke:#ffb74d,color:#e0e0e0
+ style O fill:#4a3000,stroke:#ffb74d,color:#e0e0e0
+ style I fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0
+ style ER fill:#4a1a1a,stroke:#ef5350,color:#e0e0e0
```
**Dependency Rules:**
diff --git a/docs/src/output.md b/docs/src/output.md
index 6e91610f..6b1fbf74 100644
--- a/docs/src/output.md
+++ b/docs/src/output.md
@@ -14,7 +14,7 @@ The output module is organized across three files:
### `output::MatchResult`
-Represents a single magic rule match in the output layer. Created by converting from an evaluator-level `MatchResult`, with additional fields for structured output.
+Represents a single magic rule match in the output layer. Created by converting from an evaluator-level `RuleMatch`, with additional fields for structured output.
```rust
pub struct MatchResult {
@@ -32,7 +32,7 @@ Key constructors:
- `MatchResult::new(message, offset, value)` -- Creates a match with default confidence of 50.
- `MatchResult::with_metadata(...)` -- Creates a fully specified match. Confidence is clamped to 100.
-- `MatchResult::from_evaluator_match(m, mime_type)` -- Converts from the evaluator's `MatchResult`. Scales confidence from 0.0--1.0 to 0--100 and extracts rule path tags using the shared `TagExtractor`.
+- `MatchResult::from_evaluator_match(m, mime_type)` -- Converts from the evaluator's `RuleMatch`. Scales confidence from 0.0--1.0 to 0--100 and extracts rule path tags using the shared `TagExtractor`.
### `output::EvaluationResult`
@@ -96,21 +96,25 @@ The text module (`src/output/text.rs`) produces output compatible with the GNU `
### Examples
Single file, single match:
+
```text
photo.png: PNG image data
```
Single file, multiple matches:
+
```text
ls: ELF 64-bit LSB executable, x86-64, dynamically linked
```
No matches:
+
```text
unknown.bin: data
```
Error case:
+
```text
missing.txt: ERROR: File not found
```
@@ -195,24 +199,34 @@ All three return `Result`.
The full conversion pipeline from evaluation to output:
-```text
-evaluator::MatchResult ──from_evaluator_match──> output::MatchResult
- │
- ┌─────────────┼─────────────┐
- v v v
- format_text format_json format_json_line
- output output output
+```mermaid
+flowchart TD
+ EM["evaluator::RuleMatch"]
+ EM -- "from_evaluator_match" --> OM["output::MatchResult"]
+ OM --> FT["format_text_output"]
+ OM --> FJ["format_json_output"]
+ OM --> FL["format_json_line_output"]
+
+ style EM fill:#4a3000,stroke:#ffb74d,color:#e0e0e0
+ style OM fill:#2a1a4a,stroke:#b39ddb,color:#e0e0e0
+ style FT fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0
+ style FJ fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0
+ style FL fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0
```
When converting from the library's top-level `EvaluationResult`:
-```text
-lib::EvaluationResult ──from_library_result──> output::EvaluationResult
- │
- ┌─────────────────┬──────┘
- v v
- format_evaluation JsonOutput::from_evaluation_result
- _result (text) (JSON)
+```mermaid
+flowchart TD
+ LE["lib::EvaluationResult"]
+ LE -- "from_library_result" --> OE["output::EvaluationResult"]
+ OE --> FER["format_evaluation_result
(text)"]
+ OE --> JER["JsonOutput::from_evaluation_result
(JSON)"]
+
+ style LE fill:#4a3000,stroke:#ffb74d,color:#e0e0e0
+ style OE fill:#2a1a4a,stroke:#b39ddb,color:#e0e0e0
+ style FER fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0
+ style JER fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0
```
## Serialization
diff --git a/docs/src/security-assurance.md b/docs/src/security-assurance.md
index aa32d7f5..daa74085 100644
--- a/docs/src/security-assurance.md
+++ b/docs/src/security-assurance.md
@@ -29,6 +29,7 @@ libmagic-rs is a file type detection library and CLI tool. Its security requirem
| Malicious file author | Exploit the detection tool to gain code execution or cause DoS | Can craft arbitrary file contents |
| Malicious magic file author | Inject rules that cause crashes, resource exhaustion, or incorrect results | Can craft arbitrary magic rule syntax |
| Supply chain attacker | Compromise a dependency to inject malicious code | Can publish malicious crate versions |
+
### 2.3 Attack Vectors
| ID | Vector | Target SR |
@@ -40,36 +41,38 @@ libmagic-rs is a file type detection library and CLI tool. Its security requirem
| AV-5 | Malformed magic file causes parser crash | SR-2 |
| AV-6 | CLI argument with path traversal reads unintended files | SR-4 |
| AV-7 | Compromised dependency introduces unsafe code | SR-5 |
+
## 3. Trust Boundaries
-```text
-+------------------------------------------------------------------+
-| Untrusted |
-| +------------------+ +-------------------+ |
-| | Input Files | | Magic Files | |
-| | (any content) | | (user or system) | |
-| +--------+---------+ +--------+----------+ |
-| | | |
-+-----------+-----------------------+-------------------------------+
- | |
- =========|=======================|============ Trust Boundary ====
- | |
-+-----------v-----------------------v-------------------------------+
-| libmagic-rs |
-| |
-| +----------------+ +----------------+ +--------------+ |
-| | Parser | | Evaluator | | Output | |
-| | - validates | | - bounds-check | | - formats | |
-| | magic syntax | | all access | | results | |
-| +----------------+ +----------------+ +--------------+ |
-| |
-| +----------------+ +----------------+ |
-| | I/O Layer | | CLI | |
-| | - mmap files | | - clap args | |
-| | - size limits | | - validates | |
-| +----------------+ +----------------+ |
-+------------------------------------------------------------------+
+```mermaid
+flowchart TD
+ subgraph Untrusted["Untrusted Zone"]
+ direction LR
+ IF["Input Files
(any content)"]
+ MF["Magic Files
(user or system)"]
+ CA["CLI Arguments
(user paths)"]
+ end
+
+ subgraph libmagic-rs["libmagic-rs (Trusted Zone)"]
+ IO["I/O Layer
mmap files, size limits"]
+ CLI["CLI
clap args, validates paths"]
+ P["Parser
validates magic syntax"]
+ E["Evaluator
bounds-checks all access"]
+ O["Output
formats results"]
+ end
+
+ IF -- "file bytes" --> IO
+ MF -- "magic syntax" --> P
+ CA -- "user paths" --> CLI
+ IO -- "mapped buffer" --> E
+ CLI -- "validated paths" --> IO
+ P -- "validated AST" --> E
+ E -- "match results" --> O
+
+ style Untrusted fill:#4a1a1a,stroke:#ef5350,color:#e0e0e0,stroke-width:2px
+ style libmagic-rs fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0,stroke-width:2px
```
+
All data crossing the trust boundary (file contents, magic file syntax, CLI arguments) is treated as untrusted and validated before use.
## 4. Secure Design Principles (Saltzer and Schroeder)
@@ -84,6 +87,7 @@ All data crossing the trust boundary (file contents, magic file syntax, CLI argu
| **Least privilege** | The tool only reads files; it never writes, executes, or modifies them. No network access. No elevated permissions required. |
| **Least common mechanism** | No shared mutable state between file evaluations. Each evaluation operates on its own data. No global caches that could leak information. |
| **Psychological acceptability** | CLI follows GNU `file` conventions. Error messages are descriptive and actionable. Default behavior is safe (built-in rules, no network). |
+
## 5. Common Weakness Countermeasures
### 5.1 CWE/SANS Top 25
@@ -104,6 +108,7 @@ All data crossing the trust boundary (file contents, magic file syntax, CLI argu
| CWE-190 | Integer overflow | Rust panics on integer overflow in debug builds. Offset calculations use checked arithmetic. | Mitigated |
| CWE-502 | Deserialization of untrusted data | Magic files are parsed with a strict grammar, not deserialized from arbitrary formats. | Mitigated |
| CWE-400 | Resource exhaustion | Evaluation timeouts prevent unbounded CPU use. Memory-mapped I/O avoids loading entire files into memory. | Mitigated |
+
### 5.2 OWASP Top 10 (where applicable)
Most OWASP Top 10 categories target web applications and are not applicable to a file detection library. The applicable items are:
@@ -114,6 +119,7 @@ Most OWASP Top 10 categories target web applications and are not applicable to a
| A04: Insecure Design | Applicable | Secure design principles applied throughout (see Section 4) |
| A06: Vulnerable Components | Applicable | `cargo audit` daily, `cargo deny`, Dependabot, `cargo-auditable` |
| A09: Security Logging | Partial | Evaluation errors logged; security events reported via GitHub Advisories |
+
## 6. Supply Chain Security
| Measure | Implementation |
@@ -127,6 +133,7 @@ Most OWASP Top 10 categories target web applications and are not applicable to a
| Binary auditing | `cargo-auditable` embeds dependency metadata in binaries |
| CI integrity | All GitHub Actions pinned to SHA hashes |
| Code review | Required on all PRs; automated by CodeRabbit with security-focused checks |
+
## 7. Ongoing Assurance
This assurance case is maintained as a living document. It is updated when:
@@ -136,4 +143,4 @@ This assurance case is maintained as a living document. It is updated when:
* Dependencies change significantly
* Security incidents occur
-The project maintains continuous assurance through automated CI checks (clippy, CodeQL, cargo audit, cargo deny) that run on every commit.
\ No newline at end of file
+The project maintains continuous assurance through automated CI checks (clippy, CodeQL, cargo audit, cargo deny) that run on every commit.