Implement comprehensive text-based magic file parser

## Overview

Implement a complete text-based magic file parser that reads entire  files and converts them into a hierarchical tree of `MagicRule` structures. This is a critical component for Phase 1 MVP completion, as it bridges the gap between existing parser components (offsets, types, operators, values) and the evaluator engine.

## Background

The project has completed core parsing components in `src/parser/grammar.rs`:
- ✅ `parse_number` - Parses decimal, hex, and octal numbers
- ✅ `parse_offset` - Parses offset specifications (absolute, indirect, relative)
- ✅ `parse_operator` - Parses comparison operators (=, !=, <, >, &)
- ✅ `parse_value` - Parses values (strings, numbers, byte sequences)

The AST structures in `src/parser/ast.rs` are also complete with full serialization support.

**What's Missing**: A higher-level parser that orchestrates these components to parse complete magic files line-by-line, handling:
- File-level structure and organization
- Line continuation and comments
- Hierarchical rule nesting based on indentation
- Error reporting with line numbers
- Special directives (`!:mime`, `!:strength`, etc.)

## Magic File Format Reference

Magic files follow this structure:

```text
# Comment lines start with #
offset  type  operator  value  message

# Example: ELF file detection
0       string    \x7fELF         ELF
>4      byte      1               32-bit
>4      byte      2               64-bit
>>16    leshort   >0              executable

# Continuation lines end with backslash\
0       string    PK\003\004     ZIP archive data, \
        at least v2.0 to extract
```

**Key Features**:
- **Level 0 rules**: Start with offset (0, 16, 0x20)
- **Child rules**: Prefixed with `>` characters (>, >>, >>>)
- **Comments**: Lines starting with `#`
- **Empty lines**: Should be ignored
- **Continuation**: Lines ending with `\` continue on next line
- **Special directives**: `!:mime`, `!:strength`, `!:ext`

See `docs/src/magic-format.md` for complete format specification.

## Technical Requirements

### Core Function Signature

```rust
/// Parse a complete text-based magic file
///
/// # Arguments
/// * `input` - String content of the magic file
///
/// # Returns
/// * `Result<Vec<MagicRule>, ParseError>` - Top-level rules with nested children
///
/// # Errors
/// Returns ParseError with line number and description for:
/// - Invalid syntax
/// - Unrecognized types or operators
/// - Malformed offset specifications
/// - Orphaned child rules (> without parent)
pub fn parse_text_magic_file(input: &str) -> Result<Vec<MagicRule>, ParseError> {
    // Implementation needed
}
```

### Implementation Components

1. **Line Processing Pipeline**
   - Strip comments (preserve content before `#`)
   - Skip empty lines
   - Handle continuation lines (join lines ending with `\`)
   - Track original line numbers for error reporting

2. **Rule Level Detection**
   - Count leading `>` characters to determine hierarchy level
   - Level 0: No `>` prefix
   - Level 1: `>` prefix
   - Level 2: `>>` prefix, etc.

3. **Rule Parsing**
   - Extract offset, type, operator, value, and message from each line
   - Use existing `parse_offset`, `parse_value`, etc. from `grammar.rs`
   - Handle optional operator (default to `Operator::Equal`)
   - Parse message text (may contain escape sequences)

4. **Hierarchy Building**
   - Maintain a stack of parent rules at each level
   - Attach child rules to the appropriate parent based on level
   - Validate that child rules have valid parents
   - Error if level increases by more than 1

5. **Special Directive Handling** (optional for v1)
   - `!:mime` - MIME type metadata
   - `!:strength` - Match strength/priority
   - `!:ext` - File extension hints
   - Store as metadata on the last parsed rule

6. **Error Handling**
   - Include line number in all error messages
   - Provide descriptive error messages (e.g., "Invalid offset specification at line 42")
   - Continue parsing after non-fatal errors (optional: collect all errors)

## Proposed Solution

### Phase 1: Basic Line Processing
```rust
// In src/parser/mod.rs

struct LineInfo {
    content: String,
    line_number: usize,
    level: u32,
}

fn preprocess_lines(input: &str) -> Result<Vec<LineInfo>, ParseError> {
    // 1. Handle continuation lines
    // 2. Strip comments
    // 3. Detect hierarchy level (count >)
    // 4. Track line numbers
}
```

### Phase 2: Rule Parsing
```rust
fn parse_magic_rule_line(line: &LineInfo) -> Result<MagicRule, ParseError> {
    // Use nom combinators with existing grammar.rs functions
    // Pattern: offset  type  [operator]  value  message
}
```

### Phase 3: Hierarchy Construction
```rust
fn build_rule_hierarchy(lines: Vec<LineInfo>) -> Result<Vec<MagicRule>, ParseError> {
    // Stack-based approach to build parent-child relationships
    // Validate level transitions
}
```

### Phase 4: Integration
```rust
pub fn parse_text_magic_file(input: &str) -> Result<Vec<MagicRule>, ParseError> {
    let lines = preprocess_lines(input)?;
    let rules = lines.into_iter()
        .map(|line| parse_magic_rule_line(&line))
        .collect::<Result<Vec<_>, _>>()?;
    build_rule_hierarchy(rules)
}
```

## Testing Requirements

### Unit Tests (Required)

```rust
#[cfg(test)]
mod tests {
    #[test]
    fn test_parse_simple_rule() {
        let input = "0    string    PK\\x03\\x04    ZIP archive";
        let rules = parse_text_magic_file(input).unwrap();
        assert_eq!(rules.len(), 1);
        assert_eq!(rules[0].message, "ZIP archive");
    }

    #[test]
    fn test_parse_hierarchical_rules() {
        let input = r#"
0       string    \x7fELF         ELF
>4      byte      1               32-bit
>4      byte      2               64-bit
        "#;
        let rules = parse_text_magic_file(input).unwrap();
        assert_eq!(rules.len(), 1);
        assert_eq!(rules[0].children.len(), 2);
    }

    #[test]
    fn test_parse_comments_and_empty_lines() {
        let input = r#"
# This is a comment

0       string    test    Test file
        "#;
        let rules = parse_text_magic_file(input).unwrap();
        assert_eq!(rules.len(), 1);
    }

    #[test]
    fn test_parse_continuation_lines() {
        let input = "0    string    test    Long message \\\n        continued here";
        let rules = parse_text_magic_file(input).unwrap();
        assert!(rules[0].message.contains("continued"));
    }

    #[test]
    fn test_error_orphaned_child() {
        let input = ">4    byte    1    orphaned";
        assert!(parse_text_magic_file(input).is_err());
    }

    #[test]
    fn test_error_invalid_level_jump() {
        let input = r#"
0       string    test    Parent
>>>4    byte      1       Invalid jump
        "#;
        assert!(parse_text_magic_file(input).is_err());
    }
}
```

### Integration Tests (Recommended)

- Parse actual magic files from `third_party/tests/*.magic`
- Validate against known-good outputs
- Performance testing with large magic databases

## Acceptance Criteria

- [ ] `parse_text_magic_file` function implemented in `src/parser/mod.rs`
- [ ] Line preprocessing handles comments, empty lines, continuation lines
- [ ] Hierarchy detection based on `>` prefix works correctly
- [ ] Rule parsing integrates existing grammar.rs functions
- [ ] Parent-child relationships built correctly
- [ ] Error messages include line numbers
- [ ] At least 10 unit tests covering various scenarios
- [ ] All existing tests continue to pass
- [ ] Documentation updated with examples
- [ ] Code passes `cargo clippy -- -D warnings`

## Dependencies

- Existing parser components in `src/parser/grammar.rs`
- AST structures in `src/parser/ast.rs`
- Error types in `src/error.rs`

## Related Work

- Phase 1 MVP completion depends on this parser
- Unblocks evaluator implementation (next major milestone)
- Enables integration testing with real magic files

## References

- Magic file format specification: `docs/src/magic-format.md`
- Example magic files: `third_party/tests/*.magic`
- Original libmagic: https://www.darwinsys.com/file/
- File project test suite: https://github.com/file/file/tree/master/tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement comprehensive text-based magic file parser #11

Overview

Background

Magic File Format Reference

Technical Requirements

Core Function Signature

Implementation Components

Proposed Solution

Phase 1: Basic Line Processing

Phase 2: Rule Parsing

Phase 3: Hierarchy Construction

Phase 4: Integration

Testing Requirements

Unit Tests (Required)

Integration Tests (Recommended)

Acceptance Criteria

Dependencies

Related Work

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Implement comprehensive text-based magic file parser #11

Description

Overview

Background

Magic File Format Reference

Technical Requirements

Core Function Signature

Implementation Components

Proposed Solution

Phase 1: Basic Line Processing

Phase 2: Rule Parsing

Phase 3: Hierarchy Construction

Phase 4: Integration

Testing Requirements

Unit Tests (Required)

Integration Tests (Recommended)

Acceptance Criteria

Dependencies

Related Work

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions