Keyboard shortcuts

Press ← or β†’ to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Introduction

Crates.io GitHub License OpenSSF Scorecard OpenSSF Best Practices

Welcome to the libmagic-rs developer guide! This documentation provides comprehensive information about the pure-Rust implementation of libmagic, the library that powers the file command for identifying file types.

What is libmagic-rs?

libmagic-rs is a clean-room implementation of the libmagic library, written entirely in Rust. It provides:

  • Memory Safety: Pure Rust with no unsafe code (except vetted dependencies)
  • Performance: Memory-mapped I/O for efficient file processing
  • Compatibility: Support for standard magic file syntax and formats
  • Modern Design: Extensible architecture for contemporary file formats
  • Multiple Outputs: Both human-readable text and structured JSON formats

Project Status

πŸš€ Active Development - Core components are complete with ongoing feature additions.

What’s Complete

  • Core AST Structures: Complete data model for magic rules with full serialization
  • Magic File Parser: Full text magic file parsing with hierarchical structure, comments, continuations, and parse_text_magic_file() API
  • Format Detection: Automatic detection of text files, directories (Magdir), and binary .mgc files with helpful error messages
  • Rule Evaluation Engine: Complete hierarchical evaluation with offset resolution, type interpretation, comparison operators, cross-type integer coercion, and graceful error recovery
  • Memory-Mapped I/O: FileBuffer implementation with memmap2 and comprehensive safety
  • CLI Tool (rmagic): Command-line interface with clap, text/JSON output, stdin support, magic file discovery, strict mode, timeouts, and built-in rules
  • Built-in Rules: Pre-compiled detection for common file types (ELF, PE/DOS, ZIP, TAR, GZIP, JPEG, PNG, GIF, BMP, PDF) compiled at build time
  • MIME Type Mapping: Opt-in MIME type detection via enable_mime_types configuration
  • Strength Calculation: Rule priority scoring with !:strength directive support (add, subtract, multiply, divide, set)
  • Output Formatters: Text and JSON output with tag enrichment and JSON Lines for batch processing
  • Confidence Scoring: Match confidence based on rule hierarchy depth
  • Tag Extraction: Semantic tag extraction from match descriptions (e.g., β€œexecutable”, β€œelf”, β€œarchive”)
  • Timeout Protection: Configurable per-file evaluation timeouts to prevent DoS
  • Configuration Presets: performance(), comprehensive(), and default() presets with security validation
  • Project Infrastructure: Build system, strict linting, pre-commit hooks, and CI/CD
  • Extensive Test Coverage: 940+ comprehensive tests covering all modules
  • Memory Safety: Zero unsafe code with comprehensive bounds checking
  • Error Handling: Structured error types (ParseError, EvaluationError, ConfigError, FileError, Timeout) with graceful degradation
  • Code Quality: Strict clippy pedantic linting with zero-warnings policy

Next Milestones

  • Indirect offset support (complex pointer dereferencing patterns)
  • Binary .mgc support (compiled magic database format)
  • Rule caching (pre-compiled magic database)
  • Parallel evaluation (multi-file processing)
  • Extended type support (regex, date, etc.)

Why Rust?

The choice of Rust for this implementation provides several key advantages:

  1. Memory Safety: Eliminates entire classes of security vulnerabilities
  2. Performance: Zero-cost abstractions and efficient compiled code
  3. Concurrency: Safe parallelism for processing multiple files
  4. Ecosystem: Rich crate ecosystem for parsing, I/O, and serialization
  5. Maintainability: Strong type system and excellent tooling

Architecture Overview

The library follows a clean parser-evaluator architecture:

flowchart LR
    MF[Magic File] --> P[Parser]
    P --> AST[AST]
    AST --> E[Evaluator]
    TF[Target File] --> FB[File Buffer]
    FB --> E
    E --> R[Results]
    R --> F[Formatter]

    style MF fill:#e3f2fd
    style TF fill:#e3f2fd
    style F fill:#c8e6c9

This separation allows for:

  • Independent testing of each component
  • Flexible output formatting
  • Efficient rule caching and optimization
  • Clear error handling and debugging

How to Use This Guide

This documentation is organized into five main parts:

  • Part I: User Guide - Getting started, CLI usage, and basic library integration
  • Part II: Architecture & Implementation - Deep dive into the codebase structure and components
  • Part III: Advanced Topics - Magic file formats, testing, and performance optimization
  • Part IV: Integration & Migration - Moving from libmagic and troubleshooting
  • Part V: Development & Contributing - Contributing guidelines and development setup

The appendices provide quick reference materials for commands, examples, and compatibility information.

Getting Help

  • Documentation: This comprehensive guide covers all aspects of the library
  • API Reference: Generated rustdoc for detailed API information (Appendix A)
  • Command Reference: Complete CLI documentation (Appendix B)
  • Examples: Magic file examples and patterns (Appendix C)
  • Issues: GitHub Issues for bugs and feature requests
  • Discussions: GitHub Discussions for questions and ideas

Contributing

We welcome contributions! See the CONTRIBUTING.md file in the repository root and the Development Setup guide for information on how to get started.

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

Acknowledgments

This project is inspired by and respects the original libmagic implementation by Ian Darwin and the current maintainers led by Christos Zoulas. We aim to provide a modern, safe alternative while maintaining compatibility with the established magic file format.

Getting Started

This guide will help you get up and running with libmagic-rs, whether you want to use it as a CLI tool or integrate it into your Rust applications.

Installation

Prerequisites

  • Rust 1.89+ (2024 edition)
  • Git for cloning the repository
  • Cargo (comes with Rust)

From Source

Currently, libmagic-rs is only available from source as it’s in early development:

# Clone the repository
git clone https://github.com/EvilBit-Labs/libmagic-rs.git
cd libmagic-rs

# Build the project
cargo build --release

# Run tests to verify installation
cargo test

The compiled binary will be available at target/release/rmagic.

Development Build

For development or testing the latest features:

# Clone and build in debug mode
git clone https://github.com/EvilBit-Labs/libmagic-rs.git
cd libmagic-rs
cargo build

# The debug binary is at target/debug/rmagic

Quick Start

CLI Usage

# Identify files using built-in rules (no external magic file needed)
./target/release/rmagic --use-builtin example.bin

# JSON output format
./target/release/rmagic --use-builtin --json example.bin

# Use a custom magic file
./target/release/rmagic --magic-file /usr/share/misc/magic example.bin

# Multiple files
./target/release/rmagic --use-builtin file1.bin file2.pdf file3.zip

# Read from stdin
echo -ne '\x7fELF' | ./target/release/rmagic --use-builtin -

# Help and options
./target/release/rmagic --help

Library Usage

Add libmagic-rs to your Cargo.toml:

[dependencies]
libmagic-rs = { git = "https://github.com/EvilBit-Labs/libmagic-rs.git" }

Basic usage with built-in rules (no external files needed):

use libmagic_rs::{LibmagicError, MagicDatabase};

fn main() -> Result<(), LibmagicError> {
    // Use built-in rules compiled into the binary
    let db = MagicDatabase::with_builtin_rules()?;

    // Evaluate a file
    let result = db.evaluate_file("example.bin")?;
    println!("File type: {}", result.description);
    println!("Confidence: {}", result.confidence);

    // Evaluate an in-memory buffer
    let buffer = b"\x7fELF\x02\x01\x01\x00";
    let result = db.evaluate_buffer(buffer)?;
    println!("Buffer type: {}", result.description);

    Ok(())
}

Project Structure

Understanding the project layout will help you navigate the codebase:

libmagic-rs/
β”œβ”€β”€ Cargo.toml              # Project configuration
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ lib.rs              # Library API (MagicDatabase, EvaluationConfig, etc.)
β”‚   β”œβ”€β”€ main.rs             # CLI implementation (rmagic binary)
β”‚   β”œβ”€β”€ error.rs            # Error types (LibmagicError, ParseError, EvaluationError)
β”‚   β”œβ”€β”€ parser/
β”‚   β”‚   β”œβ”€β”€ mod.rs          # Magic file parser entry point
β”‚   β”‚   β”œβ”€β”€ ast.rs          # AST data structures
β”‚   β”‚   β”œβ”€β”€ grammar.rs      # nom-based parsing combinators
β”‚   β”‚   β”œβ”€β”€ loader.rs       # File/directory loading with format detection
β”‚   β”‚   └── format.rs       # Magic file format detection
β”‚   β”œβ”€β”€ evaluator/
β”‚   β”‚   β”œβ”€β”€ mod.rs          # Evaluation engine
β”‚   β”‚   β”œβ”€β”€ offset.rs       # Offset resolution
β”‚   β”‚   β”œβ”€β”€ operators.rs    # Comparison operators with cross-type coercion
β”‚   β”‚   └── types.rs        # Type interpretation with endianness
β”‚   β”œβ”€β”€ output/
β”‚   β”‚   β”œβ”€β”€ mod.rs          # Output types and conversion
β”‚   β”‚   └── json.rs         # JSON/JSON Lines formatting
β”‚   β”œβ”€β”€ io/
β”‚   β”‚   └── mod.rs          # Memory-mapped I/O (FileBuffer)
β”‚   β”œβ”€β”€ mime.rs             # MIME type mapping
β”‚   β”œβ”€β”€ tags.rs             # Semantic tag extraction
β”‚   └── builtin_rules.rs    # Pre-compiled magic rules
β”œβ”€β”€ tests/                  # Integration tests
β”œβ”€β”€ third_party/            # Canonical libmagic tests and magic files
└── docs/                   # This documentation

Development Setup

If you want to contribute or modify the library:

1. Clone and Setup

git clone https://github.com/EvilBit-Labs/libmagic-rs.git
cd libmagic-rs

# Install development dependencies
cargo install cargo-nextest  # Faster test runner
cargo install cargo-watch    # Auto-rebuild on changes

2. Development Workflow

# Check code without building
cargo check

# Run tests (fast)
cargo nextest run

# Run tests with coverage
cargo test

# Format code
cargo fmt

# Lint code (strict mode)
cargo clippy -- -D warnings

# Build documentation
cargo doc --open

3. Continuous Development

# Auto-rebuild and test on file changes
cargo watch -x check -x test

# Auto-run specific tests
cargo watch -x "test ast_structures"

Current Capabilities

  • AST Data Structures: Complete implementation with full serialization
  • Magic File Parser: nom-based parser for magic file DSL with hierarchical rules
  • Rule Evaluator: Engine for executing rules against files with graceful error handling
  • Memory-Mapped I/O: Efficient file access with comprehensive bounds checking
  • CLI Tool (rmagic): Full-featured CLI with text/JSON output, stdin, timeouts, and built-in rules
  • Built-in Rules: Pre-compiled detection for common file types (ELF, ZIP, PDF, JPEG, PNG, etc.)
  • MIME Type Mapping: Opt-in MIME type detection
  • Output Formatters: Text and JSON output with tag enrichment
  • Strength Calculation: Rule priority scoring with !:strength directives
  • Confidence Scoring: Match confidence based on rule hierarchy depth
  • Timeout Protection: Configurable per-file evaluation timeouts
  • Build System: Cargo configuration with strict clippy pedantic linting
  • Testing: 940+ comprehensive tests across all modules
  • Documentation: This guide, API documentation, and architecture docs

Example Magic Rules

You can parse magic rules from text or work with AST structures directly:

Parsing Magic Files

#![allow(unused)]
fn main() {
use libmagic_rs::parser::parse_text_magic_file;

// Parse a simple magic file
let magic_content = r#"
ELF file format
0 string \x7fELF ELF executable
>4 byte 1 32-bit
>4 byte 2 64-bit
"#;

let rules = parse_text_magic_file(magic_content)?;
assert_eq!(rules.len(), 1);
assert_eq!(rules[0].children.len(), 2);
}

Working with AST Directly

#![allow(unused)]
fn main() {
use libmagic_rs::parser::ast::*;

// Create a simple ELF detection rule
let elf_rule = MagicRule {
    offset: OffsetSpec::Absolute(0),
    typ: TypeKind::Byte,
    op: Operator::Equal,
    value: Value::Uint(0x7f), // First byte of ELF magic
    message: "ELF executable".to_string(),
    children: vec![],
    level: 0,
};

// Serialize to JSON for inspection
let json = serde_json::to_string_pretty(&elf_rule)?;
println!("{}", json);
}

Evaluating Rules

#![allow(unused)]
fn main() {
use libmagic_rs::evaluator::{evaluate_rules_with_config, EvaluationContext};
use libmagic_rs::parser::ast::*;
use libmagic_rs::EvaluationConfig;

// Create a rule to detect ELF files
let rule = MagicRule {
    offset: OffsetSpec::Absolute(0),
    typ: TypeKind::Byte,
    op: Operator::Equal,
    value: Value::Uint(0x7f),
    message: "ELF magic".to_string(),
    children: vec![],
    level: 0,
};

// Evaluate against a buffer
let buffer = &[0x7f, 0x45, 0x4c, 0x46]; // ELF magic bytes
let config = EvaluationConfig::default();
let matches = evaluate_rules_with_config(&[rule], buffer, &config)?;

assert_eq!(matches.len(), 1);
assert_eq!(matches[0].message, "ELF magic");
}

Testing Your Setup

Verify everything is working correctly:

# Run all tests
cargo test

# Run specific AST tests
cargo test ast_structures

# Check code quality
cargo clippy -- -D warnings

# Verify documentation builds
cargo doc

# Test CLI
cargo run -- README.md

Next Steps

  1. Explore the AST: Check out AST Data Structures to understand the core types
  2. Read the Architecture: See Architecture Overview for the big picture
  3. Follow Development: Watch the GitHub repository for updates
  4. Contribute: See Development Setup for contribution guidelines

Getting Help

The project is in active development, so check back regularly for new features and capabilities!

CLI Usage

The rmagic command-line tool identifies file types using magic rules, serving as a pure-Rust alternative to the GNU file command.

Basic Usage

# Identify a single file
rmagic document.pdf

# Identify multiple files
rmagic file1.bin file2.exe file3.pdf

# Read from stdin
cat unknown.bin | rmagic -

# Use built-in rules (no external magic file required)
rmagic --use-builtin archive.tar.gz

# Get help
rmagic --help

Arguments and Flags

Positional Arguments

ArgumentDescription
FILE...One or more files to analyze (required). Use - to read from stdin.

Output Format Flags

FlagDescription
--textOutput results in text format. This is the default.
-j, --jsonOutput results in JSON format. Conflicts with --text.

These two flags are mutually exclusive. Passing both --json and --text produces an error.

Magic File Flags

FlagDescription
-m, --magic-file FILEUse a custom magic file instead of the system default.
-b, --use-builtinUse built-in magic rules compiled into the binary. Mutually exclusive with --magic-file.

The built-in rules cover common file types: ELF, PE/DOS, ZIP, TAR, GZIP, JPEG, PNG, GIF, BMP, and PDF. They are compiled at build time and require no external files.

Behavior Flags

FlagDescription
-s, --strictExit with a non-zero code on processing failures (I/O, parse, or evaluation errors). A β€œdata” result (unknown file type) is not considered an error.
-t, --timeout-ms MSPer-file evaluation timeout in milliseconds. Valid range: 1–300000 (5 minutes).

Output Formats

Text Output (Default)

Text output prints one line per file in the format filename: description:

$ rmagic image.png document.pdf
image.png: PNG image data
document.pdf: PDF document

When a file type cannot be determined, the description is data:

$ rmagic unknown.bin
unknown.bin: data

JSON Output

JSON output varies based on the number of files being analyzed.

Single file – pretty-printed JSON with a matches array:

rmagic --json image.png
{
  "matches": [
    {
      "description": "PNG image data",
      "offset": 0,
      "tags": [
        "image",
        "png"
      ],
      "mime_type": "image/png",
      "score": 90
    }
  ]
}

Multiple files – JSON Lines format with one compact JSON object per line:

$ rmagic --json file1.bin file2.txt
{"filename":"file1.bin","matches":[...]}
{"filename":"file2.txt","matches":[...]}

Each line is a self-contained JSON object, making it straightforward to parse with line-oriented tools such as jq.

Stdin Support

Use - as the filename to read input from stdin:

cat sample.bin | rmagic -

Stdin input is truncated to the configured max_string_length (8192 bytes by default). When truncation occurs, a warning is printed to stderr:

Warning: stdin input truncated to 8192 bytes

Output for stdin uses stdin as the filename:

$ echo "hello" | rmagic -
stdin: data

Stdin can be combined with regular file arguments:

rmagic --use-builtin file1.bin - file2.txt < input.dat

Exit Codes

CodeMeaning
0Success
1General error (evaluation failure, configuration error)
2Invalid arguments (bad command-line usage)
3File not found or access denied
4Magic file not found or invalid
5Evaluation timeout

Strict Mode and Exit Codes

Without --strict, processing errors for individual files are printed to stderr but do not affect the exit code. The tool continues processing remaining files and exits 0 if at least the overall invocation succeeded.

With --strict, the first processing error (I/O, parse, or evaluation) causes a non-zero exit code. The tool still processes all files and prints errors as they occur, but returns the exit code corresponding to the first error.

A β€œdata” result (unknown file type) is never treated as an error, even in strict mode.

# Without strict: exits 0 even if some files fail
$ rmagic file1.bin nonexistent.bin file2.txt
file1.bin: data
Error processing nonexistent.bin: ...
file2.txt: data
$ echo $?
0

# With strict: exits with error code from first failure
$ rmagic --strict file1.bin nonexistent.bin file2.txt
file1.bin: data
Error processing nonexistent.bin: ...
file2.txt: data
$ echo $?
3

Magic File Discovery

When --use-builtin is not specified and no --magic-file is provided, rmagic searches for a magic file in standard system locations. The search follows an OpenBSD-inspired approach, preferring human-readable text files over compiled binary .mgc files.

Search Order (Unix)

Text directories and files are checked first. If a text-format file or directory is found, it is used immediately. If only binary .mgc files exist, the first one found is used as a fallback.

PriorityPathFormat
1/usr/share/file/magic/MagdirText directory
2/usr/share/file/magicText directory/file
3/usr/share/misc/magicText file
4/usr/local/share/misc/magicText file
5/etc/magicText file
6/opt/local/share/file/magicText file
7/usr/share/file/magic.mgcBinary
8/usr/local/share/misc/magic.mgcBinary
9/opt/local/share/file/magic.mgcBinary
10/etc/magic.mgcBinary
11/usr/share/misc/magic.mgcBinary

If none of these paths exist, rmagic falls back to /usr/share/file/magic.mgc.

Windows

On Windows, the tool checks %APPDATA%\Magic\magic first, then falls back to the bundled third_party/magic.mgc.

Timeout Configuration

The --timeout-ms flag sets a per-file timeout for magic rule evaluation. Each file gets its own independent timeout window. If evaluation exceeds the specified duration, the file is skipped with an error.

# Set a 500ms timeout per file
$ rmagic --timeout-ms 500 large_file.bin

# Combine with strict mode to fail on timeout
$ rmagic --strict --timeout-ms 1000 *.bin

Valid values range from 1 to 300000 (5 minutes).

Multiple File Processing

When multiple files are provided, each file is processed sequentially with independent error handling. A failure in one file does not prevent processing of subsequent files.

$ rmagic --use-builtin image.png archive.zip README.md
image.png: PNG image data
archive.zip: Zip archive data
README.md: data

Errors for individual files are printed to stderr with the filename for context:

$ rmagic --use-builtin good.png /nonexistent bad_perms.bin
good.png: PNG image data
Error processing /nonexistent: ...
Error processing bad_perms.bin: ...

Examples

Identify files with built-in rules

$ rmagic --use-builtin photo.jpg
photo.jpg: JPEG image data, JFIF standard

JSON output for scripting

$ rmagic --use-builtin --json binary.elf | jq '.matches[0].mime_type'
"application/x-executable"

Process files from a directory listing

ls *.bin | xargs rmagic --use-builtin --strict

Custom magic file

$ rmagic --magic-file /path/to/custom.magic firmware.img
firmware.img: ARM firmware image

Pipeline with stdin

$ curl -sL https://example.com/file | rmagic --use-builtin -
stdin: Zip archive data

Strict mode in CI

#!/bin/bash
rmagic --use-builtin --strict --json artifacts/*.bin
if [ $? -ne 0 ]; then
    echo "File identification failed" >&2
    exit 1
fi

Library API

The libmagic_rs crate provides a safe, efficient Rust API for file type identification through magic rule evaluation.

Quick Start

The fastest way to get started is with built-in rules, which require no external files:

#![allow(unused)]
fn main() {
use libmagic_rs::MagicDatabase;

let db = MagicDatabase::with_builtin_rules()?;
let result = db.evaluate_file("sample.bin")?;
println!("File type: {}", result.description);
println!("Confidence: {:.0}%", result.confidence * 100.0);
Ok::<(), Box<dyn std::error::Error>>(())
}

Built-in rules are compiled into the binary at build time and detect common file types including ELF, PE/DOS, ZIP, TAR, GZIP, JPEG, PNG, GIF, BMP, and PDF.

MagicDatabase

MagicDatabase is the main entry point. It holds parsed rules, an evaluation configuration, and a cached MIME mapper. Four constructors are available:

#![allow(unused)]
fn main() {
use libmagic_rs::{MagicDatabase, EvaluationConfig};

// Built-in rules with default config
let db = MagicDatabase::with_builtin_rules()?;

// Built-in rules with custom config
let db = MagicDatabase::with_builtin_rules_and_config(EvaluationConfig::performance())?;

// Load from a file or directory (auto-detects format)
let db = MagicDatabase::load_from_file("/usr/share/misc/magic")?;

// Load from a file or directory with custom config
let config = EvaluationConfig::comprehensive();
let db = MagicDatabase::load_from_file_with_config("/usr/share/misc/magic.d", config)?;
Ok::<(), Box<dyn std::error::Error>>(())
}

When a directory path is given, all magic files within it are loaded (the Magdir pattern). Binary .mgc files are not supported; the library returns a descriptive error if one is encountered.

Evaluation

#![allow(unused)]
fn main() {
use libmagic_rs::MagicDatabase;

let db = MagicDatabase::with_builtin_rules()?;

// Evaluate a file on disk (uses memory-mapped I/O internally)
let result = db.evaluate_file("document.pdf")?;
println!("{}", result.description);

// Evaluate an in-memory buffer (useful for stdin or pre-loaded data)
let result = db.evaluate_buffer(b"\x7fELF\x02\x01\x01\x00")?;
println!("{}", result.description);
Ok::<(), Box<dyn std::error::Error>>(())
}

When no rules match, the description defaults to "data" with confidence 0.0.

Accessors

  • config() -> &EvaluationConfig – returns the active configuration.
  • source_path() -> Option<&Path> – returns the path rules were loaded from, or None for built-in rules.

EvaluationConfig

Controls evaluation behavior with these fields:

FieldTypeDefaultDescription
max_recursion_depthu3220Maximum depth for nested rule evaluation
max_string_lengthusize8192Maximum bytes read for string types
stop_at_first_matchbooltrueStop after the first matching rule
enable_mime_typesboolfalseMap descriptions to MIME types
timeout_msOption<u64>NoneEvaluation timeout in milliseconds

Presets

#![allow(unused)]
fn main() {
use libmagic_rs::EvaluationConfig;

// Default values (same as EvaluationConfig::default())
let default = EvaluationConfig::new();

// Speed: lower limits, 1s timeout, stop at first match
let fast = EvaluationConfig::performance();
assert_eq!(fast.max_recursion_depth, 10);
assert_eq!(fast.timeout_ms, Some(1000));

// Completeness: higher limits, MIME enabled, find all matches, 30s timeout
let full = EvaluationConfig::comprehensive();
assert!(!full.stop_at_first_match);
assert!(full.enable_mime_types);
}

Custom Configuration

#![allow(unused)]
fn main() {
use libmagic_rs::EvaluationConfig;

let config = EvaluationConfig {
    max_recursion_depth: 30,
    max_string_length: 16384,
    stop_at_first_match: false,
    enable_mime_types: true,
    timeout_ms: Some(5000),
};
}

Security Validation

Call validate() to check that values are within safe bounds. All MagicDatabase constructors call validate() automatically.

#![allow(unused)]
fn main() {
use libmagic_rs::EvaluationConfig;

let config = EvaluationConfig::default();
assert!(config.validate().is_ok());

let bad = EvaluationConfig { max_recursion_depth: 0, ..EvaluationConfig::default() };
assert!(bad.validate().is_err());
}

Enforced limits:

  • max_recursion_depth: 1–1000 (prevents stack overflow)
  • max_string_length: 1–1,048,576 bytes (prevents memory exhaustion)
  • timeout_ms: if set, 1–300,000 ms (prevents denial of service)
  • High recursion (>100) combined with large strings (>65,536) is rejected

EvaluationResult

pub struct EvaluationResult {
    pub description: String,          // Human-readable file type (or "data" if unknown)
    pub mime_type: Option<String>,    // MIME type (only when enable_mime_types is true)
    pub confidence: f64,              // 0.0 to 1.0, based on match depth
    pub matches: Vec<RuleMatch>,      // Individual rule matches with offset/level/value
    pub metadata: EvaluationMetadata, // Diagnostics (timing, file size, etc.)
}

Working with Results

#![allow(unused)]
fn main() {
use libmagic_rs::{MagicDatabase, EvaluationConfig};

let config = EvaluationConfig {
    enable_mime_types: true,
    stop_at_first_match: false,
    ..EvaluationConfig::default()
};
let db = MagicDatabase::with_builtin_rules_and_config(config)?;
let result = db.evaluate_file("photo.jpg")?;

println!("Description: {}", result.description);
if let Some(ref mime) = result.mime_type {
    println!("MIME type: {}", mime);
}
println!("Confidence: {:.1}%", result.confidence * 100.0);
for m in &result.matches {
    println!("  offset={}, level={}, message={}", m.offset, m.level, m.message);
}
Ok::<(), Box<dyn std::error::Error>>(())
}

EvaluationMetadata

pub struct EvaluationMetadata {
    pub file_size: u64,              // Size of evaluated file/buffer in bytes
    pub evaluation_time_ms: f64,     // Wall-clock evaluation time
    pub rules_evaluated: usize,      // Number of top-level rules in the database
    pub magic_file: Option<PathBuf>, // Source path, or None for built-in rules
    pub timed_out: bool,             // Whether evaluation hit the timeout
}
#![allow(unused)]
fn main() {
use libmagic_rs::{MagicDatabase, EvaluationConfig};

let config = EvaluationConfig { timeout_ms: Some(2000), ..EvaluationConfig::default() };
let db = MagicDatabase::with_builtin_rules_and_config(config)?;
let result = db.evaluate_buffer(b"\x89PNG\r\n\x1a\n")?;

let meta = &result.metadata;
println!("Size: {} bytes, Time: {:.3} ms, Timed out: {}",
         meta.file_size, meta.evaluation_time_ms, meta.timed_out);
Ok::<(), Box<dyn std::error::Error>>(())
}

MIME Type Mapping

MIME type detection is opt-in via enable_mime_types. When enabled, descriptions are mapped to standard MIME types using an internal lookup table.

#![allow(unused)]
fn main() {
use libmagic_rs::{MagicDatabase, EvaluationConfig};

let config = EvaluationConfig { enable_mime_types: true, ..EvaluationConfig::default() };
let db = MagicDatabase::with_builtin_rules_and_config(config)?;
let result = db.evaluate_buffer(b"%PDF-1.4")?;

if let Some(mime) = &result.mime_type {
    println!("MIME: {}", mime); // e.g., "application/pdf"
}
Ok::<(), Box<dyn std::error::Error>>(())
}

When enable_mime_types is false (the default), mime_type is always None.

Error Handling

All fallible operations return Result<T, LibmagicError> (aliased as libmagic_rs::Result<T>).

LibmagicError Variants

VariantWhen it occurs
ParseError(ParseError)Invalid magic file syntax during loading
EvaluationError(EvaluationError)Rule evaluation failure (buffer overrun, unsupported type)
IoError(std::io::Error)File system errors (not found, permission denied)
Timeout { timeout_ms }Evaluation exceeded the configured timeout
ConfigError { reason }Invalid configuration values
FileError(String)Structured file I/O error with path and operation context

Matching on Errors

#![allow(unused)]
fn main() {
use libmagic_rs::{MagicDatabase, LibmagicError};

let db = MagicDatabase::with_builtin_rules()?;
match db.evaluate_file("missing.bin") {
    Ok(result) => println!("Type: {}", result.description),
    Err(LibmagicError::IoError(e)) => eprintln!("File error: {}", e),
    Err(LibmagicError::Timeout { timeout_ms }) => {
        eprintln!("Timed out after {} ms", timeout_ms);
    }
    Err(LibmagicError::EvaluationError(e)) => eprintln!("Evaluation failed: {}", e),
    Err(e) => eprintln!("Error: {}", e),
}
Ok::<(), Box<dyn std::error::Error>>(())
}

Validating Configuration Early

#![allow(unused)]
fn main() {
use libmagic_rs::{EvaluationConfig, LibmagicError};

let config = EvaluationConfig { max_recursion_depth: 5000, ..EvaluationConfig::default() };
match config.validate() {
    Ok(()) => println!("Config is valid"),
    Err(LibmagicError::ConfigError { reason }) => eprintln!("Bad config: {}", reason),
    Err(e) => eprintln!("Unexpected: {}", e),
}
}

Reading from Standard Input

Use evaluate_buffer to process data piped through stdin:

#![allow(unused)]
fn main() {
use libmagic_rs::MagicDatabase;
use std::io::Read;

let db = MagicDatabase::with_builtin_rules()?;
let mut buffer = Vec::new();
std::io::stdin().read_to_end(&mut buffer)?;
println!("{}", db.evaluate_buffer(&buffer)?.description);
Ok::<(), Box<dyn std::error::Error>>(())
}

Evaluating Multiple Files

A single MagicDatabase can evaluate any number of files. Rules are parsed once and reused:

#![allow(unused)]
fn main() {
use libmagic_rs::MagicDatabase;

let db = MagicDatabase::with_builtin_rules()?;
for path in &["image.png", "archive.tar.gz", "binary.elf"] {
    match db.evaluate_file(path) {
        Ok(result) => println!("{}: {}", path, result.description),
        Err(e) => eprintln!("{}: error: {}", path, e),
    }
}
Ok::<(), Box<dyn std::error::Error>>(())
}

Configuration

libmagic-rs provides a single configuration struct, EvaluationConfig, that controls how magic rules are evaluated. All fields have safe defaults, and the validate() method enforces security bounds before any evaluation begins.

EvaluationConfig

#![allow(unused)]
fn main() {
use libmagic_rs::EvaluationConfig;

pub struct EvaluationConfig {
    pub max_recursion_depth: u32,       // Default: 20, max: 1000
    pub max_string_length: usize,       // Default: 8192, max: 1MB (1_048_576)
    pub stop_at_first_match: bool,      // Default: true
    pub enable_mime_types: bool,         // Default: false
    pub timeout_ms: Option<u64>,        // Default: None, max: 300_000 (5 min)
}
}

Field Reference

FieldTypeDefaultBoundsPurpose
max_recursion_depthu32201 – 1000Limits nested rule traversal depth to prevent stack overflow
max_string_lengthusize81921 – 1_048_576Caps bytes read for string types to prevent memory exhaustion
stop_at_first_matchbooltrue–When true, evaluation stops after the first matching rule
enable_mime_typesboolfalse–When true, maps file type descriptions to standard MIME types
timeout_msOption<u64>None1 – 300_000Per-file evaluation timeout in milliseconds; None disables

Constructor Presets

EvaluationConfig::new() / EvaluationConfig::default()

Returns balanced defaults suitable for most workloads:

#![allow(unused)]
fn main() {
let config = EvaluationConfig::new();

assert_eq!(config.max_recursion_depth, 20);
assert_eq!(config.max_string_length, 8192);
assert!(config.stop_at_first_match);
assert!(!config.enable_mime_types);
assert_eq!(config.timeout_ms, None);
}

EvaluationConfig::performance()

Optimized for high-throughput scenarios where speed matters more than completeness:

#![allow(unused)]
fn main() {
let config = EvaluationConfig::performance();

assert_eq!(config.max_recursion_depth, 10);
assert_eq!(config.max_string_length, 1024);
assert!(config.stop_at_first_match);
assert!(!config.enable_mime_types);
assert_eq!(config.timeout_ms, Some(1000)); // 1 second
}

EvaluationConfig::comprehensive()

Finds all matches with deep analysis, MIME type mapping, and a generous timeout:

#![allow(unused)]
fn main() {
let config = EvaluationConfig::comprehensive();

assert_eq!(config.max_recursion_depth, 50);
assert_eq!(config.max_string_length, 32768);
assert!(!config.stop_at_first_match);
assert!(config.enable_mime_types);
assert_eq!(config.timeout_ms, Some(30000)); // 30 seconds
}

Custom Configuration

Use struct update syntax to override individual fields from any preset:

#![allow(unused)]
fn main() {
use libmagic_rs::EvaluationConfig;

let config = EvaluationConfig {
    max_recursion_depth: 30,
    enable_mime_types: true,
    timeout_ms: Some(5000),
    ..EvaluationConfig::default()
};
}

Validation

Call validate() to check that all values fall within safe bounds. The MagicDatabase constructors call validate() automatically, so you only need to call it explicitly when creating a config that will be stored for later use.

#![allow(unused)]
fn main() {
use libmagic_rs::EvaluationConfig;

let config = EvaluationConfig::default();
assert!(config.validate().is_ok());

let bad = EvaluationConfig {
    max_recursion_depth: 0,
    ..EvaluationConfig::default()
};
assert!(bad.validate().is_err());
}

Validation Rules

The validate() method enforces four categories of security constraints:

Recursion depth – must be between 1 and 1000. A value of 0 is rejected because evaluation cannot proceed without at least one level. Values above 1000 risk stack overflow.

String length – must be between 1 and 1_048_576 (1 MB). A value of 0 is rejected because no string matching could occur. Values above 1 MB risk memory exhaustion.

Timeout – if Some, must be between 1 and 300_000 (5 minutes). A value of 0 is rejected as meaningless. Values above 5 minutes risk denial-of-service. None (no timeout) is always accepted.

Resource combination – a recursion depth above 100 combined with a string length above 65_536 is rejected. Deep recursion with large string reads at every level can compound into excessive resource consumption even when each value individually falls within safe bounds.

Using Configuration with MagicDatabase

Built-in Rules with Default Config

#![allow(unused)]
fn main() {
use libmagic_rs::MagicDatabase;

let db = MagicDatabase::with_builtin_rules()?;
let result = db.evaluate_buffer(b"\x7fELF")?;
}

Built-in Rules with Custom Config

#![allow(unused)]
fn main() {
use libmagic_rs::{MagicDatabase, EvaluationConfig};

let config = EvaluationConfig {
    timeout_ms: Some(5000),
    ..EvaluationConfig::default()
};
let db = MagicDatabase::with_builtin_rules_and_config(config)?;
}

Magic File with Custom Config

#![allow(unused)]
fn main() {
use libmagic_rs::{MagicDatabase, EvaluationConfig};

let config = EvaluationConfig::performance();
let db = MagicDatabase::load_from_file_with_config("custom.magic", config)?;
}

All three constructors call config.validate() internally and return an error if the configuration is invalid. There is no way to create a MagicDatabase with an invalid configuration.

CLI Usage

The command-line interface exposes the timeout_ms field via the --timeout-ms flag. All other configuration values use their defaults when running from the CLI.

# No timeout (default)
libmagic-rs sample.bin

# 5-second timeout per file
libmagic-rs --timeout-ms 5000 sample.bin

If evaluation exceeds the timeout, the file is skipped and an error message is printed to stderr with exit code 5.

Choosing a Preset

ScenarioPresetWhy
General file identificationdefault()Balanced depth and limits
Batch processing many filesperformance()Low limits, 1s timeout, early exit
Forensic analysiscomprehensive()Deep traversal, all matches, MIME types
Untrusted inputperformance()Tight bounds reduce attack surface
Custom requirementsStruct update syntaxOverride specific fields from any preset

Architecture Overview

The libmagic-rs library is designed around a clean separation of concerns, following a parser-evaluator architecture that promotes maintainability, testability, and performance.

High-Level Architecture

flowchart LR
    subgraph Input
        MF[Magic File]
        TF[Target File]
    end

    subgraph Processing
        P[Parser]
        AST[AST]
        FB[File Buffer]
        E[Evaluator]
    end

    subgraph Output
        R[Results]
        F[Formatter]
        O[Output]
    end

    MF --> P --> AST --> E
    TF --> FB --> E
    E --> R --> F --> O

    style MF fill:#1a3a5c,stroke:#4a9eff,color:#e0e0e0
    style TF fill:#1a3a5c,stroke:#4a9eff,color:#e0e0e0
    style P fill:#4a3000,stroke:#ffb74d,color:#e0e0e0
    style AST fill:#4a3000,stroke:#ffb74d,color:#e0e0e0
    style FB fill:#4a3000,stroke:#ffb74d,color:#e0e0e0
    style E fill:#4a3000,stroke:#ffb74d,color:#e0e0e0
    style R fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0
    style F fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0
    style O fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0

Core Components

1. Parser Module (src/parser/)

The parser is responsible for converting magic files (text-based DSL) into an Abstract Syntax Tree (AST).

Key Files:

  • ast.rs: Core data structures representing magic rules (βœ… Complete)
  • grammar.rs: nom-based parsing components for magic file syntax (βœ… Complete)
  • mod.rs: Parser interface, format detection, and hierarchical rule building (βœ… Complete)

Responsibilities:

  • Parse magic file syntax into structured data (βœ… Complete)
  • Handle hierarchical rule relationships (βœ… Complete)
  • Validate syntax and report meaningful errors (βœ… Complete)
  • Detect file format (text, directory, binary) (βœ… Complete)
  • Support incremental parsing for large magic databases (πŸ“‹ Planned)

Current Implementation Status:

  • βœ… Number parsing: Decimal and hexadecimal with overflow protection
  • βœ… Offset parsing: Absolute offsets with comprehensive validation
  • βœ… Operator parsing: Equality (=, ==), inequality (!=, <>), comparison (<, >, <=, >=), bitwise (&, ^, ~), and any-value (x) operators
  • βœ… Value parsing: Strings, numbers, and hex byte sequences with escape sequences
  • βœ… Error handling: Comprehensive nom error handling with meaningful messages
  • βœ… Rule parsing: Complete rule parsing via parse_magic_rule()
  • βœ… File parsing: Complete magic file parsing with parse_text_magic_file()
  • βœ… Hierarchy building: Parent-child relationships via build_rule_hierarchy()
  • βœ… Format detection: Text, directory, and binary format detection
  • πŸ“‹ Indirect offsets: Pointer dereferencing patterns

2. AST Data Structures (src/parser/ast.rs)

The AST provides a complete representation of magic rules in memory.

Core Types:

#![allow(unused)]
fn main() {
pub struct MagicRule {
    pub offset: OffsetSpec,       // Where to read data
    pub typ: TypeKind,            // How to interpret bytes
    pub op: Operator,             // Comparison operation
    pub value: Value,             // Expected value
    pub message: String,          // Human-readable description
    pub children: Vec<MagicRule>, // Nested rules
    pub level: u32,               // Indentation level
}

pub enum TypeKind {
    Byte { signed: bool },        // Single byte with explicit signedness
    Short { endian: Endianness, signed: bool },
    Long { endian: Endianness, signed: bool },
    Quad { endian: Endianness, signed: bool },
    String { max_length: Option<usize> },
    PString { max_length: Option<usize> }, // Pascal string (length-prefixed)
}

pub enum Operator {
    Equal,                        // = or ==
    NotEqual,                     // != or <>
    LessThan,                     // <
    GreaterThan,                  // >
    LessEqual,                    // <=
    GreaterEqual,                 // >=
    BitwiseAnd,                   // &
    BitwiseAndMask(u64),          // & with mask
    BitwiseXor,                   // ^
    BitwiseNot,                   // ~
    AnyValue,                     // x (always matches)
}
}

Design Principles:

  • Immutable by default: Rules don’t change after parsing
  • Serializable: Full serde support for caching
  • Self-contained: No external dependencies in AST nodes
  • Type-safe: Rust’s type system prevents invalid rule combinations
  • Explicit signedness: TypeKind::Byte and integer types (Short, Long, Quad) distinguish signed from unsigned interpretations

3. Evaluator Module (src/evaluator/)

The evaluator executes magic rules against file buffers to identify file types. (βœ… Complete)

Structure:

  • mod.rs: Public API surface (~720 lines) with EvaluationContext, RuleMatch types, and re-exports
  • engine/: Core evaluation engine submodule
    • mod.rs: evaluate_single_rule, evaluate_rules, and evaluate_rules_with_config functions
    • tests.rs: Engine unit tests
  • types/: Type interpretation submodule
    • mod.rs: Public API surface with read_typed_value, coerce_value_to_type, and type re-exports
    • numeric.rs: Numeric type handling (read_byte, read_short, read_long, read_quad) with endianness and signedness support
    • string.rs: String type handling (read_string) with null-termination and UTF-8 conversion
    • tests.rs: Module tests
  • offset/: Offset resolution submodule
    • mod.rs: Dispatcher (resolve_offset) and re-exports
    • absolute.rs: OffsetError, resolve_absolute_offset
    • indirect.rs: resolve_indirect_offset stub (issue #37)
    • relative.rs: resolve_relative_offset stub (issue #38)
  • operators/: Operator application submodule
    • mod.rs: Dispatcher (apply_operator) and re-exports
    • equality.rs: apply_equal, apply_not_equal
    • comparison.rs: compare_values, apply_less_than/greater_than/less_equal/greater_equal
    • bitwise.rs: apply_bitwise_and, apply_bitwise_and_mask, apply_bitwise_xor, apply_bitwise_not

Organization Note: The evaluator module has been refactored to split monolithic files into focused submodules. The initial refactoring split a 2,638-line mod.rs into engine/ submodules, and a subsequent refactoring reorganized the 1,836-line types.rs into types/ submodules for numeric and string handling. The public API surface remains in mod.rs with core logic distributed across focused submodules. This maintains the same public API through re-exports (no breaking changes) while improving code organization and staying within the 500-600 line module guideline.

Implemented Features:

  • βœ… Hierarchical Evaluation: Parent rules must match before children
  • βœ… Lazy Evaluation: Only process rules when necessary
  • βœ… Bounds Checking: Safe buffer access with overflow protection
  • βœ… Context Preservation: Maintain state across rule evaluations
  • βœ… Graceful Degradation: Skip problematic rules, continue evaluation
  • βœ… Timeout Protection: Configurable time limits
  • βœ… Recursion Limiting: Prevent stack overflow from deep nesting
  • βœ… Signedness Coercion: Automatic value coercion for signed type comparisons (e.g., 0xff β†’ -1 for signed byte)
  • βœ… Comparison Operators: Full support for <, >, <=, >= with numeric and lexicographic ordering
  • πŸ“‹ Indirect Offsets: Pointer dereferencing (planned)

4. I/O Module (src/io/)

Provides efficient file access through memory-mapped I/O. (βœ… Complete)

Implemented Features:

  • FileBuffer: Memory-mapped file buffers using memmap2
  • Safe buffer access: Comprehensive bounds checking with safe_read_bytes and safe_read_byte
  • Error handling: Structured IoError types for all failure scenarios
  • Resource management: RAII patterns with automatic cleanup
  • File validation: Size limits, empty file detection, and metadata validation
  • Overflow protection: Safe arithmetic in all buffer operations

Key Components:

#![allow(unused)]
fn main() {
pub struct FileBuffer {
    mmap: Mmap,
    path: PathBuf,
}

pub fn safe_read_bytes(buffer: &[u8], offset: usize, length: usize) -> Result<&[u8], IoError>
pub fn safe_read_byte(buffer: &[u8], offset: usize) -> Result<u8, IoError>
pub fn validate_buffer_access(buffer_size: usize, offset: usize, length: usize) -> Result<(), IoError>
}

5. Output Module (src/output/)

Formats evaluation results into different output formats.

Planned Formatters:

  • text.rs: Human-readable output (GNU file compatible)
  • json.rs: Structured JSON output with metadata
  • mod.rs: Format selection and coordination

Data Flow

1. Magic File Loading

flowchart LR
    A[Magic File\ntext] --> B[Parser]
    B --> C[AST]
    C --> D[Validation]
    D --> E[Cached Rules]

    style A fill:#1a3a5c,stroke:#4a9eff,color:#e0e0e0
    style E fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0
  1. Parsing: Convert text DSL to structured AST
  2. Validation: Check rule consistency and dependencies
  3. Optimization: Reorder rules for evaluation efficiency
  4. Caching: Serialize compiled rules for reuse

2. File Evaluation

flowchart LR
    A[Target File] --> B[Memory Map]
    B --> C[Buffer]
    C --> D[Rule Evaluation]
    D --> E[Results]
    E --> F[Formatting]

    style A fill:#1a3a5c,stroke:#4a9eff,color:#e0e0e0
    style F fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0
  1. File Access: Create memory-mapped buffer
  2. Rule Matching: Execute rules hierarchically
  3. Result Collection: Gather matches and metadata
  4. Output Generation: Format results as text or JSON

Design Patterns

Parser-Evaluator Separation

The clear separation between parsing and evaluation provides:

  • Independent Testing: Each component can be tested in isolation
  • Performance Optimization: Rules can be pre-compiled and cached
  • Flexible Input: Support for different magic file formats
  • Error Isolation: Parse errors vs. evaluation errors are distinct

Hierarchical Rule Processing

Magic rules form a tree structure where:

  • Parent rules define broad file type categories
  • Child rules provide specific details and variants
  • Evaluation stops when a definitive match is found
  • Context flows from parent to child evaluations
flowchart TD
    R["Root Rule<br/>e.g., 0 string PK"]
    R -->|match| C1["Child Rule 1<br/>e.g., #gt;4 ubyte 0x14"]
    R -->|match| C2["Child Rule 2<br/>e.g., #gt;4 ubyte 0x06"]
    C1 -->|match| G1["Grandchild<br/>ZIP archive v2.0"]
    C2 -->|match| G2["Grandchild<br/>ZIP archive v1.0"]

    style R fill:#1a3a5c,stroke:#4a9eff,color:#e0e0e0
    style C1 fill:#4a3000,stroke:#ffb74d,color:#e0e0e0
    style C2 fill:#4a3000,stroke:#ffb74d,color:#e0e0e0
    style G1 fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0
    style G2 fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0

Operator Support:

The evaluator supports all comparison, bitwise, and special matching operators:

  • Equality: = or == (exact match)
  • Inequality: != or <> (not equal)
  • Less-than: < (numeric or lexicographic)
  • Greater-than: > (numeric or lexicographic)
  • Less-equal: <= (numeric or lexicographic)
  • Greater-equal: >= (numeric or lexicographic)
  • Bitwise AND: & (bit pattern matching)
  • Bitwise XOR: ^ (exclusive OR pattern matching)
  • Bitwise NOT: ~ (bitwise complement comparison)
  • Any-value: x (unconditional match, always succeeds)

Comparison operators support both numeric comparisons (with automatic type coercion between signed and unsigned integers via i128) and lexicographic comparisons for strings and byte sequences.

Memory-Safe Buffer Access

All buffer operations use safe Rust patterns:

#![allow(unused)]
fn main() {
// Safe buffer access with bounds checking
fn read_bytes(buffer: &[u8], offset: usize, length: usize) -> Option<&[u8]> {
    buffer.get(offset..offset.saturating_add(length))
}
}

Error Handling Strategy

The library uses Result types with nested error enums throughout:

#![allow(unused)]
fn main() {
pub type Result<T> = std::result::Result<T, LibmagicError>;

#[derive(Debug, thiserror::Error)]
pub enum LibmagicError {
    #[error("Parse error: {0}")]
    ParseError(#[from] ParseError),

    #[error("Evaluation error: {0}")]
    EvaluationError(#[from] EvaluationError),

    #[error("I/O error: {0}")]
    IoError(#[from] std::io::Error),

    #[error("Evaluation timeout exceeded after {timeout_ms}ms")]
    Timeout { timeout_ms: u64 },
}

#[derive(Debug, thiserror::Error)]
pub enum ParseError {
    #[error("Invalid syntax at line {line}: {message}")]
    InvalidSyntax { line: usize, message: String },

    #[error("Unsupported format at line {line}: {format_type}")]
    UnsupportedFormat { line: usize, format_type: String, message: String },
    // ... additional variants
}

#[derive(Debug, thiserror::Error)]
pub enum EvaluationError {
    #[error("Buffer overrun at offset {offset}")]
    BufferOverrun { offset: usize },

    #[error("Recursion limit exceeded (depth: {depth})")]
    RecursionLimitExceeded { depth: u32 },
    // ... additional variants
}
}

Performance Considerations

Memory Efficiency

  • Zero-copy operations where possible
  • Memory-mapped I/O to avoid loading entire files
  • Lazy evaluation to skip unnecessary work
  • Rule caching to avoid re-parsing magic files

Computational Efficiency

  • Early termination when definitive matches are found
  • Optimized rule ordering based on match probability
  • Efficient string matching using algorithms like Aho-Corasick
  • Minimal allocations in hot paths

Scalability

  • Parallel evaluation for multiple files (future)
  • Streaming support for large files (future)
  • Incremental parsing for large magic databases
  • Resource limits to prevent runaway evaluations

Module Dependencies

flowchart TD
    L[lib.rs<br/>Public API and coordination]
    L --> P[parser/<br/>Magic file parsing]
    L --> E[evaluator/<br/>Rule evaluation engine]
    L --> O[output/<br/>Result formatting]
    L --> I[io/<br/>File I/O utilities]
    L --> ER[error.rs<br/>Error types]

    P --> ER
    E --> P
    E --> I
    E --> ER
    O --> ER

    style L fill:#2a1a4a,stroke:#b39ddb,color:#e0e0e0
    style P fill:#4a3000,stroke:#ffb74d,color:#e0e0e0
    style E fill:#4a3000,stroke:#ffb74d,color:#e0e0e0
    style O fill:#4a3000,stroke:#ffb74d,color:#e0e0e0
    style I fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0
    style ER fill:#4a1a1a,stroke:#ef5350,color:#e0e0e0

Dependency Rules:

  • No circular dependencies between modules
  • Clear interfaces with well-defined responsibilities
  • Minimal coupling between components
  • Testable boundaries for each module

This architecture ensures the library is maintainable, performant, and extensible while providing a clean API for both CLI and library usage.

AST Data Structures

The Abstract Syntax Tree (AST) is the core representation of magic rules in libmagic-rs. This chapter provides detailed documentation of the fully implemented AST data structures with comprehensive test coverage (29 unit tests) and their usage patterns.

Overview

The AST consists of several key types that work together to represent magic rules:

  • MagicRule: The main rule structure containing all components
  • OffsetSpec: Specifies where to read data in files
  • TypeKind: Defines how to interpret bytes
  • Operator: Comparison and bitwise operations
  • Value: Expected values for matching
  • Endianness: Byte order specifications

MagicRule Structure

The MagicRule struct is the primary AST node representing a complete magic rule:

#![allow(unused)]
fn main() {
pub struct MagicRule {
    pub offset: OffsetSpec,       // Where to read data
    pub typ: TypeKind,            // How to interpret bytes
    pub op: Operator,             // Comparison operation
    pub value: Value,             // Expected value
    pub message: String,          // Human-readable description
    pub children: Vec<MagicRule>, // Nested rules
    pub level: u32,               // Indentation level
}
}

Example Usage

#![allow(unused)]
fn main() {
use libmagic_rs::parser::ast::*;

// ELF magic number rule
let elf_rule = MagicRule {
    offset: OffsetSpec::Absolute(0),
    typ: TypeKind::Long {
        endian: Endianness::Little,
        signed: false
    },
    op: Operator::Equal,
    value: Value::Bytes(vec![0x7f, 0x45, 0x4c, 0x46]), // "\x7fELF"
    message: "ELF executable".to_string(),
    children: vec![],
    level: 0,
};
}

Hierarchical Rules

Magic rules can contain child rules that are evaluated when the parent matches:

#![allow(unused)]
fn main() {
let parent_rule = MagicRule {
    offset: OffsetSpec::Absolute(0),
    typ: TypeKind::Byte { signed: false },
    op: Operator::Equal,
    value: Value::Uint(0x7f),
    message: "ELF".to_string(),
    children: vec![
        MagicRule {
            offset: OffsetSpec::Absolute(4),
            typ: TypeKind::Byte { signed: false },
            op: Operator::Equal,
            value: Value::Uint(1),
            message: "32-bit".to_string(),
            children: vec![],
            level: 1,
        },
        MagicRule {
            offset: OffsetSpec::Absolute(4),
            typ: TypeKind::Byte { signed: false },
            op: Operator::Equal,
            value: Value::Uint(2),
            message: "64-bit".to_string(),
            children: vec![],
            level: 1,
        },
    ],
    level: 0,
};
}

OffsetSpec Variants

The OffsetSpec enum defines where to read data within a file:

Absolute Offsets

#![allow(unused)]
fn main() {
pub enum OffsetSpec {
    /// Absolute offset from file start
    Absolute(i64),
    // ... other variants
}
}

Examples:

#![allow(unused)]
fn main() {
// Read at byte 0 (file start)
let start = OffsetSpec::Absolute(0);

// Read at byte 16
let offset_16 = OffsetSpec::Absolute(16);

// Read 4 bytes before current position (negative offset)
let relative_back = OffsetSpec::Absolute(-4);
}

Indirect Offsets

Indirect offsets read a pointer value and use it as the actual offset:

#![allow(unused)]
fn main() {
Indirect {
    base_offset: i64,        // Where to read the pointer
    pointer_type: TypeKind,  // How to interpret the pointer
    adjustment: i64,         // Value to add to pointer
    endian: Endianness,      // Byte order for pointer
}
}

Example:

#![allow(unused)]
fn main() {
// Read a 32-bit little-endian pointer at offset 0x20,
// then read data at (pointer_value + 4)
let indirect = OffsetSpec::Indirect {
    base_offset: 0x20,
    pointer_type: TypeKind::Long {
        endian: Endianness::Little,
        signed: false
    },
    adjustment: 4,
    endian: Endianness::Little,
};
}

Relative and FromEnd Offsets

#![allow(unused)]
fn main() {
// Relative to previous match position
Relative(i64),

// Relative to end of file
FromEnd(i64),
}

Examples:

#![allow(unused)]
fn main() {
// 8 bytes after previous match
let relative = OffsetSpec::Relative(8);

// 16 bytes before end of file
let from_end = OffsetSpec::FromEnd(-16);
}

TypeKind Variants

The TypeKind enum specifies how to interpret bytes at the given offset:

Breaking Change in v0.2.0: The Byte variant changed from a unit variant (Byte) to a struct variant (Byte { signed: bool }). Code that pattern-matches exhaustively on TypeKind requires updates.

Numeric Types

#![allow(unused)]
fn main() {
pub enum TypeKind {
    /// Single byte (8-bit)
    Byte { signed: bool },

    /// 16-bit integer
    Short { endian: Endianness, signed: bool },

    /// 32-bit integer
    Long { endian: Endianness, signed: bool },

    /// 64-bit integer
    Quad { endian: Endianness, signed: bool },

    /// String data
    String { max_length: Option<usize> },

    /// Pascal string (length-prefixed)
    PString { max_length: Option<usize> },
}
}

Examples:

#![allow(unused)]
fn main() {
// Single unsigned byte
let byte_type = TypeKind::Byte { signed: false };

// Single signed byte
let signed_byte_type = TypeKind::Byte { signed: true };

// 16-bit little-endian unsigned integer
let short_le = TypeKind::Short {
    endian: Endianness::Little,
    signed: false
};

// 32-bit big-endian signed integer
let long_be = TypeKind::Long {
    endian: Endianness::Big,
    signed: true
};

// 64-bit little-endian unsigned integer
let quad_le = TypeKind::Quad {
    endian: Endianness::Little,
    signed: false
};

// 64-bit big-endian signed integer
let quad_be = TypeKind::Quad {
    endian: Endianness::Big,
    signed: true
};

// Null-terminated string, max 256 bytes
let string_type = TypeKind::String {
    max_length: Some(256)
};
}

PString (Pascal String)

Pascal-style length-prefixed strings where the first byte contains the string length.

Structure:

  • Length byte: 1 byte indicating string length (0-255)
  • String data: The number of bytes specified by the length byte

Example:

0    pstring    JPEG

Reads one byte as length, then reads that many bytes as a string.

Behavior:

  • Returns Value::String containing the string data (without the length prefix)
  • Performs bounds checking on both the length byte and the string data
  • Supports all string comparison operators

Usage:

#![allow(unused)]
fn main() {
// Pascal string with no length limit
let pstring_type = TypeKind::PString {
    max_length: None
};

// Pascal string with maximum 64-byte limit
let limited_pstring = TypeKind::PString {
    max_length: Some(64)
};
}

Endianness Options

#![allow(unused)]
fn main() {
pub enum Endianness {
    Little, // Little-endian (x86, ARM in little mode)
    Big,    // Big-endian (network byte order, PowerPC)
    Native, // Host system byte order
}
}

Operator Types

The Operator enum defines comparison and bitwise operations:

#![allow(unused)]
fn main() {
pub enum Operator {
    Equal,          // == (equality comparison)
    NotEqual,       // != (inequality comparison)
    LessThan,       // < (less-than comparison)
    GreaterThan,    // > (greater-than comparison)
    LessEqual,      // <= (less-than-or-equal comparison)
    GreaterEqual,   // >= (greater-than-or-equal comparison)
    BitwiseAnd,     // & (bitwise AND for pattern matching)
    BitwiseAndMask(u64), // & (bitwise AND with mask value)
}
}

Added in v0.2.0: The comparison operators LessThan, GreaterThan, LessEqual, and GreaterEqual were added. This is a breaking change for exhaustive matches on Operator.

Usage Examples:

#![allow(unused)]
fn main() {
// Exact match
let equal_op = Operator::Equal;

// Not equal
let not_equal_op = Operator::NotEqual;

// Less than comparison
let less_op = Operator::LessThan;

// Greater than comparison
let greater_op = Operator::GreaterThan;

// Less than or equal
let less_equal_op = Operator::LessEqual;

// Greater than or equal
let greater_equal_op = Operator::GreaterEqual;

// Bitwise AND (useful for flag checking)
let bitwise_op = Operator::BitwiseAnd;

// Bitwise AND with mask
let bitwise_mask_op = Operator::BitwiseAndMask(0xFF00);
}

Value Types

The Value enum represents expected values for comparison:

#![allow(unused)]
fn main() {
pub enum Value {
    Uint(u64),      // Unsigned integer
    Int(i64),       // Signed integer
    Bytes(Vec<u8>), // Byte sequence
    String(String), // String value
}
}

Examples:

#![allow(unused)]
fn main() {
// Unsigned integer value
let uint_val = Value::Uint(0x464c457f);

// Signed integer value
let int_val = Value::Int(-1);

// Byte sequence (magic numbers)
let bytes_val = Value::Bytes(vec![0x50, 0x4b, 0x03, 0x04]); // ZIP signature

// String value
let string_val = Value::String("#!/bin/sh".to_string());
}

Serialization Support

All AST types implement Serialize and Deserialize for caching and interchange with comprehensive test coverage:

#![allow(unused)]
fn main() {
use serde_json;

// Serialize a rule to JSON (fully tested)
let rule = MagicRule { /* ... */ };
let json = serde_json::to_string(&rule)?;

// Deserialize from JSON (fully tested)
let rule: MagicRule = serde_json::from_str(&json)?;

// All edge cases are tested including:
// - Empty collections (Vec::new(), String::new())
// - Extreme values (u64::MAX, i64::MIN, i64::MAX)
// - Complex nested structures with multiple levels
// - All enum variants and their serialization round-trips
}

Implementation Status:

  • βœ… Complete serialization for all AST types
  • βœ… Comprehensive testing with edge cases and boundary values
  • βœ… JSON compatibility for rule caching and interchange
  • βœ… Round-trip validation ensuring data integrity

Common Patterns

ELF File Detection

#![allow(unused)]
fn main() {
let elf_rules = vec![
    MagicRule {
        offset: OffsetSpec::Absolute(0),
        typ: TypeKind::Long { endian: Endianness::Little, signed: false },
        op: Operator::Equal,
        value: Value::Bytes(vec![0x7f, 0x45, 0x4c, 0x46]),
        message: "ELF".to_string(),
        children: vec![
            MagicRule {
                offset: OffsetSpec::Absolute(4),
                typ: TypeKind::Byte { signed: false },
                op: Operator::Equal,
                value: Value::Uint(1),
                message: "32-bit".to_string(),
                children: vec![],
                level: 1,
            },
            MagicRule {
                offset: OffsetSpec::Absolute(4),
                typ: TypeKind::Byte { signed: false },
                op: Operator::Equal,
                value: Value::Uint(2),
                message: "64-bit".to_string(),
                children: vec![],
                level: 1,
            },
        ],
        level: 0,
    }
];
}

ZIP Archive Detection

#![allow(unused)]
fn main() {
let zip_rule = MagicRule {
    offset: OffsetSpec::Absolute(0),
    typ: TypeKind::Long { endian: Endianness::Little, signed: false },
    op: Operator::Equal,
    value: Value::Bytes(vec![0x50, 0x4b, 0x03, 0x04]),
    message: "ZIP archive".to_string(),
    children: vec![],
    level: 0,
};
}

Script Detection with String Matching

#![allow(unused)]
fn main() {
let script_rule = MagicRule {
    offset: OffsetSpec::Absolute(0),
    typ: TypeKind::String { max_length: Some(32) },
    op: Operator::Equal,
    value: Value::String("#!/bin/bash".to_string()),
    message: "Bash script".to_string(),
    children: vec![],
    level: 0,
};
}

Best Practices

Rule Organization

  1. Start with broad patterns and use child rules for specifics
  2. Order rules by probability of matching (most common first)
  3. Use appropriate types for the data being checked
  4. Minimize indirection for performance

Type Selection

  1. Use Byte { signed } for single-byte values and flags, specifying signedness
  2. Use Short/Long/Quad with explicit endianness and signedness for multi-byte integers
  3. Use String with length limits for text patterns
  4. Use PString for Pascal-style length-prefixed strings
  5. Use Bytes for exact binary sequences

Performance Considerations

  1. Prefer absolute offsets over indirect when possible
  2. Use bitwise AND for flag checking instead of multiple equality rules
  3. Limit string lengths to prevent excessive reading
  4. Structure hierarchies to fail fast on non-matches

The AST provides a flexible, type-safe foundation for representing magic rules while maintaining compatibility with existing magic file formats.

Parser Implementation

The libmagic-rs parser is built using the nom parser combinator library, providing a robust and efficient way to parse magic file syntax into our AST representation.

Architecture Overview

The parser follows a modular design where individual components are implemented and tested separately, then composed into higher-level parsers:

Magic File Text β†’ Individual Parsers β†’ Combined Parsers β†’ Complete AST
                      ↓
              Numbers, Offsets, Operators, Values β†’ Rules β†’ Rule Hierarchies

Implemented Components

Number Parsing (parse_number)

Handles both decimal and hexadecimal number formats with comprehensive overflow protection:

#![allow(unused)]
fn main() {
// Decimal numbers
parse_number("123")    // Ok(("", 123))
parse_number("-456")   // Ok(("", -456))

// Hexadecimal numbers
parse_number("0x1a")   // Ok(("", 26))
parse_number("-0xFF")  // Ok(("", -255))
}

Features:

  • βœ… Decimal and hexadecimal format support
  • βœ… Signed and unsigned number handling
  • βœ… Overflow protection with proper error reporting
  • βœ… Comprehensive test coverage (15+ test cases)

Offset Parsing (parse_offset)

Converts numeric values into OffsetSpec::Absolute variants:

#![allow(unused)]
fn main() {
// Basic offsets
parse_offset("0")      // Ok(("", OffsetSpec::Absolute(0)))
parse_offset("0x10")   // Ok(("", OffsetSpec::Absolute(16)))
parse_offset("-4")     // Ok(("", OffsetSpec::Absolute(-4)))

// With whitespace handling
parse_offset(" 123 ")  // Ok(("", OffsetSpec::Absolute(123)))
}

Features:

  • βœ… Absolute offset parsing with full number format support
  • βœ… Whitespace handling (leading and trailing)
  • βœ… Negative offset support for relative positioning
  • πŸ“‹ Indirect offset parsing (planned)
  • πŸ“‹ Relative offset parsing (planned)

Operator Parsing (parse_operator)

Parses comparison and bitwise operators with multiple syntax variants:

#![allow(unused)]
fn main() {
// Equality operators
parse_operator("=")    // Ok(("", Operator::Equal))
parse_operator("==")   // Ok(("", Operator::Equal))

// Inequality operators
parse_operator("!=")   // Ok(("", Operator::NotEqual))
parse_operator("<>")   // Ok(("", Operator::NotEqual))

// Comparison operators (v0.2.0+)
parse_operator("<")    // Ok(("", Operator::LessThan))
parse_operator(">")    // Ok(("", Operator::GreaterThan))
parse_operator("<=")   // Ok(("", Operator::LessEqual))
parse_operator(">=")   // Ok(("", Operator::GreaterEqual))

// Bitwise operators
parse_operator("&")    // Ok(("", Operator::BitwiseAnd))
parse_operator("^")    // Ok(("", Operator::BitwiseXor))
parse_operator("~")    // Ok(("", Operator::BitwiseNot))

// Any-value operator (always matches)
parse_operator("x")    // Ok(("", Operator::AnyValue))
}

Features:

  • βœ… Multiple syntax variants for compatibility
  • βœ… Precedence handling (longer operators matched first)
  • βœ… Whitespace tolerance
  • βœ… Invalid operator rejection with clear errors
  • βœ… Ten comparison and bitwise operators supported, plus AnyValue (x)

Note: Comparison operators (<, >, <=, >=) were implemented in v0.2.0 via #104.

Value Parsing (parse_value)

Handles multiple value types with intelligent type detection:

#![allow(unused)]
fn main() {
// String literals with escape sequences
parse_value("\"Hello\"")           // Value::String("Hello".to_string())
parse_value("\"Line1\\nLine2\"")   // Value::String("Line1\nLine2".to_string())

// Floating-point literals
parse_value("3.14")                // Value::Float(3.14)
parse_value("-1.0")                // Value::Float(-1.0)
parse_value("2.5e10")              // Value::Float(2.5e10)

// Numeric values
parse_value("123")                 // Value::Uint(123)
parse_value("-456")                // Value::Int(-456)
parse_value("0x1a")                // Value::Uint(26)

// Hex byte sequences
parse_value("\\x7f\\x45")          // Value::Bytes(vec![0x7f, 0x45])
parse_value("7f454c46")            // Value::Bytes(vec![0x7f, 0x45, 0x4c, 0x46])
}

Features:

  • βœ… Quoted string parsing with escape sequence support
  • βœ… Floating-point literal parsing with scientific notation support
  • βœ… Numeric literal parsing (decimal and hexadecimal)
  • βœ… Hex byte sequence parsing (with and without \x prefix)
  • βœ… Intelligent type precedence to avoid parsing conflicts
  • βœ… Comprehensive escape sequence handling (\n, \t, \r, \\, \", \', \0)

Float and Double Type Parsing (parse_float_value)

Parses floating-point type specifiers and literals for IEEE 754 single (32-bit) and double-precision (64-bit) values:

#![allow(unused)]
fn main() {
// Float literals
parse_float_value("3.14")          // Ok(("", 3.14))
parse_float_value("-0.5")          // Ok(("", -0.5))
parse_float_value("1.0e-10")       // Ok(("", 1.0e-10))
parse_float_value("2.5E+3")        // Ok(("", 2.5e+3))
}

Type Keywords:

Six floating-point type keywords are supported, each mapping to TypeKind::Float or TypeKind::Double with an Endianness field:

  • float - 32-bit IEEE 754, native endianness β†’ TypeKind::Float { endian: Endianness::Native }
  • befloat - 32-bit IEEE 754, big-endian β†’ TypeKind::Float { endian: Endianness::Big }
  • lefloat - 32-bit IEEE 754, little-endian β†’ TypeKind::Float { endian: Endianness::Little }
  • double - 64-bit IEEE 754, native endianness β†’ TypeKind::Double { endian: Endianness::Native }
  • bedouble - 64-bit IEEE 754, big-endian β†’ TypeKind::Double { endian: Endianness::Big }
  • ledouble - 64-bit IEEE 754, little-endian β†’ TypeKind::Double { endian: Endianness::Little }

Float Literal Grammar:

The parse_float_value function recognizes standard floating-point notation with a mandatory decimal point to distinguish floats from integers:

[-]digits.digits[{e|E}[{+|-}]digits]

Examples: 3.14, -0.5, 1.0e-10, 2.5E+3

Parsed literals are stored as Value::Float(f64) in the AST, regardless of whether the rule uses float or double (the type determines buffer read size, not literal representation).

Usage in Magic Rules:

#![allow(unused)]
fn main() {
// Native-endian float comparison
0 float x        // Match any float value
0 float =3.14    // Match if float equals 3.14

// Big-endian double comparison
0 bedouble >1.5  // Match if big-endian double > 1.5
}

Features:

  • βœ… Six type keywords for float and double with endianness variants
  • βœ… Float literal parsing with decimal point, negative values, scientific notation
  • βœ… Value::Float(f64) AST variant for floating-point literals
  • βœ… Type precedence ensures floats parsed before integers (decimal point disambiguates)
  • βœ… Comprehensive test coverage for all endianness variants and literal formats

Note: Float and double types do not have signed/unsigned variants. IEEE 754 handles sign internally via the sign bit, so all float types use a single TypeKind variant with only an endian field (no signed: bool field).

Pascal String (pstring) Type

The parser supports Pascal-style length-prefixed strings through the pstring keyword:

Type Keyword:

  • pstring - Length-prefixed string (1-byte length + string data) β†’ TypeKind::PString { max_length: None }

Format:

Pascal strings store the length as the first byte (0-255), followed by that many bytes of string data. Unlike C strings, they are not null-terminated.

Parser Implementation:

  • Recognized by parse_type_keyword() in src/parser/types.rs
  • Maps to TypeKind::PString in the AST
  • Evaluator reads length prefix byte then that many bytes as string data
  • Stored as Value::String for comparison with string operators
  • Supports optional max_length field to cap the length byte value

Usage in Magic Rules:

#![allow(unused)]
fn main() {
// Basic pstring matching
0 pstring =Hello     // Match if pstring equals "Hello"
0 pstring x          // Match any pstring value

// With max_length constraint (parsed separately)
0 pstring/64 x       // Limit string read to 64 bytes
}

Features:

  • βœ… Single type keyword pstring
  • βœ… Length-prefixed format (1 byte length, 0-255 bytes data)
  • βœ… Bounds checking for both length byte and string data
  • βœ… UTF-8 validation with replacement character for invalid sequences
  • βœ… Optional max_length parameter to limit string reads
  • βœ… String comparison operators work with pstring values

Date and Timestamp Types

The parser supports date and timestamp types for parsing Unix timestamps (signed seconds since epoch). There are 12 type keywords:

32-bit timestamps (Date):

  • date - Native endian, UTC
  • ldate - Native endian, local time
  • bedate - Big-endian, UTC
  • beldate - Big-endian, local time
  • ledate - Little-endian, UTC
  • leldate - Little-endian, local time

64-bit timestamps (QDate):

  • qdate - Native endian, UTC
  • qldate - Native endian, local time
  • beqdate - Big-endian, UTC
  • beqldate - Big-endian, local time
  • leqdate - Little-endian, UTC
  • leqldate - Little-endian, local time

The parser creates TypeKind::Date or TypeKind::QDate variants with appropriate endianness and UTC flags. During evaluation, timestamps are formatted as strings in the format β€œWww Mmm DD HH:MM:SS YYYY” to match GNU file output.

Parser Design Principles

Error Handling

All parsers use nom’s IResult type for consistent error handling:

#![allow(unused)]
fn main() {
pub fn parse_number(input: &str) -> IResult<&str, i64> {
    // Implementation with proper error propagation
}
}

Error Categories:

  • Syntax Errors: Invalid characters or malformed input
  • Overflow Errors: Numbers too large for target type
  • Format Errors: Invalid hex digits, unterminated strings, etc.

Memory Safety

All parsing operations are memory-safe with no unsafe code:

  • Bounds Checking: All buffer access is bounds-checked
  • Overflow Protection: Numeric parsing includes overflow detection
  • Resource Management: No manual memory management required

Performance Optimization

The parser is designed for efficiency:

  • Zero-Copy: String slices used where possible to avoid allocations
  • Early Termination: Parsers fail fast on invalid input
  • Minimal Backtracking: Parser combinators designed to minimize backtracking

Testing Strategy

Each parser component has comprehensive test coverage:

Test Categories

  1. Basic Functionality: Core parsing behavior
  2. Edge Cases: Boundary values, empty input, etc.
  3. Error Conditions: Invalid input handling
  4. Whitespace Handling: Leading/trailing whitespace tolerance
  5. Remaining Input: Proper handling of unconsumed input

Example Test Structure

#![allow(unused)]
fn main() {
#[test]
fn test_parse_number_positive() {
    assert_eq!(parse_number("123"), Ok(("", 123)));
    assert_eq!(parse_number("0x1a"), Ok(("", 26)));
}

#[test]
fn test_parse_number_with_remaining_input() {
    assert_eq!(parse_number("123abc"), Ok(("abc", 123)));
    assert_eq!(parse_number("0xFF rest"), Ok((" rest", 255)));
}

#[test]
fn test_parse_number_edge_cases() {
    assert_eq!(parse_number("0"), Ok(("", 0)));
    assert_eq!(parse_number("-0"), Ok(("", 0)));
    assert!(parse_number("").is_err());
    assert!(parse_number("abc").is_err());
}
}

Complete Magic File Parsing

The parser provides complete magic file parsing through the parse_text_magic_file() function:

#![allow(unused)]
fn main() {
use libmagic_rs::parser::parse_text_magic_file;

let magic_content = r#"
ELF file format
0 string \x7fELF ELF executable
>4 byte 1 32-bit
>4 byte 2 64-bit
"#;

let rules = parse_text_magic_file(magic_content)?;
assert_eq!(rules.len(), 1);           // One root rule
assert_eq!(rules[0].children.len(), 2); // Two child rules
}

The parser distinguishes between signed and unsigned type variants (e.g., byte vs ubyte, leshort vs uleshort), mapping them to the signed field in TypeKind::Byte { signed: bool } and similar type variants. Unprefixed types default to signed in accordance with libmagic conventions. Float and double types do not have signed/unsigned variants; IEEE 754 handles sign internally.

Format Detection

The parser automatically detects magic file formats:

#![allow(unused)]
fn main() {
use libmagic_rs::parser::{detect_format, MagicFileFormat};

match detect_format(path)? {
    MagicFileFormat::Text => // Parse as text magic file
    MagicFileFormat::Directory => // Load all files from Magdir
    MagicFileFormat::Binary => // Show helpful error (not yet supported)
}
}

Current Limitations

Not Yet Implemented

  • Indirect Offsets: Pointer dereferencing patterns (e.g., (0x3c.l))
  • Regex Support: Regular expression matching in rules
  • Binary .mgc Format: Compiled magic database format
  • Strength Modifiers: !:strength parsing for rule priority

Planned Enhancements

  • Better Error Messages: More descriptive error reporting with source locations
  • Performance Optimization: Specialized parsers for common patterns
  • Streaming Support: Incremental parsing for large magic files

Integration Points

The parser provides a complete pipeline from text to AST:

#![allow(unused)]
fn main() {
use libmagic_rs::parser::{parse_text_magic_file, detect_format, MagicFileFormat};

// Detect format and parse accordingly
let rules = match detect_format(path)? {
    MagicFileFormat::Text => {
        let content = std::fs::read_to_string(path)?;
        parse_text_magic_file(&content)?
    }
    MagicFileFormat::Directory => {
        // Load and merge all files in directory
        load_magic_directory(path)?
    }
    MagicFileFormat::Binary => {
        return Err(ParseError::UnsupportedFormat { ... });
    }
};
}

The hierarchical structure is automatically built from indentation levels (> prefixes), enabling parent-child rule relationships for detailed file type identification.

Evaluator Engine

The evaluator engine executes magic rules against file buffers to identify file types. It provides safe, efficient rule evaluation with hierarchical processing, graceful error recovery, and configurable resource limits.

Overview

The evaluator processes magic rules hierarchically:

  1. Load file into memory-mapped buffer
  2. Resolve offsets (absolute, relative, from-end)
  3. Read typed values from buffer with bounds checking
  4. Apply operators for comparison
  5. Process children if parent rule matches
  6. Collect results with match metadata

Architecture

File Buffer β†’ Offset Resolution β†’ Type Reading β†’ Operator Application β†’ Results
     ↑              ↑                  ↑              ↑                    ↑
Memory Map    Context State      Endian Handling   Match Logic      Hierarchical

Module Organization

The evaluator module separates public interface from implementation:

  • evaluator/mod.rs - Public API surface: defines EvaluationContext and RuleMatch types, re-exports core evaluation functions from the engine submodule
  • evaluator/engine/mod.rs - Core evaluation implementation: evaluate_single_rule, evaluate_rules, evaluate_rules_with_config
  • evaluator/offset/mod.rs - Offset resolution
  • evaluator/operators/mod.rs - Operator application
  • evaluator/types/ - Type reading and coercion (organized as submodules as of v0.4.2)
    • types/mod.rs - Public API surface: read_typed_value, coerce_value_to_type, re-exports type functions
    • types/numeric.rs - Numeric type handling: read_byte, read_short, read_long, read_quad with endianness and signedness support
    • types/float.rs - Floating-point type handling: read_float (32-bit IEEE 754), read_double (64-bit IEEE 754) with endianness support
    • types/date.rs - Date and timestamp type handling: read_date (32-bit Unix timestamps), read_qdate (64-bit Unix timestamps) with endianness and UTC/local time support
    • types/string.rs - String type handling: read_string with null-termination and UTF-8 conversion
    • types/tests.rs - Module tests
  • evaluator/strength.rs - Rule strength calculation

The refactoring improves organization by separating concerns: mod.rs handles the public API surface and data types, while engine/ contains the core evaluation logic. The types module was refactored in v0.4.2 from a single 1,836-line file into focused submodules for numeric, floating-point, date/timestamp, and string handling, improving maintainability without changing the public API. From a public API perspective, all types and functions are imported from the evaluator module as before – the internal organization is transparent to library users.

Core Components

EvaluationContext

Maintains state during rule processing:

#![allow(unused)]
fn main() {
pub struct EvaluationContext {
    /// Current offset position for relative calculations
    current_offset: usize,
    /// Current recursion depth for safety limits
    recursion_depth: u32,
    /// Configuration for evaluation behavior
    config: EvaluationConfig,
}
}

Note: Fields are private; use accessor methods like current_offset(), recursion_depth(), and config().

Key Methods:

  • new() - Create context with default configuration
  • current_offset() / set_current_offset() - Track current buffer position
  • recursion_depth() - Query current recursion depth
  • increment_recursion_depth() / decrement_recursion_depth() - Track recursion safely
  • timeout_ms() - Query configured timeout
  • reset() - Reset context state for reuse

RuleMatch

Represents a successful rule match:

#![allow(unused)]
fn main() {
pub struct RuleMatch {
    /// Human-readable description from the matched rule
    pub message: String,
    /// Offset where the match occurred
    pub offset: usize,
    /// Depth in the rule hierarchy (0 = root rule)
    pub level: u32,
    /// The matched value (parsed according to rule type)
    pub value: Value,
    /// Confidence score (0.0 to 1.0) based on rule hierarchy depth
    pub confidence: f64,
}
}

The Value type is from parser::ast::Value and represents the actual matched content according to the rule’s type specification. Note that Value implements only PartialEq (not Eq) due to floating-point NaN semantics.

Offset Resolution (evaluator/offset.rs)

Handles all offset types safely:

  • Absolute offsets: Direct file positions (0, 0x100)
  • Relative offsets: Based on previous match positions (&+4)
  • From-end offsets: Calculated from file size (-4 from end)
  • Bounds checking: All offset calculations are validated
#![allow(unused)]
fn main() {
pub fn resolve_offset(
    spec: &OffsetSpec,
    buffer: &[u8],
) -> Result<usize, LibmagicError>
}

Type Reading (evaluator/types/)

Interprets bytes according to type specifications. The types module is organized into submodules for numeric, floating-point, date/timestamp, and string type handling (refactored from a single file in v0.4.2):

  • Byte: Single byte values (signed or unsigned)
  • Short: 16-bit integers with endianness
  • Long: 32-bit integers with endianness
  • Quad: 64-bit integers with endianness
  • Float: 32-bit IEEE 754 floating-point with endianness (native, big-endian befloat, little-endian lefloat)
  • Double: 64-bit IEEE 754 floating-point with endianness (native, big-endian bedouble, little-endian ledouble)
  • Date: 32-bit Unix timestamps (signed seconds since epoch) with configurable endianness and UTC/local time formatting
  • QDate: 64-bit Unix timestamps (signed seconds since epoch) with configurable endianness and UTC/local time formatting
  • String: Byte sequences with length limits
  • Bounds checking: Prevents buffer overruns
#![allow(unused)]
fn main() {
pub fn read_typed_value(
    buffer: &[u8],
    offset: usize,
    type_kind: &TypeKind,
) -> Result<Value, TypeReadError>
}

The read_byte function signature changed in v0.2.0 to accept three parameters (buffer, offset, and signed) instead of two, allowing explicit control over signed vs unsigned byte interpretation.

Floating-Point Type Reading (evaluator/types/float.rs):

#![allow(unused)]
fn main() {
pub fn read_float(
    buffer: &[u8],
    offset: usize,
    endian: Endianness,
) -> Result<Value, TypeReadError>

pub fn read_double(
    buffer: &[u8],
    offset: usize,
    endian: Endianness,
) -> Result<Value, TypeReadError>
}
  • read_float() reads 4 bytes and interprets as f32, converting to f64 and returning Value::Float(f64)
  • read_double() reads 8 bytes and interprets as f64, returning Value::Float(f64)
  • Both respect endianness specified in TypeKind::Float or TypeKind::Double

Date and QDate Type Reading (evaluator/types/date.rs):

#![allow(unused)]
fn main() {
pub fn read_date(
    buffer: &[u8],
    offset: usize,
    endian: Endianness,
    utc: bool,
) -> Result<Value, TypeReadError>

pub fn read_qdate(
    buffer: &[u8],
    offset: usize,
    endian: Endianness,
    utc: bool,
) -> Result<Value, TypeReadError>
}
  • read_date() reads 4 bytes as a 32-bit Unix timestamp (seconds since epoch) and returns Value::String formatted as "Www Mmm DD HH:MM:SS YYYY" to match GNU file output
  • read_qdate() reads 8 bytes as a 64-bit Unix timestamp (seconds since epoch) and returns Value::String formatted as "Www Mmm DD HH:MM:SS YYYY" to match GNU file output
  • Both support endianness (little-endian, big-endian, native)
  • Both support UTC or local time formatting
  • The evaluator reads raw integer timestamps from the buffer and converts them to formatted date strings for comparison
  • Example: A 32-bit value 1234567890 at offset 0 with type ldate would be evaluated as "Fri Feb 13 23:31:30 2009"

Operator Application (evaluator/operators.rs)

Applies comparison operations:

  • Equal (=, ==): Exact value matching
  • NotEqual (!=, <>): Non-matching values
  • LessThan (<): Less-than comparison (numeric or lexicographic) (added in v0.2.0)
  • GreaterThan (>): Greater-than comparison (numeric or lexicographic) (added in v0.2.0)
  • LessEqual (<=): Less-than-or-equal comparison (numeric or lexicographic) (added in v0.2.0)
  • GreaterEqual (>=): Greater-than-or-equal comparison (numeric or lexicographic) (added in v0.2.0)
  • BitwiseAnd (&): Pattern matching for flags
  • BitwiseAndMask: AND with mask then compare

Comparison operators support numeric comparisons across different integer types using i128 coercion for cross-type compatibility.

Floating-Point Operator Semantics:

Float values (Value::Float) work with comparison and equality operators but have special handling:

  • Equality operators (==, !=): Use epsilon-aware comparison with f64::EPSILON tolerance
    • Two floats are considered equal when |a - b| <= f64::EPSILON
    • Implementation is in floats_equal() helper function (evaluator/operators/equality.rs)
  • Ordering operators (<, >, <=, >=): Use IEEE 754 partial_cmp semantics
    • Standard floating-point ordering: -∞ < finite values < +∞
    • Implementation is in compare_values() function (evaluator/operators/comparison.rs)
  • NaN handling:
    • NaN != NaN returns true (NaN is never equal to anything, including itself)
    • All comparison operations with NaN return false (NaN is not comparable)
  • Infinity handling:
    • Positive and negative infinity are only equal to the same sign of infinity
    • Infinities are ordered correctly: NEG_INFINITY < finite < INFINITY
  • Type mismatch: Float values cannot be compared with Int or Uint (returns false or None)
#![allow(unused)]
fn main() {
pub fn apply_operator(
    operator: &Operator,
    left: &Value,
    right: &Value,
) -> bool
}

Example with comparison operators:

#![allow(unused)]
fn main() {
use libmagic_rs::parser::ast::{Operator, Value};
use libmagic_rs::evaluator::operators::apply_operator;

// Less-than comparison (v0.2.0+)
assert!(apply_operator(
    &Operator::LessThan,
    &Value::Uint(5),
    &Value::Uint(10)
));

// Greater-than-or-equal comparison (v0.2.0+)
assert!(apply_operator(
    &Operator::GreaterEqual,
    &Value::Uint(10),
    &Value::Uint(10)
));

// Cross-type integer comparison (v0.2.0+)
assert!(apply_operator(
    &Operator::LessThan,
    &Value::Int(-1),
    &Value::Uint(0)
));
}

Example with floating-point operators:

#![allow(unused)]
fn main() {
use libmagic_rs::parser::ast::{Operator, Value};
use libmagic_rs::evaluator::operators::apply_operator;

// Epsilon-aware equality
assert!(apply_operator(
    &Operator::Equal,
    &Value::Float(1.0),
    &Value::Float(1.0 + f64::EPSILON)
));

// Float ordering
assert!(apply_operator(
    &Operator::LessThan,
    &Value::Float(1.5),
    &Value::Float(2.0)
));

// NaN inequality
assert!(apply_operator(
    &Operator::NotEqual,
    &Value::Float(f64::NAN),
    &Value::Float(f64::NAN)
));

// Infinity comparison
assert!(apply_operator(
    &Operator::LessThan,
    &Value::Float(f64::NEG_INFINITY),
    &Value::Float(0.0)
));
}

Evaluation Algorithm

The evaluator uses a depth-first hierarchical algorithm:

#![allow(unused)]
fn main() {
pub fn evaluate_rules(
    rules: &[MagicRule],
    buffer: &[u8],
) -> Result<Vec<RuleMatch>, EvaluationError>
}

Algorithm:

  1. For each root rule:

    • Resolve offset from buffer
    • Read value at offset according to type
    • Apply operator to compare actual vs expected
    • If match: add to results, recursively evaluate children
    • If no match: skip children, continue to next rule
  2. Child rules inherit context from parent match

  3. Results accumulate hierarchically (parent message + child details)

Hierarchical Processing

flowchart TD
    R[Root Rule<br/>e.g., "0 string \x7fELF"]
    R -->|match| C1[Child Rule 1<br/>e.g., ">4 byte 1"]
    R -->|match| C2[Child Rule 2<br/>e.g., ">4 byte 2"]
    C1 -->|match| G1[Result:<br/>ELF 32-bit]
    C2 -->|match| G2[Result:<br/>ELF 64-bit]

    style R fill:#e3f2fd
    style C1 fill:#fff3e0
    style C2 fill:#fff3e0
    style G1 fill:#c8e6c9
    style G2 fill:#c8e6c9

Configuration

Evaluation behavior is controlled via EvaluationConfig:

#![allow(unused)]
fn main() {
pub struct EvaluationConfig {
    /// Maximum recursion depth for nested rules (default: 20)
    pub max_recursion_depth: u32,
    /// Maximum string length to read (default: 8192)
    pub max_string_length: usize,
    /// Stop at first match or continue for all matches (default: true)
    pub stop_at_first_match: bool,
    /// Enable MIME type mapping in results (default: false)
    pub enable_mime_types: bool,
    /// Timeout for evaluation in milliseconds (default: None)
    pub timeout_ms: Option<u64>,
}
}

Preset Configurations:

#![allow(unused)]
fn main() {
// Default balanced configuration
let config = EvaluationConfig::default();

// Optimized for speed
let config = EvaluationConfig::performance();

// Find all matches with full details
let config = EvaluationConfig::comprehensive();
}

Safety Features

Memory Safety

  • Bounds checking: All buffer access is validated before reading
  • Integer overflow protection: Safe arithmetic using checked_* and saturating_*
  • Resource limits: Configurable limits prevent resource exhaustion

Error Handling

The evaluator uses graceful degradation:

  • Invalid offsets: Skip rule, continue with others
  • Type mismatches: Skip rule, continue with others
  • Timeout exceeded: Return error (partial results are not preserved)
  • Recursion limit: Stop descent, continue siblings
#![allow(unused)]
fn main() {
pub enum EvaluationError {
    BufferOverrun { offset: usize },
    InvalidOffset { offset: i64 },
    UnsupportedType { type_name: String },
    RecursionLimitExceeded { depth: u32 },
    StringLengthExceeded { length: usize, max_length: usize },
    InvalidStringEncoding { offset: usize },
    Timeout { timeout_ms: u64 },
    TypeReadError(TypeReadError),
}
}

Timeout Protection

#![allow(unused)]
fn main() {
// With 5 second timeout
let config = EvaluationConfig {
    timeout_ms: Some(5000),
    ..Default::default()
};

let result = evaluate_rules_with_config(&rules, buffer, &config)?;
}

API Reference

Primary Functions

#![allow(unused)]
fn main() {
/// Evaluate rules with context for recursion tracking
pub fn evaluate_rules(
    rules: &[MagicRule],
    buffer: &[u8],
    context: &mut EvaluationContext,
) -> Result<Vec<RuleMatch>, LibmagicError>;

/// Evaluate rules with custom configuration (creates context internally)
pub fn evaluate_rules_with_config(
    rules: &[MagicRule],
    buffer: &[u8],
    config: &EvaluationConfig,
) -> Result<Vec<RuleMatch>, LibmagicError>;

/// Evaluate a single rule (used internally and for testing)
pub fn evaluate_single_rule(
    rule: &MagicRule,
    buffer: &[u8],
) -> Result<Option<(usize, Value)>, LibmagicError>;
}

Usage Example

#![allow(unused)]
fn main() {
use libmagic_rs::{evaluate_rules, EvaluationConfig};
use libmagic_rs::parser::parse_text_magic_file;

// Parse magic rules
let magic_content = r#"
0 string \x7fELF ELF executable
>4 byte 1 32-bit
>4 byte 2 64-bit
"#;
let rules = parse_text_magic_file(magic_content)?;

// Read target file
let buffer = std::fs::read("sample.bin")?;

// Evaluate with default config
let matches = evaluate_rules(&rules, &buffer)?;

for m in matches {
    println!("Match at offset {}: {}", m.offset, m.message);
}
}

Example with comparison operators (v0.2.0+):

#![allow(unused)]
fn main() {
use libmagic_rs::{evaluate_rules, EvaluationConfig};
use libmagic_rs::parser::parse_text_magic_file;

// Parse magic rule with comparison operator
let magic_content = r#"
0 leshort <100 Small value detected
0 leshort >=1000 Large value detected
"#;
let rules = parse_text_magic_file(magic_content)?;

let buffer = vec![0x0A, 0x00]; // Little-endian 10
let matches = evaluate_rules(&rules, &buffer)?;

// Matches first rule (<100)
assert_eq!(matches[0].message, "Small value detected");
}

Example with floating-point types:

#![allow(unused)]
fn main() {
use libmagic_rs::{evaluate_rules, EvaluationConfig};
use libmagic_rs::parser::parse_text_magic_file;

// Parse magic rule with float type
let magic_content = r#"
0 lefloat 3.14159 Pi constant detected
0 bedouble >100.0 Large double value
"#;
let rules = parse_text_magic_file(magic_content)?;

// IEEE 754 little-endian representation of 3.14159f32
let buffer = vec![0xd0, 0x0f, 0x49, 0x40];
let matches = evaluate_rules(&rules, &buffer)?;

assert_eq!(matches[0].message, "Pi constant detected");
}

Implementation Status

  • Basic evaluation engine structure
  • Offset resolution (absolute, relative, from-end)
  • Type reading with endianness support (Byte, Short, Long, Quad, Float, Double, Date, QDate, String)
  • Operator application (Equal, NotEqual, LessThan, GreaterThan, LessEqual, GreaterEqual, BitwiseAnd, BitwiseAndMask)
  • Hierarchical rule processing with child evaluation
  • Error handling with graceful degradation
  • Timeout protection
  • Recursion depth limiting
  • Comprehensive test coverage (150+ tests)
  • Indirect offset support (pointer dereferencing)
  • Regex type support
  • Performance optimizations (rule ordering, caching)

Performance Considerations

Lazy Evaluation

  • Parent-first: Only evaluate children if parent matches
  • Early termination: Stop on first match when configured
  • Skip on error: Continue evaluation after non-fatal errors

Memory Efficiency

  • Memory mapping: Files accessed via mmap, not loaded entirely
  • Zero-copy reads: Slice references where possible
  • Bounded strings: String reads limited to prevent memory exhaustion

Chapter 9: Output Formatters

The output module converts raw evaluation results into structured, consumable formats. It supports human-readable text output compatible with the GNU file command, pretty-printed JSON for single-file analysis, and compact JSON Lines for multi-file batch processing.

Module Structure

The output module is organized across three files:

  • src/output/mod.rs – Core data structures (EvaluationResult, MatchResult, EvaluationMetadata) and the conversion layer from evaluator types to output types, including tag enrichment via a shared LazyLock<TagExtractor>.
  • src/output/json.rs – JSON-specific types (JsonMatchResult, JsonOutput, JsonLineOutput) and formatting functions for pretty-printed and compact JSON output.
  • src/output/text.rs – Text formatting functions that produce GNU file-compatible output.

Core Data Types

output::MatchResult

Represents a single magic rule match in the output layer. Created by converting from an evaluator-level RuleMatch, with additional fields for structured output.

#![allow(unused)]
fn main() {
pub struct MatchResult {
    pub message: String,           // Human-readable description
    pub offset: usize,             // Byte offset where match occurred
    pub length: usize,             // Number of bytes examined
    pub value: Value,              // Matched value (Bytes, String, Uint, Int)
    pub rule_path: Vec<String>,    // Hierarchical tags/rule names
    pub confidence: u8,            // Score 0-100 (clamped)
    pub mime_type: Option<String>, // Optional MIME type
}
}

Key constructors:

  • MatchResult::new(message, offset, value) – Creates a match with default confidence of 50.
  • MatchResult::with_metadata(...) – Creates a fully specified match. Confidence is clamped to 100.
  • MatchResult::from_evaluator_match(m, mime_type) – Converts from the evaluator’s RuleMatch. Scales confidence from 0.0–1.0 to 0–100 and extracts rule path tags using the shared TagExtractor.

output::EvaluationResult

The complete result of evaluating a file against magic rules.

#![allow(unused)]
fn main() {
pub struct EvaluationResult {
    pub filename: PathBuf,
    pub matches: Vec<MatchResult>,
    pub metadata: EvaluationMetadata,
    pub error: Option<String>,
}
}

Key methods:

  • EvaluationResult::from_library_result(result, filename) – Converts a library-level EvaluationResult to the output format. Enriches the first match’s rule_path with tags extracted from the overall description when the rule path is empty.
  • primary_match() – Returns the match with the highest confidence score.
  • is_success() – Returns true when no error is present.

output::EvaluationMetadata

Diagnostic information about the evaluation process.

#![allow(unused)]
fn main() {
pub struct EvaluationMetadata {
    pub file_size: u64,
    pub evaluation_time_ms: f64,
    pub rules_evaluated: u32,
    pub rules_matched: u32,
}
}

The match_rate() method returns the percentage of evaluated rules that matched (0.0 when no rules were evaluated).

Tag Enrichment

The output module uses a static LazyLock<TagExtractor> (defined as DEFAULT_TAG_EXTRACTOR) to avoid allocating the keyword set on every conversion call. The TagExtractor (from src/tags.rs) maintains a HashSet of 16 keywords:

executable, archive, image, video, audio, document, compressed, encrypted, text, binary, data, script, font, database, spreadsheet, presentation

Tag extraction happens at two points during conversion:

  1. Per-match: MatchResult::from_evaluator_match calls extract_rule_path to normalize match messages into hyphenated, lowercase tag identifiers.
  2. Overall enrichment: EvaluationResult::from_library_result calls extract_tags on the overall description to populate the first match’s rule_path when it is empty after per-match extraction.

The extract_tags method performs case-insensitive substring matching and returns a sorted, deduplicated vector. The extract_rule_path method normalizes messages by lowercasing, replacing spaces with hyphens, and stripping non-alphanumeric characters.

Text Output

The text module (src/output/text.rs) produces output compatible with the GNU file command.

Functions

format_text_result(result) -> String – Returns the match message as-is.

format_text_output(results) -> String – Joins all match messages with ", ". Returns "data" for empty results (the standard fallback for unknown files).

format_evaluation_result(evaluation) -> String – Formats as filename: description. Extracts the filename component from the path. Falls back to "unknown" for empty or root-only paths. Shows "ERROR: <message>" when the evaluation has an error.

Examples

Single file, single match:

photo.png: PNG image data

Single file, multiple matches:

ls: ELF 64-bit LSB executable, x86-64, dynamically linked

No matches:

unknown.bin: data

Error case:

missing.txt: ERROR: File not found

JSON Output

The JSON module (src/output/json.rs) provides structured output for programmatic consumption.

JsonMatchResult

The JSON representation of a single match, following the libmagic specification:

#![allow(unused)]
fn main() {
pub struct JsonMatchResult {
    pub text: String,        // Match description
    pub offset: usize,       // Byte offset
    pub value: String,       // Hex-encoded matched bytes
    pub tags: Vec<String>,   // Classification tags (from rule_path)
    pub score: u8,           // Confidence score 0-100
}
}

Created via JsonMatchResult::from_match_result(match_result), which converts the Value field to a lowercase hex string using format_value_as_hex.

Hex Value Encoding

The format_value_as_hex function converts Value variants to hex strings:

Value TypeEncoding
Bytes(vec)Direct hex encoding of each byte
String(s)Hex encoding of UTF-8 bytes
Uint(n)Little-endian u64 bytes (16 hex chars)
Int(n)Little-endian i64 bytes (16 hex chars)

Examples: Bytes([0x7f, 0x45, 0x4c, 0x46]) becomes "7f454c46", String("PNG") becomes "504e47".

JsonOutput (Single File)

Wraps an array of JsonMatchResult values. Produced by format_json_output, which emits pretty-printed JSON:

{
  "matches": [
    {
      "text": "ELF 64-bit LSB executable",
      "offset": 0,
      "value": "7f454c46",
      "tags": [
        "executable",
        "elf"
      ],
      "score": 90
    }
  ]
}

A compact variant is available via format_json_output_compact, which omits whitespace and newlines.

JsonLineOutput (Multiple Files)

For batch processing, format_json_line_output produces compact, single-line JSON with a filename field:

{
  "filename": "file1.bin",
  "matches": [
    {
      "text": "ELF executable",
      "offset": 0,
      "value": "7f454c46",
      "tags": [
        "executable"
      ],
      "score": 90
    }
  ]
}

Each file produces exactly one line, making the output suitable for streaming and line-oriented processing tools.

Formatting Functions Summary

FunctionFormatUse Case
format_json_output(matches)Pretty-printed JSONSingle file, human-readable
format_json_output_compact(matches)Compact JSONSingle file, machine processing
format_json_line_output(path, matches)JSON LinesMultiple files, streaming

All three return Result<String, serde_json::Error>.

Conversion Pipeline

The full conversion pipeline from evaluation to output:

flowchart TD
    EM["evaluator::RuleMatch"]
    EM -- "from_evaluator_match" --> OM["output::MatchResult"]
    OM --> FT["format_text_output"]
    OM --> FJ["format_json_output"]
    OM --> FL["format_json_line_output"]

    style EM fill:#4a3000,stroke:#ffb74d,color:#e0e0e0
    style OM fill:#2a1a4a,stroke:#b39ddb,color:#e0e0e0
    style FT fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0
    style FJ fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0
    style FL fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0

When converting from the library’s top-level EvaluationResult:

flowchart TD
    LE["lib::EvaluationResult"]
    LE -- "from_library_result" --> OE["output::EvaluationResult"]
    OE --> FER["format_evaluation_result<br/>(text)"]
    OE --> JER["JsonOutput::from_evaluation_result<br/>(JSON)"]

    style LE fill:#4a3000,stroke:#ffb74d,color:#e0e0e0
    style OE fill:#2a1a4a,stroke:#b39ddb,color:#e0e0e0
    style FER fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0
    style JER fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0

Serialization

All output types derive Serialize and Deserialize (via serde), enabling direct use with any serde-compatible format beyond JSON. The MatchResult, EvaluationResult, and EvaluationMetadata types in the output module are all fully serializable and round-trip through JSON without data loss.

I/O and Performance

The I/O module provides efficient file access through memory-mapped I/O with comprehensive safety guarantees and performance optimizations.

Memory-Mapped I/O Architecture

libmagic-rs uses memory-mapped I/O through the memmap2 crate to provide efficient file access without loading entire files into memory. This approach offers several advantages:

  • Zero-copy access: File data is accessed directly from the OS page cache
  • Lazy loading: Only accessed portions of files are loaded into memory
  • Efficient for large files: No memory overhead for file size
  • OS-optimized: Leverages operating system virtual memory management

FileBuffer Implementation

The FileBuffer struct provides the core abstraction for memory-mapped file access:

#![allow(unused)]
fn main() {
pub struct FileBuffer {
    mmap: Mmap,
    path: PathBuf,
}

impl FileBuffer {
    pub fn new(path: &Path) -> Result<Self, IoError>
    pub fn as_slice(&self) -> &[u8]
    pub fn len(&self) -> usize
    pub fn path(&self) -> &Path
    pub fn is_empty(&self) -> bool
}
}

File Validation and Safety

Before creating a memory mapping, FileBuffer::new() performs comprehensive validation:

  1. File existence: Verifies the file can be opened for reading
  2. Empty file detection: Rejects empty files that cannot be meaningfully processed
  3. Size limits: Enforces maximum file size (1GB) to prevent resource exhaustion
  4. Metadata validation: Ensures file metadata is accessible
#![allow(unused)]
fn main() {
// Example validation flow
let file = File::open(path)?;
let metadata = file.metadata()?;

if metadata.len() == 0 {
    return Err(IoError::EmptyFile { path });
}

if metadata.len() > MAX_FILE_SIZE {
    return Err(IoError::FileTooLarge { size, max_size });
}
}

Safe Buffer Access

All buffer operations use bounds-checked access patterns to prevent buffer overruns and memory safety violations.

Core Safety Functions

safe_read_bytes()

Provides safe access to byte ranges with comprehensive validation:

#![allow(unused)]
fn main() {
pub fn safe_read_bytes(
    buffer: &[u8],
    offset: usize,
    length: usize
) -> Result<&[u8], IoError>
}

Safety Guarantees:

  • Validates offset is within buffer bounds
  • Checks for integer overflow in offset + length calculation
  • Ensures requested range doesn’t exceed buffer size
  • Rejects zero-length reads as invalid

safe_read_byte()

Convenience function for single-byte access:

#![allow(unused)]
fn main() {
pub fn safe_read_byte(buffer: &[u8], offset: usize) -> Result<u8, IoError>
}

validate_buffer_access()

Pre-validates access parameters without performing reads:

#![allow(unused)]
fn main() {
pub fn validate_buffer_access(
    buffer_size: usize,
    offset: usize,
    length: usize
) -> Result<(), IoError>
}

Error Handling

The I/O module defines comprehensive error types for all failure scenarios:

#![allow(unused)]
fn main() {
#[derive(Debug, Error)]
pub enum IoError {
    #[error("Failed to open file '{path}': {source}")]
    FileOpenError {
        path: PathBuf,
        source: std::io::Error,
    },

    #[error("Failed to memory-map file '{path}': {source}")]
    MmapError {
        path: PathBuf,
        source: std::io::Error,
    },

    #[error("File '{path}' is empty")]
    EmptyFile { path: PathBuf },

    #[error("File '{path}' is too large ({size} bytes, maximum {max_size} bytes)")]
    FileTooLarge {
        path: PathBuf,
        size: u64,
        max_size: u64,
    },

    #[error(
        "Buffer access out of bounds: offset {offset} + length {length} > buffer size {buffer_size}"
    )]
    BufferOverrun {
        offset: usize,
        length: usize,
        buffer_size: usize,
    },

    #[error("Invalid buffer access parameters: offset {offset}, length {length}")]
    InvalidAccess { offset: usize, length: usize },
}
}

Performance Characteristics

Memory Usage

  • Constant memory overhead: FileBuffer uses minimal heap memory regardless of file size
  • OS page cache utilization: Leverages system-wide file caching
  • No data copying: Direct access to mapped memory regions
  • Automatic cleanup: RAII patterns ensure proper resource deallocation

Access Patterns

The memory-mapped approach is optimized for typical magic rule evaluation patterns:

  • Sequential access: Reading file headers and structured data
  • Random access: Jumping to specific offsets based on rule specifications
  • Small reads: Most magic rules read small amounts of data (1-64 bytes)
  • Repeated access: Same file regions may be accessed by multiple rules

Performance Benchmarks

Current performance characteristics (measured on typical hardware):

  • File opening: ~10-50ΞΌs for files up to 1GB
  • Buffer creation: ~1-5ΞΌs overhead per FileBuffer
  • Byte access: ~10-50ns per safe_read_byte() call
  • Range access: ~50-200ns per safe_read_bytes() call

Optimization Strategies

Memory Mapping Benefits

  1. Large file handling: No memory pressure from file size
  2. Shared mappings: Multiple processes can share the same file mapping
  3. OS optimization: Kernel handles prefetching and caching
  4. Lazy loading: Only accessed pages are loaded into physical memory

Bounds Checking Optimization

The safety functions are designed for minimal overhead:

  • Single validation: Bounds checking performed once per access
  • Overflow protection: Uses checked_add() to prevent integer overflow
  • Early returns: Fast path for common valid access patterns
  • Zero-cost abstractions: Compiler optimizations eliminate overhead in release builds

Resource Management

RAII Patterns

FileBuffer uses Rust’s RAII (Resource Acquisition Is Initialization) patterns:

#![allow(unused)]
fn main() {
impl Drop for FileBuffer {
    fn drop(&mut self) {
        // Mmap handles cleanup automatically through its Drop implementation
        // Memory mapping is safely unmapped and file handles are closed
    }
}
}

File Handle Management

  • Automatic cleanup: File handles closed when FileBuffer is dropped
  • Exception safety: Cleanup occurs even if operations panic
  • No resource leaks: Guaranteed cleanup through Rust’s ownership system

Memory Mapping Lifecycle

  1. Creation: File opened and validated, memory mapping established
  2. Usage: Safe access through bounds-checked functions
  3. Cleanup: Automatic unmapping and file handle closure on drop

Implementation Status

  • Memory-mapped file buffers (io/mod.rs) - Complete with FileBuffer
  • Safe buffer access utilities - safe_read_bytes, safe_read_byte, validate_buffer_access
  • Error handling for I/O operations - Comprehensive IoError types with context
  • Resource management - RAII patterns with automatic cleanup
  • File validation - Size limits, empty file detection, metadata validation
  • Comprehensive testing - Unit tests covering all functionality and error cases
  • Performance benchmarks - Planned for future releases

Integration with Evaluation Engine

The I/O layer is designed to integrate seamlessly with the rule evaluation engine:

Offset Resolution

#![allow(unused)]
fn main() {
// Example integration pattern
let buffer = FileBuffer::new(file_path)?;
let data = buffer.as_slice();

// Safe offset-based access for rule evaluation
let bytes = safe_read_bytes(data, rule.offset, rule.type_size)?;
let value = interpret_bytes(bytes, rule.type_kind)?;
}

Error Propagation

I/O errors are properly propagated through the evaluation chain:

#![allow(unused)]
fn main() {
pub type Result<T> = std::result::Result<T, LibmagicError>;

impl From<IoError> for LibmagicError {
    fn from(err: IoError) -> Self {
        LibmagicError::IoError(err)
    }
}
}

This architecture ensures that file I/O operations are both safe and performant, providing a solid foundation for the magic rule evaluation engine.

Magic File Format

Magic files define rules for identifying file types through byte-level patterns. This chapter documents the magic file format supported by libmagic-rs.

Overview

Magic files contain rules that describe file formats by specifying byte patterns at specific offsets. Each rule consists of:

  1. Offset - Where to look in the file
  2. Type - How to interpret the bytes
  3. Value - What to match against
  4. Message - Description to display on match

Basic Format

offset  type  value  message

Example:

0       string  PK    ZIP archive data

This rule matches files starting with β€œPK” and labels them as β€œZIP archive data”.

Basic Syntax

Rule Structure

[level>]offset    type    [operator]value    message
ComponentRequiredDescription
level>NoIndentation level for nested rules
offsetYesWhere to read data
typeYesData type to read
operatorNoComparison operator (default: =)
valueYesExpected value
messageYesDescription text

Comments

Lines starting with # are comments:

# This is a comment
0  string  PK  ZIP archive

Whitespace

  • Fields are separated by whitespace (spaces or tabs)
  • Leading whitespace indicates rule nesting level
  • Trailing whitespace is ignored

Offset Specifications

Absolute Offset

Direct byte position from file start:

0       string  \x7fELF   ELF executable
16      short   2         (shared object)

Hexadecimal Offset

Use 0x prefix for hex offsets:

0x0     string  MZ        DOS executable
0x3c    long    >0        (PE offset present)

Negative Offset (From End)

Read from end of file:

-4      string  .ZIP      ZIP file (end marker)

Indirect Offset

Read pointer value and use as offset:

# Read 4-byte pointer at offset 60, then check that location
(0x3c.l)   string  PE\0\0  PE executable

Indirect offset syntax:

  • (base.type) - Read pointer at base, interpret as type
  • (base.type+adj) - Add adjustment to pointer value

Types for indirect offsets:

  • .b - byte (1 byte)
  • .s - short (2 bytes)
  • .l - long (4 bytes)
  • .q - quad (8 bytes)

Relative Offset

Offset relative to previous match:

0       string  PK\x03\x04   ZIP archive
&2      short   >0           (with data)

The & prefix indicates relative offset.

Type Specifications

Integer Types

TypeSizeEndianness
byte1 byteN/A
short2 bytesnative
leshort2 byteslittle-endian
beshort2 bytesbig-endian
long4 bytesnative
lelong4 byteslittle-endian
belong4 bytesbig-endian
quad8 bytesnative
lequad8 byteslittle-endian
bequad8 bytesbig-endian

All integer types have unsigned variants prefixed with u:

  • ubyte, ushort, uleshort, ubeshort
  • ulong, ulelong, ubelong
  • uquad, ulequad, ubequad

Examples:

0       byte      0x7f      (byte match)
0       leshort   0x5a4d    DOS MZ signature
0       belong    0xcafebabe Java class file
0       lequad    0x1234567890abcdef  (64-bit little-endian)
8       uquad     >0x8000000000000000 (unsigned 64-bit check)

Floating-Point Types

TypeSizeEndiannessIEEE 754
float4 bytesnative32-bit
befloat4 bytesbig-endian32-bit
lefloat4 byteslittle-endian32-bit
double8 bytesnative64-bit
bedouble8 bytesbig-endian64-bit
ledouble8 byteslittle-endian64-bit

Floating-point types follow IEEE 754 standard. Unlike integer types, float types do not have signed or unsigned variants (the IEEE 754 format handles sign internally).

Examples:

0       lefloat   =3.14159   File with float value pi
0       bedouble  >1.0       Double value greater than 1.0

Float comparison behavior:

  • Equality: Uses epsilon-aware comparison (f64::EPSILON tolerance)
  • Ordering: Uses IEEE 754 semantics via partial_cmp
  • NaN: NaN != NaN, comparisons with NaN always return false
  • Infinity: Positive and negative infinity are properly ordered

Date/Timestamp Types

TypeSizeEndiannessUTC/LocalDescription
date4 bytesnativeUTC32-bit Unix timestamp (signed seconds since epoch), formatted as UTC
ldate4 bytesnativeLocal32-bit Unix timestamp, formatted as local time
bedate4 bytesbig-endianUTC32-bit Unix timestamp, big-endian byte order, UTC
beldate4 bytesbig-endianLocal32-bit Unix timestamp, big-endian byte order, local time
ledate4 byteslittle-endianUTC32-bit Unix timestamp, little-endian byte order, UTC
leldate4 byteslittle-endianLocal32-bit Unix timestamp, little-endian byte order, local time
qdate8 bytesnativeUTC64-bit Unix timestamp (signed seconds since epoch), formatted as UTC
qldate8 bytesnativeLocal64-bit Unix timestamp, formatted as local time
beqdate8 bytesbig-endianUTC64-bit Unix timestamp, big-endian byte order, UTC
beqldate8 bytesbig-endianLocal64-bit Unix timestamp, big-endian byte order, local time
leqdate8 byteslittle-endianUTC64-bit Unix timestamp, little-endian byte order, UTC
leqldate8 byteslittle-endianLocal64-bit Unix timestamp, little-endian byte order, local time

Timestamp values are formatted as strings matching GNU file output format: β€œWww Mmm DD HH:MM:SS YYYY”

Examples:

# Match file modified at Unix epoch
0       date      =0        File created at epoch

# Check timestamp in file header (big-endian)
8       bedate    >946684800 File created after 2000-01-01

# 64-bit timestamp (little-endian, local time)
16      leqldate  x         \b, timestamp %s

String Types

Match literal string data:

0       string    %PDF      PDF document
0       string    GIF89a    GIF image data

String escape sequences:

  • \x00 - hex byte
  • \n - newline
  • \t - tab
  • \\ - backslash

Pascal String Type

Pascal string (pstring) is a length-prefixed string type. The first byte contains the string length (0-255), followed by that many bytes of string data. Unlike C strings, Pascal strings are not null-terminated.

0       pstring   =JPEG     JPEG image (Pascal string)

The evaluator reads the length byte, then reads that many bytes as string data. The optional max_length parameter caps the length byte value:

0       pstring   x         \b, name: %s

String Flags (Not Yet Implemented)

Note: String flags are documented for libmagic compatibility reference but are not yet implemented in libmagic-rs.

FlagDescription
/cCase-insensitive match
/wWhitespace-insensitive
/bMatch at word boundary

Example:

0       string/c  <!doctype  HTML document

Operators

Comparison Operators

OperatorDescriptionExample
=Equal (default)0 long =0xcafebabe
!=Not equal4 byte !=0
>Greater than8 long >1000
<Less than8 long <100
>=Greater than or equal8 long >=1000
<=Less than or equal8 long <=100
&Bitwise AND4 byte &0x80
^Bitwise XOR (not yet implemented)4 byte ^0xff

Bitwise AND with Mask

Test specific bits:

# Check if bit 7 is set
4       byte    &0x80     (compressed)

# Check if lower nibble is 0x0f
4       byte    &0x0f=0x0f (all bits set)

Negation

Prefix operator with ! for negation:

# Match if NOT equal to zero
4       long    !0        (non-zero)

Values

Numeric Values

# Decimal
0       long    1234

# Hexadecimal
0       long    0x4d5a

# Octal
0       byte    0177

String Values

# Plain string
0       string  RIFF

# With escape sequences
0       string  PK\x03\x04

# Unicode (as bytes)
0       string  \xff\xfe

Special Values

ValueDescription
xMatch any value (always true)

Example:

0       string  PK        ZIP archive
>4      short   x         version %d

The x value matches anything and %d formats the matched value.

Nested Rules

Rules can be nested to create hierarchical matches. Deeper matches indicate more specific identification.

Indentation Levels

Use > prefix for nested rules:

0       string  \x7fELF   ELF
>4      byte    1         32-bit
>4      byte    2         64-bit
>5      byte    1         LSB
>5      byte    2         MSB

Evaluation:

  1. Check offset 0 for ELF magic
  2. If matched, check offset 4 for bit size
  3. If matched, check offset 5 for endianness

Multiple Nesting Levels

0       string  \x7fELF       ELF
>4      byte    2             64-bit
>>5     byte    1             LSB
>>>16   short   2             (shared object)
>>>16   short   3             (executable)

Continuation Messages

Use \b (backspace) to suppress space before message:

0       string  GIF8      GIF image data
>4      byte    7a        \b, version 87a
>4      byte    9a        \b, version 89a

Output: GIF image data, version 89a

Examples

ELF Executable

# ELF (Executable and Linkable Format)
0       string  \x7fELF       ELF
>4      byte    1             32-bit
>4      byte    2             64-bit
>5      byte    1             LSB
>5      byte    2             MSB
>16     leshort 2             (executable)
>16     leshort 3             (shared object)

ZIP Archive

# ZIP archive
0       string  PK\x03\x04    ZIP archive data
>4      leshort x             \b, version %d.%d to extract
>6      leshort &0x0001       \b, encrypted
>6      leshort &0x0008       \b, with data descriptor

JPEG Image

# JPEG
0       string  \xff\xd8\xff  JPEG image data
>3      byte    0xe0          \b, JFIF standard
>3      byte    0xe1          \b, Exif format

PDF Document

# PDF
0       string  %PDF-         PDF document
>5      string  1.            \b, version 1.x
>5      string  2.            \b, version 2.x

PE Executable

# DOS MZ executable with PE header
0       string  MZ            DOS executable
>0x3c   lelong  >0            (PE offset)
>(0x3c.l) string PE\0\0       PE executable

GZIP Compressed

# GZIP
0       string  \x1f\x8b      gzip compressed data
>2      byte    8             \b, deflated
>3      byte    &0x01         \b, ASCII text
>3      byte    &0x02         \b, with header CRC
>3      byte    &0x04         \b, with extra field
>3      byte    &0x08         \b, with original name
>3      byte    &0x10         \b, with comment

PNG Image

# PNG
0       string  \x89PNG\r\n\x1a\n   PNG image data
>16     belong  x                   \b, %d x
>20     belong  x                   %d
>24     byte    0                   \b, grayscale
>24     byte    2                   \b, RGB
>24     byte    3                   \b, palette
>24     byte    4                   \b, grayscale+alpha
>24     byte    6                   \b, RGBA

Floating-Point Values

# Check for specific float value
0       lefloat   =3.14159   File with float value pi

# Float comparison
0       float     >1.0       Float value greater than 1.0

# Double precision
0       bedouble  =0.45455   PNG image with gamma 0.45455

Best Practices

1. Order Rules by Specificity

Put more specific rules first:

# Good: Specific before general
0       string  PK\x03\x04   ZIP archive
0       string  PK           (generic PK signature)

# Bad: General catches all
0       string  PK           (generic PK signature)
0       string  PK\x03\x04   ZIP archive  # Never reached

2. Use Nested Rules for Details

# Good: Hierarchical structure
0       string  \x7fELF   ELF
>4      byte    2         64-bit
>>5     byte    1         LSB

# Bad: Flat rules
0       string  \x7fELF           ELF
4       byte    2                 64-bit
5       byte    1                 LSB

3. Document Complex Rules

# JPEG with Exif metadata
# The Exif APP1 marker (0xFFE1) contains camera metadata
0       string  \xff\xd8\xff    JPEG image data
>3      byte    0xe1            \b, Exif format

4. Test Edge Cases

Consider:

  • Empty files
  • Truncated files
  • Minimum valid file size
  • Maximum offset values

5. Use Appropriate Types

# Good: Match exact size needed
0       leshort 0x5a4d   DOS executable

# Bad: Over-reading
0       lelong  x        (reads 4 bytes when 2 needed)

6. Handle Endianness Explicitly

# Good: Explicit endianness
0       lelong  0xcafebabe   (little-endian)
0       belong  0xcafebabe   (big-endian)

# Risky: Native endianness
0       long    0xcafebabe   (platform-dependent)

Supported Features

Currently Supported

  • Absolute offsets
  • Relative offsets
  • Indirect offsets (basic)
  • Byte, short, long, quad types (8-bit, 16-bit, 32-bit, 64-bit integers)
  • Float and double types (32-bit and 64-bit IEEE 754 floating-point)
  • Date and qdate types (32-bit and 64-bit Unix timestamps)
  • String and pstring types (null-terminated and length-prefixed strings)
  • Comparison operators (equal, not-equal, less-than, greater-than, less-equal, greater-equal)
  • Bitwise AND operator
  • Nested rules
  • Comments

Not Yet Supported

  • Regex patterns
  • 128-bit integer types
  • Use/name directives
  • Default rules

Recently Added

  • Strength modifiers: The !:strength directive for adjusting rule priority
  • 64-bit integers: quad type family (quad, uquad, lequad, ulequad, bequad, ubequad)
  • Floating-point types: float and double type families (float, befloat, lefloat, double, bedouble, ledouble) with IEEE 754 semantics and epsilon-aware equality

Troubleshooting

Rule Not Matching

  1. Check offset is correct (0-indexed)
  2. Verify endianness matches file format
  3. Test with hexdump -C file | head
  4. Ensure no conflicting rules

Unexpected Results

  1. Check rule order (first match wins)
  2. Verify nested rule levels
  3. Test with simpler rules first

Performance Issues

  1. Avoid unnecessary string searches
  2. Use specific offsets over searches
  3. Order rules by likelihood of match

See Also

Testing and Quality Assurance

The libmagic-rs project maintains high quality standards through comprehensive testing, strict linting, and continuous integration. This chapter covers the testing strategy, current test coverage, and quality assurance practices.

Testing Philosophy

Comprehensive Coverage

The project aims for comprehensive test coverage across all components:

  • Unit Tests: Test individual functions and methods in isolation
  • Integration Tests: Test component interactions and workflows
  • Property Tests: Use property-based testing for edge cases
  • Compatibility Tests: Validate against GNU file command results
  • Performance Tests: Benchmark critical path performance

Quality Gates

All code must pass these quality gates:

  1. Zero Warnings: cargo clippy -- -D warnings must pass
  2. All Tests Pass: Complete test suite must pass
  3. Code Coverage: Target >85% coverage for new code
  4. Documentation: All public APIs must be documented
  5. Memory Safety: No unsafe code except in vetted dependencies

Current Test Coverage

Test Statistics

Unit Tests: Located in source files with #[cfg(test)] modules

Integration Tests: Located in tests/ directory:

  • tests/cli_integration.rs - CLI subprocess tests using assert_cmd
  • tests/integration_tests.rs - End-to-end evaluation tests
  • tests/evaluator_tests.rs - Evaluator component tests
  • tests/parser_integration_tests.rs - Parser integration tests
  • tests/json_integration_test.rs - JSON output format tests
  • tests/compatibility_tests.rs - GNU file compatibility tests
  • tests/directory_loading_tests.rs - Magic directory loading tests
  • tests/mime_tests.rs - MIME type detection tests
  • tests/tags_tests.rs - Tag extraction tests
  • tests/property_tests.rs - Property-based tests using proptest
# Run all tests (unit + integration)
cargo test

# Run only unit tests
cargo test --lib

# Run only integration tests
cargo test --test cli_integration
cargo test --test property_tests

Test Distribution

AST Structure Tests (29 tests)

OffsetSpec Tests:

  • test_offset_spec_absolute - Basic absolute offset creation
  • test_offset_spec_indirect - Complex indirect offset structures
  • test_offset_spec_relative - Relative offset handling
  • test_offset_spec_from_end - End-relative offset calculations
  • test_offset_spec_serialization - JSON serialization round-trips
  • test_all_offset_spec_variants - Comprehensive variant testing
  • test_endianness_variants - Endianness handling in all contexts

Value Tests:

  • test_value_uint - Unsigned integer values including extremes
  • test_value_int - Signed integer values including boundaries
  • test_value_bytes - Byte sequence handling and comparison
  • test_value_string - String values including Unicode
  • test_value_comparison - Cross-type comparison behavior
  • test_value_serialization - Complete serialization testing
  • test_value_serialization_edge_cases - Boundary and extreme values

TypeKind Tests:

  • test_type_kind_byte - Single byte type handling with signedness
  • test_type_kind_short - 16-bit integer types with endianness
  • test_type_kind_long - 32-bit integer types with endianness
  • test_type_kind_quad - 64-bit integer types with endianness
  • test_type_kind_string - String types with length limits
  • test_type_kind_serialization - All type serialization including signed/unsigned variants
  • test_serialize_type_kind_quad - Quad type serialization (build_helpers.rs)

Operator Tests:

  • test_operator_variants - All operator types (Equal, NotEqual, LessThan, GreaterThan, LessEqual, GreaterEqual, BitwiseAnd, BitwiseAndMask)
  • test_operator_serialization - Operator serialization including comparison operators

MagicRule Tests:

  • test_magic_rule_creation - Basic rule construction
  • test_magic_rule_with_children - Hierarchical rule structures
  • test_magic_rule_serialization - Complete rule serialization

Parser Component Tests (50 tests)

Number Parsing Tests:

  • test_parse_decimal_number - Basic decimal parsing
  • test_parse_hex_number - Hexadecimal parsing with 0x prefix
  • test_parse_number_positive - Positive number handling
  • test_parse_number_negative - Negative number handling
  • test_parse_number_edge_cases - Boundary values and error conditions
  • test_parse_number_with_remaining_input - Partial parsing behavior

Offset Parsing Tests:

  • test_parse_offset_absolute_positive - Positive absolute offsets
  • test_parse_offset_absolute_negative - Negative absolute offsets
  • test_parse_offset_with_whitespace - Whitespace tolerance
  • test_parse_offset_with_remaining_input - Partial parsing
  • test_parse_offset_edge_cases - Error conditions and boundaries
  • test_parse_offset_common_magic_file_values - Real-world patterns
  • test_parse_offset_boundary_values - Extreme values

Operator Parsing Tests:

  • test_parse_operator_equality - Equality operators (= and ==)
  • test_parse_operator_inequality - Inequality operators (!= and <>)
  • test_parse_operator_comparison - Comparison operators (<, >, <=, >=)
  • test_parse_operator_bitwise_and - Bitwise AND operator (&)
  • test_parse_operator_with_remaining_input - Partial parsing
  • test_parse_operator_precedence - Operator precedence handling
  • test_parse_operator_invalid_input - Error condition handling
  • test_parse_operator_edge_cases - Boundary conditions
  • test_parse_operator_common_magic_file_patterns - Real patterns

Value Parsing Tests:

  • test_parse_quoted_string_simple - Basic string parsing
  • test_parse_quoted_string_with_escapes - Escape sequence handling
  • test_parse_quoted_string_with_whitespace - Whitespace handling
  • test_parse_quoted_string_invalid - Error conditions
  • test_parse_hex_bytes_with_backslash_x - \x prefix hex bytes
  • test_parse_hex_bytes_without_prefix - Raw hex byte sequences
  • test_parse_hex_bytes_mixed_case - Case insensitive hex
  • test_parse_numeric_value_positive - Positive numeric values
  • test_parse_numeric_value_negative - Negative numeric values
  • test_parse_value_string_literals - String literal parsing
  • test_parse_value_numeric_literals - Numeric literal parsing
  • test_parse_value_hex_byte_sequences - Hex byte parsing
  • test_parse_value_type_precedence - Type detection precedence
  • test_parse_value_edge_cases - Boundary conditions
  • test_parse_value_invalid_input - Error handling

Evaluator Component Tests

Type Reading Tests:

  • test_read_byte - Single byte reading with signedness
  • test_read_short_endianness_and_signedness - 16-bit reading with all endian/sign combinations
  • test_read_short_extreme_values - 16-bit boundary values
  • test_read_long_endianness_and_signedness - 32-bit reading with all endian/sign combinations
  • test_read_long_buffer_overrun - 32-bit buffer boundary checking
  • test_read_quad_endianness_and_signedness - 64-bit reading with all endian/sign combinations
  • test_read_quad_buffer_overrun - 64-bit buffer boundary checking
  • test_read_quad_at_offset - 64-bit reading at non-zero offsets
  • test_read_string - Null-terminated string reading
  • test_read_typed_value - Dispatch to correct type reader

Value Coercion Tests:

  • test_coerce_value_to_type - Type conversion including quad overflow handling

Strength Calculation Tests:

  • test_strength_type_byte - Byte type strength
  • test_strength_type_short - 16-bit type strength
  • test_strength_type_long - 32-bit type strength
  • test_strength_type_quad - 64-bit type strength
  • test_strength_type_string - String type strength with/without max_length
  • test_strength_operator_equal - Operator strength calculations

Integration Tests

End-to-End Evaluation Tests:

  • test_quad_lequad_matches_little_endian_value - LE quad pattern matching
  • test_quad_bequad_matches_big_endian_value - BE quad pattern matching
  • test_quad_signed_negative_one - Signed 64-bit negative value matching
  • test_quad_nested_child_rule_with_offset - Quad types in hierarchical rules

Test Categories

Unit Tests

Located alongside source code using #[cfg(test)]:

#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_basic_functionality() {
        let result = parse_number("123");
        assert_eq!(result, Ok(("", 123)));
    }

    #[test]
    fn test_error_conditions() {
        let result = parse_number("invalid");
        assert!(result.is_err());
    }

    #[test]
    fn test_edge_cases() {
        // Test boundary values
        assert_eq!(parse_number("0"), Ok(("", 0)));
        assert_eq!(parse_number("-0"), Ok(("", 0)));

        // Test extreme values
        let max_val = i64::MAX.to_string();
        assert_eq!(parse_number(&max_val), Ok(("", i64::MAX)));
    }
}
}

Integration Tests

CLI integration tests are located in tests/cli_integration.rs and use the assert_cmd crate for subprocess-based testing. This approach provides natural process isolation and eliminates the need for fragile fd manipulation.

Running CLI integration tests:

# Run all CLI integration tests
cargo test --test cli_integration

# Run specific test
cargo test --test cli_integration test_builtin_elf_detection

# Run with output
cargo test --test cli_integration -- --nocapture

Test organization in tests/cli_integration.rs:

  • Builtin Flag Tests: Test --use-builtin with various file formats (ELF, PNG, JPEG, PDF, ZIP, GIF)
  • Stdin Tests: Test stdin input handling, truncation warnings, and format detection
  • Multiple File Tests: Test sequential processing, partial failures, and strict mode behavior
  • Error Handling Tests: Test file not found, directory errors, magic file errors, and invalid arguments
  • Timeout Tests: Test --timeout-ms argument parsing and validation
  • Output Format Tests: Test text and JSON output formats
  • Shell Completion Tests: Test --generate-completion for bash, zsh, and fish
  • Custom Magic File Tests: Test custom magic file loading and fallback behavior
  • Edge Cases: Test file names with spaces, Unicode, empty files, and small files
  • CLI Argument Parsing: Test multiple files, strict mode, and flag combinations

Example CLI integration test:

#![allow(unused)]
fn main() {
use assert_cmd::Command;
use predicates::prelude::*;
use tempfile::TempDir;

/// Helper to create a Command for the rmagic binary
fn rmagic_cmd() -> Command {
    Command::new(assert_cmd::cargo::cargo_bin!("rmagic"))
}

#[test]
fn test_builtin_elf_detection() {
    let temp_dir = TempDir::new().expect("Failed to create temp dir");
    let test_file = temp_dir.path().join("test.elf");
    std::fs::write(&test_file, b"\x7fELF\x02\x01\x01\x00")
        .expect("Failed to create test file");

    rmagic_cmd()
        .args(["--use-builtin", test_file.to_str().expect("Invalid path")])
        .assert()
        .success()
        .stdout(predicate::str::contains("ELF"));
}
}

Parser integration tests are also located in the tests/ directory:

#![allow(unused)]
fn main() {
// tests/parser_integration.rs
use libmagic_rs::parser::*;

#[test]
fn test_complete_rule_parsing() {
    let magic_line = "0 string \\x7fELF ELF executable";
    let rule = parse_magic_rule(magic_line).unwrap();

    assert_eq!(rule.offset, OffsetSpec::Absolute(0));
    assert_eq!(rule.message, "ELF executable");
}

#[test]
fn test_hierarchical_rules() {
    let magic_content = r#"
0 string \x7fELF ELF
>4 byte 1 32-bit
>4 byte 2 64-bit
"#;
    let rules = parse_magic_file_content(magic_content).unwrap();
    assert_eq!(rules.len(), 1);
    assert_eq!(rules[0].children.len(), 2);
}
}

Property Tests

Property-based testing using proptest is implemented in tests/property_tests.rs:

# Run property tests
cargo test --test property_tests

# Run with more test cases
PROPTEST_CASES=1000 cargo test --test property_tests

The property tests verify:

  • Serialization roundtrips: AST types serialize and deserialize correctly
  • Evaluation safety: Evaluation never panics on arbitrary input
  • Configuration validation: Invalid configurations are rejected
  • Known pattern detection: ELF, ZIP, PDF patterns are correctly detected

Example property test:

#![allow(unused)]
fn main() {
use proptest::prelude::*;

proptest! {
    #[test]
    fn prop_evaluation_never_panics(buffer in prop::collection::vec(any::<u8>(), 0..1024)) {
        let db = MagicDatabase::with_builtin_rules().expect("should load");
        // Should not panic regardless of buffer contents
        let result = db.evaluate_buffer(&buffer);
        prop_assert!(result.is_ok());
    }
}
}

Compatibility Tests

Compatibility tests validate against GNU file command using the canonical test suite from the file project. Test data is located in third_party/tests/.

# Run compatibility tests (requires test files)
cargo test test_compatibility_with_original_libmagic -- --ignored

# Or use the just recipe
just test-compatibility

The compatibility workflow runs automatically in CI on pushes to main/develop.

Test Utilities and Helpers

Common Test Patterns

Whitespace Testing Helper:

#![allow(unused)]
fn main() {
fn test_with_whitespace_variants<T, F>(input: &str, expected: &T, parser: F)
where
    T: Clone + PartialEq + std::fmt::Debug,
    F: Fn(&str) -> IResult<&str, T>,
{
    let variants = vec![
        format!(" {}", input),  // Leading space
        format!("  {}", input), // Leading spaces
        format!("\t{}", input), // Leading tab
        format!("{} ", input),  // Trailing space
        format!("{}  ", input), // Trailing spaces
        format!("{}\t", input), // Trailing tab
        format!(" {} ", input), // Both sides
    ];

    for variant in variants {
        assert_eq!(
            parser(&variant),
            Ok(("", expected.clone())),
            "Failed with whitespace: '{}'",
            variant
        );
    }
}
}

Error Testing Patterns:

#![allow(unused)]
fn main() {
#[test]
fn test_parser_error_conditions() {
    let error_cases = vec![
        ("", "empty input"),
        ("abc", "invalid characters"),
        ("0xGG", "invalid hex digits"),
        ("--123", "double negative"),
    ];

    for (input, description) in error_cases {
        assert!(
            parse_number(input).is_err(),
            "Should fail on {}: '{}'",
            description,
            input
        );
    }
}
}

Testing Signed vs Unsigned Byte Behavior:

#![allow(unused)]
fn main() {
#[test]
fn test_signed_unsigned_byte_handling() {
    // Test signed byte interpretation
    let signed_rule = MagicRule {
        offset: OffsetSpec::Absolute(0),
        typ: TypeKind::Byte { signed: true },
        op: Operator::GreaterThan,
        value: Value::Int(0),
        message: "Positive signed byte".to_string(),
        children: vec![],
        level: 0,
    };

    // 0x7f = 127 as signed (positive)
    // 0x80 = -128 as signed (negative)

    // Test unsigned byte interpretation
    let unsigned_rule = MagicRule {
        offset: OffsetSpec::Absolute(0),
        typ: TypeKind::Byte { signed: false },
        op: Operator::GreaterThan,
        value: Value::Uint(127),
        message: "Large unsigned byte".to_string(),
        children: vec![],
        level: 0,
    };

    // Both 0x7f and 0x80 are > 127 when interpreted as unsigned
}
}

Testing 64-bit Integer (Quad) Types:

#![allow(unused)]
fn main() {
#[test]
fn test_read_quad_endianness_and_signedness() {
    // Little-endian unsigned
    let buffer = &[0xef, 0xcd, 0xab, 0x90, 0x78, 0x56, 0x34, 0x12];
    let result = read_quad(buffer, 0, Endianness::Little, false).unwrap();
    assert_eq!(result, Value::Uint(0x1234_5678_90ab_cdef));

    // Big-endian signed negative
    let buffer = &[0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff];
    let result = read_quad(buffer, 0, Endianness::Big, true).unwrap();
    assert_eq!(result, Value::Int(-1));
}
}

Test Data Management

Test Fixtures:

#![allow(unused)]
fn main() {
// Common test data
const ELF_MAGIC: &[u8] = &[0x7f, 0x45, 0x4c, 0x46];
const ZIP_MAGIC: &[u8] = &[0x50, 0x4b, 0x03, 0x04];
const PDF_MAGIC: &str = "%PDF-";

fn create_test_rule() -> MagicRule {
    MagicRule {
        offset: OffsetSpec::Absolute(0),
        typ: TypeKind::Byte { signed: true },
        op: Operator::Equal,
        value: Value::Uint(0x7f),
        message: "Test rule".to_string(),
        children: vec![],
        level: 0,
    }
}
}

Running Tests

Basic Test Execution

# Run all tests
cargo test

# Run specific test module
cargo test parser::grammar::tests

# Run specific test
cargo test test_parse_number_positive

# Run tests with output
cargo test -- --nocapture

# Run ignored tests (if any)
cargo test -- --ignored

Enhanced Test Running

# Use nextest for faster execution
cargo nextest run

# Run tests with coverage
cargo llvm-cov --html

# Run tests in release mode
cargo test --release

# Test documentation examples
cargo test --doc

Continuous Testing

# Auto-run tests on file changes
cargo watch -x test

# Auto-run specific tests
cargo watch -x "test parser"

# Run checks and tests together
cargo watch -x check -x test

Code Coverage

Coverage Tools

# Install coverage tool
cargo install cargo-llvm-cov

# Generate HTML coverage report
cargo llvm-cov --html

# Generate lcov format for CI
cargo llvm-cov --lcov --output-path coverage.lcov

# Show coverage summary
cargo llvm-cov --summary-only

Coverage Targets

  • Overall Coverage: Target >85% for the project
  • New Code: Require >90% coverage for new features
  • Critical Paths: Require 100% coverage for parser and evaluator
  • Public APIs: Require 100% coverage for all public functions

Coverage Exclusions

Some code is excluded from coverage requirements:

#![allow(unused)]
fn main() {
// Debug/development code
#[cfg(debug_assertions)]
fn debug_helper() { /* ... */
}

// Error handling that's hard to trigger
#[cfg_attr(coverage, coverage(off))]
fn handle_system_error() { /* ... */
}
}

Quality Assurance

Automated Checks

All code must pass these automated checks:

# Formatting check
cargo fmt -- --check

# Linting with strict rules
cargo clippy -- -D warnings

# Documentation generation
cargo doc --document-private-items

# Security audit
cargo audit

# Dependency check
cargo tree --duplicates

Manual Review Checklist

For code reviews:

  • Functionality: Does the code work as intended?
  • Tests: Are there comprehensive tests covering the changes?
  • Documentation: Are public APIs documented with examples?
  • Error Handling: Are errors handled gracefully?
  • Performance: Are there any performance implications?
  • Memory Safety: Is all buffer access bounds-checked?
  • Compatibility: Does this maintain API compatibility?

Performance Testing

# Run benchmarks
cargo bench

# Profile with flamegraph
cargo install flamegraph
cargo flamegraph --bench parser_bench

# Memory usage analysis
valgrind --tool=massif target/release/rmagic large_file.bin

CLI Testing

CLI Integration Tests

CLI functionality is tested using the assert_cmd crate in tests/cli_integration.rs. This subprocess-based approach provides:

  • Process isolation: Each test runs rmagic as a separate process
  • Realistic testing: Tests actual CLI behavior including exit codes and output
  • Reliable coverage: Works correctly under llvm-cov for coverage reporting
  • Cross-platform compatibility: No platform-specific fd manipulation required

Running CLI Tests

# Run all CLI integration tests
cargo test --test cli_integration

# Run specific CLI test
cargo test --test cli_integration test_builtin_elf_detection

# Run with verbose output
cargo test --test cli_integration -- --nocapture

Test Categories in cli_integration.rs

CategoryDescription
Builtin Flag TestsTest --use-builtin with ELF, PNG, JPEG, PDF, ZIP, GIF
Stdin TestsTest - input, truncation warnings, format detection
Multiple File TestsTest sequential processing, strict mode, partial failures
Error Handling TestsTest file not found, directory errors, invalid arguments
Timeout TestsTest --timeout-ms parsing and validation
Output Format TestsTest --json and --text output formats
Shell Completion TestsTest --generate-completion for various shells
Custom Magic File TestsTest --magic-file loading and fallback
Edge CasesTest Unicode filenames, empty files, small files

Best Practices

  1. Use assert_cmd: All CLI tests use rmagic_cmd() helper (wrapping cargo_bin!("rmagic") macro) for subprocess testing
  2. Use predicates: Check stdout/stderr with predicate matchers for readable assertions
  3. Use tempfile: Create temporary test files with TempDir for isolation
  4. Derive from config: Use EvaluationConfig::default() for thresholds instead of hardcoding

Benchmarks

Performance benchmarks are implemented using Criterion in the benches/ directory:

# Run all benchmarks
cargo bench

# Run specific benchmark group
cargo bench parser
cargo bench evaluation
cargo bench io

# Generate HTML benchmark report
cargo bench -- --noplot

Available Benchmarks

BenchmarkDescription
parser_benchMagic file parsing performance
evaluation_benchRule evaluation against various file types
io_benchMemory-mapped I/O operations

Benchmark CI

Benchmarks run automatically:

  • Weekly: Scheduled runs on Sunday at 3 AM UTC
  • On PR: When performance-critical code changes (src/evaluator, src/parser, src/io, benches)
  • Manual: Via workflow_dispatch

The CI compares PR benchmarks against the main branch and reports regressions.

Future Testing Plans

Fuzzing Integration (Phase 2)

  • Parser Fuzzing: Use cargo-fuzz for parser robustness
  • Evaluator Fuzzing: Test evaluation engine with malformed files
  • Continuous Fuzzing: Integrate with OSS-Fuzz for ongoing testing

The comprehensive testing strategy ensures libmagic-rs maintains high quality, reliability, and compatibility while enabling confident refactoring and feature development.

Performance Optimization

libmagic-rs includes several performance optimizations across I/O, evaluation, and compilation. This chapter describes each optimization and how to take advantage of them.

Memory-Mapped I/O

The FileBuffer type in src/io/mod.rs uses the memmap2 crate to memory-map files instead of reading them entirely into memory. This provides:

  • Demand paging: The OS loads only the pages that are actually accessed during evaluation, avoiding unnecessary reads for large files.
  • Zero-copy access: FileBuffer::as_slice() returns a &[u8] directly backed by the memory mapping with no intermediate copy.
  • OS page cache reuse: Repeated analysis of the same file reuses cached pages without additional I/O.

Files up to 1 GB are supported. Empty files and non-regular files (devices, FIFOs, directories) are rejected at construction time with descriptive errors.

SIMD-Accelerated Null Byte Scanning

String type reading in src/evaluator/types.rs uses the memchr crate to locate null terminators. The memchr crate automatically uses SIMD instructions (SSE2/AVX2 on x86-64, NEON on aarch64) when available, making null-terminated string extraction significantly faster than a byte-by-byte loop.

#![allow(unused)]
fn main() {
// From src/evaluator/types.rs - uses SIMD-accelerated memchr for null scan
let read_length = memchr::memchr(0, &remaining_buffer[..search_len])
    .unwrap_or(search_len);
}

Evaluation Pipeline Optimizations

Unified Offset and Value Resolution

The evaluate_single_rule function resolves the offset, reads the typed value, and applies the operator in a single pass, returning Option<(usize, Value)>. Callers receive the resolved offset and matched value directly, avoiding redundant re-resolution when constructing match results.

Pre-Allocated Collections

Several hot paths pre-allocate collections with known or estimated capacities:

  • evaluate_rules pre-allocates the match results vector with Vec::with_capacity(8).
  • concatenate_messages computes the total capacity from match message lengths and allocates the output string once with String::with_capacity.
  • Hex encoding in the JSON output formatter pre-allocates based on byte count.

Early Exit on First Match

When EvaluationConfig::stop_at_first_match is true (the default), the evaluator stops iterating rules as soon as the first successful match is found. This avoids evaluating the remaining rule set when only one result is needed.

Timeout Support

Each evaluation tracks elapsed time from a std::time::Instant created at the start. If timeout_ms is set in the configuration, the evaluator checks the elapsed time during rule iteration and returns a timeout error if the limit is exceeded. This prevents runaway evaluations on adversarial or unusually large inputs.

Static Tag Extraction Patterns

The DEFAULT_TAG_EXTRACTOR in src/output/mod.rs is a static LazyLock<TagExtractor> initialized once on first use. This avoids constructing the keyword set on every call to from_evaluator_match or from_library_result.

#![allow(unused)]
fn main() {
static DEFAULT_TAG_EXTRACTOR: LazyLock<crate::tags::TagExtractor> =
    LazyLock::new(crate::tags::TagExtractor::new);
}

Release Profile Optimization

The Cargo.toml release profile enables link-time optimization and single-codegen-unit compilation:

[profile.release]
lto = "thin"
codegen-units = 1
  • Thin LTO allows cross-crate inlining while keeping link times reasonable.
  • Single codegen unit gives LLVM a complete view of the crate for better optimization at the cost of longer compile times.

Configuration Presets

EvaluationConfig provides named presets that trade off between speed and completeness:

PresetRecursion DepthString LengthStop FirstMIME TypesTimeout
default()208192yesnonone
performance()101024yesno1s
comprehensive()5032768noyes30s

Use the performance preset when throughput matters more than detail:

#![allow(unused)]
fn main() {
use libmagic_rs::{MagicDatabase, EvaluationConfig};

let db = MagicDatabase::with_builtin_rules_and_config(
    EvaluationConfig::performance()
)?;
let result = db.identify_file("target.bin")?;
}

Benchmarking

The project uses Criterion for benchmarks. The benchmark suite in benches/evaluation_bench.rs covers three areas:

  1. File type detection – ELF, ZIP, PDF, and unknown data detection throughput.
  2. Buffer sizes – Evaluation time across buffer sizes from 64 bytes to 64 KB.
  3. Configuration comparison – Default vs. performance vs. comprehensive presets.

Run the full benchmark suite:

cargo bench

Run a specific benchmark group:

cargo bench -- file_type_detection
cargo bench -- buffer_sizes
cargo bench -- evaluation_configs

Criterion generates HTML reports in target/criterion/ with statistical analysis and comparison against previous runs.

Profiling

Generate a CPU flamegraph to identify hot spots:

cargo install flamegraph
cargo flamegraph --bench evaluation_bench -- --bench

The resulting flamegraph.svg shows where CPU time is spent during evaluation, making it straightforward to identify optimization targets.

Use cargo bench regularly during development to catch performance regressions. Criterion’s built-in comparison mode highlights statistically significant changes between runs.

Error Handling

libmagic-rs uses Rust’s Result type system for comprehensive, type-safe error handling.

Error Types

LibmagicError

The main error enum covers all library operations:

#![allow(unused)]
fn main() {
use thiserror::Error;

#[derive(Debug, Error)]
pub enum LibmagicError {
    #[error("Parse error at line {line}: {message}")]
    ParseError { line: usize, message: String },

    #[error("Evaluation error: {0}")]
    EvaluationError(String),

    #[error("IO error: {0}")]
    IoError(#[from] std::io::Error),

    #[error("Invalid magic file format: {0}")]
    InvalidFormat(String),
}
}

Result Type Alias

For convenience, the library provides a type alias:

#![allow(unused)]
fn main() {
pub type Result<T> = std::result::Result<T, LibmagicError>;
}

Error Handling Patterns

Basic Error Handling

#![allow(unused)]
fn main() {
use libmagic_rs::{MagicDatabase, LibmagicError};

match MagicDatabase::load_from_file("magic.db") {
    Ok(db) => {
        // Use the database
        println!("Loaded magic database successfully");
    }
    Err(e) => {
        eprintln!("Failed to load magic database: {}", e);
        return;
    }
}
}

Using the ? Operator

#![allow(unused)]
fn main() {
fn analyze_file(path: &str) -> Result<String> {
    let db = MagicDatabase::load_from_file("magic.db")?;
    let result = db.evaluate_file(path)?;
    Ok(result.description)
}
}

Matching Specific Errors

#![allow(unused)]
fn main() {
use libmagic_rs::LibmagicError;

match db.evaluate_file("example.bin") {
    Ok(result) => println!("File type: {}", result.description),
    Err(LibmagicError::IoError(e)) => {
        eprintln!("File access error: {}", e);
    }
    Err(LibmagicError::EvaluationError(msg)) => {
        eprintln!("Evaluation failed: {}", msg);
    }
    Err(e) => {
        eprintln!("Other error: {}", e);
    }
}
}

Error Context

Adding Context with map_err

#![allow(unused)]
fn main() {
use libmagic_rs::LibmagicError;

fn load_custom_magic(path: &str) -> Result<MagicDatabase> {
    MagicDatabase::load_from_file(path).map_err(|e| {
        LibmagicError::InvalidFormat(format!(
            "Failed to load custom magic file '{}': {}",
            path, e
        ))
    })
}
}

Using anyhow for Application Errors

use anyhow::{Context, Result};
use libmagic_rs::MagicDatabase;

fn main() -> Result<()> {
    let db = MagicDatabase::load_from_file("magic.db").context("Failed to load magic database")?;

    let result = db
        .evaluate_file("example.bin")
        .context("Failed to analyze file")?;

    println!("File type: {}", result.description);
    Ok(())
}

Error Recovery

Graceful Degradation

#![allow(unused)]
fn main() {
fn analyze_with_fallback(path: &str) -> String {
    match MagicDatabase::load_from_file("magic.db") {
        Ok(db) => match db.evaluate_file(path) {
            Ok(result) => result.description,
            Err(_) => "unknown file type".to_string(),
        },
        Err(_) => "magic database unavailable".to_string(),
    }
}
}

Retry Logic

#![allow(unused)]
fn main() {
use std::thread;
use std::time::Duration;

fn load_with_retry(path: &str, max_attempts: u32) -> Result<MagicDatabase> {
    let mut attempts = 0;

    loop {
        match MagicDatabase::load_from_file(path) {
            Ok(db) => return Ok(db),
            Err(e) if attempts < max_attempts => {
                attempts += 1;
                eprintln!("Attempt {} failed: {}", attempts, e);
                thread::sleep(Duration::from_millis(100));
            }
            Err(e) => return Err(e),
        }
    }
}
}

Best Practices

1. Use Specific Error Types

#![allow(unused)]
fn main() {
// Good: Specific error information
Err(LibmagicError::ParseError {
    line: 42,
    message: "Invalid offset specification".to_string()
})

// Avoid: Generic error messages
Err(LibmagicError::EvaluationError("something went wrong".to_string()))
}

2. Provide Context

#![allow(unused)]
fn main() {
// Good: Contextual error information
fn parse_magic_file(path: &Path) -> Result<Vec<MagicRule>> {
    std::fs::read_to_string(path)
        .map_err(|e| LibmagicError::IoError(e))
        .and_then(|content| parse_magic_string(&content))
}

// Better: Even more context
fn parse_magic_file(path: &Path) -> Result<Vec<MagicRule>> {
    let content = std::fs::read_to_string(path).map_err(|e| {
        LibmagicError::InvalidFormat(format!(
            "Cannot read magic file '{}': {}",
            path.display(),
            e
        ))
    })?;

    parse_magic_string(&content).map_err(|e| {
        LibmagicError::InvalidFormat(format!("Invalid magic file '{}': {}", path.display(), e))
    })
}
}

3. Handle Errors at the Right Level

// Library level: Return detailed errors
pub fn evaluate_file<P: AsRef<Path>>(&self, path: P) -> Result<EvaluationResult> {
    // Detailed error handling
}

// Application level: Handle user-facing concerns
fn main() {
    match analyze_file("example.bin") {
        Ok(description) => println!("{}", description),
        Err(e) => {
            eprintln!("Error: {}", e);
            std::process::exit(1);
        }
    }
}

4. Document Error Conditions

#![allow(unused)]
fn main() {
/// Evaluate magic rules against a file
///
/// # Errors
///
/// This function will return an error if:
/// - The file cannot be read (`IoError`)
/// - The file is too large for processing (`EvaluationError`)
/// - Rule evaluation encounters invalid data (`EvaluationError`)
///
/// # Examples
///
/// ```rust,no_run
/// use libmagic_rs::MagicDatabase;
///
/// let db = MagicDatabase::load_from_file("magic.db")?;
/// match db.evaluate_file("example.bin") {
///     Ok(result) => println!("Type: {}", result.description),
///     Err(e) => eprintln!("Error: {}", e),
/// }
/// # Ok::<(), Box<dyn std::error::Error>>(())
/// ```
pub fn evaluate_file<P: AsRef<Path>>(&self, path: P) -> Result<EvaluationResult> {
    // Implementation
}
}

Testing Error Conditions

#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_missing_file_error() {
        let result = MagicDatabase::load_from_file("nonexistent.magic");
        assert!(result.is_err());

        match result {
            Err(LibmagicError::IoError(_)) => (), // Expected
            _ => panic!("Expected IoError for missing file"),
        }
    }

    #[test]
    fn test_invalid_magic_file() {
        let result = parse_magic_string("invalid syntax here");
        assert!(result.is_err());

        if let Err(LibmagicError::ParseError { line, message }) = result {
            assert_eq!(line, 1);
            assert!(message.contains("syntax"));
        } else {
            panic!("Expected ParseError for invalid syntax");
        }
    }
}
}

This comprehensive error handling approach ensures libmagic-rs provides clear, actionable error information while maintaining type safety and enabling robust error recovery strategies.

Migration from libmagic

This guide helps you migrate from the C-based libmagic library to libmagic-rs, covering API differences, compatibility considerations, and best practices.

API Comparison

C libmagic API

#include <magic.h>

magic_t magic = magic_open(MAGIC_MIME_TYPE);
magic_load(magic, NULL);
const char* result = magic_file(magic, "example.bin");
printf("MIME type: %s\n", result);
magic_close(magic);

libmagic-rs API

#![allow(unused)]
fn main() {
use libmagic_rs::MagicDatabase;

// Using built-in rules (no external magic file needed)
let db = MagicDatabase::with_builtin_rules()?;
let result = db.evaluate_file("example.bin")?;
println!("File type: {}", result.description);

// Or load from a magic file / directory
let db = MagicDatabase::load_from_file("/usr/share/misc/magic")?;
let result = db.evaluate_file("example.bin")?;
println!("File type: {}", result.description);
Ok::<(), Box<dyn std::error::Error>>(())
}

API Mapping

C libmagiclibmagic-rsNotes
magic_open(flags)MagicDatabase::with_builtin_rules()No manual flag management
magic_load(magic, path)MagicDatabase::load_from_file(path)Auto-detects format
magic_file(magic, path)db.evaluate_file(path)Returns structured result
magic_buffer(magic, buf, len)db.evaluate_buffer(buf)Safe slice, no length needed
magic_error(magic)Result<T, LibmagicError>Typed errors, no global state
magic_close(magic)(automatic)RAII cleanup on drop
MAGIC_MIME_TYPEEvaluationConfig { enable_mime_types: true, .. }Opt-in configuration

Key Differences

Memory Safety

  • C libmagic: Manual memory management, potential for leaks/corruption
  • libmagic-rs: Automatic memory management, compile-time safety guarantees

Error Handling

  • C libmagic: Error codes and global error state
  • libmagic-rs: Result types with structured errors (ParseError, EvaluationError, ConfigError, Timeout)

Thread Safety

  • C libmagic: Requires careful synchronization
  • libmagic-rs: MagicDatabase is safe to share across threads via Arc

Migration Strategies

Direct Replacement

For simple use cases, libmagic-rs can be a drop-in replacement:

#![allow(unused)]
fn main() {
// Before (C)
// const char* type = magic_file(magic, path);

// After (Rust)
let result = db.evaluate_file(path)?;
let type_str = &result.description;
}

Gradual Migration

For complex applications:

  1. Start with new code: Use libmagic-rs for new features
  2. Wrap existing code: Create Rust wrappers around C libmagic calls
  3. Replace incrementally: Migrate modules one at a time
  4. Remove C dependency: Complete the migration

Compatibility Notes

Magic File Format

  • Supported: Standard magic file syntax
  • Extensions: Additional features planned (regex, etc.)
  • Compatibility: Existing magic files should work

Output Format

  • Text mode: Compatible with GNU file command
  • JSON mode: New structured format for modern applications
  • MIME types: Similar to file --mime-type

Performance

  • Memory usage: Comparable to C libmagic
  • Speed: Target within 10% of C performance
  • Startup: Faster with compiled rule caching

Common Migration Issues

Error Handling Patterns

C libmagic:

if (magic_load(magic, NULL) != 0) {
    fprintf(stderr, "Error: %s\n", magic_error(magic));
    return -1;
}

libmagic-rs:

use libmagic_rs::{MagicDatabase, LibmagicError};

let db = match MagicDatabase::load_from_file("magic.db") {
    Ok(db) => db,
    Err(LibmagicError::IoError(e)) => {
        eprintln!("File error: {}", e);
        return Err(LibmagicError::IoError(e));
    }
    Err(LibmagicError::ParseError(e)) => {
        eprintln!("Parse error: {}", e);
        return Err(LibmagicError::ParseError(e));
    }
    Err(e) => return Err(e),
};

Resource Management

C libmagic:

magic_t magic = magic_open(flags);
// ... use magic ...
magic_close(magic);  // Manual cleanup required

libmagic-rs:

#![allow(unused)]
fn main() {
{
    let db = MagicDatabase::load_from_file("magic.db")?;
    // ... use db ...
}  // Automatic cleanup when db goes out of scope
}

Best Practices

Error Handling

  • Use ? operator for error propagation
  • Match on specific error types when needed
  • Provide context with error messages

Performance

  • Reuse MagicDatabase instances when possible
  • Consider caching for frequently accessed files
  • Use appropriate configuration for your use case

Testing

  • Test with your existing magic files
  • Verify output compatibility with your applications
  • Benchmark performance for your workload

Future Compatibility

libmagic-rs aims to maintain compatibility with:

  • Standard magic file format: Core syntax will remain supported
  • GNU file output: Text output format compatibility
  • Common use cases: Drop-in replacement for most applications

Migrating from v0.1.x to v0.2.0

Version 0.2.0 introduces breaking changes to support comparison operators and improve type handling. Update your code as follows:

TypeKind::Byte Variant Change

The Byte variant changed from a unit variant to a struct variant with a signed field.

Before (v0.1.x):

use libmagic_rs::parser::ast::TypeKind;

match type_kind {
    TypeKind::Byte => {
        // Handle byte type
    }
    _ => {}
}

// Constructing
let byte_type = TypeKind::Byte;

After (v0.2.0):

use libmagic_rs::parser::ast::TypeKind;

match type_kind {
    TypeKind::Byte { signed } => {
        // Handle byte type, check signedness if needed
        if signed {
            // Handle signed byte
        } else {
            // Handle unsigned byte
        }
    }
    _ => {}
}

// Constructing
let signed_byte = TypeKind::Byte { signed: true };
let unsigned_byte = TypeKind::Byte { signed: false };

New Operator Variants

The Operator enum added four comparison operators. Exhaustive matches must handle these variants.

Before (v0.1.x):

use libmagic_rs::parser::ast::Operator;

match operator {
    Operator::Equal => { /* ... */ }
    Operator::NotEqual => { /* ... */ }
    // Other existing variants
}

After (v0.2.0):

use libmagic_rs::parser::ast::Operator;

match operator {
    Operator::Equal => { /* ... */ }
    Operator::NotEqual => { /* ... */ }
    Operator::LessThan => { /* ... */ }
    Operator::GreaterThan => { /* ... */ }
    Operator::LessEqual => { /* ... */ }
    Operator::GreaterEqual => { /* ... */ }
    // Other existing variants
}

read_byte Function Signature

The libmagic_rs::evaluator::types::read_byte function signature changed from 2 to 3 parameters.

Before (v0.1.x):

use libmagic_rs::evaluator::types::read_byte;

let value = read_byte(buffer, offset)?;

After (v0.2.0):

use libmagic_rs::evaluator::types::read_byte;

// The third parameter indicates signedness
let signed_value = read_byte(buffer, offset, true)?;
let unsigned_value = read_byte(buffer, offset, false)?;

Migrating from v0.2.x to v0.3.0

Version 0.3.0 adds support for 64-bit quad integers and renames a core evaluator type. Update your code as follows:

New TypeKind::Quad Variant

A new Quad variant was added to the TypeKind enum for 64-bit integer types. Exhaustive matches on TypeKind must handle the new variant.

Before (v0.2.x):

use libmagic_rs::parser::ast::TypeKind;

match type_kind {
    TypeKind::Byte { signed } => { /* ... */ }
    TypeKind::Short { endian, signed } => { /* ... */ }
    TypeKind::Long { endian, signed } => { /* ... */ }
    TypeKind::String { max_length } => { /* ... */ }
}

After (v0.3.0):

use libmagic_rs::parser::ast::TypeKind;

match type_kind {
    TypeKind::Byte { signed } => { /* ... */ }
    TypeKind::Short { endian, signed } => { /* ... */ }
    TypeKind::Long { endian, signed } => { /* ... */ }
    TypeKind::Quad { endian, signed } => {
        // Handle 64-bit quad integer type
    }
    TypeKind::String { max_length } => { /* ... */ }
}

TypeKind Variant Discriminant Changes

The addition of the Quad variant changed the discriminant value of the String variant from 3 to 4. Code using numeric casts on TypeKind variants must be updated.

Before (v0.2.x):

use libmagic_rs::parser::ast::TypeKind;

let type_kind = TypeKind::String { max_length: None };
let discriminant = type_kind as isize;  // Returns 3

After (v0.3.0):

use libmagic_rs::parser::ast::TypeKind;

let type_kind = TypeKind::String { max_length: None };
let discriminant = type_kind as isize;  // Returns 4

Recommendation: Avoid relying on enum discriminant values. Use pattern matching or the std::mem::discriminant function instead.

MatchResult Renamed to RuleMatch

The MatchResult struct in libmagic_rs::evaluator was renamed to RuleMatch for clarity.

Before (v0.2.x):

use libmagic_rs::evaluator::MatchResult;

let match_result = MatchResult {
    message: "ELF executable".to_string(),
    offset: 0,
    level: 0,
    value: Value::Uint(0x7f),
    confidence: MatchResult::calculate_confidence(0),
};

After (v0.3.0):

use libmagic_rs::evaluator::RuleMatch;

let match_result = RuleMatch {
    message: "ELF executable".to_string(),
    offset: 0,
    level: 0,
    value: Value::Uint(0x7f),
    confidence: RuleMatch::calculate_confidence(0),
};

Update all references from MatchResult to RuleMatch in type annotations, function signatures, and construction sites.

Migrating from v0.3.x to v0.4.0

Version 0.4.0 adds three new operator variants to the Operator enum for extended bitwise operations and pattern matching capabilities.

New Operator Enum Variants

Three variants were added to the Operator enum:

  • BitwiseXor (magic file symbol: ^)
  • BitwiseNot (magic file symbol: ~)
  • AnyValue (magic file symbol: x)

Impact: Since the Operator enum is exhaustive (not marked with #[non_exhaustive]), any code with exhaustive pattern matching on Operator must be updated to handle these variants.

Before (v0.3.x):

use libmagic_rs::parser::ast::Operator;

match operator {
    Operator::Equal => { /* ... */ }
    Operator::NotEqual => { /* ... */ }
    Operator::BitwiseAnd => { /* ... */ }
    Operator::BitwiseOr => { /* ... */ }
    // ... other existing variants
}

After (v0.4.0):

use libmagic_rs::parser::ast::Operator;

match operator {
    Operator::Equal => { /* ... */ }
    Operator::NotEqual => { /* ... */ }
    Operator::BitwiseAnd => { /* ... */ }
    Operator::BitwiseOr => { /* ... */ }
    Operator::BitwiseXor => { /* handle XOR */ }
    Operator::BitwiseNot => { /* handle NOT */ }
    Operator::AnyValue => { /* handle any value x */ }
    // ... other existing variants
}

Alternative: If your code does not need to handle all operators specifically, use a wildcard pattern:

match operator {
    Operator::Equal => { /* specific handling */ }
    _ => { /* generic handling for all other operators */ }
}

These operators enable fuller support for libmagic file format specifications and extend bitwise operation capabilities.

Migrating from v0.4.x to v0.5.0

Version 0.5.0 adds support for floating-point types in magic file parsing. This enables detection of file formats that use IEEE 754 float and double values.

RuleMatch Struct Field Addition

The RuleMatch struct gained a new type_kind field that carries the source TypeKind used to read the matched value. Code constructing RuleMatch instances with struct literals must add this field.

Before (v0.4.x):

use libmagic_rs::evaluator::RuleMatch;
use libmagic_rs::parser::ast::Value;

let match_result = RuleMatch {
    message: "ELF executable".to_string(),
    offset: 0,
    level: 0,
    value: Value::Uint(0x7f),
    confidence: RuleMatch::calculate_confidence(0),
};

After (v0.5.0):

use libmagic_rs::evaluator::RuleMatch;
use libmagic_rs::parser::ast::{Value, TypeKind, Endianness};

let match_result = RuleMatch {
    message: "ELF executable".to_string(),
    offset: 0,
    level: 0,
    value: Value::Uint(0x7f),
    type_kind: TypeKind::Byte { signed: false },
    confidence: RuleMatch::calculate_confidence(0),
};

The type_kind field allows output formatters and other consumers to determine the on-disk width of the matched value.

Value Enum Changes

The Value enum added a Float(f64) variant for floating-point values and no longer derives the Eq trait (it still implements PartialEq).

New Float Variant:

use libmagic_rs::parser::ast::Value;

// New floating-point variant
let float_value = Value::Float(3.14);
let double_value = Value::Float(2.71828);

Eq Trait Removal:

Code using Eq as a trait bound or relying on exact equality semantics must be updated:

// Before (v0.4.x) - Eq was available
fn compare_values<T: Eq>(a: T, b: T) -> bool {
    a == b
}

// After (v0.5.0) - Use PartialEq instead
fn compare_values<T: PartialEq>(a: T, b: T) -> bool {
    a == b
}

Exact equality comparisons on Value::Float follow IEEE 754 semantics (NaN != NaN).

TypeKind Enum Extensions

Two new variants were added to TypeKind for floating-point types. Exhaustive pattern matches must handle these variants.

New Variants:

  • Float { endian: Endianness } - 32-bit IEEE 754 floating-point
  • Double { endian: Endianness } - 64-bit IEEE 754 double-precision

Before (v0.4.x):

use libmagic_rs::parser::ast::TypeKind;

match type_kind {
    TypeKind::Byte { signed } => { /* ... */ }
    TypeKind::Short { endian, signed } => { /* ... */ }
    TypeKind::Long { endian, signed } => { /* ... */ }
    TypeKind::Quad { endian, signed } => { /* ... */ }
    TypeKind::String { max_length } => { /* ... */ }
}

After (v0.5.0):

use libmagic_rs::parser::ast::TypeKind;

match type_kind {
    TypeKind::Byte { signed } => { /* ... */ }
    TypeKind::Short { endian, signed } => { /* ... */ }
    TypeKind::Long { endian, signed } => { /* ... */ }
    TypeKind::Quad { endian, signed } => { /* ... */ }
    TypeKind::Float { endian } => {
        // Handle 32-bit float type
    }
    TypeKind::Double { endian } => {
        // Handle 64-bit double type
    }
    TypeKind::String { max_length } => { /* ... */ }
}

String Variant Discriminant Change:

The addition of Float and Double changed the discriminant value of the String variant from 4 to 6. Code casting TypeKind variants to integers must be updated.

Before (v0.4.x):

use libmagic_rs::parser::ast::TypeKind;

let type_kind = TypeKind::String { max_length: None };
let discriminant = type_kind as isize;  // Returns 4

After (v0.5.0):

use libmagic_rs::parser::ast::TypeKind;

let type_kind = TypeKind::String { max_length: None };
let discriminant = type_kind as isize;  // Returns 6

Recommendation: Avoid relying on enum discriminant values. Use pattern matching or the std::mem::discriminant function instead.

Getting Help

If you encounter migration issues:

Troubleshooting

Common issues and solutions when using libmagic-rs.

Installation Issues

Rust Version Compatibility

Problem: Build fails with older Rust versions

error: package `libmagic-rs v0.4.0` cannot be built because it requires rustc 1.89 or newer

Solution: Update Rust to version 1.89 or newer

rustup update stable
rustc --version  # Should show 1.89+

Dependency Conflicts

Problem: Cargo fails to resolve dependencies

error: failed to select a version for the requirement `serde = "^1.0"`

Solution: Clean and rebuild

cargo clean
rm Cargo.lock
cargo build

Runtime Issues

Magic File Loading Errors

Problem: Cannot load magic file

Error: Parse error at line 42: Invalid offset specification

Solutions:

  1. Check file path: Ensure the magic file exists and is readable
  2. Validate syntax: Check the magic file format at the specified line
  3. Use absolute paths: Relative paths may not resolve correctly
#![allow(unused)]
fn main() {
// Use absolute path
let db = MagicDatabase::load_from_file("/usr/share/misc/magic")?;

// Or check if file exists first
use std::path::Path;
let magic_path = "magic.db";
if !Path::new(magic_path).exists() {
    eprintln!("Magic file not found: {}", magic_path);
    return;
}
}

File Evaluation Errors

Problem: File analysis fails

Error: IO error: Permission denied (os error 13)

Solutions:

  1. Check permissions: Ensure the file is readable
  2. Handle missing files: Check if file exists before analysis
  3. Use proper error handling: Match specific error types
#![allow(unused)]
fn main() {
use libmagic_rs::LibmagicError;

match db.evaluate_file("example.bin") {
    Ok(result) => println!("Type: {}", result.description),
    Err(LibmagicError::IoError(e)) => {
        eprintln!("Cannot access file: {}", e);
    }
    Err(e) => eprintln!("Analysis failed: {}", e),
}
}

Performance Issues

Slow File Analysis

Problem: File analysis takes too long

Solutions:

  1. Optimize configuration: Reduce recursion depth and string length limits
  2. Use early termination: Stop at first match for faster results
  3. Check file size: Large files may need special handling
#![allow(unused)]
fn main() {
let fast_config = EvaluationConfig {
    max_recursion_depth: 5,
    max_string_length: 512,
    stop_at_first_match: true,
};

let result = db.evaluate_file_with_config("large_file.bin", &fast_config)?;
}

Memory Usage Issues

Problem: High memory consumption

Solutions:

  1. Use memory mapping: Avoid loading entire files into memory
  2. Limit string lengths: Reduce max_string_length in configuration
  3. Process files individually: Don’t keep multiple databases in memory
#![allow(unused)]
fn main() {
// Process files one at a time
for file_path in file_list {
    let result = db.evaluate_file(&file_path)?;
    println!("{}: {}", file_path, result.description);
    // Result is dropped here, freeing memory
}
}

Development Issues

Compilation Errors

Problem: Clippy warnings treated as errors

error: this expression creates a reference which is immediately dereferenced

Solution: Fix clippy warnings or temporarily allow them for development

#![allow(unused)]
fn main() {
#[allow(clippy::needless_borrow)]
fn development_function() {
    // Temporary code
}
}

Better solution: Fix the underlying issue

#![allow(unused)]
fn main() {
// Instead of
let result = function(&value);

// Use
let result = function(value);
}

Test Failures

Problem: Tests fail on different platforms

Solutions:

  1. Check file paths: Use platform-independent path handling
  2. Handle endianness: Test both little and big-endian scenarios
  3. Use conditional compilation: Platform-specific test cases
#![allow(unused)]
fn main() {
#[cfg(target_endian = "little")]
#[test]
fn test_little_endian_parsing() {
    // Little-endian specific test
}

#[cfg(target_endian = "big")]
#[test]
fn test_big_endian_parsing() {
    // Big-endian specific test
}
}

Magic File Issues

Syntax Errors

Problem: Magic file parsing fails

Parse error at line 15: Expected operator, found 'invalid'

Solutions:

  1. Check syntax: Verify magic file format
  2. Use comments: Add comments to document complex rules
  3. Test incrementally: Add rules one at a time
# Good magic file syntax
0    string    \x7fELF    ELF executable
>4   byte      1          32-bit
>4   byte      2          64-bit

# Bad syntax (missing operator)
0    string    \x7fELF    # Missing value

Encoding Issues

Problem: String matching fails with non-ASCII content

Solutions:

  1. Use byte sequences: For binary data, use hex escapes
  2. Specify encoding: Use appropriate string types
  3. Test with sample files: Verify rules work with real data
# Use hex escapes for binary data
0    string    \x7f\x45\x4c\x46    ELF

# Use quotes for text with spaces
0    string    "#!/bin/bash"        Bash script

Debugging Tips

Enable Logging

# Set log level for debugging
RUST_LOG=debug cargo run -- example.bin
RUST_LOG=libmagic_rs=trace cargo test

Use Debug Output

#![allow(unused)]
fn main() {
// Print debug information
println!("Evaluating rule: {:?}", rule);
println!("Buffer slice: {:?}", &buffer[offset..offset + length]);
}

Minimal Reproduction

When reporting issues:

  1. Create minimal example: Simplest code that reproduces the problem
  2. Include sample files: Provide test files that trigger the issue
  3. Specify environment: OS, Rust version, dependency versions
// Minimal reproduction example
use libmagic_rs::MagicDatabase;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let db = MagicDatabase::load_from_file("simple.magic")?;
    let result = db.evaluate_file("test.bin")?;
    println!("Result: {}", result.description);
    Ok(())
}

Getting Help

Check Documentation

Search Existing Issues

Report New Issues

When creating an issue, include:

  • Rust version: rustc --version
  • Library version: From Cargo.toml
  • Operating system: OS and version
  • Minimal reproduction: Smallest example that shows the problem
  • Expected behavior: What should happen
  • Actual behavior: What actually happens
  • Error messages: Complete error output

Community Support

  • Discussions: Ask questions and share ideas
  • Discord/IRC: Real-time community chat (if available)
  • Stack Overflow: Tag questions with libmagic-rs

This troubleshooting guide covers the most common issues. For specific problems not covered here, please check the existing issues or create a new one with detailed information.

Development Setup

This guide covers setting up a development environment for contributing to libmagic-rs, including tools, workflows, and best practices.

Current Implementation Status

Project Phase: Active Development with Solid Foundation

Completed Components βœ…

  • Core AST Structures: Complete with 29 comprehensive unit tests
  • Parser Components: Numbers, offsets, operators, values (50 unit tests)
  • CLI Framework: Basic command-line interface with clap
  • Code Quality: Zero-warnings policy with comprehensive linting
  • Serialization: Full serde support for all data structures
  • Memory Safety: Zero unsafe code with bounds checking

In Progress πŸ”„

  • Complete Magic File Parser: Integration of parsing components
  • Rule Evaluation Engine: Offset resolution and type interpretation
  • Memory-Mapped I/O: Efficient file access with memmap2
  • Output Formatters: Text and JSON result formatting

Test Coverage

Current test suite includes 79 passing unit tests:

# Run current test suite
cargo test
# Output: running 79 tests ... test result: ok. 79 passed; 0 failed

Test Categories:

  • AST structure tests (29 tests)
  • Parser component tests (50 tests)
  • Serialization round-trip tests
  • Edge case and boundary value tests
  • Error condition handling tests

Prerequisites

Required Tools

  • Rust 1.85+ with the 2021 edition
  • Git for version control
  • Cargo (included with Rust)
# Enhanced test runner
cargo install cargo-nextest

# Auto-rebuild on file changes
cargo install cargo-watch

# Code coverage
cargo install cargo-llvm-cov

# Security auditing
cargo install cargo-audit

# Dependency analysis
cargo install cargo-tree

# Documentation tools
cargo install mdbook  # For this documentation

Environment Setup

1. Clone the Repository

git clone https://github.com/EvilBit-Labs/libmagic-rs.git
cd libmagic-rs

2. Verify Setup

# Check Rust version
rustc --version  # Should be 1.85+

# Verify project builds
cargo check

# Run tests
cargo test

# Check linting passes
cargo clippy -- -D warnings

3. IDE Configuration

VS Code

Recommended extensions:

  • rust-analyzer: Rust language server
  • CodeLLDB: Debugging support
  • Better TOML: TOML syntax highlighting
  • Error Lens: Inline error display

Settings (.vscode/settings.json):

{
  "rust-analyzer.check.command": "clippy",
  "rust-analyzer.check.extraArgs": [
    "--",
    "-D",
    "warnings"
  ],
  "rust-analyzer.cargo.features": "all"
}

Other IDEs

  • IntelliJ IDEA: Use the Rust plugin
  • Vim/Neovim: Configure with rust-analyzer LSP
  • Emacs: Use rustic-mode with lsp-mode

Development Workflow

Daily Development

# Start development session
cargo watch -x check -x test

# In another terminal, make changes and see results automatically

Code Quality Checks

# Format code (required before commits)
cargo fmt

# Check for issues (must pass)
cargo clippy -- -D warnings

# Run all tests
cargo nextest run  # or cargo test

# Check documentation
cargo doc --document-private-items

Testing Strategy

# Run specific test modules
cargo test ast_structures
cargo test parser
cargo test evaluator

# Run tests with output
cargo test -- --nocapture

# Run ignored tests (if any)
cargo test -- --ignored

# Test documentation examples
cargo test --doc

Project Standards

Code Style

The project enforces strict code quality standards:

Linting Configuration

See Cargo.toml for the complete linting setup. Key rules:

  • No unsafe code: unsafe_code = "forbid"
  • Zero warnings: warnings = "deny"
  • Comprehensive clippy: Pedantic, nursery, and security lints enabled
  • No unwrap/panic: unwrap_used = "deny", panic = "deny"

Formatting

# Format all code (required)
cargo fmt

# Check formatting without changing files
cargo fmt -- --check

Documentation Standards

Code Documentation

All public APIs must have rustdoc comments:

#![allow(unused)]
fn main() {
/// Parses a magic file into an AST
///
/// This function reads a magic file from the given path and parses it into
/// a vector of `MagicRule` structures that can be used for file type detection.
///
/// # Arguments
///
/// * `path` - Path to the magic file to parse
///
/// # Returns
///
/// Returns `Ok(Vec<MagicRule>)` on success, or `Err(LibmagicError)` if parsing fails.
///
/// # Errors
///
/// This function will return an error if:
/// - The file cannot be read
/// - The magic file syntax is invalid
/// - Memory allocation fails
///
/// # Examples
///
/// ```rust,no_run
/// use libmagic_rs::parser::parse_magic_file;
///
/// let rules = parse_magic_file("magic.db")?;
/// println!("Loaded {} rules", rules.len());
/// # Ok::<(), Box<dyn std::error::Error>>(())
/// ```
pub fn parse_magic_file<P: AsRef<Path>>(path: P) -> Result<Vec<MagicRule>> {
    // Implementation
}
}

Module Documentation

Each module should have comprehensive documentation:

#![allow(unused)]
fn main() {
//! Magic file parser module
//!
//! This module handles parsing of magic files into an Abstract Syntax Tree (AST)
//! that can be evaluated against file buffers for type identification.
//!
//! # Magic File Format
//!
//! Magic files use a simple DSL to describe file type detection rules:
//!
//! ```text
//! # ELF files
//! 0    string    \x7fELF    ELF
//! >4   byte      1          32-bit
//! >4   byte      2          64-bit
//! ```
//!
//! # Examples
//!
//! ```rust,no_run
//! use libmagic_rs::parser::parse_magic_file;
//!
//! let rules = parse_magic_file("magic.db")?;
//! # Ok::<(), Box<dyn std::error::Error>>(())
//! ```
}

Testing Standards

Unit Tests

Every module should have comprehensive unit tests:

#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_basic_functionality() {
        // Test basic case
        let result = function_under_test();
        assert_eq!(result, expected_value);
    }

    #[test]
    fn test_error_conditions() {
        // Test error handling
        let result = function_that_should_fail();
        assert!(result.is_err());
    }

    #[test]
    fn test_edge_cases() {
        // Test boundary conditions
        // Empty inputs, maximum values, etc.
    }
}
}

Integration Tests

Place integration tests in the tests/ directory:

#![allow(unused)]
fn main() {
// tests/integration_test.rs
use libmagic_rs::*;

#[test]
fn test_end_to_end_workflow() {
    // Test complete workflows
    let db = MagicDatabase::load_from_file("third_party/magic.mgc").unwrap();
    let result = db
        .evaluate_file("third_party/tests/elf64.testfile")
        .unwrap();
    assert_eq!(result.description, "ELF 64-bit LSB executable");
}
}

Error Handling

Use the project’s error types consistently:

#![allow(unused)]
fn main() {
use thiserror::Error;

#[derive(Debug, Error)]
pub enum ModuleError {
    #[error("Invalid input: {0}")]
    InvalidInput(String),

    #[error("Processing failed: {reason}")]
    ProcessingFailed { reason: String },

    #[error("IO error: {0}")]
    IoError(#[from] std::io::Error),
}

pub type Result<T> = std::result::Result<T, ModuleError>;
}

Contribution Workflow

1. Issue Creation

Before starting work:

  • Check existing issues and discussions
  • Create an issue describing the problem or feature
  • Wait for maintainer feedback on approach

2. Branch Creation

# Create feature branch
git checkout -b feature/descriptive-name

# Or for bug fixes
git checkout -b fix/issue-description

3. Development Process

# Make changes following the standards above
# Run checks frequently
cargo watch -x check -x test

# Before committing
cargo fmt
cargo clippy -- -D warnings
cargo test

4. Commit Guidelines

Use conventional commit format:

# Feature commits
git commit -m "feat(parser): add support for indirect offsets"

# Bug fixes
git commit -m "fix(evaluator): handle buffer overflow in string reading"

# Documentation
git commit -m "docs(api): add examples for MagicRule creation"

# Tests
git commit -m "test(ast): add comprehensive serialization tests"

5. Pull Request Process

  1. Push branch: git push origin feature/descriptive-name
  2. Create PR with:
    • Clear description of changes
    • Reference to related issues
    • Test coverage information
    • Breaking change notes (if any)
  3. Sign off commits with git commit -s (DCO required)
  4. Address feedback from code review
  5. Ensure CI passes all checks

For full details on code review criteria, DCO requirements, and project governance, see CONTRIBUTING.md.

Debugging

Logging

Use the log crate for debugging:

#![allow(unused)]
fn main() {
use log::{debug, error, info, warn};

pub fn parse_rule(input: &str) -> Result<MagicRule> {
    debug!("Parsing rule: {}", input);

    let result = do_parsing(input)?;

    info!("Successfully parsed rule: {}", result.message);
    Ok(result)
}
}

Run with logging:

RUST_LOG=debug cargo test
RUST_LOG=libmagic_rs=trace cargo run

Debugging Tests

# Run single test with output
cargo test test_name -- --nocapture

# Debug with lldb/gdb
cargo test --no-run
lldb target/debug/deps/libmagic_rs-<hash>

Performance Profiling

# Install profiling tools
cargo install cargo-flamegraph

# Profile specific benchmarks
cargo flamegraph --bench evaluation_bench

# Memory profiling with valgrind
cargo build
valgrind --tool=massif target/debug/rmagic large_file.bin

Continuous Integration

The project uses GitHub Actions for CI. Local checks should match CI:

# Run the same checks as CI
cargo fmt -- --check
cargo clippy -- -D warnings
cargo test
cargo doc --document-private-items

Release Process

For maintainers:

Version Bumping

# Update version in Cargo.toml
# Update CHANGELOG.md
# Commit changes
git commit -m "chore: bump version to 0.3.0"
git tag v0.3.0
git push origin main --tags

Documentation Updates

# Update documentation
mdbook build docs/
# Deploy to GitHub Pages (automated)

This development setup ensures high code quality, comprehensive testing, and smooth collaboration across the project.

Code Style

libmagic-rs follows strict code style guidelines to ensure consistency, readability, and maintainability across the codebase.

Formatting

Rustfmt Configuration

The project uses rustfmt with default settings. All code must be formatted before committing:

# Format all code
cargo fmt

# Check formatting without changing files
cargo fmt -- --check

Key Formatting Rules

  • Line length: 100 characters (rustfmt default)
  • Indentation: 4 spaces (no tabs)
  • Trailing commas: Required in multi-line constructs
  • Import organization: Automatic grouping and sorting
#![allow(unused)]
fn main() {
// Good: Proper formatting
use std::collections::HashMap;
use std::path::Path;

use serde::{Deserialize, Serialize};
use thiserror::Error;

use crate::parser::ast::MagicRule;

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct EvaluationResult {
    pub description: String,
    pub mime_type: Option<String>,
    pub confidence: f64,
}
}

Naming Conventions

Types and Structs

Use PascalCase for types, structs, enums, and traits:

#![allow(unused)]
fn main() {
// Good
pub struct MagicDatabase {}
pub enum OffsetSpec {}
pub trait BinaryRegex {}

// Bad
pub struct magic_database {}
pub enum offset_spec {}
}

Functions and Variables

Use snake_case for functions, methods, and variables:

#![allow(unused)]
fn main() {
// Good
pub fn parse_magic_file(path: &Path) -> Result<Vec<MagicRule>> { }
let magic_rules = vec![];
let file_buffer = FileBuffer::new(path)?;

// Bad
pub fn ParseMagicFile(path: &Path) -> Result<Vec<MagicRule>> { }
let magicRules = vec![];
}

Constants

Use SCREAMING_SNAKE_CASE for constants:

#![allow(unused)]
fn main() {
// Good
const DEFAULT_BUFFER_SIZE: usize = 8192;
const MAX_RECURSION_DEPTH: u32 = 50;

// Bad
const default_buffer_size: usize = 8192;
const maxRecursionDepth: u32 = 50;
}

Modules

Use snake_case for module names:

#![allow(unused)]
fn main() {
// Good
mod file_evaluator;
mod magic_parser;
mod output_formatter;

// Bad
mod MagicParser;
mod fileEvaluator;
}

Documentation Standards

Public API Documentation

All public items must have rustdoc comments with examples:

#![allow(unused)]
fn main() {
/// Parses a magic file into a vector of magic rules
///
/// This function reads a magic file from the specified path and parses it into
/// a collection of `MagicRule` structures that can be used for file type detection.
///
/// # Arguments
///
/// * `path` - Path to the magic file to parse
///
/// # Returns
///
/// Returns `Ok(Vec<MagicRule>)` on success, or `Err(LibmagicError)` if parsing fails.
///
/// # Errors
///
/// This function will return an error if:
/// - The file cannot be read due to permissions or missing file
/// - The magic file contains invalid syntax
/// - Memory allocation fails during parsing
///
/// # Examples
///
/// ```rust,no_run
/// use libmagic_rs::parser::parse_magic_file;
///
/// let rules = parse_magic_file("magic.db")?;
/// println!("Loaded {} magic rules", rules.len());
/// # Ok::<(), Box<dyn std::error::Error>>(())
/// ```
pub fn parse_magic_file<P: AsRef<Path>>(path: P) -> Result<Vec<MagicRule>> {
    // Implementation
}
}

Module Documentation

Each module should have comprehensive documentation:

#![allow(unused)]
fn main() {
//! Magic file parser module
//!
//! This module handles parsing of magic files into an Abstract Syntax Tree (AST)
//! that can be evaluated against file buffers for type identification.
//!
//! The parser uses nom combinators for robust, efficient parsing with good
//! error reporting. It supports the standard magic file format with extensions
//! for modern file types.
//!
//! # Examples
//!
//! ```rust,no_run
//! use libmagic_rs::parser::parse_magic_file;
//!
//! let rules = parse_magic_file("magic.db")?;
//! for rule in &rules {
//!     println!("Rule: {}", rule.message);
//! }
//! # Ok::<(), Box<dyn std::error::Error>>(())
//! ```
}

Inline Comments

Use inline comments sparingly, focusing on why rather than what:

#![allow(unused)]
fn main() {
// Good: Explains reasoning
// Use indirect offset to handle relocatable executables
let actual_offset = resolve_indirect_offset(base_offset, buffer)?;

// Bad: States the obvious
// Set the offset to the resolved value
let actual_offset = resolved_offset;
}

Error Handling Style

Use Result Types

Always use Result for fallible operations:

#![allow(unused)]
fn main() {
// Good
pub fn parse_offset(input: &str) -> Result<OffsetSpec> {
    // Implementation that can fail
}

// Bad: Using Option for errors
pub fn parse_offset(input: &str) -> Option<OffsetSpec> {
    // Loses error information
}

// Bad: Using panics
pub fn parse_offset(input: &str) -> OffsetSpec {
    // Implementation that panics on error
    input.parse().unwrap()
}
}

Descriptive Error Messages

Provide context in error messages:

#![allow(unused)]
fn main() {
// Good: Specific, actionable error
return Err(LibmagicError::ParseError {
    line: line_number,
    message: format!("Invalid offset '{}': expected number or hex value", input),
});

// Bad: Generic error
return Err(LibmagicError::ParseError {
    line: line_number,
    message: "parse error".to_string(),
});
}

Error Propagation

Use the ? operator for error propagation:

#![allow(unused)]
fn main() {
// Good
pub fn load_and_parse(path: &Path) -> Result<Vec<MagicRule>> {
    let content = std::fs::read_to_string(path)?;
    let rules = parse_magic_string(&content)?;
    Ok(rules)
}

// Avoid: Manual error handling when ? works
pub fn load_and_parse(path: &Path) -> Result<Vec<MagicRule>> {
    let content = match std::fs::read_to_string(path) {
        Ok(content) => content,
        Err(e) => return Err(LibmagicError::IoError(e)),
    };
    // ...
}
}

Code Organization

Import Organization

Group imports in this order:

  1. Standard library
  2. External crates
  3. Internal crates/modules
#![allow(unused)]
fn main() {
// Standard library
use std::collections::HashMap;
use std::path::Path;

// External crates
use nom::{IResult, bytes::complete::tag};
use serde::{Deserialize, Serialize};
use thiserror::Error;

// Internal modules
use crate::evaluator::EvaluationContext;
use crate::parser::ast::{MagicRule, OffsetSpec};
}

Function Organization

Organize functions logically within modules:

#![allow(unused)]
fn main() {
impl MagicRule {
    // Constructors first
    pub fn new(/* ... */) -> Self {}

    // Public methods
    pub fn evaluate(&self, buffer: &[u8]) -> Result<bool> {}
    pub fn message(&self) -> &str {}

    // Private helpers last
    fn validate_offset(&self) -> bool {}
}
}

File Organization

Keep files focused and reasonably sized (< 500-600 lines):

#![allow(unused)]
fn main() {
// Good: Focused module
// src/parser/offset.rs - Only offset parsing logic

// Bad: Everything in one file
// src/parser/mod.rs - All parsing logic (thousands of lines)
}

Testing Style

Test Organization

#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
    use super::*;

    // Group related tests
    mod offset_parsing {
        use super::*;

        #[test]
        fn test_absolute_offset() {
            // Test implementation
        }

        #[test]
        fn test_indirect_offset() {
            // Test implementation
        }
    }

    mod error_handling {
        use super::*;

        #[test]
        fn test_invalid_syntax_error() {
            // Test implementation
        }
    }
}
}

Test Naming

Use descriptive test names that explain the scenario:

#![allow(unused)]
fn main() {
// Good: Descriptive names
#[test]
fn test_parse_absolute_offset_with_hex_value() {}

#[test]
fn test_parse_offset_returns_error_for_invalid_syntax() {}

// Bad: Generic names
#[test]
fn test_parse_offset() {}

#[test]
fn test_error() {}
}

Assertion Style

Use specific assertions with helpful messages:

#![allow(unused)]
fn main() {
// Good: Specific assertion with context
assert_eq!(
    result.unwrap().message,
    "ELF executable",
    "Magic rule should identify ELF files correctly"
);

// Good: Pattern matching for complex types
match result {
    Ok(OffsetSpec::Absolute(offset)) => assert_eq!(offset, 42),
    _ => panic!("Expected absolute offset with value 42"),
}

// Avoid: Generic assertions
assert!(result.is_ok());
}

Performance Considerations

Prefer Borrowing

Use references instead of owned values when possible:

#![allow(unused)]
fn main() {
// Good: Borrowing
pub fn evaluate_rule(rule: &MagicRule, buffer: &[u8]) -> Result<bool> {}

// Avoid: Unnecessary ownership
pub fn evaluate_rule(rule: MagicRule, buffer: Vec<u8>) -> Result<bool> {}
}

Avoid Unnecessary Allocations

#![allow(unused)]
fn main() {
// Good: String slice
pub fn parse_message(input: &str) -> &str {
    input.trim()
}

// Avoid: Unnecessary allocation
pub fn parse_message(input: &str) -> String {
    input.trim().to_string()
}
}

Use Appropriate Data Structures

#![allow(unused)]
fn main() {
// Good: Vec for ordered data
let rules: Vec<MagicRule> = parse_rules(input)?;

// Good: HashMap for key-value lookups
let mime_types: HashMap<String, String> = load_mime_mappings()?;

// Consider: BTreeMap for sorted keys
let sorted_rules: BTreeMap<u32, MagicRule> = rules_by_priority();
}

This style guide ensures consistent, readable, and maintainable code across the libmagic-rs project. All contributors should follow these guidelines, and automated tools enforce many of these rules during CI.

Testing Guidelines

Comprehensive testing guidelines for libmagic-rs to ensure code quality, reliability, and maintainability.

Testing Philosophy

libmagic-rs follows a comprehensive testing strategy:

  • Unit tests: Test individual functions and methods in isolation
  • Integration tests: Test complete workflows and component interactions
  • Property tests: Use fuzzing to discover edge cases and ensure robustness
  • Compatibility tests: Verify compatibility with existing magic files and GNU file output
  • Performance tests: Ensure performance requirements are met

Test Organization

Directory Structure

libmagic-rs/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ lib.rs              # Unit tests in #[cfg(test)] modules
β”‚   β”œβ”€β”€ parser/
β”‚   β”‚   β”œβ”€β”€ mod.rs          # Parser unit tests
β”‚   β”‚   └── ast.rs          # AST unit tests
β”‚   └── evaluator/
β”‚       └── mod.rs          # Evaluator unit tests
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ cli_integration.rs  # CLI integration tests
β”‚   β”œβ”€β”€ integration/        # Integration tests
β”‚   β”œβ”€β”€ compatibility/      # GNU file compatibility tests
β”‚   └── fixtures/           # Test data and expected outputs
β”‚       β”œβ”€β”€ magic/          # Sample magic files
β”‚       β”œβ”€β”€ samples/        # Test binary files
β”‚       └── expected/       # Expected output files
└── benches/                # Performance benchmarks

Test Categories

Unit Tests

Located in #[cfg(test)] modules within source files:

#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_basic_functionality() {
        // Arrange
        let input = create_test_input();

        // Act
        let result = function_under_test(input);

        // Assert
        assert_eq!(result, expected_output);
    }
}
}

Integration Tests

Located in tests/ directory:

#![allow(unused)]
fn main() {
// tests/integration/basic_workflow.rs
use libmagic_rs::{EvaluationConfig, MagicDatabase};

#[test]
fn test_complete_file_analysis_workflow() {
    let db = MagicDatabase::load_from_file("tests/fixtures/magic/basic.magic")
        .expect("Failed to load magic database");

    let result = db
        .evaluate_file("tests/fixtures/samples/elf64")
        .expect("Failed to evaluate file");

    assert_eq!(result.description, "ELF 64-bit LSB executable");
}
}

CLI Integration Tests

Located in tests/cli_integration.rs, these tests verify the rmagic binary through subprocess execution using assert_cmd rather than testing internal functions. This approach provides proper process isolation and eliminates fragile file descriptor manipulation.

Dependencies: assert_cmd, predicates, and tempfile (from dev-dependencies).

#![allow(unused)]
fn main() {
use assert_cmd::Command;
use predicates::prelude::*;
use tempfile::TempDir;

#[test]
fn test_builtin_elf_detection() {
    let temp_dir = TempDir::new().expect("Failed to create temp dir");
    let test_file = temp_dir.path().join("test.elf");
    std::fs::write(&test_file, b"\x7fELF\x02\x01\x01\x00").unwrap();

    Command::cargo_bin("rmagic")
        .unwrap()
        .args(["--use-builtin", test_file.to_str().unwrap()])
        .assert()
        .success()
        .stdout(predicate::str::contains("ELF"));
}
}

The test suite covers:

  • Builtin Flag Tests: Verify --use-builtin with various file formats (ELF, PNG, JPEG, PDF, ZIP, GIF)
  • Stdin Tests: Validate reading from stdin with -, including empty input and truncation warnings
  • Multiple File Tests: Sequential output, strict mode, JSON output, custom magic files
  • Error Handling Tests: Missing files, directories, invalid magic files, conflicting flags
  • Timeout Tests: Argument parsing, boundary conditions
  • Output Format Tests: Text and JSON formats for single and multiple files
  • Shell Completion Tests: Generate completion scripts for bash, zsh, fish
  • Custom Magic File Tests: User-provided magic file handling
  • Edge Cases: Files with spaces, Unicode names, empty files, small files
  • CLI Argument Parsing Tests: Multiple files, strict mode, format combinations

Use CLI integration tests for end-to-end verification of rmagic binary behavior. Use unit tests (in src/main.rs or library modules) for testing individual functions and components in isolation.

Writing Effective Tests

Test Naming

Use descriptive names that explain the scenario being tested:

#![allow(unused)]
fn main() {
// Good: Descriptive test names
#[test]
fn test_parse_absolute_offset_with_positive_decimal_value() {}

#[test]
fn test_parse_absolute_offset_with_hexadecimal_value() {}

#[test]
fn test_parse_offset_returns_error_for_invalid_syntax() {}

// Bad: Generic test names
#[test]
fn test_parse_offset() {}

#[test]
fn test_error_case() {}
}

Test Structure

Follow the Arrange-Act-Assert pattern:

#![allow(unused)]
fn main() {
#[test]
fn test_magic_rule_evaluation_with_matching_bytes() {
    // Arrange
    let rule = MagicRule {
        offset: OffsetSpec::Absolute(0),
        typ: TypeKind::Byte { signed: false },
        op: Operator::Equal,
        value: Value::Uint(0x7f),
        message: "ELF magic".to_string(),
        children: vec![],
        level: 0,
    };
    let buffer = vec![0x7f, 0x45, 0x4c, 0x46]; // ELF magic

    // Act
    let result = evaluate_rule(&rule, &buffer);

    // Assert
    assert!(result.is_ok());
    assert!(result.unwrap());
}
}

Assertion Best Practices

Use specific assertions with helpful error messages:

#![allow(unused)]
fn main() {
// Good: Specific assertions
assert_eq!(result.description, "ELF executable");
assert!(result.confidence > 0.8);

// Good: Custom error messages
assert_eq!(
    parsed_offset,
    OffsetSpec::Absolute(42),
    "Parser should correctly handle decimal offset values"
);

// Good: Pattern matching for complex types
match result {
    Ok(OffsetSpec::Indirect { base_offset, adjustment, .. }) => {
        assert_eq!(base_offset, 0x20);
        assert_eq!(adjustment, 4);
    }
    _ => panic!("Expected indirect offset specification"),
}

// Avoid: Generic assertions
assert!(result.is_ok());
assert_ne!(value, 0);
}

Error Testing

Test error conditions thoroughly:

#![allow(unused)]
fn main() {
#[test]
fn test_parse_magic_file_with_invalid_syntax() {
    let invalid_magic = "0 invalid_type value message";

    let result = parse_magic_string(invalid_magic);

    assert!(result.is_err());
    match result {
        Err(LibmagicError::ParseError { line, message }) => {
            assert_eq!(line, 1);
            assert!(message.contains("invalid_type"));
        }
        _ => panic!("Expected ParseError for invalid syntax"),
    }
}

#[test]
fn test_file_evaluation_with_missing_file() {
    let db = MagicDatabase::load_from_file("tests/fixtures/magic/basic.magic").unwrap();

    let result = db.evaluate_file("nonexistent_file.bin");

    assert!(result.is_err());
    match result {
        Err(LibmagicError::IoError(_)) => (), // Expected
        _ => panic!("Expected IoError for missing file"),
    }
}
}

Edge Case Testing

Test boundary conditions and edge cases:

#![allow(unused)]
fn main() {
#[test]
fn test_offset_parsing_edge_cases() {
    // Test zero offset
    let result = parse_offset("0");
    assert_eq!(result.unwrap(), OffsetSpec::Absolute(0));

    // Test maximum positive offset
    let result = parse_offset(&i64::MAX.to_string());
    assert_eq!(result.unwrap(), OffsetSpec::Absolute(i64::MAX));

    // Test negative offset
    let result = parse_offset("-1");
    assert_eq!(result.unwrap(), OffsetSpec::Absolute(-1));

    // Test empty input
    let result = parse_offset("");
    assert!(result.is_err());
}
}

Property-Based Testing

Use proptest for fuzzing and property-based testing:

#![allow(unused)]
fn main() {
use proptest::prelude::*;

proptest! {
    #[test]
    fn test_magic_rule_serialization_roundtrip(rule in any::<MagicRule>()) {
        // Property: serialization should be reversible
        let json = serde_json::to_string(&rule)?;
        let deserialized: MagicRule = serde_json::from_str(&json)?;
        prop_assert_eq!(rule, deserialized);
    }

    #[test]
    fn test_offset_resolution_never_panics(
        offset in any::<OffsetSpec>(),
        buffer in prop::collection::vec(any::<u8>(), 0..1024)
    ) {
        // Property: offset resolution should never panic
        let _ = resolve_offset(&offset, &buffer, 0);
        // If we reach here without panicking, the test passes
    }
}
}

Test Data Management

Fixture Organization

Organize test data systematically:

tests/fixtures/
β”œβ”€β”€ magic/
β”‚   β”œβ”€β”€ basic.magic         # Simple rules for testing
β”‚   β”œβ”€β”€ complex.magic       # Complex hierarchical rules
β”‚   └── invalid.magic       # Invalid syntax for error testing
β”œβ”€β”€ samples/
β”‚   β”œβ”€β”€ elf32               # 32-bit ELF executable
β”‚   β”œβ”€β”€ elf64               # 64-bit ELF executable
β”‚   β”œβ”€β”€ zip_archive.zip     # ZIP file
β”‚   └── text_file.txt       # Plain text file
└── expected/
    β”œβ”€β”€ elf32.txt           # Expected output for elf32
    β”œβ”€β”€ elf64.json          # Expected JSON output for elf64
    └── compatibility.txt   # GNU file compatibility results

Creating Test Fixtures

#![allow(unused)]
fn main() {
// Helper function for creating test data
fn create_elf_magic_rule() -> MagicRule {
    MagicRule {
        offset: OffsetSpec::Absolute(0),
        typ: TypeKind::Long {
            endian: Endianness::Little,
            signed: false,
        },
        op: Operator::Equal,
        value: Value::Bytes(vec![0x7f, 0x45, 0x4c, 0x46]),
        message: "ELF executable".to_string(),
        children: vec![],
        level: 0,
    }
}

// Helper for creating test buffers
fn create_elf_buffer() -> Vec<u8> {
    let mut buffer = vec![0x7f, 0x45, 0x4c, 0x46]; // ELF magic
    buffer.extend_from_slice(&[0x02, 0x01, 0x01, 0x00]); // 64-bit, little-endian
    buffer.resize(64, 0); // Pad to minimum ELF header size
    buffer
}
}

Compatibility Testing

GNU File Comparison

Test compatibility with GNU file command:

#![allow(unused)]
fn main() {
#[test]
fn test_gnu_file_compatibility() {
    use std::process::Command;

    let sample_file = "tests/fixtures/samples/elf64";

    // Get GNU file output
    let gnu_output = Command::new("file")
        .arg("--brief")
        .arg(sample_file)
        .output()
        .expect("Failed to run GNU file command");

    let gnu_result = String::from_utf8(gnu_output.stdout)
        .expect("Invalid UTF-8 from GNU file")
        .trim();

    // Get libmagic-rs output
    let db = MagicDatabase::load_from_file("tests/fixtures/magic/standard.magic").unwrap();
    let result = db.evaluate_file(sample_file).unwrap();

    // Compare results (allowing for minor differences)
    assert!(
        results_are_compatible(&result.description, gnu_result),
        "libmagic-rs output '{}' not compatible with GNU file output '{}'",
        result.description,
        gnu_result
    );
}

fn results_are_compatible(rust_output: &str, gnu_output: &str) -> bool {
    // Implement compatibility checking logic
    // Allow for minor differences in formatting, version numbers, etc.
    rust_output.contains("ELF") && gnu_output.contains("ELF")
}
}

Performance Testing

Benchmark Tests

Use criterion for performance benchmarks:

#![allow(unused)]
fn main() {
// benches/evaluation_bench.rs
use criterion::{Criterion, black_box, criterion_group, criterion_main};
use libmagic_rs::{EvaluationConfig, MagicDatabase};

fn bench_file_evaluation(c: &mut Criterion) {
    let db = MagicDatabase::load_from_file("tests/fixtures/magic/standard.magic")
        .expect("Failed to load magic database");

    c.bench_function("evaluate_elf_file", |b| {
        b.iter(|| {
            db.evaluate_file(black_box("tests/fixtures/samples/elf64"))
                .expect("Evaluation failed")
        })
    });
}

criterion_group!(benches, bench_file_evaluation);
criterion_main!(benches);
}

Performance Regression Testing

#![allow(unused)]
fn main() {
#[test]
fn test_evaluation_performance() {
    use std::time::Instant;

    let db = MagicDatabase::load_from_file("tests/fixtures/magic/standard.magic").unwrap();

    let start = Instant::now();
    let _result = db
        .evaluate_file("tests/fixtures/samples/large_file.bin")
        .unwrap();
    let duration = start.elapsed();

    // Ensure evaluation completes within reasonable time
    assert!(
        duration.as_millis() < 100,
        "File evaluation took too long: {}ms",
        duration.as_millis()
    );
}
}

Test Execution

Running Tests

# Run all tests
cargo test

# Run with nextest (faster, better output)
cargo nextest run

# Run specific test modules
cargo test ast_structures
cargo test integration

# Run CLI integration tests
cargo test --test cli_integration

# Run specific CLI test
cargo test --test cli_integration test_builtin_elf_detection

# Run tests with output
cargo test -- --nocapture

# Run ignored tests
cargo test -- --ignored

# Run property tests with more cases
PROPTEST_CASES=10000 cargo test proptest

Coverage Analysis

# Install coverage tools
cargo install cargo-llvm-cov

# Generate coverage report
cargo llvm-cov --html --open

# Coverage for specific tests
cargo llvm-cov --html --tests integration

Continuous Integration

Ensure tests run in CI with multiple configurations:

# .github/workflows/test.yml
strategy:
  matrix:
    os: [ubuntu-latest, macos-latest, windows-latest]
    rust: [stable, beta]

steps:
  - name: Run tests
    run: cargo nextest run --all-features

  - name: Run property tests
    run: cargo test proptest
    env:
      PROPTEST_CASES: 1000

  - name: Check compatibility
    run: cargo test compatibility
    if: matrix.os == 'ubuntu-latest'

Test Maintenance

Keeping Tests Updated

  • Update fixtures: When adding new file format support
  • Maintain compatibility: Update compatibility tests when GNU file changes
  • Performance baselines: Update performance expectations as optimizations are added
  • Documentation: Keep test documentation current with implementation

Test Debugging

#![allow(unused)]
fn main() {
// Use debug output for failing tests
#[test]
fn debug_failing_test() {
    let result = function_under_test();
    println!("Debug output: {:?}", result);
    assert_eq!(result, expected_value);
}

// Use conditional compilation for debug tests
#[cfg(test)]
#[cfg(feature = "debug-tests")]
mod debug_tests {
    #[test]
    fn verbose_test() {
        // Detailed debugging test
    }
}
}

This comprehensive testing approach ensures libmagic-rs maintains high quality, reliability, and compatibility throughout its development lifecycle.

Release Process

This document outlines the release process for libmagic-rs, including version management, testing procedures, and deployment steps.

Release Types

Semantic Versioning

libmagic-rs follows Semantic Versioning (SemVer):

  • Major version (X.0.0): Breaking API changes
  • Minor version (0.X.0): New features, backward compatible
  • Patch version (0.0.X): Bug fixes, backward compatible

Conventional Commits

Commit messages follow the Conventional Commits specification for automated semantic versioning and changelog generation. The required format is:

<type>(<scope>): <description>

[optional body]

[optional footer(s)]

Commit Types

  • feat: Triggers minor version bump (0.X.0)
  • fix: Triggers patch version bump (0.0.X)
  • docs, style, refactor, perf, test, build, ci, chore, revert: No version bump

Breaking Changes

To trigger a major version bump (X.0.0), indicate breaking changes using a BREAKING CHANGE: footer in the commit body:

feat(api): redesign evaluation interface

Replace EvaluationConfig::new() with a builder pattern
for better ergonomics.

BREAKING CHANGE: EvaluationConfig::new() removed, use
EvaluationConfig::builder() instead.

The ! indicator after the type/scope (e.g., feat!: or feat(scope)!:) is not validated by the Mergify conventional commit enforcement. Use the BREAKING CHANGE: footer for reliable detection by both Mergify and release-plz.

Pre-release Versions

  • Alpha (0.1.0-alpha.1): Early development, unstable API
  • Beta (0.1.0-beta.1): Feature complete, API stabilizing
  • Release Candidate (0.1.0-rc.1): Final testing before release

Release Checklist

Pre-Release Preparation

1. Code Quality Verification

# Ensure all tests pass
cargo test --all-features

# Check code formatting
cargo fmt -- --check

# Run comprehensive linting
cargo clippy -- -D warnings

# Verify documentation builds
cargo doc --document-private-items

# Run security audit
cargo audit

# Check for outdated dependencies
cargo outdated

2. Performance Validation

# Run benchmarks and compare with baseline
cargo bench

# Profile memory usage
cargo build --release
valgrind --tool=massif target/release/rmagic large_file.bin

# Test with large files and magic databases
./performance_test.sh

3. Compatibility Testing

# Test against GNU file compatibility suite
cargo test compatibility

# Test with various magic file formats
./test_magic_compatibility.sh

# Cross-platform testing
cargo test --target x86_64-pc-windows-gnu
cargo test --target aarch64-apple-darwin

4. Documentation Updates

  • Update README.md with new features and changes
  • Update CHANGELOG.md with release notes
  • Review and update API documentation
  • Update migration guide if needed
  • Verify all examples work with new version

Version Bumping

1. Update Version Numbers

# Cargo.toml
[package]
name = "libmagic-rs"
version = "0.2.0"    # Update version

2. Update Documentation

#![allow(unused)]
fn main() {
// src/lib.rs - Update version in documentation
//! # libmagic-rs v0.2.0
//!
//! A pure-Rust implementation of libmagic...
}

3. Update Changelog

# Changelog

## [0.2.0] - 2026-03-01

### Features
- **parser**: Implement comparison operators

### Miscellaneous Tasks
- **Mergify**: Add outdated PR protection
- Add Mergify merge queue and simplify CI
- Mergify merge queue, dependabot integration, and CI simplification
- **release**: Add regex for version bumping based on commit types

### Breaking Changes
- `TypeKind::Byte` variant changed from unit variant to a different kind
- `Operator` enum: added `LessThan`, `GreaterThan`, `LessEqual`, `GreaterEqual` variants (exhaustive enum)
- `read_byte` function signature changed (parameter count modified)

Release Creation

1. Create Release Branch

# Create release branch
git checkout -b release/v0.2.0

# Commit version updates
git add Cargo.toml CHANGELOG.md README.md
git commit -m "chore: bump version to 0.2.0"

# Push release branch
git push origin release/v0.2.0

2. Final Testing

# Clean build and test
cargo clean
cargo build --release
cargo test --release

# Integration testing
./integration_test.sh

# Performance regression testing
./performance_regression_test.sh

3. Create Pull Request

  • Create PR from release branch to main
  • Ensure all CI checks pass
  • Get approval from maintainers
  • Merge to main branch

4. Tag Release

# Switch to main branch
git checkout main
git pull origin main

# Create and push tag
git tag -a v0.2.0 -m "Release version 0.2.0"
git push origin v0.2.0

GitHub Release

1. Create GitHub Release

  • Go to GitHub repository releases page
  • Click β€œCreate a new release”
  • Select the version tag (v0.2.0)
  • Use version number as release title
  • Copy changelog content as release description

2. Release Assets

Include relevant assets:

  • Source code (automatically included)
  • Pre-compiled binaries (if applicable)
  • Documentation archive
  • Checksums file

Post-Release Tasks

1. Update Development Branch

# Create new development branch
git checkout -b develop
git push origin develop

# Update version to next development version
# Cargo.toml: version = "0.3.0-dev"
git add Cargo.toml
git commit -m "chore: bump version to 0.3.0-dev"
git push origin develop

2. Documentation Deployment

# Deploy documentation to GitHub Pages
mdbook build docs/
# Automated deployment via GitHub Actions

3. Announcement

  • Update project README with latest version
  • Post announcement in GitHub Discussions
  • Update any external documentation or websites
  • Notify users through appropriate channels

Hotfix Process

Critical Bug Fixes

For critical bugs that need immediate release:

1. Create Hotfix Branch

# Branch from latest release tag
git checkout v0.2.0
git checkout -b hotfix/v0.2.1

# Make minimal fix
# ... fix the critical bug ...

# Commit fix
git add .
git commit -m "fix: critical security vulnerability in offset parsing"

2. Test Hotfix

# Run focused tests
cargo test security
cargo test offset_parsing

# Run security audit
cargo audit

# Minimal integration testing
./critical_path_test.sh

3. Release Hotfix

# Update version to patch release
# Cargo.toml: version = "0.2.1"

# Update changelog
# Add entry for hotfix

# Commit and tag
git add Cargo.toml CHANGELOG.md
git commit -m "chore: bump version to 0.2.1"
git tag -a v0.2.1 -m "Hotfix release 0.2.1"

# Push hotfix
git push origin hotfix/v0.2.1
git push origin v0.2.1

4. Merge Back

# Merge hotfix to main
git checkout main
git merge hotfix/v0.2.1

# Merge hotfix to develop
git checkout develop
git merge hotfix/v0.2.1

# Clean up hotfix branch
git branch -d hotfix/v0.2.1
git push origin --delete hotfix/v0.2.1

Release Automation

Releases are automated by two complementary tools:

  • release-plz: Manages crates.io publishing, version bumping, changelog generation, and git tagging
  • cargo-dist: Builds cross-platform binaries, Homebrew formulas, SBOM, and GitHub Releases

How It Works

  1. Every push to main triggers release-plz, which opens (or updates) a release PR with:

    • Version bump in Cargo.toml
    • Updated CHANGELOG.md (generated via git-cliff)
    • Semantic versioning based on conventional commits
  2. When the release PR is merged, release-plz:

    • Publishes the crate to crates.io
    • Creates a git tag (e.g., v0.2.0)
  3. The git tag triggers cargo-dist, which:

    • Builds binaries for all target platforms
    • Generates SLSA attestations and SBOM
    • Publishes the Homebrew formula
    • Creates the GitHub Release with all artifacts

Release PR Merge Handling

Release-plz PRs (with branch names matching release-plz-*) are exempt from the standard β€œCI must pass” merge protection. This exemption exists because:

  • release-plz force-pushes when updating existing PRs
  • GITHUB_TOKEN-triggered pushes do not trigger workflow events, so CI never runs on the updated HEAD commit
  • Without this exemption, the merge protection would block the PR indefinitely
  • Release-plz PRs only bump versions and update changelogs – they contain no code changes
  • The code being released was already tested on main before the release PR was created
  • CI still runs in the merge queue as a final verification step before the release is completed

The β€œCI must pass” merge protection requires 7 checks: quality, test, test-cross-platform (ubuntu-latest, Linux), test-cross-platform (ubuntu-22.04, Linux), test-cross-platform (macos-latest, macOS), test-cross-platform (windows-latest, Windows), and coverage.

PRs must be within 10 commits of main before merging. This exemption allows the release workflow to proceed smoothly while maintaining the safety of the automated release process.

Configuration Files

FilePurpose
release-plz.tomlrelease-plz configuration (crates.io, tags, changelog)
dist-workspace.tomlcargo-dist configuration (binaries, Homebrew, SBOM)
cliff.tomlgit-cliff changelog template (shared by both tools)

Authentication

  • crates.io: Uses trusted publishing (OIDC) – no token secret needed. Requires configuring the trusted publisher on crates.io and id-token: write permission in the workflow. Note: the first publish of a new crate must be done manually with cargo publish.
  • Homebrew tap: Requires a HOMEBREW_TAP_TOKEN secret with write access to the tap repository.
  • GitHub Releases: Uses the automatic GITHUB_TOKEN.

Release Schedule

Regular Releases

  • Minor releases: Every 6-8 weeks
  • Patch releases: As needed for bug fixes
  • Major releases: When breaking changes accumulate

Release Windows

  • Feature freeze: 1 week before release
  • Code freeze: 3 days before release
  • Release day: Tuesday (for maximum testing time)

Communication

  • Release planning: Discussed in GitHub Issues/Discussions
  • Release announcements: GitHub Releases, project README
  • Breaking changes: Documented in migration guide

This release process ensures high-quality, reliable releases while maintaining clear communication with users and contributors.

Release Verification

All libmagic-rs release artifacts are cryptographically signed to ensure authenticity and integrity. This guide explains how to verify that a downloaded artifact is genuine.

How Releases Are Signed

libmagic-rs uses Sigstore keyless signing via GitHub Attestations. During the release build:

  1. cargo-dist builds release artifacts in GitHub Actions
  2. actions/attest-build-provenance generates a signed SLSA provenance attestation for each artifact
  3. The attestation is stored in GitHub’s attestation ledger and Sigstore’s transparency log

Keyless signing means there are no long-lived private keys to manage or compromise. Each build receives an ephemeral signing certificate tied to the GitHub Actions workflow identity.

Verifying with GitHub CLI

The simplest way to verify an artifact:

# Install GitHub CLI if you haven't already
# https://cli.github.com/

# Download a release artifact
gh release download v0.1.0 --repo EvilBit-Labs/libmagic-rs

# Verify the artifact
gh attestation verify rmagic-x86_64-unknown-linux-gnu.tar.xz \
  --repo EvilBit-Labs/libmagic-rs

A successful verification looks like:

Loaded digest sha256:abc123... for file rmagic-x86_64-unknown-linux-gnu.tar.xz
Loaded 1 attestation from GitHub API

The following attestation matched the digest:
  - Predicate type: https://slsa.dev/provenance/v1
  - Signer:         https://github.com/EvilBit-Labs/libmagic-rs/.github/workflows/release.yml
  - Build trigger:  push

What Verification Proves

A successful verification confirms:

  • Authenticity: The artifact was built by the official GitHub Actions workflow in the EvilBit-Labs/libmagic-rs repository
  • Integrity: The artifact has not been modified since it was built
  • Provenance: The build was triggered by a specific commit and tag

Additional Integrity Checks

SBOM (Software Bill of Materials)

Each release includes a CycloneDX SBOM generated by cargo-cyclonedx, listing all dependencies and their versions.

Embedded Dependency Metadata

Release binaries are built with cargo-auditable, which embeds dependency information directly into the binary. You can inspect it with:

cargo audit bin rmagic

This allows post-deployment vulnerability scanning against the RustSec Advisory Database.

Homebrew

Homebrew formula installations from the EvilBit-Labs/homebrew-tap tap are verified through Homebrew’s standard SHA256 checksum mechanism, which is populated from the GitHub Release artifacts.

API Reference

Complete API documentation for libmagic-rs library components.

Core Types

MagicDatabase

The main interface for loading magic rules and evaluating files.

#![allow(unused)]
fn main() {
use libmagic_rs::MagicDatabase;
}

Constructor Methods

MethodDescription
with_builtin_rules()Create database with built-in rules
with_builtin_rules_and_config(config)Create with built-in rules and custom config
load_from_file(path)Load rules from a file or directory
load_from_file_with_config(path, config)Load from file with custom config

Evaluation Methods

MethodDescription
evaluate_file(path)Evaluate a file and return results
evaluate_buffer(buffer)Evaluate an in-memory buffer

Accessor Methods

MethodReturn TypeDescription
config()&EvaluationConfigGet evaluation configuration
source_path()Option<&Path>Get path rules were loaded from

Example

#![allow(unused)]
fn main() {
use libmagic_rs::{MagicDatabase, EvaluationConfig};

// Using built-in rules
let db = MagicDatabase::with_builtin_rules()?;
let result = db.evaluate_file("sample.bin")?;
println!("Type: {}", result.description);

// With custom configuration
let config = EvaluationConfig {
    timeout_ms: Some(5000),
    enable_mime_types: true,
    ..Default::default()
};
let db = MagicDatabase::with_builtin_rules_and_config(config)?;

// From file
let db = MagicDatabase::load_from_file("/usr/share/misc/magic")?;
}

EvaluationResult

Result of magic rule evaluation.

#![allow(unused)]
fn main() {
use libmagic_rs::EvaluationResult;
}

Fields

FieldTypeDescription
descriptionStringHuman-readable file type description
mime_typeOption<String>MIME type (if enabled)
confidencef64Confidence score (0.0-1.0)
matchesVec<MatchResult>Individual match results
metadataEvaluationMetadataEvaluation diagnostics

Example

#![allow(unused)]
fn main() {
let result = db.evaluate_file("document.pdf")?;

println!("Description: {}", result.description);
println!("Confidence: {:.0}%", result.confidence * 100.0);

if let Some(mime) = &result.mime_type {
    println!("MIME Type: {}", mime);
}

println!("Evaluation time: {:.2}ms", result.metadata.evaluation_time_ms);
}

EvaluationConfig

Configuration for rule evaluation behavior.

#![allow(unused)]
fn main() {
use libmagic_rs::EvaluationConfig;
}

Fields

FieldTypeDefaultDescription
max_recursion_depthu3220Maximum nesting depth for rules (1-1000)
max_string_lengthusize8192Maximum string bytes to read (1-1MB)
stop_at_first_matchbooltrueStop after first match
enable_mime_typesboolfalseMap results to MIME types
timeout_msOption<u64>NoneEvaluation timeout (1-300000ms)

Preset Configurations

#![allow(unused)]
fn main() {
// Default balanced settings
let config = EvaluationConfig::default();

// Optimized for speed
let config = EvaluationConfig::performance();
// - max_recursion_depth: 10
// - max_string_length: 1024
// - stop_at_first_match: true
// - timeout_ms: Some(1000)

// Optimized for completeness
let config = EvaluationConfig::comprehensive();
// - max_recursion_depth: 50
// - max_string_length: 32768
// - stop_at_first_match: false
// - enable_mime_types: true
// - timeout_ms: Some(30000)
}

Validation

#![allow(unused)]
fn main() {
let config = EvaluationConfig {
    max_recursion_depth: 25,
    max_string_length: 16384,
    ..Default::default()
};

// Validate configuration
config.validate()?;
}

EvaluationMetadata

Diagnostic information about the evaluation process.

#![allow(unused)]
fn main() {
use libmagic_rs::EvaluationMetadata;
}

Fields

FieldTypeDescription
file_sizeu64Size of analyzed file in bytes
evaluation_time_msf64Time taken in milliseconds
rules_evaluatedusizeNumber of rules tested
magic_fileOption<PathBuf>Source magic file path
timed_outboolWhether evaluation timed out

AST Types

MagicRule

Represents a parsed magic rule.

#![allow(unused)]
fn main() {
use libmagic_rs::MagicRule;
}
FieldTypeDescription
offsetOffsetSpecWhere to read data
typTypeKindType of data to read
opOperatorComparison operator
valueValueExpected value
messageStringDescription message
childrenVec<MagicRule>Nested rules
levelu32Indentation level
strength_modifierOption<StrengthModifier>Optional strength modifier from !:strength directive

StrengthModifier

Optional modifier for rule strength calculation.

#![allow(unused)]
fn main() {
use libmagic_rs::StrengthModifier;
}
VariantDescription
Add(i32)Add to base strength
Subtract(i32)Subtract from base strength
Multiply(i32)Multiply base strength
Divide(i32)Divide base strength
Set(i32)Set strength to fixed value

OffsetSpec

Offset specification for locating data.

#![allow(unused)]
fn main() {
use libmagic_rs::OffsetSpec;
}
VariantDescription
Absolute(i64)Absolute offset from file start
Indirect { base_offset, pointer_type, adjustment, endian }Indirect through pointer
Relative(i64)Relative to previous match
FromEnd(i64)Offset from end of file

TypeKind

Data type specifications.

#![allow(unused)]
fn main() {
use libmagic_rs::TypeKind;
}
VariantDescription
Byte { signed }Single byte with explicit signedness (struct variant since v0.2.0; previously unit variant)
Short { endian, signed }16-bit integer
Long { endian, signed }32-bit integer
Float { endian }32-bit IEEE 754 floating-point (added in v0.5.0)
Double { endian }64-bit IEEE 754 double-precision floating-point (added in v0.5.0)
String { max_length }String data (discriminant changed from 4 to 6 in v0.5.0)
PString { max_length }Pascal string - length-prefixed byte followed by string data (returns Value::String)

Operator

Comparison and bitwise operators for magic rule matching.

#![allow(unused)]
fn main() {
use libmagic_rs::Operator;
}
VariantDescription
EqualEquality comparison (=)
NotEqualInequality comparison (!=)
LessThanLess than comparison (<) (added in v0.2.0)
GreaterThanGreater than comparison (>) (added in v0.2.0)
LessEqualLess than or equal comparison (<=) (added in v0.2.0)
GreaterEqualGreater than or equal comparison (>=) (added in v0.2.0)
BitwiseAndBitwise AND (&) - matches if any bits overlap
BitwiseAndMask(u64)Bitwise AND with mask - masked comparison of file data
BitwiseXorBitwise XOR (^) - matches if XOR result is non-zero
BitwiseNotBitwise NOT (~) - compares complement of file value with expected value
AnyValueMatch any value (x) - unconditional match

Examples

#![allow(unused)]
fn main() {
use libmagic_rs::{MagicDatabase, Operator};
use libmagic_rs::parser::grammar::parse_magic_rule;

// Equal operator (default)
let (_, rule) = parse_magic_rule("0 byte =0x7f").unwrap();
assert_eq!(rule.op, Operator::Equal);

// Bitwise AND - check if bit is set
let (_, rule) = parse_magic_rule("0 byte &0x80").unwrap();
assert_eq!(rule.op, Operator::BitwiseAnd);

// Bitwise XOR - check for difference
let (_, rule) = parse_magic_rule("0 byte ^0xFF").unwrap();
assert_eq!(rule.op, Operator::BitwiseXor);

// Bitwise NOT - check complement
let (_, rule) = parse_magic_rule("0 byte ~0xFF").unwrap();
assert_eq!(rule.op, Operator::BitwiseNot);

// Any value - always matches
let (_, rule) = parse_magic_rule("0 byte x").unwrap();
assert_eq!(rule.op, Operator::AnyValue);
}

Value

Value types for matching.

#![allow(unused)]
fn main() {
use libmagic_rs::Value;
}
VariantDescription
Uint(u64)Unsigned integer
Int(i64)Signed integer
Float(f64)Floating-point value (added in v0.5.0)
Bytes(Vec<u8>)Byte sequence
String(String)String value

The Value enum derives PartialEq but no longer derives Eq (removed in v0.5.0 to support floating-point values).

Endianness

Byte order specification.

#![allow(unused)]
fn main() {
use libmagic_rs::Endianness;
}
VariantDescription
LittleLittle-endian
BigBig-endian
NativeSystem native

Error Types

LibmagicError

Main error type for all library operations.

#![allow(unused)]
fn main() {
use libmagic_rs::LibmagicError;
}

Variants

VariantDescription
ParseError(ParseError)Magic file parsing error
EvaluationError(EvaluationError)Rule evaluation error
IoError(std::io::Error)File I/O error
Timeout { timeout_ms }Evaluation timeout exceeded

ParseError

Errors during magic file parsing.

VariantDescription
InvalidSyntax { line, message }Invalid syntax in magic file
UnsupportedFeature { line, feature }Unsupported feature encountered
InvalidOffset { line, offset }Invalid offset specification
InvalidType { line, type_spec }Invalid type specification
InvalidOperator { line, operator }Invalid operator
InvalidValue { line, value }Invalid value
UnsupportedFormat { line, format_type, message }Unsupported file format
IoError(std::io::Error)I/O error during parsing

EvaluationError

Errors during rule evaluation.

VariantDescription
BufferOverrun { offset }Read beyond buffer bounds
InvalidOffset { offset }Invalid offset calculation
UnsupportedType { type_name }Unsupported type during evaluation
RecursionLimitExceeded { depth }Max recursion depth exceeded
StringLengthExceeded { length, max_length }String too long
InvalidStringEncoding { offset }Invalid string encoding
Timeout { timeout_ms }Evaluation timeout
InternalError { message }Internal error (bug)

Example

#![allow(unused)]
fn main() {
use libmagic_rs::{MagicDatabase, LibmagicError, ParseError};

match MagicDatabase::load_from_file("invalid.magic") {
    Ok(db) => println!("Loaded successfully"),
    Err(LibmagicError::ParseError(ParseError::InvalidSyntax { line, message })) => {
        eprintln!("Syntax error at line {}: {}", line, message);
    }
    Err(LibmagicError::IoError(e)) => {
        eprintln!("I/O error: {}", e);
    }
    Err(e) => eprintln!("Error: {}", e),
}
}

Evaluator Module

EvaluationContext

Maintains evaluation state during rule processing.

#![allow(unused)]
fn main() {
use libmagic_rs::EvaluationContext;
}

Methods

MethodDescription
new(config)Create new context
current_offset()Get current position
set_current_offset(offset)Set current position
recursion_depth()Get recursion depth
increment_recursion_depth()Increment depth (with limit check)
decrement_recursion_depth()Decrement depth
should_stop_at_first_match()Check stop behavior
max_string_length()Get max string length
enable_mime_types()Check MIME type setting
timeout_ms()Get timeout value
reset()Reset to initial state

MatchResult (Evaluator)

Result from internal evaluation.

#![allow(unused)]
fn main() {
use libmagic_rs::evaluator::MatchResult;
}
FieldTypeDescription
messageStringMatch description
offsetusizeMatch offset
levelu32Rule level
valueValueMatched value
type_kindTypeKindType used to read value (added in v0.5.0)
confidencef64Confidence score

Output Module

MatchResult (Output)

Structured match result for output formatting.

#![allow(unused)]
fn main() {
use libmagic_rs::output::MatchResult;
}

Fields

FieldTypeDescription
messageStringFile type description
offsetusizeMatch offset
lengthusizeBytes examined
valueValueMatched value
rule_pathVec<String>Rule hierarchy
confidenceu8Confidence (0-100)
mime_typeOption<String>MIME type

Methods

#![allow(unused)]
fn main() {
// Create basic result
let result = MatchResult::new(
    "PNG image".to_string(),
    0,
    Value::Bytes(vec![0x89, 0x50, 0x4e, 0x47])
);

// Create with full metadata
let result = MatchResult::with_metadata(
    "JPEG image".to_string(),
    0,
    2,
    Value::Bytes(vec![0xff, 0xd8]),
    vec!["image".to_string(), "jpeg".to_string()],
    85,
    Some("image/jpeg".to_string())
);

// Modify result
result.set_confidence(90);
result.add_rule_path("subtype".to_string());
result.set_mime_type(Some("image/jpeg".to_string()));
}

JSON Output

#![allow(unused)]
fn main() {
use libmagic_rs::output::json::{format_json_output, format_json_line_output};

// Pretty-printed JSON (single file)
let json = format_json_output(&matches)?;

// JSON Lines (multiple files)
let json_line = format_json_line_output(path, &matches)?;
}

Type Aliases

AliasDefinitionDescription
Result<T>std::result::Result<T, LibmagicError>Library result type

Re-exports

The following types are re-exported from the root module for convenience:

#![allow(unused)]
fn main() {
// AST types
pub use parser::ast::{Endianness, MagicRule, OffsetSpec, Operator, StrengthModifier, TypeKind, Value};

// Evaluator types
pub use evaluator::{EvaluationContext, MatchResult};

// Error types
pub use error::{EvaluationError, LibmagicError, ParseError};
}

Thread Safety

  • MagicDatabase is not Send or Sync by default due to internal state
  • EvaluationConfig is Send + Sync (plain data)
  • For multi-threaded use, create separate MagicDatabase instances per thread or use appropriate synchronization

Version Compatibility

  • Minimum Rust Version: 1.85
  • Edition: 2024
  • License: Apache-2.0
  • Current Version: 0.5.0

Breaking Changes in v0.5.0

  • TypeKind enum: Added Float { endian } and Double { endian } variants for IEEE 754 floating-point support
  • TypeKind::String discriminant changed from 4 to 6 to accommodate new float types
  • Value enum: Added Float(f64) variant for floating-point values
  • Value enum: No longer derives Eq trait (only PartialEq is available due to floating-point values)
  • RuleMatch struct: Added type_kind: TypeKind field to indicate the type used for matching

Breaking Changes in v0.2.0

  • TypeKind::Byte changed from a unit variant to a struct variant Byte { signed } to support explicit signedness
  • Added comparison operators: LessThan, GreaterThan, LessEqual, GreaterEqual to the Operator enum (breaking change due to exhaustive enum)

For complete API documentation with examples, run:

cargo doc --open

Appendix B: Command Reference

Command-line interface documentation for the rmagic file identification tool.

Overview

rmagic is a pure-Rust implementation of the file command for file type identification using magic rules.

Installation

# From source
git clone https://github.com/EvilBit-Labs/libmagic-rs
cd libmagic-rs
cargo install --path .

Synopsis

rmagic [OPTIONS] <FILE>...
rmagic [OPTIONS] -

Description

rmagic analyzes files and determines their types based on magic rules. It examines file contents rather than relying on file extensions, providing accurate identification for binary files, archives, executables, images, and more.

Arguments

ArgumentDescription
<FILE>...One or more files to analyze
-Read from standard input

Options

Output Format

OptionDescription
-j, --jsonOutput results in JSON format
--textOutput results in text format (default)

Note: --json and --text are mutually exclusive.

Magic File Selection

OptionDescription
-m, --magic-file <FILE>Use custom magic file or directory
-b, --use-builtinUse built-in magic rules

Note: --magic-file and --use-builtin are mutually exclusive.

Behavior

OptionDescription
-s, --strictExit with non-zero code on any error
-t, --timeout-ms <MS>Set evaluation timeout (1-300000ms)

Help

OptionDescription
-h, --helpPrint help information
-V, --versionPrint version information

Exit Codes

CodeDescription
0Success
1General evaluation error
2Invalid arguments (misuse)
3File not found or access denied
4Magic file not found or invalid
5Evaluation timeout

Output Formats

Text Format (Default)

One line per file in the format:

filename: description

Examples:

document.pdf: PDF document
image.png: PNG image data
binary.exe: PE32 executable

JSON Format

Single file: Pretty-printed JSON with full details.

{
  "matches": [
    {
      "message": "ELF 64-bit LSB executable",
      "offset": 0,
      "length": 4,
      "value": "7f454c46",
      "rule_path": [
        "elf",
        "executable"
      ],
      "confidence": 90,
      "mime_type": "application/x-executable"
    }
  ]
}

Multiple files: JSON Lines format (compact, one JSON object per line).

{"filename":"file1.bin","matches":[...]}
{"filename":"file2.bin","matches":[...]}

Magic File Discovery

When no --magic-file is specified and --use-builtin is not used, rmagic searches for magic files in this order (OpenBSD-style, text-first):

Text Directories (Highest Priority)

  1. /usr/share/file/magic/Magdir
  2. /usr/share/file/magic

Text Files

  1. /usr/share/misc/magic
  2. /usr/local/share/misc/magic
  3. /etc/magic
  4. /opt/local/share/file/magic

Binary Files (Fallback)

  1. /usr/share/file/magic.mgc
  2. /usr/local/share/misc/magic.mgc
  3. /opt/local/share/file/magic.mgc
  4. /etc/magic.mgc
  5. /usr/share/misc/magic.mgc

Development Fallbacks

  1. missing.magic (current directory)
  2. third_party/magic.mgc

Note: Binary .mgc files are currently unsupported. Use --use-builtin or a text magic file.

Built-in Rules

The --use-builtin flag uses pre-compiled rules for common file types:

CategoryFormats
ExecutablesELF, PE/DOS (MZ)
ArchivesZIP, TAR, GZIP
ImagesJPEG, PNG, GIF, BMP
DocumentsPDF

Examples

Basic Usage

# Identify a single file
rmagic document.pdf

# Identify multiple files
rmagic *.bin

# Use built-in rules
rmagic --use-builtin image.png

# Read from stdin
cat unknown.bin | rmagic -

JSON Output

# Single file with pretty JSON
rmagic --json executable.elf

# Multiple files with JSON Lines
rmagic --json file1.bin file2.bin file3.bin

# Parse JSON output with jq
rmagic --json binary.exe | jq '.matches[0].text'

Custom Magic File

# Use specific magic file
rmagic --magic-file /path/to/custom.magic files/*

# Use magic directory (Magdir style)
rmagic --magic-file /usr/share/file/magic files/*

Error Handling

# Strict mode - fail on first error
rmagic --strict *.bin

# With timeout protection
rmagic --timeout-ms 5000 large-file.bin

# Combine options
rmagic --strict --timeout-ms 10000 --json *.bin

Pipeline Usage

# Find all ELF files
find . -type f -exec rmagic --use-builtin {} + | grep ELF

# Process files and output JSON
for f in *.bin; do
    rmagic --json "$f" >> results.jsonl
done

# Use with xargs
find . -name "*.dat" -print0 | xargs -0 rmagic --use-builtin

Scripting

#!/bin/bash
# Check if file is an image

if rmagic --use-builtin "$1" | grep -q "image"; then
    echo "File is an image"
    exit 0
else
    echo "File is not an image"
    exit 1
fi

Environment Variables

VariableDescription
CIEnables CI mode (affects magic file fallback)
GITHUB_ACTIONSEnables GitHub Actions mode

Platform-Specific Behavior

Unix (Linux, macOS, BSD)

  • Full magic file discovery
  • Memory-mapped file access
  • Standard Unix exit codes

Windows

  • Limited magic file locations
  • Falls back to %APPDATA%\Magic\magic
  • Uses third_party/magic.mgc in CI

Troubleshooting

Common Issues

β€œMagic file not found”

# Solution 1: Use built-in rules
rmagic --use-builtin file.bin

# Solution 2: Specify magic file path
rmagic --magic-file /path/to/magic file.bin

# Solution 3: Check available locations
ls -la /usr/share/misc/magic /usr/share/file/magic* 2>/dev/null

β€œUnsupported format: binary .mgc”

# Binary .mgc files are not supported
# Use --use-builtin or a text magic file

rmagic --use-builtin file.bin

β€œEvaluation timeout”

# Increase timeout
rmagic --timeout-ms 30000 large-file.bin

# Or use simpler rules
rmagic --use-builtin large-file.bin

β€œPermission denied”

# Check file permissions
ls -la file.bin

# Fix permissions if needed
chmod +r file.bin

Debug Tips

# Check which magic file is being used
rmagic --help  # Shows version

# Test with built-in rules first
rmagic --use-builtin test-file.bin

# Verbose error with strict mode
rmagic --strict file.bin

Comparison with GNU file

FeaturermagicGNU file
Binary .mgc supportNoYes
Text magic filesYesYes
Built-in rulesYesNo
Memory safetyRust (safe)C
JSON outputNativeRequires wrapper
Timeout supportYesNo

Migration from file

# Before (GNU file)
file document.pdf

# After (rmagic)
rmagic document.pdf

# With options
file -i document.pdf      # MIME type
rmagic --json document.pdf | jq '.matches[0].mime_type'

See Also

Appendix C: Magic File Examples

This appendix provides comprehensive examples of magic file syntax and patterns, demonstrating how to create effective file type detection rules.

Basic Magic File Syntax

Simple Pattern Matching

# ELF executable files
0    string    \x7fELF    ELF

# PDF documents
0    string    %PDF-      PDF document

# PNG images
0    string    \x89PNG    PNG image data

# ZIP archives
0    string    PK\x03\x04    ZIP archive data

Numeric Value Matching

# JPEG images (using hex values)
0    beshort    0xffd8    JPEG image data

# Windows PE executables
0    string    MZ        MS-DOS executable
>60  lelong    >0
>>60 string    PE\0\0    PE32 executable

# ELF with specific architecture
0    string    \x7fELF    ELF
>16  leshort   2         executable
>18  leshort   62        x86-64

# 64-bit file signature matching
0    lequad    0x1234567890abcdef    Custom format with 64-bit magic
0    bequad    0xfeedface00000000    Mach-O universal binary signature

# Large integer value checks
512  uquad     >0x8000000000000000    Large unsigned 64-bit value
1024 quad      0xffffffffffffffff    Signed 64-bit match

Hierarchical Rules

Parent-Child Relationships

# ELF files with detailed classification
0    string    \x7fELF    ELF
>4   byte      1         32-bit
>>16 leshort   2         executable
>>16 leshort   3         shared object
>>16 leshort   1         relocatable
>4   byte      2         64-bit
>>16 leshort   2         executable
>>16 leshort   3         shared object
>>16 leshort   1         relocatable

Multiple Levels of Nesting

# Detailed PE analysis
0    string    MZ        MS-DOS executable
>60  lelong    >0
>>60 string    PE\0\0    PE32
>>>88 leshort  0x010b    PE32 executable
>>>>92 leshort 1         (native)
>>>>92 leshort 2         (GUI)
>>>>92 leshort 3         (console)
>>>88 leshort  0x020b    PE32+ executable
>>>>92 leshort 1         (native)
>>>>92 leshort 2         (GUI)
>>>>92 leshort 3         (console)

Data Types and Endianness

Integer Types

# Little-endian integers
0    leshort   0x5a4d    MS-DOS executable (little-endian short)
0    lelong    0x464c457f    ELF (little-endian long)
0    lequad    0x1234567890abcdef    64-bit little-endian value

# Big-endian integers
0    beshort   0x4d5a    MS-DOS executable (big-endian short)
0    belong    0x7f454c46    ELF (big-endian long)
0    bequad    0xfeedface00000000    64-bit big-endian value

# Native endian (system default)
0    short     0x5a4d    MS-DOS executable (native endian)
0    long      0x464c457f    ELF (native endian)
0    quad      0xffffffffffffffff    64-bit native endian

# Unsigned variants
0    ubyte     0xff      Unsigned byte
0    ushort    0xffff    Unsigned short
0    ulong     0xffffffff    Unsigned long
0    uquad     0xffffffffffffffff    Unsigned 64-bit value
0    ulequad   >0x8000000000000000    Unsigned little-endian quad
0    ubequad   >0x8000000000000000    Unsigned big-endian quad

String Matching

# Fixed-length strings
0    string    #!/bin/sh    shell script
0    string    #!/usr/bin/python    Python script

# Variable-length strings with limits
0    string/32    #!/    script text executable
16   string/256   This program    self-describing executable

# Case-insensitive matching (planned)
0    istring   html    HTML document
0    istring   <html   HTML document

Advanced Offset Specifications

Indirect Offsets

# PE section table access
0    string    MZ        MS-DOS executable
>60  lelong    >0
>>60 string    PE\0\0    PE32
>>>(60.l+24)  leshort   >0    sections
>>>>(60.l+24) leshort   x     \b, %d sections

Relative Offsets

# ZIP file entries
0    string    PK\x03\x04    ZIP archive data
>26  leshort   x         \b, compressed size %d
>28  leshort   x         \b, uncompressed size %d
>30  leshort   >0
>>(30.s+46)   string    x    \b, first entry: "%.64s"

Search Patterns

# Search for patterns within a range
0      string    \x7fELF    ELF
>0     search/1024    .note.gnu.build-id    \b, with build-id
>0     search/1024    .debug_info    \b, with debug info

Bitwise Operations

Flag Testing

# ELF program header flags
0    string    \x7fELF    ELF
>16  leshort   2         executable
>36  lelong    &0x1      \b, executable
>36  lelong    &0x2      \b, writable
>36  lelong    &0x4      \b, readable

Mask Operations

# File permissions in Unix archives
0    string    070707    cpio archive
>6   long      &0170000
>>6  long      0100000   \b, regular file
>>6  long      0040000   \b, directory
>>6  long      0120000   \b, symbolic link
>>6  long      0060000   \b, block device
>>6  long      0020000   \b, character device

Complex File Format Examples

JPEG Image Analysis

# JPEG with EXIF data
0    beshort   0xffd8    JPEG image data
>2   beshort   0xffe1    \b, EXIF standard
>>10 string    Exif\0\0
>>>14 beshort  0x4d4d    \b, big-endian
>>>14 beshort  0x4949    \b, little-endian
>2   beshort   0xffe0    \b, JFIF standard
>>10 string    JFIF
>>>14 byte     x         \b, version %d
>>>15 byte     x         \b.%d

Archive Format Detection

# TAR archives
257  string    ustar\0   POSIX tar archive
257  string    ustar\040\040\0    GNU tar archive

# RAR archives
0    string    Rar!      RAR archive data
>4   byte      0x1a      \b, version 1.x
>4   byte      0x07      \b, version 5.x

# 7-Zip archives
0    string    7z\xbc\xaf\x27\x1c    7-zip archive data
>6   byte      x         \b, version %d
>7   byte      x         \b.%d

Executable Format Analysis

# Mach-O executables (macOS)
0    belong    0xfeedface    Mach-O executable (32-bit)
>4   belong    7            i386
>4   belong    18           x86_64
>12  belong    2            executable
>12  belong    6            shared library
>12  belong    8            bundle

0    belong    0xfeedfacf    Mach-O executable (64-bit)
>4   belong    0x01000007   x86_64
>4   belong    0x0100000c   arm64
>12  belong    2            executable
>12  belong    6            shared library

Script and Text File Detection

Shebang Detection

# Shell scripts
0    string    #!/bin/sh         POSIX shell script
0    string    #!/bin/bash       Bash shell script
0    string    #!/bin/csh        C shell script
0    string    #!/bin/tcsh       TC shell script
0    string    #!/bin/zsh        Z shell script

# Interpreted languages
0    string    #!/usr/bin/python    Python script
0    string    #!/usr/bin/perl      Perl script
0    string    #!/usr/bin/ruby      Ruby script
0    string    #!/usr/bin/node      Node.js script
0    string    #!/usr/bin/php       PHP script

Text Format Detection

# Configuration files
0    string    [Desktop\ Entry]    Desktop configuration
0    string    # Configuration      configuration text
0    regex     ^[a-zA-Z_][a-zA-Z0-9_]*\s*=    configuration text

# Source code detection
0    regex     ^#include\s*<       C source code
0    regex     ^package\s+         Java source code
0    regex     ^class\s+\w+:       Python source code
0    regex     ^function\s+        JavaScript source code

Database and Structured Data

Database Files

# SQLite databases
0    string    SQLite\ format\ 3    SQLite 3.x database
>13  byte      x                   \b, version %d

# MySQL databases
0    string    \xfe\x01\x00\x00    MySQL table data
0    string    \x00\x00\x00\x00    MySQL ISAM compressed data

# PostgreSQL
0    belong    0x00061561          PostgreSQL custom database dump
>4   belong    x                   \b, version %d

Structured Text Formats

# JSON files
0    regex     ^\s*[\{\[]          JSON data
>0   search/64 "version"          \b, with version info
>0   search/64 "name"             \b, with name field

# XML files
0    string    <?xml               XML document
>5   search/256 version
>>5  regex     version="([^"]*)"   \b, version \1
>5   search/256 encoding
>>5  regex     encoding="([^"]*)"  \b, encoding \1

# YAML files
0    regex     ^---\s*$            YAML document
0    regex     ^[a-zA-Z_][^:]*:    YAML configuration

Multimedia File Examples

Audio Formats

# MP3 files
0    string    ID3                 MP3 audio file with ID3
>3   byte      <0xff               version 2
>>3  byte      x                   \b.%d
0    beshort   0xfffb              MP3 audio file
0    beshort   0xfff3              MP3 audio file
0    beshort   0xffe3              MP3 audio file

# WAV files
0    string    RIFF                Microsoft RIFF
>8   string    WAVE                \b, WAVE audio
>>20 leshort   1                   \b, PCM
>>20 leshort   85                  \b, MPEG Layer 3
>>22 leshort   1                   \b, mono
>>22 leshort   2                   \b, stereo

Video Formats

# AVI files
0    string    RIFF                Microsoft RIFF
>8   string    AVI\040             \b, AVI video
>>12 string    LIST
>>>20 string   hdrlavih

# MP4/QuickTime
4    string    ftyp                ISO Media
>8   string    isom                \b, MP4 Base Media v1
>8   string    mp41                \b, MP4 v1
>8   string    mp42                \b, MP4 v2
>8   string    qt                  \b, QuickTime movie

Best Practices Examples

Efficient Rule Ordering

# Order by probability - most common formats first
0    string    \x7fELF             ELF
0    string    MZ                  MS-DOS executable
0    string    \x89PNG             PNG image data
0    string    \xff\xd8\xff        JPEG image data
0    string    PK\x03\x04          ZIP archive data
0    string    %PDF-               PDF document

# Less common formats later
0    string    \x00\x00\x01\x00    Windows icon
0    string    \x00\x00\x02\x00    Windows cursor

Error-Resistant Patterns

# Validate magic numbers with additional checks
0    string    \x7fELF             ELF
>4   byte      1                   32-bit
>4   byte      2                   64-bit
>4   byte      >2                  invalid class
>5   byte      1                   little-endian
>5   byte      2                   big-endian
>5   byte      >2                  invalid encoding

Performance Optimizations

# Use specific offsets instead of searches when possible
0    string    \x7fELF             ELF
>16  leshort   2                   executable
>18  leshort   62                  x86-64

# Prefer shorter patterns for initial matching
0    beshort   0xffd8              JPEG image data
>2   beshort   0xffe0              \b, JFIF standard
>2   beshort   0xffe1              \b, EXIF standard

Testing and Validation

Test File Creation

# Create test files for magic rules
echo -e '\x7fELF\x02\x01\x01\x00' > test_elf64.bin
echo -e 'PK\x03\x04\x14\x00' > test_zip.bin
echo '%PDF-1.4' > test_pdf.txt

Rule Validation

# Include validation comments
# Test: echo -e '\x7fELF\x02\x01\x01\x00' | rmagic -
# Expected: ELF 64-bit LSB executable
0    string    \x7fELF             ELF
>4   byte      2                   64-bit
>5   byte      1                   LSB
>6   byte      1                   current version

This comprehensive collection of magic file examples demonstrates the flexibility and power of the magic file format for accurate file type detection.

Appendix D: Compatibility Matrix

This appendix provides detailed compatibility information between libmagic-rs and other file identification tools, magic file formats, and system environments.

GNU file Compatibility

Command-Line Interface

GNU file Optionrmagic EquivalentStatusNotes
file <file>rmagic <file>βœ… CompleteBasic file identification
file -i <file>rmagic --mime-type <file>πŸ“‹ PlannedMIME type output
file -b <file>rmagic --brief <file>πŸ“‹ PlannedBrief output (no filename)
file -m <magic>rmagic --magic-file <magic>βœ… CompleteCustom magic file
file -z <file>rmagic --compress <file>πŸ“‹ PlannedLook inside compressed files
file -L <file>rmagic --follow-symlinks <file>πŸ“‹ PlannedFollow symbolic links
file -h <file>rmagic --no-follow-symlinks <file>πŸ“‹ PlannedDon’t follow symlinks
file -f <list>rmagic --files-from <list>πŸ“‹ PlannedRead filenames from file
file -F <sep>rmagic --separator <sep>πŸ“‹ PlannedCustom field separator
file -0rmagic --print0πŸ“‹ PlannedNUL-separated output
file --jsonrmagic --jsonβœ… CompleteJSON output format

Output Format Compatibility

Text Output

# GNU file
$ file example.elf
example.elf: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked

# rmagic (current)
$ rmagic example.elf
example.elf: ELF 64-bit LSB executable

# rmagic (planned)
$ rmagic example.elf
example.elf: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked

MIME Type Output

# GNU file
$ file -i example.pdf
example.pdf: application/pdf; charset=binary

# rmagic (planned)
$ rmagic --mime-type example.pdf
example.pdf: application/pdf; charset=binary

JSON Output

# GNU file (recent versions)
$ file --json example.elf
[{"filename":"example.elf","mime-type":"application/x-pie-executable","mime-encoding":"binary","description":"ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked"}]

# rmagic (current)
$ rmagic --json example.elf
{
  "filename": "example.elf",
  "description": "ELF 64-bit LSB executable",
  "mime_type": "application/x-executable",
  "confidence": 1.0
}

Magic File Format Compatibility

FeatureGNU filermagicStatusNotes
Basic patternsβœ…βœ…CompleteString, numeric matching
Hierarchical rulesβœ…πŸ”„In ProgressParent-child relationships
Indirect offsetsβœ…πŸ“‹PlannedPointer dereferencing
Relative offsetsβœ…πŸ“‹PlannedPosition-relative addressing
Search patternsβœ…πŸ“‹PlannedPattern searching in ranges
Bitwise operationsβœ…βœ…CompleteAND, OR operations
String operationsβœ…πŸ“‹PlannedCase-insensitive, regex
Date/time formatsβœ…πŸ“‹PlannedUnix timestamps, etc.
Floating pointβœ…πŸ“‹PlannedFloat, double types
Unicode supportβœ…πŸ“‹PlannedUTF-8, UTF-16 strings

libmagic C Library Compatibility

API Compatibility

libmagic Functionrmagic EquivalentStatusNotes
magic_open()MagicDatabase::new()βœ…Database initialization
magic_load()MagicDatabase::load_from_file()πŸ”„Magic file loading
magic_file()MagicDatabase::evaluate_file()πŸ”„File evaluation
magic_buffer()MagicDatabase::evaluate_buffer()πŸ“‹Buffer evaluation
magic_setflags()EvaluationConfigβœ…Configuration options
magic_close()Drop traitβœ…Automatic cleanup
magic_error()Result<T, LibmagicError>βœ…Error handling

Flag Compatibility

libmagic Flagrmagic EquivalentStatusNotes
MAGIC_NONEDefault behaviorβœ…Standard file identification
MAGIC_DEBUGDebug loggingπŸ“‹Planned
MAGIC_SYMLINKfollow_symlinks: trueπŸ“‹Planned
MAGIC_COMPRESSdecompress: trueπŸ“‹Planned
MAGIC_DEVICEScheck_devices: trueπŸ“‹Planned
MAGIC_MIME_TYPEoutput_format: MimeTypeπŸ“‹Planned
MAGIC_CONTINUEstop_at_first_match: falseβœ…Multiple matches
MAGIC_CHECKValidation modeπŸ“‹Planned
MAGIC_PRESERVE_ATIMEpreserve_atime: trueπŸ“‹Planned
MAGIC_RAWraw_output: trueπŸ“‹Planned

Platform Compatibility

Operating Systems

PlatformStatusNotes
Linuxβœ… CompletePrimary development platform
macOSβœ… CompleteFull support with native builds
Windowsβœ… CompleteMSVC and GNU toolchain support
FreeBSDβœ… CompleteBSD compatibility
OpenBSDβœ… CompleteBSD compatibility
NetBSDβœ… CompleteBSD compatibility
SolarisπŸ“‹ PlannedShould work with Rust support
AIXπŸ“‹ PlannedDepends on Rust availability

Architectures

ArchitectureStatusNotes
x86_64βœ… CompletePrimary target architecture
i686βœ… Complete32-bit x86 support
aarch64βœ… CompleteARM 64-bit (Apple Silicon, etc.)
armv7βœ… CompleteARM 32-bit
riscv64βœ… CompleteRISC-V 64-bit
powerpc64βœ… CompletePowerPC 64-bit
s390xβœ… CompleteIBM System z
mips64πŸ“‹ PlannedMIPS 64-bit
sparc64πŸ“‹ PlannedSPARC 64-bit

Rust Version Compatibility

Rust VersionStatusNotes
1.89+βœ… RequiredMinimum supported version
1.88❌ Not supportedMissing required features
1.87❌ Not supportedMissing required features
Stableβœ… SupportedAlways targets stable Rust
Betaβœ… SupportedShould work with beta releases
Nightly⚠️ Best effortMay work but not guaranteed

File Format Support

Executable Formats

FormatGNU filermagicStatusNotes
ELFβœ…βœ…CompleteLinux/Unix executables
PE/COFFβœ…πŸ“‹PlannedWindows executables
Mach-Oβœ…πŸ“‹PlannedmacOS executables
a.outβœ…πŸ“‹PlannedLegacy Unix format
Java Classβœ…πŸ“‹PlannedJVM bytecode
WebAssemblyβœ…πŸ“‹PlannedWASM modules

Archive Formats

FormatGNU filermagicStatusNotes
ZIPβœ…πŸ“‹PlannedZIP archives
TARβœ…πŸ“‹PlannedTape archives
RARβœ…πŸ“‹PlannedRAR archives
7-Zipβœ…πŸ“‹Planned7z archives
arβœ…πŸ“‹PlannedUnix archives
CPIOβœ…πŸ“‹PlannedCPIO archives

Image Formats

FormatGNU filermagicStatusNotes
JPEGβœ…πŸ“‹PlannedJPEG images
PNGβœ…πŸ“‹PlannedPNG images
GIFβœ…πŸ“‹PlannedGIF images
BMPβœ…πŸ“‹PlannedWindows bitmaps
TIFFβœ…πŸ“‹PlannedTIFF images
WebPβœ…πŸ“‹PlannedWebP images
SVGβœ…πŸ“‹PlannedSVG vector graphics

Document Formats

FormatGNU filermagicStatusNotes
PDFβœ…πŸ“‹PlannedPDF documents
PostScriptβœ…πŸ“‹PlannedPS/EPS files
RTFβœ…πŸ“‹PlannedRich Text Format
MS Officeβœ…πŸ“‹PlannedDOC, XLS, PPT
OpenDocumentβœ…πŸ“‹PlannedODF formats
HTMLβœ…πŸ“‹PlannedHTML documents
XMLβœ…πŸ“‹PlannedXML documents

Performance Comparison

Benchmark Results (Preliminary)

Test CaseGNU filermagicRatioNotes
Single ELF file2.1ms1.8ms1.17x fasterMemory-mapped I/O advantage
1000 small files180ms165ms1.09x fasterReduced startup overhead
Large file (1GB)45ms42ms1.07x fasterEfficient memory mapping
Magic file loading12ms8ms1.5x fasterOptimized parsing

Note: Benchmarks are preliminary and may vary by system and file types.

Memory Usage

ScenarioGNU filermagicNotes
Base memory~2MB~1.5MBSmaller runtime footprint
Magic database~8MB~6MBMore efficient storage
Large file processing~16MB~2MBMemory-mapped I/O

Migration Guide

From GNU file

Command Line Migration

# Old GNU file commands
file document.pdf
file -i document.pdf
file -b document.pdf
file -m custom.magic document.pdf

# New rmagic commands
rmagic document.pdf
rmagic --mime-type document.pdf     # Planned
rmagic --brief document.pdf         # Planned
rmagic --magic-file custom.magic document.pdf

Script Migration

#!/bin/bash
# Old script using GNU file
for f in *.bin; do
    type=$(file -b "$f")
    echo "File $f is: $type"
done

# New script using rmagic
for f in *.bin; do
    type=$(rmagic --brief "$f")  # Planned
    echo "File $f is: $type"
done

From libmagic C Library

C Code Migration

// Old libmagic C code
#include <magic.h>

magic_t magic = magic_open(MAGIC_MIME_TYPE);
magic_load(magic, NULL);
const char* result = magic_file(magic, "file.bin");
printf("MIME type: %s\n", result);
magic_close(magic);
#![allow(unused)]
fn main() {
// New Rust code
use libmagic_rs::{MagicDatabase, EvaluationConfig};

let mut config = EvaluationConfig::default();
config.output_format = OutputFormat::MimeType;  // Planned

let db = MagicDatabase::load_default()?;
let result = db.evaluate_file("file.bin")?;
println!("MIME type: {}", result.mime_type.unwrap_or_default());
}

Known Limitations

Current Limitations

  1. Incomplete Magic File Support: Not all GNU file magic syntax is implemented
  2. Limited File Format Coverage: Focus on common formats initially
  3. No Compression Support: Cannot look inside compressed files yet
  4. Basic MIME Type Support: Limited MIME type database
  5. No Plugin System: Cannot extend with custom detectors

Planned Improvements

  1. Complete Magic File Compatibility: Full GNU file magic syntax support
  2. Comprehensive Format Support: Support for all major file formats
  3. Advanced Features: Compression, encryption detection
  4. Performance Optimization: Parallel processing, caching
  5. Extended APIs: More flexible configuration options

Testing Compatibility

Test Suite Coverage

Test CategoryGNU file Testsrmagic TestsCoverage
Basic formats500+7915%
Magic file parsing200+5025%
Error handling100+2929%
Performance50+00%
CompatibilityN/A00%

Compatibility Test Plan

  1. Format Detection Tests: Validate against GNU file results
  2. Magic File Tests: Test with real-world magic databases
  3. Performance Tests: Compare speed and memory usage
  4. API Tests: Validate library interface compatibility
  5. Cross-platform Tests: Ensure consistent behavior across platforms

This compatibility matrix will be updated as development progresses and more features are implemented.

Security Assurance Case

This document provides a structured argument that libmagic-rs meets its security requirements. It follows the assurance case model described in NIST IR 7608.

1. Security Requirements

libmagic-rs is a file type detection library and CLI tool. Its security requirements are:

  1. SR-1: Must not crash, panic, or exhibit undefined behavior when processing any input file
  2. SR-2: Must not crash, panic, or exhibit undefined behavior when parsing any magic file
  3. SR-3: Must not read beyond allocated buffer boundaries
  4. SR-4: Must not allow path traversal via CLI arguments
  5. SR-5: Must not execute arbitrary code based on file contents or magic rule definitions
  6. SR-6: Must not consume unbounded resources (memory, CPU) during evaluation
  7. SR-7: Must not leak sensitive information from one file evaluation to another

2. Threat Model

2.1 Assets

  • Host system: The machine running libmagic-rs
  • File contents: Data being inspected (may be sensitive)
  • Magic rules: Definitions that drive file type detection

2.2 Threat Actors

ActorMotivationCapability
Malicious file authorExploit the detection tool to gain code execution or cause DoSCan craft arbitrary file contents
Malicious magic file authorInject rules that cause crashes, resource exhaustion, or incorrect resultsCan craft arbitrary magic rule syntax
Supply chain attackerCompromise a dependency to inject malicious codeCan publish malicious crate versions

2.3 Attack Vectors

IDVectorTarget SR
AV-1Crafted file triggers buffer over-readSR-1, SR-3
AV-2Crafted file triggers integer overflow in offset calculationSR-1, SR-3
AV-3Deeply nested magic rules cause stack overflowSR-1, SR-6
AV-4Extremely large file causes memory exhaustionSR-6
AV-5Malformed magic file causes parser crashSR-2
AV-6CLI argument with path traversal reads unintended filesSR-4
AV-7Compromised dependency introduces unsafe codeSR-5

3. Trust Boundaries

flowchart TD
    subgraph Untrusted["Untrusted Zone"]
        direction LR
        IF["Input Files<br/>(any content)"]
        MF["Magic Files<br/>(user or system)"]
        CA["CLI Arguments<br/>(user paths)"]
    end

    subgraph libmagic-rs["libmagic-rs (Trusted Zone)"]
        IO["I/O Layer<br/>mmap files, size limits"]
        CLI["CLI<br/>clap args, validates paths"]
        P["Parser<br/>validates magic syntax"]
        E["Evaluator<br/>bounds-checks all access"]
        O["Output<br/>formats results"]
    end

    IF -- "file bytes" --> IO
    MF -- "magic syntax" --> P
    CA -- "user paths" --> CLI
    IO -- "mapped buffer" --> E
    CLI -- "validated paths" --> IO
    P -- "validated AST" --> E
    E -- "match results" --> O

    style Untrusted fill:#4a1a1a,stroke:#ef5350,color:#e0e0e0,stroke-width:2px
    style libmagic-rs fill:#1b3d1b,stroke:#66bb6a,color:#e0e0e0,stroke-width:2px

All data crossing the trust boundary (file contents, magic file syntax, CLI arguments) is treated as untrusted and validated before use.

4. Secure Design Principles (Saltzer and Schroeder)

PrincipleHow Applied
Economy of mechanismPure Rust with minimal dependencies. Simple parser-evaluator pipeline. No plugin system, no scripting, no network I/O.
Fail-safe defaultsWorkspace lint unsafe_code = "forbid" enforced project-wide via Cargo.toml. Buffer access defaults to bounds-checked .get() returning None rather than panicking. Invalid magic rules are skipped, not executed.
Complete mediationEvery buffer access is bounds-checked. Every magic file is validated during parsing. Every CLI argument is validated by clap.
Open designFully open source (Apache-2.0). Security does not depend on obscurity. All security mechanisms are publicly documented.
Separation of privilegeParser and evaluator are separate modules with distinct responsibilities. Parse errors cannot bypass evaluation safety checks.
Least privilegeThe tool only reads files; it never writes, executes, or modifies them. No network access. No elevated permissions required.
Least common mechanismNo shared mutable state between file evaluations. Each evaluation operates on its own data. No global caches that could leak information.
Psychological acceptabilityCLI follows GNU file conventions. Error messages are descriptive and actionable. Default behavior is safe (built-in rules, no network).

5. Common Weakness Countermeasures

5.1 CWE/SANS Top 25

CWEWeaknessCountermeasureStatus
CWE-787Out-of-bounds writeRust ownership prevents writes to unowned memory. Workspace-level lints in Cargo.toml forbid unsafe code and eliminate raw pointer writes.Mitigated
CWE-79XSSNot applicable (no web output).N/A
CWE-89SQL injectionNot applicable (no database).N/A
CWE-416Use after freeRust ownership/borrowing system prevents use-after-free at compile time.Mitigated
CWE-78OS command injectionNo shell invocation or command execution. CLI arguments parsed by clap, not passed to shell.Mitigated
CWE-20Improper input validationAll inputs validated: magic syntax validated by parser, file buffers bounds-checked, CLI args validated by clap.Mitigated
CWE-125Out-of-bounds readAll buffer access uses .get() with bounds checking. Memory-mapped files have known size limits.Mitigated
CWE-22Path traversalCLI accepts file paths as arguments but only performs read-only access. No path construction from file contents.Mitigated
CWE-352CSRFNot applicable (no web interface).N/A
CWE-434Unrestricted uploadNot applicable (no file upload).N/A
CWE-476NULL pointer dereferenceRust’s Option type eliminates null pointer dereferences at compile time.Mitigated
CWE-190Integer overflowRust panics on integer overflow in debug builds. Offset calculations use checked arithmetic.Mitigated
CWE-502Deserialization of untrusted dataMagic files are parsed with a strict grammar, not deserialized from arbitrary formats.Mitigated
CWE-400Resource exhaustionEvaluation timeouts prevent unbounded CPU use. Memory-mapped I/O avoids loading entire files into memory.Mitigated

5.2 OWASP Top 10 (where applicable)

Most OWASP Top 10 categories target web applications and are not applicable to a file detection library. The applicable items are:

CategoryApplicabilityCountermeasure
A03: InjectionPartial – magic file parsingStrict grammar-based parser rejects invalid syntax
A04: Insecure DesignApplicableSecure design principles applied throughout (see Section 4)
A06: Vulnerable ComponentsApplicablecargo audit daily, cargo deny, Dependabot, cargo-auditable
A09: Security LoggingPartialEvaluation errors logged; security events reported via GitHub Advisories

6. Supply Chain Security

MeasureImplementation
Dependency auditingcargo audit and cargo deny run daily in CI
Dependency updatesDependabot configured for automated PRs
Pinned toolchainRust stable via rust-toolchain.toml
Reproducible buildsCargo.lock and mise.lock committed
Build provenanceSigstore attestations via actions/attest-build-provenance (wrapper around actions/attest)
SBOM generationcargo-cyclonedx produces CycloneDX SBOM per release
Binary auditingcargo-auditable embeds dependency metadata in binaries
CI integrityAll GitHub Actions pinned to SHA hashes
Code reviewRequired on all PRs; automated by CodeRabbit with security-focused checks

7. Ongoing Assurance

This assurance case is maintained as a living document. It is updated when:

  • New features introduce new attack surfaces
  • New threat vectors are identified
  • Dependencies change significantly
  • Security incidents occur

The project maintains continuous assurance through automated CI checks (clippy, CodeQL, cargo audit, cargo deny) that run on every commit.