AST Data Structures
The Abstract Syntax Tree (AST) is the core representation of magic rules in libmagic-rs. This chapter provides detailed documentation of the fully implemented AST data structures with comprehensive test coverage (29 unit tests) and their usage patterns.
Overview
The AST consists of several key types that work together to represent magic rules:
MagicRule: The main rule structure containing all componentsOffsetSpec: Specifies where to read data in filesTypeKind: Defines how to interpret bytesOperator: Comparison and bitwise operationsValue: Expected values for matchingEndianness: Byte order specifications
MagicRule Structure
The MagicRule struct is the primary AST node representing a complete magic rule:
#![allow(unused)]
fn main() {
pub struct MagicRule {
pub offset: OffsetSpec, // Where to read data
pub typ: TypeKind, // How to interpret bytes
pub op: Operator, // Comparison operation
pub value: Value, // Expected value
pub message: String, // Human-readable description
pub children: Vec<MagicRule>, // Nested rules
pub level: u32, // Indentation level
}
}
Example Usage
#![allow(unused)]
fn main() {
use libmagic_rs::parser::ast::*;
// ELF magic number rule
let elf_rule = MagicRule {
offset: OffsetSpec::Absolute(0),
typ: TypeKind::Long {
endian: Endianness::Little,
signed: false
},
op: Operator::Equal,
value: Value::Bytes(vec![0x7f, 0x45, 0x4c, 0x46]), // "\x7fELF"
message: "ELF executable".to_string(),
children: vec![],
level: 0,
};
}
Hierarchical Rules
Magic rules can contain child rules that are evaluated when the parent matches:
#![allow(unused)]
fn main() {
let parent_rule = MagicRule {
offset: OffsetSpec::Absolute(0),
typ: TypeKind::Byte,
op: Operator::Equal,
value: Value::Uint(0x7f),
message: "ELF".to_string(),
children: vec![
MagicRule {
offset: OffsetSpec::Absolute(4),
typ: TypeKind::Byte,
op: Operator::Equal,
value: Value::Uint(1),
message: "32-bit".to_string(),
children: vec![],
level: 1,
},
MagicRule {
offset: OffsetSpec::Absolute(4),
typ: TypeKind::Byte,
op: Operator::Equal,
value: Value::Uint(2),
message: "64-bit".to_string(),
children: vec![],
level: 1,
},
],
level: 0,
};
}
OffsetSpec Variants
The OffsetSpec enum defines where to read data within a file:
Absolute Offsets
#![allow(unused)]
fn main() {
pub enum OffsetSpec {
/// Absolute offset from file start
Absolute(i64),
// ... other variants
}
}
Examples:
#![allow(unused)]
fn main() {
// Read at byte 0 (file start)
let start = OffsetSpec::Absolute(0);
// Read at byte 16
let offset_16 = OffsetSpec::Absolute(16);
// Read 4 bytes before current position (negative offset)
let relative_back = OffsetSpec::Absolute(-4);
}
Indirect Offsets
Indirect offsets read a pointer value and use it as the actual offset:
#![allow(unused)]
fn main() {
Indirect {
base_offset: i64, // Where to read the pointer
pointer_type: TypeKind, // How to interpret the pointer
adjustment: i64, // Value to add to pointer
endian: Endianness, // Byte order for pointer
}
}
Example:
#![allow(unused)]
fn main() {
// Read a 32-bit little-endian pointer at offset 0x20,
// then read data at (pointer_value + 4)
let indirect = OffsetSpec::Indirect {
base_offset: 0x20,
pointer_type: TypeKind::Long {
endian: Endianness::Little,
signed: false
},
adjustment: 4,
endian: Endianness::Little,
};
}
Relative and FromEnd Offsets
#![allow(unused)]
fn main() {
// Relative to previous match position
Relative(i64),
// Relative to end of file
FromEnd(i64),
}
Examples:
#![allow(unused)]
fn main() {
// 8 bytes after previous match
let relative = OffsetSpec::Relative(8);
// 16 bytes before end of file
let from_end = OffsetSpec::FromEnd(-16);
}
TypeKind Variants
The TypeKind enum specifies how to interpret bytes at the given offset:
Numeric Types
#![allow(unused)]
fn main() {
pub enum TypeKind {
/// Single byte (8-bit)
Byte,
/// 16-bit integer
Short { endian: Endianness, signed: bool },
/// 32-bit integer
Long { endian: Endianness, signed: bool },
/// String data
String { max_length: Option<usize> },
}
}
Examples:
#![allow(unused)]
fn main() {
// Single byte
let byte_type = TypeKind::Byte;
// 16-bit little-endian unsigned integer
let short_le = TypeKind::Short {
endian: Endianness::Little,
signed: false
};
// 32-bit big-endian signed integer
let long_be = TypeKind::Long {
endian: Endianness::Big,
signed: true
};
// Null-terminated string, max 256 bytes
let string_type = TypeKind::String {
max_length: Some(256)
};
}
Endianness Options
#![allow(unused)]
fn main() {
pub enum Endianness {
Little, // Little-endian (x86, ARM in little mode)
Big, // Big-endian (network byte order, PowerPC)
Native, // Host system byte order
}
}
Operator Types
The Operator enum defines comparison operations:
#![allow(unused)]
fn main() {
pub enum Operator {
Equal, // ==
NotEqual, // !=
BitwiseAnd, // & (bitwise AND for pattern matching)
}
}
Usage Examples:
#![allow(unused)]
fn main() {
// Exact match
let equal_op = Operator::Equal;
// Not equal
let not_equal_op = Operator::NotEqual;
// Bitwise AND (useful for flag checking)
let bitwise_op = Operator::BitwiseAnd;
}
Value Types
The Value enum represents expected values for comparison:
#![allow(unused)]
fn main() {
pub enum Value {
Uint(u64), // Unsigned integer
Int(i64), // Signed integer
Bytes(Vec<u8>), // Byte sequence
String(String), // String value
}
}
Examples:
#![allow(unused)]
fn main() {
// Unsigned integer value
let uint_val = Value::Uint(0x464c457f);
// Signed integer value
let int_val = Value::Int(-1);
// Byte sequence (magic numbers)
let bytes_val = Value::Bytes(vec![0x50, 0x4b, 0x03, 0x04]); // ZIP signature
// String value
let string_val = Value::String("#!/bin/sh".to_string());
}
Serialization Support
All AST types implement Serialize and Deserialize for caching and interchange with comprehensive test coverage:
#![allow(unused)]
fn main() {
use serde_json;
// Serialize a rule to JSON (fully tested)
let rule = MagicRule { /* ... */ };
let json = serde_json::to_string(&rule)?;
// Deserialize from JSON (fully tested)
let rule: MagicRule = serde_json::from_str(&json)?;
// All edge cases are tested including:
// - Empty collections (Vec::new(), String::new())
// - Extreme values (u64::MAX, i64::MIN, i64::MAX)
// - Complex nested structures with multiple levels
// - All enum variants and their serialization round-trips
}
Implementation Status:
- ✅ Complete serialization for all AST types
- ✅ Comprehensive testing with edge cases and boundary values
- ✅ JSON compatibility for rule caching and interchange
- ✅ Round-trip validation ensuring data integrity
Common Patterns
ELF File Detection
#![allow(unused)]
fn main() {
let elf_rules = vec![
MagicRule {
offset: OffsetSpec::Absolute(0),
typ: TypeKind::Long { endian: Endianness::Little, signed: false },
op: Operator::Equal,
value: Value::Bytes(vec![0x7f, 0x45, 0x4c, 0x46]),
message: "ELF".to_string(),
children: vec![
MagicRule {
offset: OffsetSpec::Absolute(4),
typ: TypeKind::Byte,
op: Operator::Equal,
value: Value::Uint(1),
message: "32-bit".to_string(),
children: vec![],
level: 1,
},
MagicRule {
offset: OffsetSpec::Absolute(4),
typ: TypeKind::Byte,
op: Operator::Equal,
value: Value::Uint(2),
message: "64-bit".to_string(),
children: vec![],
level: 1,
},
],
level: 0,
}
];
}
ZIP Archive Detection
#![allow(unused)]
fn main() {
let zip_rule = MagicRule {
offset: OffsetSpec::Absolute(0),
typ: TypeKind::Long { endian: Endianness::Little, signed: false },
op: Operator::Equal,
value: Value::Bytes(vec![0x50, 0x4b, 0x03, 0x04]),
message: "ZIP archive".to_string(),
children: vec![],
level: 0,
};
}
Script Detection with String Matching
#![allow(unused)]
fn main() {
let script_rule = MagicRule {
offset: OffsetSpec::Absolute(0),
typ: TypeKind::String { max_length: Some(32) },
op: Operator::Equal,
value: Value::String("#!/bin/bash".to_string()),
message: "Bash script".to_string(),
children: vec![],
level: 0,
};
}
Best Practices
Rule Organization
- Start with broad patterns and use child rules for specifics
- Order rules by probability of matching (most common first)
- Use appropriate types for the data being checked
- Minimize indirection for performance
Type Selection
- Use
Bytefor single-byte values and flags - Use
Short/Longwith explicit endianness for multi-byte integers - Use
Stringwith length limits for text patterns - Use
Bytesfor exact binary sequences
Performance Considerations
- Prefer absolute offsets over indirect when possible
- Use bitwise AND for flag checking instead of multiple equality rules
- Limit string lengths to prevent excessive reading
- Structure hierarchies to fail fast on non-matches
The AST provides a flexible, type-safe foundation for representing magic rules while maintaining compatibility with existing magic file formats.