Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Appendix C: Magic File Examples

This appendix provides comprehensive examples of magic file syntax and patterns, demonstrating how to create effective file type detection rules.

Basic Magic File Syntax

Simple Pattern Matching

# ELF executable files
0    string    \x7fELF    ELF

# PDF documents
0    string    %PDF-      PDF document

# PNG images
0    string    \x89PNG    PNG image data

# ZIP archives
0    string    PK\x03\x04    ZIP archive data

Numeric Value Matching

# JPEG images (using hex values)
0    beshort    0xffd8    JPEG image data

# Windows PE executables
0    string    MZ        MS-DOS executable
>60  lelong    >0
>>60 string    PE\0\0    PE32 executable

# ELF with specific architecture
0    string    \x7fELF    ELF
>16  leshort   2         executable
>18  leshort   62        x86-64

Hierarchical Rules

Parent-Child Relationships

# ELF files with detailed classification
0    string    \x7fELF    ELF
>4   byte      1         32-bit
>>16 leshort   2         executable
>>16 leshort   3         shared object
>>16 leshort   1         relocatable
>4   byte      2         64-bit
>>16 leshort   2         executable
>>16 leshort   3         shared object
>>16 leshort   1         relocatable

Multiple Levels of Nesting

# Detailed PE analysis
0    string    MZ        MS-DOS executable
>60  lelong    >0
>>60 string    PE\0\0    PE32
>>>88 leshort  0x010b    PE32 executable
>>>>92 leshort 1         (native)
>>>>92 leshort 2         (GUI)
>>>>92 leshort 3         (console)
>>>88 leshort  0x020b    PE32+ executable
>>>>92 leshort 1         (native)
>>>>92 leshort 2         (GUI)
>>>>92 leshort 3         (console)

Data Types and Endianness

Integer Types

# Little-endian integers
0    leshort   0x5a4d    MS-DOS executable (little-endian short)
0    lelong    0x464c457f    ELF (little-endian long)

# Big-endian integers
0    beshort   0x4d5a    MS-DOS executable (big-endian short)
0    belong    0x7f454c46    ELF (big-endian long)

# Native endian (system default)
0    short     0x5a4d    MS-DOS executable (native endian)
0    long      0x464c457f    ELF (native endian)

String Matching

# Fixed-length strings
0    string    #!/bin/sh    shell script
0    string    #!/usr/bin/python    Python script

# Variable-length strings with limits
0    string/32    #!/    script text executable
16   string/256   This program    self-describing executable

# Case-insensitive matching (planned)
0    istring   html    HTML document
0    istring   <html   HTML document

Advanced Offset Specifications

Indirect Offsets

# PE section table access
0    string    MZ        MS-DOS executable
>60  lelong    >0
>>60 string    PE\0\0    PE32
>>>(60.l+24)  leshort   >0    sections
>>>>(60.l+24) leshort   x     \b, %d sections

Relative Offsets

# ZIP file entries
0    string    PK\x03\x04    ZIP archive data
>26  leshort   x         \b, compressed size %d
>28  leshort   x         \b, uncompressed size %d
>30  leshort   >0
>>(30.s+46)   string    x    \b, first entry: "%.64s"

Search Patterns

# Search for patterns within a range
0      string    \x7fELF    ELF
>0     search/1024    .note.gnu.build-id    \b, with build-id
>0     search/1024    .debug_info    \b, with debug info

Bitwise Operations

Flag Testing

# ELF program header flags
0    string    \x7fELF    ELF
>16  leshort   2         executable
>36  lelong    &0x1      \b, executable
>36  lelong    &0x2      \b, writable
>36  lelong    &0x4      \b, readable

Mask Operations

# File permissions in Unix archives
0    string    070707    cpio archive
>6   long      &0170000
>>6  long      0100000   \b, regular file
>>6  long      0040000   \b, directory
>>6  long      0120000   \b, symbolic link
>>6  long      0060000   \b, block device
>>6  long      0020000   \b, character device

Complex File Format Examples

JPEG Image Analysis

# JPEG with EXIF data
0    beshort   0xffd8    JPEG image data
>2   beshort   0xffe1    \b, EXIF standard
>>10 string    Exif\0\0
>>>14 beshort  0x4d4d    \b, big-endian
>>>14 beshort  0x4949    \b, little-endian
>2   beshort   0xffe0    \b, JFIF standard
>>10 string    JFIF
>>>14 byte     x         \b, version %d
>>>15 byte     x         \b.%d

Archive Format Detection

# TAR archives
257  string    ustar\0   POSIX tar archive
257  string    ustar\040\040\0    GNU tar archive

# RAR archives
0    string    Rar!      RAR archive data
>4   byte      0x1a      \b, version 1.x
>4   byte      0x07      \b, version 5.x

# 7-Zip archives
0    string    7z\xbc\xaf\x27\x1c    7-zip archive data
>6   byte      x         \b, version %d
>7   byte      x         \b.%d

Executable Format Analysis

# Mach-O executables (macOS)
0    belong    0xfeedface    Mach-O executable (32-bit)
>4   belong    7            i386
>4   belong    18           x86_64
>12  belong    2            executable
>12  belong    6            shared library
>12  belong    8            bundle

0    belong    0xfeedfacf    Mach-O executable (64-bit)
>4   belong    0x01000007   x86_64
>4   belong    0x0100000c   arm64
>12  belong    2            executable
>12  belong    6            shared library

Script and Text File Detection

Shebang Detection

# Shell scripts
0    string    #!/bin/sh         POSIX shell script
0    string    #!/bin/bash       Bash shell script
0    string    #!/bin/csh        C shell script
0    string    #!/bin/tcsh       TC shell script
0    string    #!/bin/zsh        Z shell script

# Interpreted languages
0    string    #!/usr/bin/python    Python script
0    string    #!/usr/bin/perl      Perl script
0    string    #!/usr/bin/ruby      Ruby script
0    string    #!/usr/bin/node      Node.js script
0    string    #!/usr/bin/php       PHP script

Text Format Detection

# Configuration files
0    string    [Desktop\ Entry]    Desktop configuration
0    string    # Configuration      configuration text
0    regex     ^[a-zA-Z_][a-zA-Z0-9_]*\s*=    configuration text

# Source code detection
0    regex     ^#include\s*<       C source code
0    regex     ^package\s+         Java source code
0    regex     ^class\s+\w+:       Python source code
0    regex     ^function\s+        JavaScript source code

Database and Structured Data

Database Files

# SQLite databases
0    string    SQLite\ format\ 3    SQLite 3.x database
>13  byte      x                   \b, version %d

# MySQL databases
0    string    \xfe\x01\x00\x00    MySQL table data
0    string    \x00\x00\x00\x00    MySQL ISAM compressed data

# PostgreSQL
0    belong    0x00061561          PostgreSQL custom database dump
>4   belong    x                   \b, version %d

Structured Text Formats

# JSON files
0    regex     ^\s*[\{\[]          JSON data
>0   search/64 "version"          \b, with version info
>0   search/64 "name"             \b, with name field

# XML files
0    string    <?xml               XML document
>5   search/256 version
>>5  regex     version="([^"]*)"   \b, version \1
>5   search/256 encoding
>>5  regex     encoding="([^"]*)"  \b, encoding \1

# YAML files
0    regex     ^---\s*$            YAML document
0    regex     ^[a-zA-Z_][^:]*:    YAML configuration

Multimedia File Examples

Audio Formats

# MP3 files
0    string    ID3                 MP3 audio file with ID3
>3   byte      <0xff               version 2
>>3  byte      x                   \b.%d
0    beshort   0xfffb              MP3 audio file
0    beshort   0xfff3              MP3 audio file
0    beshort   0xffe3              MP3 audio file

# WAV files
0    string    RIFF                Microsoft RIFF
>8   string    WAVE                \b, WAVE audio
>>20 leshort   1                   \b, PCM
>>20 leshort   85                  \b, MPEG Layer 3
>>22 leshort   1                   \b, mono
>>22 leshort   2                   \b, stereo

Video Formats

# AVI files
0    string    RIFF                Microsoft RIFF
>8   string    AVI\040             \b, AVI video
>>12 string    LIST
>>>20 string   hdrlavih

# MP4/QuickTime
4    string    ftyp                ISO Media
>8   string    isom                \b, MP4 Base Media v1
>8   string    mp41                \b, MP4 v1
>8   string    mp42                \b, MP4 v2
>8   string    qt                  \b, QuickTime movie

Best Practices Examples

Efficient Rule Ordering

# Order by probability - most common formats first
0    string    \x7fELF             ELF
0    string    MZ                  MS-DOS executable
0    string    \x89PNG             PNG image data
0    string    \xff\xd8\xff        JPEG image data
0    string    PK\x03\x04          ZIP archive data
0    string    %PDF-               PDF document

# Less common formats later
0    string    \x00\x00\x01\x00    Windows icon
0    string    \x00\x00\x02\x00    Windows cursor

Error-Resistant Patterns

# Validate magic numbers with additional checks
0    string    \x7fELF             ELF
>4   byte      1                   32-bit
>4   byte      2                   64-bit
>4   byte      >2                  invalid class
>5   byte      1                   little-endian
>5   byte      2                   big-endian
>5   byte      >2                  invalid encoding

Performance Optimizations

# Use specific offsets instead of searches when possible
0    string    \x7fELF             ELF
>16  leshort   2                   executable
>18  leshort   62                  x86-64

# Prefer shorter patterns for initial matching
0    beshort   0xffd8              JPEG image data
>2   beshort   0xffe0              \b, JFIF standard
>2   beshort   0xffe1              \b, EXIF standard

Testing and Validation

Test File Creation

# Create test files for magic rules
echo -e '\x7fELF\x02\x01\x01\x00' > test_elf64.bin
echo -e 'PK\x03\x04\x14\x00' > test_zip.bin
echo '%PDF-1.4' > test_pdf.txt

Rule Validation

# Include validation comments
# Test: echo -e '\x7fELF\x02\x01\x01\x00' | rmagic -
# Expected: ELF 64-bit LSB executable
0    string    \x7fELF             ELF
>4   byte      2                   64-bit
>5   byte      1                   LSB
>6   byte      1                   current version

This comprehensive collection of magic file examples demonstrates the flexibility and power of the magic file format for accurate file type detection.