Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Ranking Algorithm

Stringy’s ranking system prioritizes strings by relevance, helping analysts focus on the most important findings first. The algorithm combines multiple factors to produce a comprehensive relevance score.

Scoring Formula

Final Score = SectionWeight + SemanticBoost - NoisePenalty

Each component contributes to the overall relevance assessment. The resulting internal score is then mapped to a display score (0-100) via band mapping.

Note: Section weights use a 1.0-10.0 scale, and semantic boosts add to the internal score. The pipeline’s normalizer then maps the combined internal score to a 0-100 display score using the band table shown in Display Score Mapping below.

Section Weight

Different sections have varying likelihood of containing meaningful strings. Container parsers assign weights (1.0-10.0) to each section based on its type and name.

Weight Ranges

Section TypeTypical WeightExamples
Dedicated string storage8.0-10.0.rodata, __TEXT,__cstring, .rsrc
Read-only data7.0.data.rel.ro, __DATA_CONST
General data5.0.data
Code sections1.0.text

Format-specific adjustments are applied based on section names. For example, ELF .rodata.str1.1 (aligned strings) and PE .rsrc (rich resources) receive additional priority.

Semantic Boost

Strings with recognized semantic meaning receive score boosts based on their tags.

Boost Categories

Tag CategoryBoost LevelExamples
Network (URL, Domain, IP)Highhttps://api.evil.com
Identifiers (GUID, Email)High{12345678-1234-...}
File System (Path, Registry)Medium-HighC:\Windows\System32\evil.dll
User-Agent-like stringsMedium-HighMozilla/5.0 ...
Version/ManifestMediumMyApp v1.2.3
Code Artifacts (Format strings, Base64)MediumError: %s at line %d
Symbols (Import, Export)Low-MediumCreateFileW, main

Strings with multiple semantic tags receive additional (diminishing) bonuses for each extra tag.

Noise Penalty

Various factors indicate low-quality or noisy strings, and receive penalties:

Penalty Categories

  • High Entropy: Strings with high Shannon entropy (randomness) are likely binary data or encoded content and receive significant penalties.

  • Excessive Length: Very long strings are often noise (padding, embedded data). Longer strings receive progressively larger penalties.

  • Repeated Patterns: Strings with excessive character repetition (e.g., AAAAAAA...) are penalized based on the repetition ratio.

  • Common Noise Patterns: Known noise patterns receive penalties, including padding characters, hex dump patterns, and table-like data with excessive delimiters.

Display Score Mapping

The internal score is mapped to a display score (0-100) using bands:

Internal ScoreDisplay ScoreMeaning
<= 00Low relevance
1-791-49Low relevance
80-11950-69Moderate
120-15970-89Meaningful
160-22090-100High-value
> 220100 (clamped)High-value

Filtering Recommendations

  • Interactive analysis: Show display scores >= 50
  • Automated processing: Use display scores >= 70
  • YARA rules: Focus on display scores >= 80
  • High-confidence indicators: Display scores >= 90