Ranking Algorithm
Stringy’s ranking system prioritizes strings by relevance, helping analysts focus on the most important findings first. The algorithm combines multiple factors to produce a comprehensive relevance score.
Scoring Formula
Final Score = SectionWeight + SemanticBoost - NoisePenalty
Each component contributes to the overall relevance assessment. The resulting internal score is then mapped to a display score (0-100) via band mapping.
Note: Section weights use a 1.0-10.0 scale, and semantic boosts add to the internal score. The pipeline’s normalizer then maps the combined internal score to a 0-100 display score using the band table shown in Display Score Mapping below.
Section Weight
Different sections have varying likelihood of containing meaningful strings. Container parsers assign weights (1.0-10.0) to each section based on its type and name.
Weight Ranges
| Section Type | Typical Weight | Examples |
|---|---|---|
| Dedicated string storage | 8.0-10.0 | .rodata, __TEXT,__cstring, .rsrc |
| Read-only data | 7.0 | .data.rel.ro, __DATA_CONST |
| General data | 5.0 | .data |
| Code sections | 1.0 | .text |
Format-specific adjustments are applied based on section names. For example, ELF .rodata.str1.1 (aligned strings) and PE .rsrc (rich resources) receive additional priority.
Semantic Boost
Strings with recognized semantic meaning receive score boosts based on their tags.
Boost Categories
| Tag Category | Boost Level | Examples |
|---|---|---|
| Network (URL, Domain, IP) | High | https://api.evil.com |
| Identifiers (GUID, Email) | High | {12345678-1234-...} |
| File System (Path, Registry) | Medium-High | C:\Windows\System32\evil.dll |
| User-Agent-like strings | Medium-High | Mozilla/5.0 ... |
| Version/Manifest | Medium | MyApp v1.2.3 |
| Code Artifacts (Format strings, Base64) | Medium | Error: %s at line %d |
| Symbols (Import, Export) | Low-Medium | CreateFileW, main |
Strings with multiple semantic tags receive additional (diminishing) bonuses for each extra tag.
Noise Penalty
Various factors indicate low-quality or noisy strings, and receive penalties:
Penalty Categories
-
High Entropy: Strings with high Shannon entropy (randomness) are likely binary data or encoded content and receive significant penalties.
-
Excessive Length: Very long strings are often noise (padding, embedded data). Longer strings receive progressively larger penalties.
-
Repeated Patterns: Strings with excessive character repetition (e.g.,
AAAAAAA...) are penalized based on the repetition ratio. -
Common Noise Patterns: Known noise patterns receive penalties, including padding characters, hex dump patterns, and table-like data with excessive delimiters.
Display Score Mapping
The internal score is mapped to a display score (0-100) using bands:
| Internal Score | Display Score | Meaning |
|---|---|---|
| <= 0 | 0 | Low relevance |
| 1-79 | 1-49 | Low relevance |
| 80-119 | 50-69 | Moderate |
| 120-159 | 70-89 | Meaningful |
| 160-220 | 90-100 | High-value |
| > 220 | 100 (clamped) | High-value |
Filtering Recommendations
- Interactive analysis: Show display scores >= 50
- Automated processing: Use display scores >= 70
- YARA rules: Focus on display scores >= 80
- High-confidence indicators: Display scores >= 90