Fuzzing Dictionaries

Dictionaries provide AFL++ with syntax tokens and keywords for your target format, dramatically improving fuzzing efficiency for structured inputs.

What Are Dictionaries?

Dictionaries are collections of interesting tokens, keywords, or byte sequences that are likely meaningful to your target. AFL++ uses these to:

Replace random bytes with known-good values
Insert format-specific keywords
Speed up discovery of paths requiring specific tokens
Bypass simple parsing checks

Dictionaries are most effective for formats with specific keywords, magic bytes, or structured syntax (XML, JSON, SQL, file formats, protocols).

Using Dictionaries

Pass a dictionary to afl-fuzz with the -x option:

afl-fuzz -i input -o output -x dictionaries/xml.dict -- ./target @@

Built-in Dictionaries

AFL++ includes dictionaries for common formats in the dictionaries/ directory:

xml.dict

XML tags and entities

json.dict

JSON syntax elements

png.dict

PNG chunk types and values

sql.dict

SQL keywords and operators

html.dict

HTML tags and attributes

jpeg.dict

JPEG markers and values

See AFL++ dictionaries directory for the complete list.

Dictionary Formats

AFL++ supports two dictionary formats:

File Format (Recommended)

A text file with one token per line:

name="value"

name: Optional alphanumeric identifier (for documentation)
value: Token in quotes with hex escaping for special characters

tag_open="<"
tag_close=">"
entity_amp="&amp;"
entity_lt="&lt;"
entity_gt="&gt;"
cdata_start="<![CDATA["
cdata_end="]]>"
xml_version="<?xml version=\"1.0\"?>"

Escape Sequences

Use these escape sequences in values:

\xNN: Hex byte (e.g., \x00 for null byte)
\\: Literal backslash
\": Literal quote
\r, \n, \t: Carriage return, newline, tab

Directory Format

Create a directory where each file contains one token:

mkdir my_dictionary/
echo -n "<?xml" > my_dictionary/token1
echo -n "<element>" > my_dictionary/token2
echo -n "</element>" > my_dictionary/token3

No escaping needed - raw file contents are used as tokens.

Use with:

afl-fuzz -i input -o output -x my_dictionary/ -- ./target @@

Dictionary Levels

Control which tokens are loaded based on complexity levels:

basic_token="value"
advanced_token@1="value"
expert_token@2="value"

@0 (default): Always loaded
@1: Loaded if level ≥ 1
@2: Loaded if level ≥ 2

Specify level when running:

afl-fuzz -i input -o output -x dictionary.dct@2 -- ./target @@

Use levels to create graduated dictionaries: basic tokens at @0, rare/complex tokens at higher levels.

Creating Custom Dictionaries

Manual Creation

Identify important tokens

Analyze your target format for:

Magic bytes and headers
Keywords and commands
Common delimiters
Field separators
Control characters

Keep tokens small

Optimal token size: 2-16 bytes

# Good
keyword="SELECT"
delim=";"

# Too large (will slow fuzzing)
huge_structure="<?xml version=\"1.0\"?><root><element attr=\"value\">...</element></root>"

Create the dictionary file

# Magic bytes
magic="\x4d\x5a\x90\x00"

# Keywords
kw_start="START"
kw_end="END"

# Delimiters
delim_colon=":"
delim_semi=";"

Auto-generated Dictionaries

AFL++ can automatically generate dictionaries:

LTO Mode Auto-Dictionary

With afl-clang-lto, dictionaries are automatically generated from compile-time comparisons:

# Compile with LTO
CC=afl-clang-lto ./configure
make

# Dictionary is embedded - no -x flag needed!
afl-fuzz -i input -o output -- ./target @@

This is automatic - just use afl-clang-lto and forget about dictionaries!

LLVM Mode Dictionary Generation

With afl-clang-fast, generate a dictionary file during compilation:

export AFL_LLVM_DICT2FILE=/path/to/output.dict
export AFL_LLVM_DICT2FILE_NO_MAIN=1  # Skip main() parsing
CC=afl-clang-fast ./configure
make

# Use generated dictionary
afl-fuzz -i input -o output -x /path/to/output.dict -- ./target @@

AFL_LLVM_DICT2FILE

path

Full path to dictionary file to create during compilation.

AFL_LLVM_DICT2FILE_NO_MAIN

boolean

Skip parsing main() function (often just command-line parsing).

Runtime Token Capture

Use libtokencap to capture tokens during execution:

export AFL_TOKEN_FILE=/path/to/captured.dict
AFL_PRELOAD=/path/to/libtokencap.so ./target < sample_input

# Use captured tokens
afl-fuzz -i input -o output -x /path/to/captured.dict -- ./target @@

See utils/libtokencap/README.md for details.

Dictionary Best Practices

Token size matters

Keep tokens 2-16 bytes for best results:

# Optimal
keyword="if"
operator="=="
delimiter=";"

# Too small (1 byte - already covered by havoc)
single="a"

# Too large (slows fuzzing)
large="this is a very long token that is probably too large"

Quality over quantity

Fewer, high-quality tokens > many low-value tokens:

# Good: 20-50 meaningful tokens
keyword_select="SELECT"
keyword_from="FROM"
keyword_where="WHERE"

# Bad: 500 random strings from corpus
# (defeats the purpose)

Format-specific tokens

Include tokens specific to your format:

# PNG format
magic="\x89PNG\r\n\x1a\n"
chunk_ihdr="IHDR"
chunk_idat="IDAT"

# Not generic strings
random_word="hello"

Combine approaches

Use multiple dictionary sources:

# Auto-generated + manual
export AFL_LLVM_DICT2FILE=auto.dict
CC=afl-clang-fast ./configure && make

# Merge with manual dictionary
cat auto.dict manual.dict > combined.dict
afl-fuzz -i input -o output -x combined.dict -- ./target @@

Probabilistic Dictionary Mode

For large dictionaries, AFL++ uses probabilistic mode to avoid slowdowns:

AFL_MAX_DET_EXTRAS

integer

default:"200"

Threshold for probabilistic mode. When dictionary + auto-dictionary entries exceed this, not all entries are used all the time.

export AFL_MAX_DET_EXTRAS=300
afl-fuzz -i input -o output -x large.dict -- ./target @@

With 201+ entries, there’s a 1/201 chance that one entry won’t be used directly in a given mutation.

Dictionary Recommendations by Format

XML/HTML

tag_open="<"
tag_close=">"
slash="/"
equal="="
quote="\""
entity_amp="&amp;"
entity_lt="&lt;"

JSON

lbrace="{"
rbrace="}"
lbracket="["
rbracket="]"
colon=":"
comma=","
true="true"
false="false"
null="null"

Binary Formats

magic="\x4d\x5a"  # MZ
pe_sig="PE\x00\x00"
elf_magic="\x7fELF"
png_magic="\x89PNG"

Network Protocols

http_get="GET"
http_post="POST"
http_version="HTTP/1.1"
crlf="\r\n"
header_host="Host:"

Disabling Auto-Dictionaries

If you want to use only your manual dictionary:

export AFL_NO_AUTODICT=1
afl-fuzz -i input -o output -x manual.dict -- ./target @@

AFL_NO_AUTODICT

boolean

Disable loading of LTO-generated auto-dictionaries compiled into the target.

Examples

Example 1: SQL Fuzzer

# SQL Keywords
select="SELECT"
from="FROM"
where="WHERE"
insert="INSERT"
into="INTO"
values="VALUES"
update="UPDATE"
delete="DELETE"

# Operators
eq="="
lt="<"
gt=">"
and="AND"
or="OR"

# Syntax
semi=";"
comma=","
star="*"
lparen="("
rparen=")"
quote="'"

Example 2: Image Format

# PNG Magic
magic="\x89PNG\r\n\x1a\n"

# Chunk Types
ihdr="IHDR"
plte="PLTE"
idat="IDAT"
iend="IEND"
text="tEXt"
time="tIME"

# Common Sizes
width_800="\x00\x00\x03\x20"
height_600="\x00\x00\x02\x58"

Example 3: Protocol Fuzzer

# Methods
get="GET"
post="POST"
head="HEAD"
put="PUT"
delete="DELETE"

# Versions
http10="HTTP/1.0"
http11="HTTP/1.1"
http2="HTTP/2"

# Headers
host="Host:"
user_agent="User-Agent:"
content_type="Content-Type:"
content_length="Content-Length:"

# Delimiters
crlf="\r\n"
space=" "
colon=":"

Custom Mutators

Implement structure-aware mutations

CMPLOG

Automatic comparison discovery

LAF-Intel

Split comparisons for easier solving

LTO Mode

Automatic dictionary generation

Documentation Index

​What Are Dictionaries?

​Using Dictionaries

​Built-in Dictionaries

xml.dict

json.dict

png.dict

sql.dict

html.dict

jpeg.dict

​Dictionary Formats

​File Format (Recommended)

​Escape Sequences

​Directory Format

​Dictionary Levels

​Creating Custom Dictionaries

​Manual Creation

​Auto-generated Dictionaries

​LTO Mode Auto-Dictionary

​LLVM Mode Dictionary Generation

​Runtime Token Capture

​Dictionary Best Practices

​Probabilistic Dictionary Mode

​Dictionary Recommendations by Format

XML/HTML

JSON

Binary Formats

Network Protocols

​Disabling Auto-Dictionaries

​Examples

​Example 1: SQL Fuzzer

​Example 2: Image Format

​Example 3: Protocol Fuzzer

​Related Topics

Custom Mutators

CMPLOG

LAF-Intel

LTO Mode

What Are Dictionaries?

Using Dictionaries

Built-in Dictionaries

Dictionary Formats

File Format (Recommended)

Escape Sequences

Directory Format

Dictionary Levels

Creating Custom Dictionaries

Manual Creation

Auto-generated Dictionaries

LTO Mode Auto-Dictionary

LLVM Mode Dictionary Generation

Runtime Token Capture

Dictionary Best Practices

Probabilistic Dictionary Mode

Dictionary Recommendations by Format

Disabling Auto-Dictionaries

Examples

Example 1: SQL Fuzzer

Example 2: Image Format

Example 3: Protocol Fuzzer

Related Topics