MD4: From Hero to Zero
Before MD5, there was MD4. Designed for speed above all else, MD4 became the foundation for an entire generation of hash functions—and a cautionary tale about prioritizing performance over security. Today, MD4 is so thoroughly broken that collisions can be computed by hand.
This post traces MD4's brief reign and spectacular fall, and explains why it still haunts us in legacy systems.
What is MD4?
MD4 (Message-Digest Algorithm 4) is a cryptographic hash function that produces a 128-bit (16-byte) hash value, rendered as a 32-character hexadecimal string:
MD4("Hello, World!") = 94e3cb0fa9aa7a5ee3db74b79e915989
Like all cryptographic hash functions, MD4 was designed to provide:
- Preimage resistance: Given a hash
h, you can't find a message that produces it - Second preimage resistance: Given a message
m1, you can't find a differentm2with the same hash - Collision resistance: You can't find ANY two different messages with the same hash
MD4 failed at all three—spectacularly.
The Birth of MD4
MD4 was designed by Ronald Rivest at MIT and published in October 1990 as RFC 1320.
Rivest had a specific goal: speed. The hash functions available in 1990 were computationally expensive. Rivest wanted something fast enough to be practical for software implementations on the hardware of that era.
The MD Family Tree
MD (1989) — Unpublished, theoretical
↓
MD2 (1989) — Designed for 8-bit processors, very slow
↓
MD3 (1990) — Never published, Rivest found weaknesses himself
↓
MD4 (1990) — Designed for speed, too fast and loose
↓
MD5 (1991) — Strengthened MD4, still eventually broken
What happened to MD3? Rivest designed MD3 as an intermediate step, but discovered weaknesses in it before publication. Rather than release a flawed algorithm, he abandoned MD3 and moved directly to MD4. The details of MD3 were never made public.
MD4 was explicitly designed to be "aggressive" in its security margins. Rivest knew he was pushing limits, writing in the RFC:
"It is conjectured that it is computationally infeasible to produce two messages having the same message digest."
"Conjectured" was doing a lot of heavy lifting there.
How MD4 Works
MD4 processes messages in 512-bit blocks through three rounds of 16 operations each (48 total):
- Padding: Message is padded to a multiple of 512 bits
- Initialization: Four 32-bit state variables (A, B, C, D) set to fixed constants
- Processing: Each block goes through 3 rounds of 16 operations
- Output: Final state variables concatenated into 128-bit hash
The three rounds use different nonlinear functions:
- Round 1:
F(X,Y,Z) = (X AND Y) OR (NOT X AND Z) - Round 2:
G(X,Y,Z) = (X AND Y) OR (X AND Z) OR (Y AND Z) - Round 3:
H(X,Y,Z) = X XOR Y XOR Z
Each round also uses:
- Left rotation by varying amounts
- Addition of different constants (0, 0x5A827999, 0x6ED9EBA1)
Key difference from MD5: MD4 has only 48 operations total; MD5 has 64. MD4 uses three rounds; MD5 uses four. These "efficiency" choices became MD4's downfall.
The Glory Days (All 18 Months of Them)
MD4's reign was brief but influential:
- Password hashing: Early systems used MD4 for password storage
- Integrity checks: File verification in early 1990s software
- Foundation for successors: MD5, SHA-0, SHA-1, and RIPEMD all borrowed MD4's structure
But MD4's real legacy is its descendants:
| Algorithm | Based on MD4? | Status |
|---|---|---|
| MD5 | Direct successor | Broken |
| SHA-0 | MD4 structure | Broken |
| SHA-1 | MD4 structure | Broken |
| SHA-2 | MD4 structure | Still secure |
| RIPEMD | MD4 structure | Mostly broken |
| RIPEMD-160 | MD4 structure | Still secure |
The Merkle-Damgård construction and basic operations pioneered by MD4 live on, even as MD4 itself is completely broken.
The Fall: A Timeline of Attacks
1991: First Warnings (Within a Year!)
Just one year after publication, Bert den Boer and Antoon Bosselaers found weaknesses in MD4. They discovered that the last two rounds of MD4 were not collision-resistant when analyzed separately.
This was an ominous sign—no hash function should show weaknesses that quickly.
1995: Compression Function Broken
Hans Dobbertin found collisions in MD4's compression function. Unlike a full collision attack, this required control over the initial state variables, but it demonstrated fundamental weaknesses in the design.
1996: Full Collisions
Dobbertin extended his attack to find actual MD4 collisions—two different messages with identical (the same) MD4 hashes. The attack required about 2^20 operations, far below the expected 2^64 for a 128-bit hash.
MD4 was officially broken.
2004: Collisions in Seconds
Xiaoyun Wang (the same researcher who broke MD5 and SHA-1) demonstrated MD4 collisions could be found in less than a second on a standard PC.
Here's an example collision from Wang's paper—two 512-bit messages with identical (the same) MD4 hashes:
Message 1:
4d7a9c83 56cb927a b9d5a578 57a7a5ee de748a3c dcc366b3 b683a020 3b2a5d9f
c69d71b3 f9e99198 d79f805e a63bb2e8 45dd8e31 97e31fe5 2794bf08 b9e8c3e9
Message 2:
4d7a9c83 d6cb927a 29d5a578 57a7a5ee de748a3c dcc366b3 b683a020 3b2a5d9f
c69d71b3 f9e99198 d79f805e a63bb2e8 45dc8e31 97e31fe5 2794bf08 b9e8c3e9
The differences are minimal (only a few bits), but both produce:
MD4: 4d7e6a1d ea4a5cd5 0981b94d e0e3ed3a
2007: Collisions by Hand
In a remarkable demonstration, Vlastimil Klima showed that MD4 collisions could be found manually—by hand, with pencil and paper—in about one hour.
This is the ultimate indignity for a cryptographic hash function: being broken without even needing a computer.
Modern Day: Instant Collisions
Today, MD4 collisions can be generated in microseconds. The hashclash tool can produce millions of colliding pairs per second.
Why MD4 Broke So Quickly
MD4's aggressive design made it vulnerable:
1. Too Few Rounds
MD4 uses only 3 rounds with 16 operations each (48 total). MD5 added a fourth round (64 total) specifically because Rivest recognized MD4 was too lean.
Fewer rounds mean:
- Less diffusion (changes don't spread enough)
- More predictable intermediate states
- Easier differential paths for attackers
2. Simple Round Functions
MD4's round functions are elegant but too simple:
- Round 1: Basic conditional selection
- Round 2: Majority function
- Round 3: Simple XOR
More complex functions in later hashes provide better nonlinearity and resistance to differential attacks.
3. Weak Message Schedule
MD4 processes message words in a simple, predictable order. Later designs like SHA-2 use complex message expansion that makes differential attacks much harder.
4. Optimized for Speed, Not Security
Rivest explicitly traded security margin for speed. In 1990, this seemed reasonable—MD4 was "probably secure." By 1996, it was definitely not.
Where MD4 Still Lurks
Despite being completely broken, MD4 persists in several places:
NTLM (Windows Password Hashing)
The most significant MD4 legacy is NTLM (NT LAN Manager), used for Windows password authentication:
NTLM Hash = MD4(UTF-16-LE(password))
Yes, Windows passwords are still stored as unsalted MD4 hashes in many environments. This is why:
- NTLM hashes are trivially crackable with rainbow tables
- Tools like hashcat crack NTLM at billions of hashes per second
- Pass-the-hash attacks work so well against Windows networks
Microsoft has been trying to deprecate NTLM since Windows Vista (2007), but it remains widely used for backward compatibility.
ed2k (eDonkey2000) Links
The eDonkey peer-to-peer network used MD4 for file identification. While the network is largely defunct, ed2k links still appear and use MD4 hashes.
Legacy Systems
Some old:
- Embedded systems with firmware from the 1990s
- Industrial control systems
- Legacy databases
MD4 vs MD5 vs SHA-1
| Property | MD4 | MD5 | SHA-1 |
|---|---|---|---|
| Output size | 128 bits | 128 bits | 160 bits |
| Block size | 512 bits | 512 bits | 512 bits |
| Rounds | 3 | 4 | 4 (80 operations) |
| Operations | 48 | 64 | 80 |
| First collision | 1996 | 2004 | 2017 |
| Time to collision | Microseconds | Seconds | Hours (was $110k) |
| Status | Completely broken | Broken | Broken |
MD4 broke first because it had the fewest defenses. MD5 lasted longer because Rivest added a fourth round. SHA-1 lasted longest because of its larger output and more operations—but they all fell eventually.
Lessons from MD4
1. Speed vs Security is a Real Tradeoff
Rivest optimized MD4 for speed and explicitly reduced security margins. The result was a hash function broken within 6 years. There's no free lunch in cryptography.
2. "Probably Secure" Isn't Good Enough
RFC 1320 said MD4 was "conjectured" to be secure. Conjectures in cryptography have a way of being disproven.
3. Legacy Systems Are Security Liabilities
NTLM's continued use of MD4 shows how broken cryptography can persist for decades due to backward compatibility requirements. Security debt compounds over time.
4. Cryptanalysis Moves Fast
MD4 went from "published standard" to "broken by hand" in 17 years. In cryptographic terms, that's not much time. Any hash function deployed today will face 17 years of concentrated attack.
5. Design Conservatively
Modern hash functions like SHA-3 use many more rounds than strictly necessary. This safety margin is learned from MD4's failure—better to be slower and secure than fast and broken.
What Should You Use Instead?
Never use MD4 for any purpose. Alternatives:
| Purpose | Recommended |
|---|---|
| General hashing | SHA-256, SHA-3, BLAKE3 |
| Password hashing | Argon2, bcrypt, scrypt |
| File integrity | SHA-256, BLAKE3 |
| HMAC | HMAC-SHA256 |
| Legacy NTLM replacement | Kerberos, NTLMv2 (minimum), or modern auth |
Conclusion
MD4 was designed to be fast, and it was—both to compute and to break. Its aggressive design philosophy produced a hash function that fell to cryptanalysis in record time.
Yet MD4's influence persists. Its structure lives on in SHA-2, and its hash lives on in NTLM. Every time a penetration tester cracks Windows passwords at billions per second, they're exploiting a decision made in 1990.
MD4 teaches us that in cryptography, shortcuts are dangerous. The extra rounds Rivest added to MD5 bought a few more years. The additional complexity in SHA-1 bought a decade more. But ultimately, all three fell to the same fundamental vulnerabilities inherited from MD4's lean design.
When choosing hash functions today, remember MD4: the fastest way to obsolescence is to optimize for speed over security.
References
- Rivest, R. (1990). RFC 1320: The MD4 Message-Digest Algorithm
- den Boer, B., Bosselaers, A. (1991). An Attack on the Last Two Rounds of MD4
- Dobbertin, H. (1996). Cryptanalysis of MD4
- Wang, X., et al. (2004). Collisions for Hash Functions MD4, MD5, RIPEMD and HAVAL
- Klima, V. (2006). Tunnels in Hash Functions: MD5 Collisions Within a Minute
- Microsoft. NTLM Overview