Cryptography

MD4: From Hero to Zero

Before MD5, there was MD4. Designed for speed above all else, MD4 became the foundation for an entire generation of hash functions—and a cautionary tale about prioritizing performance over security. Today, MD4 is so thoroughly broken that collisions can be computed by hand.

This post traces MD4's brief reign and spectacular fall, and explains why it still haunts us in legacy systems.

What is MD4?

MD4 (Message-Digest Algorithm 4) is a cryptographic hash function that produces a 128-bit (16-byte) hash value, rendered as a 32-character hexadecimal string:

MD4("Hello, World!") = 94e3cb0fa9aa7a5ee3db74b79e915989

Like all cryptographic hash functions, MD4 was designed to provide:

Preimage resistance: Given a hash h, you can't find a message that produces it
Second preimage resistance: Given a message m1, you can't find a different m2 with the same hash
Collision resistance: You can't find ANY two different messages with the same hash

MD4 failed at all three—spectacularly.

The Birth of MD4

MD4 was designed by Ronald Rivest at MIT and published in October 1990 as RFC 1320.

Rivest had a specific goal: speed. The hash functions available in 1990 were computationally expensive. Rivest wanted something fast enough to be practical for software implementations on the hardware of that era.

The MD Family Tree

MD (1989) — Unpublished, theoretical
    ↓
MD2 (1989) — Designed for 8-bit processors, very slow
    ↓
MD3 (1990) — Never published, Rivest found weaknesses himself
    ↓
MD4 (1990) — Designed for speed, too fast and loose
    ↓
MD5 (1991) — Strengthened MD4, still eventually broken

What happened to MD3? Rivest designed MD3 as an intermediate step, but discovered weaknesses in it before publication. Rather than release a flawed algorithm, he abandoned MD3 and moved directly to MD4. The details of MD3 were never made public.

MD4 was explicitly designed to be "aggressive" in its security margins. Rivest knew he was pushing limits, writing in the RFC:

"It is conjectured that it is computationally infeasible to produce two messages having the same message digest."

"Conjectured" was doing a lot of heavy lifting there.

How MD4 Works

MD4 processes messages in 512-bit blocks through three rounds of 16 operations each (48 total):

Padding: Message is padded to a multiple of 512 bits
Initialization: Four 32-bit state variables (A, B, C, D) set to fixed constants
Processing: Each block goes through 3 rounds of 16 operations
Output: Final state variables concatenated into 128-bit hash

The three rounds use different nonlinear functions:

Round 1: F(X,Y,Z) = (X AND Y) OR (NOT X AND Z)
Round 2: G(X,Y,Z) = (X AND Y) OR (X AND Z) OR (Y AND Z)
Round 3: H(X,Y,Z) = X XOR Y XOR Z

Each round also uses:

Left rotation by varying amounts
Addition of different constants (0, 0x5A827999, 0x6ED9EBA1)

Key difference from MD5: MD4 has only 48 operations total; MD5 has 64. MD4 uses three rounds; MD5 uses four. These "efficiency" choices became MD4's downfall.

The Glory Days (All 18 Months of Them)

MD4's reign was brief but influential:

Password hashing: Early systems used MD4 for password storage
Integrity checks: File verification in early 1990s software
Foundation for successors: MD5, SHA-0, SHA-1, and RIPEMD all borrowed MD4's structure

But MD4's real legacy is its descendants:

Algorithm	Based on MD4?	Status
MD5	Direct successor	Broken
SHA-0	MD4 structure	Broken
SHA-1	MD4 structure	Broken
SHA-2	MD4 structure	Still secure
RIPEMD	MD4 structure	Mostly broken
RIPEMD-160	MD4 structure	Still secure

The Merkle-Damgård construction and basic operations pioneered by MD4 live on, even as MD4 itself is completely broken.

The Fall: A Timeline of Attacks

1991: First Warnings (Within a Year!)

Just one year after publication, Bert den Boer and Antoon Bosselaers found weaknesses in MD4. They discovered that the last two rounds of MD4 were not collision-resistant when analyzed separately.

This was an ominous sign—no hash function should show weaknesses that quickly.

1995: Compression Function Broken

Hans Dobbertin found collisions in MD4's compression function. Unlike a full collision attack, this required control over the initial state variables, but it demonstrated fundamental weaknesses in the design.

1996: Full Collisions

Dobbertin extended his attack to find actual MD4 collisions—two different messages with identical (the same) MD4 hashes. The attack required about 2^20 operations, far below the expected 2^64 for a 128-bit hash.

MD4 was officially broken.

2004: Collisions in Seconds

Xiaoyun Wang (the same researcher who broke MD5 and SHA-1) demonstrated MD4 collisions could be found in less than a second on a standard PC.

Here's an example collision from Wang's paper—two 512-bit messages with identical (the same) MD4 hashes:

Message 1:
4d7a9c83 56cb927a b9d5a578 57a7a5ee de748a3c dcc366b3 b683a020 3b2a5d9f
c69d71b3 f9e99198 d79f805e a63bb2e8 45dd8e31 97e31fe5 2794bf08 b9e8c3e9

Message 2:
4d7a9c83 d6cb927a 29d5a578 57a7a5ee de748a3c dcc366b3 b683a020 3b2a5d9f
c69d71b3 f9e99198 d79f805e a63bb2e8 45dc8e31 97e31fe5 2794bf08 b9e8c3e9

The differences are minimal (only a few bits), but both produce:

MD4: 4d7e6a1d ea4a5cd5 0981b94d e0e3ed3a

2007: Collisions by Hand

In a remarkable demonstration, Vlastimil Klima showed that MD4 collisions could be found manually—by hand, with pencil and paper—in about one hour.

This is the ultimate indignity for a cryptographic hash function: being broken without even needing a computer.

Modern Day: Instant Collisions

Today, MD4 collisions can be generated in microseconds. The hashclash tool can produce millions of colliding pairs per second.

Why MD4 Broke So Quickly

MD4's aggressive design made it vulnerable:

1. Too Few Rounds

MD4 uses only 3 rounds with 16 operations each (48 total). MD5 added a fourth round (64 total) specifically because Rivest recognized MD4 was too lean.

Fewer rounds mean:

Less diffusion (changes don't spread enough)
More predictable intermediate states
Easier differential paths for attackers

2. Simple Round Functions

MD4's round functions are elegant but too simple:

Round 1: Basic conditional selection
Round 2: Majority function
Round 3: Simple XOR

More complex functions in later hashes provide better nonlinearity and resistance to differential attacks.

3. Weak Message Schedule

MD4 processes message words in a simple, predictable order. Later designs like SHA-2 use complex message expansion that makes differential attacks much harder.

4. Optimized for Speed, Not Security

Rivest explicitly traded security margin for speed. In 1990, this seemed reasonable—MD4 was "probably secure." By 1996, it was definitely not.

Where MD4 Still Lurks

Despite being completely broken, MD4 persists in several places:

NTLM (Windows Password Hashing)

The most significant MD4 legacy is NTLM (NT LAN Manager), used for Windows password authentication:

NTLM Hash = MD4(UTF-16-LE(password))

Yes, Windows passwords are still stored as unsalted MD4 hashes in many environments. This is why:

NTLM hashes are trivially crackable with rainbow tables
Tools like hashcat crack NTLM at billions of hashes per second
Pass-the-hash attacks work so well against Windows networks

Microsoft has been trying to deprecate NTLM since Windows Vista (2007), but it remains widely used for backward compatibility.

ed2k (eDonkey2000) Links

The eDonkey peer-to-peer network used MD4 for file identification. While the network is largely defunct, ed2k links still appear and use MD4 hashes.

Legacy Systems

Some old:

Embedded systems with firmware from the 1990s
Industrial control systems
Legacy databases

MD4 vs MD5 vs SHA-1

Property	MD4	MD5	SHA-1
Output size	128 bits	128 bits	160 bits
Block size	512 bits	512 bits	512 bits
Rounds	3	4	4 (80 operations)
Operations	48	64	80
First collision	1996	2004	2017
Time to collision	Microseconds	Seconds	Hours (was $110k)
Status	Completely broken	Broken	Broken

MD4 broke first because it had the fewest defenses. MD5 lasted longer because Rivest added a fourth round. SHA-1 lasted longest because of its larger output and more operations—but they all fell eventually.

Lessons from MD4

1. Speed vs Security is a Real Tradeoff

Rivest optimized MD4 for speed and explicitly reduced security margins. The result was a hash function broken within 6 years. There's no free lunch in cryptography.

2. "Probably Secure" Isn't Good Enough

RFC 1320 said MD4 was "conjectured" to be secure. Conjectures in cryptography have a way of being disproven.

3. Legacy Systems Are Security Liabilities

NTLM's continued use of MD4 shows how broken cryptography can persist for decades due to backward compatibility requirements. Security debt compounds over time.

4. Cryptanalysis Moves Fast

MD4 went from "published standard" to "broken by hand" in 17 years. In cryptographic terms, that's not much time. Any hash function deployed today will face 17 years of concentrated attack.

5. Design Conservatively

Modern hash functions like SHA-3 use many more rounds than strictly necessary. This safety margin is learned from MD4's failure—better to be slower and secure than fast and broken.

What Should You Use Instead?

Never use MD4 for any purpose. Alternatives:

Purpose	Recommended
General hashing	SHA-256, SHA-3, BLAKE3
Password hashing	Argon2, bcrypt, scrypt
File integrity	SHA-256, BLAKE3
HMAC	HMAC-SHA256
Legacy NTLM replacement	Kerberos, NTLMv2 (minimum), or modern auth

Conclusion

MD4 was designed to be fast, and it was—both to compute and to break. Its aggressive design philosophy produced a hash function that fell to cryptanalysis in record time.

Yet MD4's influence persists. Its structure lives on in SHA-2, and its hash lives on in NTLM. Every time a penetration tester cracks Windows passwords at billions per second, they're exploiting a decision made in 1990.

MD4 teaches us that in cryptography, shortcuts are dangerous. The extra rounds Rivest added to MD5 bought a few more years. The additional complexity in SHA-1 bought a decade more. But ultimately, all three fell to the same fundamental vulnerabilities inherited from MD4's lean design.

When choosing hash functions today, remember MD4: the fastest way to obsolescence is to optimize for speed over security.

References

Rivest, R. (1990). RFC 1320: The MD4 Message-Digest Algorithm
den Boer, B., Bosselaers, A. (1991). An Attack on the Last Two Rounds of MD4
Dobbertin, H. (1996). Cryptanalysis of MD4
Wang, X., et al. (2004). Collisions for Hash Functions MD4, MD5, RIPEMD and HAVAL
Klima, V. (2006). Tunnels in Hash Functions: MD5 Collisions Within a Minute
Microsoft. NTLM Overview