Goal

Understand cryptographic hash functions: what they do, how they differ from encryption, why some are broken (MD5), and how to use SHA-256 for file integrity verification.

Prerequisites: Week 1 (Cypherpunk Ideals & Threat Modeling)

This is Part 1 of 3 - Covers hashing fundamentals and integrity verification.


Introduction: Why Cryptography Matters

Before we dive into algorithms, let’s understand what we’re building toward:

Cryptography is mathematics applied to privacy. It transforms the abstract right to privacy into concrete protection through proofs, not promises.

  • Hashing proves integrity: “This file hasn’t been tampered with”
  • Encryption provides confidentiality: “Only intended recipients can read this”
  • Key derivation strengthens secrets: “Your password becomes unbreakable”
  • Randomness enables security: “Attackers can’t predict your keys”

By the end of this week, you’ll understand the how and why behind every encrypted file, signed message, and verified download you encounter.


1. What Is a Hash Function?

A cryptographic hash function takes arbitrary data and produces a fixed-size “fingerprint” called a digest.

Key properties:

  1. Deterministic - Same input always produces same output
  2. Fast - Quick to compute
  3. One-way - Cannot reverse digest back to input (mathematically hard)
  4. Avalanche effect - Tiny change in input completely changes output
  5. Collision resistant - Computationally infeasible to find two inputs with same hash

Why Hash Functions Are Not Encryption

Critical distinction:

HASHING:
"Cypherpunks write code" → SHA256 → 09cf84d64a47650f... (CANNOT BE REVERSED)

ENCRYPTION:
"Cypherpunks write code" + KEY → AES → f8a3b2c1d4e5f6... (CAN BE DECRYPTED WITH KEY)

Hashes are for:

  • File integrity verification (checksums)
  • Password storage (never store plaintext passwords!)
  • Digital signatures (covered in Week 3)
  • Proof-of-work (Bitcoin mining)

Encryption is for:

  • Confidentiality (hiding message contents)
  • Two-way transformation (need to decrypt later)

2. Common Hash Algorithms

AlgorithmOutput SizeStatusUse Case
MD5128 bitsBROKENNever use (collisions found)
SHA-1160 bitsDEPRECATEDLegacy systems only
SHA-256256 bitsSECUREGeneral purpose, Bitcoin
SHA-512512 bitsSECUREExtra security margin
BLAKE2b512 bitsSECUREFaster than SHA-2, modern
SHA-3VariableSECUREDifferent design, future-proof

Why MD5 Is Broken: Understanding Collisions

A collision is when two different inputs produce the same hash.

Example of MD5 collision (simplified):

# Two different files with SAME MD5 hash (real example from 2004):
File A: "d131dd02c5e6eec4..."  →  MD5: a4c4d8b2e3f1a5c6...
File B: "d131dd02c5e7eec4..."  →  MD5: a4c4d8b2e3f1a5c6...  (SAME!)

Why this matters:

  • Attacker creates malicious file with same hash as legitimate file
  • You verify hash, think it’s safe, actually execute malware
  • Digital signatures become worthless

SHA-256 is still secure - No practical collisions found despite billions of dollars in incentives (Bitcoin mining).


3. Hands-On: Hashing with SHA-256

# Hash a simple message
echo "Cypherpunks write code" | sha256sum
# Output: 7d8f4c3a1b2e5f6d9c8a7b6e5f4d3c2b1a9e8d7c6b5a4f3e2d1c0b9a8e7f6d5c  -

# Hash a file
echo "Privacy is a human right" > manifesto.txt
sha256sum manifesto.txt
# Output: <hash>  manifesto.txt

# Even tiny changes completely alter the hash (avalanche effect)
echo "Privacy is a human right." > manifesto-period.txt  # Added period
sha256sum manifesto-period.txt
# Completely different hash despite one character change!

Verifying File Integrity

Real-world scenario: You download a Linux ISO. How do you know it wasn’t modified?

# Download publishes SHA-256 hash on website
# Expected: a1b2c3d4e5f6...

# You compute hash of downloaded file
sha256sum ubuntu-24.04-desktop-amd64.iso

# Compare hashes - if they match, file is authentic
sha256sum -c <<EOF
a1b2c3d4e5f6... ubuntu-24.04-desktop-amd64.iso
EOF

Why this works:

  • Attacker would need to modify ISO and find collision (computationally impossible)
  • Even one flipped bit changes entire hash

4. Lab: Explore the Avalanche Effect

# Create test files with minimal differences
echo "The quick brown fox jumps over the lazy dog" > fox1.txt
echo "The quick brown fox jumps over the lazy dog." > fox2.txt  # Added period
echo "the quick brown fox jumps over the lazy dog" > fox3.txt  # Lowercase 't'

# Hash all three
sha256sum fox*.txt

# Notice: Completely different hashes despite tiny input changes
# This is the avalanche effect in action

Deliverable: Screenshot or notes showing three completely different hashes from nearly identical inputs.


Up Next

Week 2b covers symmetric encryption with AES, encryption modes (why ECB is broken), and key derivation functions.


Key Takeaways

  • Hash functions create fixed-size fingerprints from arbitrary data
  • One-way function - Cannot reverse a hash to get original input
  • MD5 is broken - Collisions found, never use for security
  • SHA-256 is secure - No practical collisions despite Bitcoin mining incentives
  • Avalanche effect - Tiny input change produces completely different hash
  • Integrity verification - Compare hashes to detect file tampering