

Choose t points C = {x_1, ..., x_n} in {0,1}^n. This is the code. 

Decoding: given x, find all x_i with d_H(x, x_i) <= d. If none, error. If more than one, error, else report that x_i.


Message without loss of generality is 1, so codeword is x_1 and noise is z. Corrupted word is y = x_1 + z.


Pr_{noise} [decode(y) != x_1] = Pr[ d_H(x_1,y) > d or some other x_i has d_H(x_i, y) <= d] 

<=

Pr_{z} [more than d bit errors] + sum_{i} Pr_{z}[ B_d(x_i) contains z]

First sum is very small. Now let's analyze E_{code} [ the second sum ] 

Claim: E_{code} sum_{i} Pr_{z} [B_d(x_i) contains z] is small. 

Stronger claim: for every fixed z, the expression

E_{code} sum_{i} [B_d(x_i) contains z] is small. 

Proof: 

