Recognising a speaker’s identity by the sound of their voice is important for successful interaction. The skill depends on our ability to discriminate minute variations in the acoustics of the vocal signal. Performance on voice identity assessments varies widely across the population. The neural underpinnings of this ability and its individual differences, however, remain poorly understood. Here we provide critical tests of a theoretical framework for the neural processing stages of voice identity and address how individual differences in identity discrimination mediate activation in this neural network. We scanned 40 individuals on an fMRI adaptation task involving voices drawn from morphed continua between two personally familiar identities. Analyses dissociated neuronal effects induced by repetition of acoustically similar morphs from those induced by a switch in perceived identity. Activation in temporal voice-sensitive areas decreased with acoustic similarity between consecutive stimuli. This repetition suppression effect was mediated by the performance on an independent voice assessment and this result highlights an important functional role of adaptive coding in voice expertise. Bilateral anterior insulae and medial frontal gyri responded to a switch in perceived voice identity compared to an acoustically equidistant switch within identity. Our results support a multistep model of voice identity perception.