Successful social interaction hinges on accurate perception of emotional signals. These signals are typically conveyed multi-modally by the face and voice. Previous research has demonstrated uni-modal contrastive aftereffects for emotionally expressive faces or voices. Here we were interested in whether these aftereffects transfer across modality as theoretical models predict. We show that adaptation to facial expressions elicits significant auditory aftereffects. Adaptation to angry facial expressions caused ambiguous vocal stimuli drawn from an anger-fear morphed continuum to be perceived as less angry and more fearful relative to adaptation to fearful faces. In a second experiment, we demonstrate that these aftereffects are not dependent on learned face-voice congruence, i.e. adaptation to one facial identity transferred to an unmatched voice identity. Taken together, our findings provide support for a supra-modal representation of emotion and suggest further that identity and emotion may be processed independently from one another, at least at the supra-modal level of the processing hierarchy.