The InterTVA dataset has been acquired with two main objectives. First, from a neuroscientific perspective, it aims at studying the inter-individual differences observed in people's ability at performing voice perception and voice identification tasks. Secondly, from a methodological perspective, it should allow benchmarking multi-view machine learning methods. Indeed, it includes several MRI modalities: anatomical MRI, diffusion MRI and several sessions of functional MRI -- one resting state run, one event-related voice localizer run (passive listening of vocal and non-vocal sounds), and four runs during which the subject performed a voice identification task.