The Glissando Corpus

Glissando is an annotated speech corpus specially designed for the analysis of Spanish and Catalan prosody from different perspectives (Phonetics, Phonology, Discourse Analysis, Speech Technology, comparative studies). Its main features are:

Bilingual: it includes actually two parallel corpora, Glissando_sp (Spanish) and Glissando_ca (Catalan), designed following the same criteria and structure.
High-quality recordings: all the corpus has been recorded in professional studios.
28 different speakers per language, both professional and non-professional.
Two different speaking styles: news reading and dialogues.
Orthographically and phonetically transcribed.
Annotated with different levels of prosodic information.

More than 20 hours of speech are available per language, which makes Glissando a useful tool for experimental, corpus-based and technological applications.

Contents

Both Glissando_sp and Glissando_ca include three data-sets:

a 'News' subcorpus: studio recordings of news readings.
an 'Informal dialogue' subcorpus: studio recordings of informal conversations between two speakers.
a 'Task-oriented dialogue' subcorpus: studio recordings of spoken interactions between two speakers oriented to a goal.

Availability

This corpus is under a Creative Commons Reconocimiento-NoComercial-CompartirIgual 3.0 España license.

How to cite

GARRIDO, J. M. - ESCUDERO, D. - AGUILAR, L. -CARDEÑOSO, V. - RODERO, E. - DE-LA-MOTA, C. - GONZÁLEZ, C. - RUSTULLET, S. - LARREA, O. - LAPLAZA, Y. - VIZCAÍNO, F. - CABRERA, M. - BONAFONTE, A. (2013).- "Glissando: a corpus for multidisciplinary prosodic studies in Spanish and Catalan", Language Resources and Evaluation, DOI 10.1007/s10579-012-9213-0.