DoReCo brings together spoken language corpora of 51 small and endangered languages from various language documentation initiatives. The resource is intended for cross-linguistic research on phonetics, morphology, and other topics related to spoken language(s). On the DoReCo website, you can explore the 51 datasets, download most annotation and audio files as well as metadata for free without registration, and find basic guidance on how to use the DoReCo data. If you have further questions or comments, you can write an email to or use our GitHub issue tracker.

AIRAL is an ongoing project (2022-2025) with two main goals. The first goal is to shed light on the acoustic properties of roots and affixes, using data from DoReCo. The second goal is to expand the DoReCo universe by adding new datasets to the corpus and by refining existing annotations.

The AIRAL team consists of:

Ludger Paschen (Principal Investigator, Leibniz-ZAS)

Michelle Throssell (Research Assistant, U Potsdam)

Aleksandr Schamberger (Research Assistant, HU Berlin)


Frank Seifart (HU Berlin & Leibniz-ZAS)

Susanne Fuchs (Leibniz-ZAS)

Christoph Draxler (LMU M√ľnchen)

Matt Stave (DDL Lyon)

Rachid Ridouane (Université Paris III - Sorbonne Nouvelle)

Peter M. Arkadiev (Russian Academy of Sciences)