Syntax Archive

The Syntax Archive is a research archive specialised in developing and maintaining Finnish language corpora. It houses digitised text, video and sound corpora available to researchers and students. The majority of the data is organised into digitised annotated corpora.

We have developed five digital corpora which have been morphologically and syntactically annotated. These corpora can be accessed through the Language Bank of Finland. The collected data extends from the earliest written Finnish to modern everyday conversations, and covers Finnish language produced by native speakers as well as non-native learners of Finnish. In addition to these corpora, the Syntax Archive houses audio, video and text data available to researchers and students. The Syntax Archive participates in the Digilang project which aims at developing the digital language resources at the School of Languages and Translation Studies.

Grammatically annotated corpora

Non-annotated corpora

The non-annotated corpora include recordings and transcriptions from the Satakunta region (Sapu) (255 transcriptions of over 200 hours) and the data from the Prosovar project studying the regional variation of Finnish prosody.

Sound and video recordings

The Syntax Archive houses the sound and video recordings of the Finnish language recording archive at the University of Turku. All the data is available in digital format. In total, the data comprises more than 5,600 hours of recordings.

Contact information

Visiting address
Hämeenkatu 1, Turku

Postal address
Syntax Archive
Department of Finnish and Finno-Ugric Studies
FI-20014 University of Turku, Finland