The WOWA corpus grew out of the project Post-predicate elements in Iranian and neighbouring languages: Inheritance, contact, and information structure. It contains data that were collected and annotated by the researchers involved in that project, as well as others contributed by associated researchers.

The principle aim of WOWA is to provide an accessible and transparent source of data for corpus-based approaches to word order typology, focussing on the languages spoken in the region designated here as Western Asia.

The data sets are successively being made available, with 41 online as of July 2024.

Getting started with WOWA

- corpus overview
- —/—
- (TBA)
- archive
- all coded values
- 3.6 MB
- —
- 24/07/13
- archive
- 7.4 MB
- 24/07/13
- archive
- all metadata
- 2.1 MB
- 24/07/13
- archive
- all source texts
- —/—
- (TBA)
- archive

Citing WOWA

Haig, Geoffrey & Stilo, Donald & Doğan, Mahîr C. & Schiborr, Nils N. (eds.). 2022. WOWA — Word Order in Western Asia: A spoken-language-based corpus for investigating areal effects in word order variation. Bamberg: University of Bamberg. (multicast.aspra.uni-bamberg.de/resources/wowa/) (date accessed) download citation

Research Background

The focus on Western Asia is motivated by an overarching research interest in the areal diffusion of word order regularities; specifically, we investigate the respective impact of inheritance (the genetic affiliation of the languages concerned, e.g. Turkic, Semitic, etc.) and the impact of neighbouring languages, related or not, in shaping word order in usage. In addition, we address the issue of which aspects of word order are stable within a particular doculect, and which display corpus-internal variability.

More generally, this is connected to the issue of integrating variation into typology. Finally, WOWA is the only cross-linguistic data-base of its type that includes exclusively spoken language, and thus provides an important corrective to much ongoing work in corpus-based typology, which is still largely based on written language.

Corpus design

Each dataset in WOWA is based on a corpus of transcribed spoken language, usually compiled in a field-work setting. The sources are extremely varied; some are taken from published dialect surveys such as those undertaken by the Turkish Language Society (Turk Dil Kurumu), or published work by experts on particular language groups (e.g. Khan 2008, on the Neo-Aramaic (Christian) dialect of Barwar, northern Iraq). Others were gathered in the course of PhD projects and other initiatives in language documentation.

All data in the WOWA corpus, including supplementary materials, are published under the Creative Commons Attribution 4.0 International licence (CC BY 4.0). The text of the licence can be found online here.

The texts in WOWA contain at least 500 analysable tokens; the current mean is 650 tokens. They are digitalized, if not already in digital form, segmented into syntactic segments of up to three clauses (the size of segmented units varies and is immaterial for the analysis), and imported to a spreadsheet template.

The tokens to be analysed are referential nominal expressions in non-subject positions (i.e. subjects are not included). They are coded for a range of features, including animacy, weight, role, and flagging. The dependent variable is position relative to the governing predicate, for which two values are available: (A) before the governing predicate, or (B) after the governing predicate. The details are outlined in the coding guidelines. Once fully coded, the spreadsheets are exported as TSV files, which can then be imported into R for statistical analysis.

For each data set, we minimally make available (i) metadata on the doculect and source texts, (ii) the complete coded data, in XLS and TSV formats, and, where available, (iii) the original sources including sound files.

Documentation

- Coding guidelines
- 258 KB
- v1.0
- 20/01/06
- archive
- Spreadsheet template
- 33 KB
- v1.1
- 21/01/15
- archive

The doculects

— Please note that a number of data sets are still in the process of being compiled. —

Missing components are marked with "—/—" in the lists below; they will be added in the near future.

Turkic

Oghuz (Ankara)

Kateryna Iefremenko
download citation
- all files
- 0.7 MB
- source texts
- 0.3 MB
- coded values
- 0.3 MB
- archive
- metadata
- 0.1 MB
- 275 MB
- 0.2 MB
- (24/04/24)

Oghuz (Erzurum)

Mahîr Can Doğan
download citation
- all files
- 1.2 MB
- source texts
- 1.1 MB
- coded values
- 0.3 MB
- archive
- metadata
- 0.1 MB
- 0.2 MB
- (24/04/24)

Oghuz (Gagauz)

Mahîr Can Doğan
download citation
- all files
- 0.3 MB
- source texts
- 0.2 MB
- coded values
- 0.3 MB
- archive
- metadata
- 0.1 MB
- 0.2 MB
- (24/04/24)

Oghuz (Qashqai)

Laurentia Schreiber
download citation
- all files
- 3.0 MB
- source texts
- 2.8 MB
- coded values
- 0.3 MB
- archive
- metadata
- 0.1 MB
- —/—
- 0.1 MB
- (24/04/24)

Oghuz (Tabriz)

Donald Stilo
download citation
- all files
- 6.8 MB
- source texts
- 6.6 MB
- coded values
- 0.3 MB
- archive
- metadata
- 0.1 MB
- 0.2 MB
- (24/07/10)

Iranian

Balochi (Coastal)

Maryam Nourzaei
download citation
- all files
- 1.9 MB
- source texts
- 1.7 MB
- coded values
- 0.5 MB
- archive
- metadata
- 0.1 MB
- 247 MB
- 0.2 MB
- (24/04/24)

Balochi (Koroshi)

Maryam Nourzaei
download citation
- all files
- 0.7 MB
- source texts
- 0.5 MB
- coded values
- 0.2 MB
- archive
- metadata
- 0.1 MB
- —/—
- 0.1 MB
- (24/04/24)

Balochi (Turkmen)

Geoffrey Haig
download citation
- all files
- 2.0 MB
- source texts
- 1.9 MB
- coded values
- 0.3 MB
- archive
- metadata
- 0.1 MB
- 175 MB
- 0.2 MB
- (24/04/24)

Bashkardi (Northern)

Agnes Korn (transcription/analysis),
Ilya Gershevitch (recordings)
download citation
- all files
- 2.0 MB
- source texts
- 1.8 MB
- coded values
- 0.2 MB
- archive
- metadata
- 0.1 MB
- 184 MB
- 0.1 MB
- (24/04/24)
- orig. coding
- 0.1 MB

Bashkardi (Southern)

Agnes Korn (transcription/analysis),
Ilya Gershevitch (recordings)
download citation
- all files
- 1.5 MB
- source texts
- 1.9 MB
- coded values
- 0.1 MB
- archive
- metadata
- 0.1 MB
- 67 MB
- 0.1 MB
- (24/04/24)
- orig. coding
- 0.1 MB

Gorani (Gawraǰū)

Masoud Mohammadirad
download citation
- all files
- 3.1 MB
- source texts
- 2.8 MB
- coded values
- 0.6 MB
- archive
- metadata
- 0.1 MB
- —/—
- 0.3 MB
- (24/04/24)

Kumzari (Musandam)

Geoffrey Haig
download citation
- all files
- 0.7 MB
- source texts
- 0.5 MB
- coded values
- 0.4 MB
- archive
- metadata
- 0.1 MB
- 0.2 MB
- (24/04/24)

Kurdish (Central, Sanandaj)

Masoud Mohammadirad
download citation
- all files
- 0.7 MB
- source texts
- 0.4 MB
- coded values
- 0.5 MB
- archive
- metadata
- 0.1 MB
- —/—
- 0.3 MB
- (24/04/24)

Kurdish (Northern, Ankara)

Kateryna Iefremenko
download citation
- all files
- 0.7 MB
- source texts
- 0.6 MB
- coded values
- 0.2 MB
- archive
- metadata
- 0.1 MB
- 294 MB
- 0.1 MB
- (24/04/24)

Kurdish (Northern, Lachin)

Donald Stilo
download citation
- all files
- 19 MB
- source texts
- 18 MB
- coded values
- 0.3 MB
- archive
- metadata
- 0.1 MB
- 0.2 MB
- (24/04/24)

Kurdish (Northern, Muş)

Geoffrey Haig
download citation
- all files
- 0.5 MB
- source texts
- 0.4 MB
- coded values
- 0.3 MB
- archive
- metadata
- 0.1 MB
- 142 MB
- 0.1 MB
- (24/04/24)

Kurdish (Southern, Bijar)

Masoud Mohammadirad
download citation
- all files
- 0.5 MB
- source texts
- 0.3 MB
- coded values
- 0.5 MB
- archive
- metadata
- 0.1 MB
- 541 MB
- 0.1 MB
- (24/04/24)

Mazandarani (Kordxeyl)

Donald Stilo, Geoffrey Haig
download citation
- all files
- 18 MB
- source texts
- 18 MB
- coded values
- 0.3 MB
- archive
- metadata
- 0.1 MB
- 0.1 MB
- (24/04/24)

Persian (New)

Elham Izadi
download citation
- all files
- 1.7 MB
- source texts
- 1.3 MB
- coded values
- 0.9 MB
- archive
- metadata
- 0.1 MB
- 590 MB
- 0.4 MB
- (24/04/24)

Persian (New, Early Classical)

Mehdi Parizadeh
download citation
- all files
- 0.2 MB
- source texts
- —/—
- coded values
- 0.5 MB
- archive
- metadata
- 0.1 MB
- 0.2 MB
- (24/04/24)

Talyshi (Lerik)

Donald Stilo
download citation
- all files
- 2.6 MB
- source texts
- 2.5 MB
- coded values
- 0.3 MB
- archive
- metadata
- 0.1 MB
- 0.1 MB
- (24/04/24)

Tāti (Hazārrudi)

Raheleh Izadifar
download citation
- all files
- 0.5 MB
- source texts
- 0.3 MB
- coded values
- 0.3 MB
- archive
- metadata
- 0.1 MB
- 388 MB
- 0.2 MB
- (24/04/24)

Vafsi (Gurchani)

Mahîr Can Doğan
download citation
- all files
- 0.6 MB
- source texts
- 0.4 MB
- coded values
- 0.4 MB
- archive
- metadata
- 0.1 MB
- 320 MB
- 0.3 MB
- (24/04/24)

Zazakî (Çewlîg)

Netîce Demir, Mahîr Can Doğan
download citation
- all files
- 0.2 MB
- source texts
- 0.1 MB
- coded values
- 0.2 MB
- archive
- metadata
- 0.1 MB
- 189 MB
- 0.1 MB
- (24/04/24)

Zazakî (Siwêreg)

Netîce Demir, Mahîr Can Doğan
download citation
- all files
- 0.2 MB
- source texts
- 0.1 MB
- coded values
- 0.1 MB
- archive
- metadata
- 0.1 MB
- 125 MB
- 0.1 MB
- (24/04/24)

Kartvelian

Laz (Arhavi)

Donald Stilo, René Lacroix
download citation
- all files
- 0.7 MB
- source texts
- 0.6 MB
- coded values
- 0.2 MB
- archive
- metadata
- 0.1 MB
- 0.1 MB
- (24/04/24)

Semitic

Arabic (Jewish, Baghdad)

Assaf Bar-Moshe, Alexandru Craevschi
download citation
- all files
- 1.5 MB
- source texts
- 1.3 MB
- coded values
- 0.4 MB
- archive
- metadata
- 0.1 MB
- 306 MB
- 0.2 MB
- (24/04/24)

Arabic (Kabiye)

Paul Noorlander
download citation
- all files
- —/—
- source texts
- —/—
- coded values
- —/—
- archive
- metadata
- —/—
- —/—
- —/—
- (24/07/13)

Arabic (Khuzestan)

Bettina Leitner
download citation
- all files
- 0.5 MB
- source texts
- 0.3 MB
- coded values
- 0.4 MB
- archive
- metadata
- 0.1 MB
- 639 MB
- 0.3 MB
- (24/04/24)

C Neo-Aramaic (Mlahso)

Paul Noorlander
download citation
- all files
- —/—
- source texts
- —/—
- coded values
- 0.4 MB
- archive
- metadata
- 0.1 MB
- —/—
- 0.2 MB
- (24/07/13)

C Neo-Aramaic

(Turoyo, Midyat)

Paul Noorlander
download citation
- all files
- —/—
- source texts
- —/—
- coded values
- 0.5 MB
- archive
- metadata
- 0.1 MB
- —/—
- 0.3 MB
- (24/07/13)

NE Neo-Aramaic

(Christian, Barwar)

Donald Stilo
download citation
- all files
- 2.7 MB
- source texts
- 2.5 MB
- coded values
- 0.4 MB
- archive
- metadata
- 0.1 MB
- 0.2 MB
- (24/07/10)

NE Neo-Aramaic

(Christian, Shaqlawa)

Paul Noorlander
download citation
- all files
- —/—
- source texts
- —/—
- coded values
- 0.2 MB
- archive
- metadata
- 0.1 MB
- —/—
- 0.1 MB
- (24/07/13)

NE Neo-Aramaic

(Christian, Urmi)

Paul Noorlander
download citation
- all files
- —/—
- source texts
- —/—
- coded values
- 0.3 MB
- archive
- metadata
- 0.1 MB
- —/—
- 0.2 MB
- (24/07/13)

NE Neo-Aramaic

(Jewish, Dohok)

Dorota Molin
download citation
- all files
- 0.5 MB
- source texts
- 0.4 MB
- coded values
- 0.2 MB
- archive
- metadata
- 0.1 MB
- —/—
- 0.1 MB
- (24/04/24)

NE Neo-Aramaic

(Jewish, Sanandaj)

Paul Noorlander
download citation
- all files
- 1.1 MB
- source texts
- 0.7 MB
- coded values
- 0.8 MB
- archive
- metadata
- 0.1 MB
- —/—
- 0.4 MB
- (24/04/24)

NE Neo-Aramaic

(Jewish, Urmi)

Paul Noorlander
download citation
- all files
- —/—
- source texts
- —/—
- coded values
- 0.4 MB
- archive
- metadata
- 0.1 MB
- —/—
- 0.2 MB
- (24/07/13)

Armenian

Armenian (Eastern, Agulis)

Katherine Hodgson
download citation
- all files
- 0.8 MB
- source texts
- 0.5 MB
- coded values
- 0.6 MB
- archive
- metadata
- 0.2 MB
- 620 MB
- 0.3 MB
- (24/04/24)

Hellenic

Pontic Greek (Madan)

Katherine Hodgson
download citation
- all files
- 0.3 MB
- source texts
- 0.2 MB
- coded values
- 0.3 MB
- archive
- metadata
- 0.1 MB
- 315 MB
- 0.1 MB
- (24/07/10)

Pontic Greek (Romeyka)

Laurentia Schreiber
download citation
- all files
- 0.1 MB
- coded values
- 0.2 MB
- archive
- metadata
- 0.1 MB
- 0.1 MB
- (24/04/24)

Indo-Aryan

Kholosi (Kholos)

Maryam Nourzaei
download citation
- all files
- 1.3 MB
- source texts
- 1.1 MB
- coded values
- 0.2 MB
- archive
- metadata
- 0.1 MB
- 234 MB
- 0.1 MB
- (24/04/24)

Publications

Published papers

(NEW!) Craevschi, Alexandru. 2022. Historical contingency and typological tendencies in languages of Western Asia: A quantitative study of word order of non-subject constituents. Unpublished MA thesis, University of Bamberg.

Haig, Geoffrey & Rasekh-Mahand, Mohammad. 2019. Post-predicate elements in Iranian and neighbouring languages: Inheritance, contact, and information structure. Position paper for the project Post-predicate constituents in Iranian and neighbouring languages.

Conference talks

(NEW!) Leitner, Bettina. 2022. Word order in Khuzestani Arabic (with some notes on Bushehr and Hormozgan Arabic). Paper presented at the workshop on Post-predicate elements across the languages of Western Asia: Theoretical and empirical approaches, Bamberg, Germany, 22–23 September 2022.

(NEW!) Rasekh-Mahand, Mohammad. 2022. Forty years after Frommer: Post-predicate elements in Persian. Paper presented at the workshop on Post-predicate elements across the languages of Western Asia: Theoretical and empirical approaches, Bamberg, Germany, 22–23 September 2022.

(NEW!) Schreiber, Laurentia & Janse, Mark. 2022. Word order & post-predicate elements in Romeyka. Paper presented at the workshop on Post-predicate elements across the languages of Western Asia: Theoretical and empirical approaches, Bamberg, Germany, 22–23 September 2022.

Haig, Geoffrey. 2021. Doing corpus-based syntactic typology with spoken language corpora. Workshop held as part of the LILEC Summer School 2021: Catching Language Data, Bologna, Italy, 23–24 April 2021.

Haig, Geoffrey. 2020. Stability and adaptivity of word order in the Western Asian Transition Zone: Evidence from West Iranian. Paper presented at the Workshop on Tracing Contact in Closely Related Languages, Zürich, Switzerland, 19–20 November 2020.

References

Faghiri, Pegah & Samvelian, Pollet & Hemforth, Barbara. 2018. Is there a canonical order in Persian ditransitive constructions? In Korn, Angnes & Malchukov, Andrey (eds.), Ditransitive constructions in a cross-linguistic perspective, 165–186. Wiesbaden: Reichert.

Frommer, Paul. 1981. Post-verbal phenomena in colloquial Persian syntax. PhD dissertation, University of Southern California.

Khan, Geoffrey. 2008. The Neo-Aramaic dialect of Barwar. Leiden: Brill.

Menz, Astrid. 2013. Gagauz. Tehlikedeki Diller Dergisi [Journal of Endangered Languages] 2(2), 55–69.

Contact

For inquiries, please contact Geoffrey Haig. Please direct questions concerning this website to Nils Schiborr.

The resources presented here as well as this page are hosted on the servers of the computing centre of the University of Bamberg. Relevant legal information can be found here.

Getting started with WOWA

Citing WOWA

Research Background

Corpus design

Documentation

The doculects

Turkic

Oghuz (Ankara)

Kateryna Iefremenko

Oghuz (Erzurum)

Mahîr Can Doğan

Oghuz (Gagauz)

Mahîr Can Doğan

Oghuz (Qashqai)

Laurentia Schreiber

Oghuz (Tabriz)

Donald Stilo

Iranian

Balochi (Coastal)

Maryam Nourzaei

Balochi (Koroshi)

Maryam Nourzaei

Balochi (Turkmen)

Geoffrey Haig

Bashkardi (Northern)

Agnes Korn (transcription/analysis),Ilya Gershevitch (recordings)

Bashkardi (Southern)

Agnes Korn (transcription/analysis),Ilya Gershevitch (recordings)

Gorani (Gawraǰū)

Masoud Mohammadirad

Kumzari (Musandam)

Geoffrey Haig

Kurdish (Central, Sanandaj)

Masoud Mohammadirad

Kurdish (Northern, Ankara)

Kateryna Iefremenko

Kurdish (Northern, Lachin)

Donald Stilo

Kurdish (Northern, Muş)

Geoffrey Haig

Kurdish (Southern, Bijar)

Masoud Mohammadirad

Mazandarani (Kordxeyl)

Donald Stilo, Geoffrey Haig

Persian (New)

Elham Izadi

Persian (New, Early Classical)

Mehdi Parizadeh

Talyshi (Lerik)

Donald Stilo

Tāti (Hazārrudi)

Raheleh Izadifar

Vafsi (Gurchani)

Mahîr Can Doğan

Zazakî (Çewlîg)

Netîce Demir, Mahîr Can Doğan

Zazakî (Siwêreg)

Netîce Demir, Mahîr Can Doğan

Kartvelian

Laz (Arhavi)

Donald Stilo, René Lacroix

Semitic

Arabic (Jewish, Baghdad)

Assaf Bar-Moshe, Alexandru Craevschi

Arabic (Kabiye)

Paul Noorlander

Arabic (Khuzestan)

Bettina Leitner

C Neo-Aramaic (Mlahso)

Paul Noorlander

C Neo-Aramaic

(Turoyo, Midyat)

Paul Noorlander

NE Neo-Aramaic

(Christian, Barwar)

Donald Stilo

NE Neo-Aramaic

(Christian, Shaqlawa)

Paul Noorlander

NE Neo-Aramaic

Agnes Korn (transcription/analysis),
Ilya Gershevitch (recordings)

Agnes Korn (transcription/analysis),
Ilya Gershevitch (recordings)