Student projects

The following Masters/PhD students are currently working on a project (co-)supervised by Andreas Baumann - MA, PhD, Data Analysis Project (Uni Wien), or Interdisciplinary Project Data Science (TU Wien) - in the field of Digital Linguistics.

Project announcements

Data analysis project (SS2025)

Digital language death 2.0

There are approximately 7,000 languages spoken worldwide, many of which face varying degrees of endangerment. In his seminal paper “Digital Language Death”, Kornai (2013, PLOS One, 8(10), e77056) explored the relationship between language stability and the extent to which languages are represented in digital spaces. One of the key findings was that the vitality of a language—measured by the number of speakers, the volume of Wikipedia articles available in that language, and the level of institutional support it receives—are positively correlated. Since the publication of this study over a decade ago, the digital landscape has undergone significant transformation, raising new questions about the evolving relationship between language vitality and digital representation.

In this project, students will revisit and partially replicate Kornai’s (2013) study using updated data from Wikipedia (to be collected during the project) and Ethnologue, a comprehensive language database that provides information on language endangerment levels and speaker populations. Students will employ a combination of web-scraping, information extraction, and statistical modeling techniques to analyze the data. By doing so, they will assess whether the patterns observed by Kornai a decade ago still hold in today’s rapidly changing digital environment and explore potential shifts in the dynamics of language endangerment and digital representation. This project will not only provide insights into the current state of linguistic diversity but also equip students with practical skills in data collection, analysis, and interpretation.

Supervisors: Hannes Fellner, Andreas Baumann

Number of students: 1-3 (in particular DH and DS, but BA students are also welcome)

Prerequisites: Python or R; ideally some experience with text processing, web-scraping (Wikipedia API), and/or regression modeling

Data analysis project (SS2025)

Probing epidemiological survey techniques for linguistic research

To learn about how diseases spread through populations, epidemiologists require knowledge of population structure and rates of contact events through which diseases can be transmitted. In their pioneering research, Mossong et al. (2008, PLoS medicine, 5(3), e74) have conducted a large-scale diary survey to collect data about contact structure. In this survey, participants were asked to provide information about who they got in touch with physically, and about demographics (age, gender) of their contacts. Evidently, knowledge of contact structure is relevant to the study of language and how it is used by speaker populations as well. The goal of this data analysis project is to design and run a small-scale survey to collect data about face-to-face and digital linguistic contact events between speakers. The survey will be distributed through the crowd-sourcing platform Prolific Academic. Akin to Mossong et al. (2008), students will compare and analyze the linguistic contact structures in the two domains (face-to-face vs. digital interactions).

Supervisors: Yllka Velaj, Andreas Baumann

Number of students: 1-2 (DH, BA, DS)

Prerequisites: ideally experience with survey design and tools (e.g., SoSciSurvey, Prolific Academic), basic knowledge of statistical data analysis, html/xml, and R/Python  

Current projects

Jona Hassenbach (MA-thesis project, MA Digital Humanities)

Reception Through Time

In literary history, there are few figures who have been received as frequently as classical characters. However, evaluating a character’s reception history often depends on the person doing the Interpretation and can thus be limited by their individual understanding of language. While comparing different interpretations is one way to address this problem, I want to try a different approach: a diachronic Emotion analysis using word embeddings from different time periods along with the VAD (Valence-Arousal-Dominance) emotional model. In this way, the resulting VAD scores should better reflect how a text judged a Character using the language of its own time. By comparing works from different periods but centered around the same group of classical women, I hope togain new insights into their reception history.

Markus Pluschkovits (PhD project, co-supervised with Alexandra Lenz)

Realizations of the Progressive Aspect in German: Form, Function and Variation

This dissertation project is concerned with the different realizations of progressive aspectuality in contemporary German. Taking a cognitive and sociolinguistic approach, the aim of the project is to use quantitative methodology to investigate the steering factors behind the choice of specific constructions to encode actions being in progress.

Sarah Bloos (MA-thesis project, MA Digital Humanities)

Do you speak “Grant”?

The Viennese Grant is perhaps the most popular sociolinguistic stereotype about the city – but there’s only little known about how it’s really perceived. Collecting data from participants from Austria, Germany and Switzerland, I’m attempting to capture Grant using a dimensional emotion model (VAD). Further interest lies on possible correlations of sociolinguistic variables like age or gender and the respective perception of Grant, eventually leading to observing culturally or demographically varying clusters of understanding.

Laura Kristen (MA-thesis project, MA Lehramt German & History)

Gender-inclusive language in the Austrian Parliament

My master-thesis deals with the investigation of the use of gender-inclusive language in the speeches of the Austrian Parliament within a defined period of time. The main aim of this work is to determine the proportion of speeches potentially affected by gender-inclusive language and, furthermore, to analyze the extent to which parliamentary debates reflect gender inequalities.

Claudia Mattes (PhD project, co-supervised with Alexandra Lenz)

The gehören-passive. A corpus linguistic approach to the analytic construction gehören + participle II

The non-canonical passive form, comprised of gehören and the past participle of a verb, hasn’t been extensively researched so far. With the approach through digital methods in different corpora, the aim of this thesis is to better understand the construction in its different aspects, namely its development, the current grammaticalization and the semantic-pragmatic usage.

Past projects

Hannes Essfors (Data analysis project, WS2024, MA Digital Humanities)

Sociophonetic variation in Afrikaans vowel production

This project was about a dataset consisting of acoustic features of vowels (first and second formant) that have been produced by white and colored speakers of Afrikaans, a Germanic language that is spoken (mainly) in South Africa. The data have been recorded using two different methods (word lists vs. speech in context) by Daan Wissing (North West University, Potchefstroom, South Africa). Acoustic features have been already extracted for all configurations. The goal of the project was to compare the different configurations to assess (a) whether the examined sociolinguistic variants of Afrikaans differ from each other and (b) to what extent results based on different methodologies match.

Marina Sommer (Interdisciplinary Project in Data Science, MSc Data Science, TU Wien)

An analysis of the development of the German touch verbs ‘anfassen’, ‘angreifen’, ‘anlangen’ with text data from Common Crawl

The aim of my project was to find out if the usage of the German touch verbs "anfassen", "angreifen" and "anlangen" has changed over the last decade. The main focus was on the exploration of the unique data repository and data format of the platform Common Crawl.

Lale Tüver & Katharina Zeh (Data analysis project, SS2024, MA Digital Humanities)

Linguistic Diversity in the Digital Age: Exploring the Effect of Digital Literacy on Minority Languages

Global linguistic diversity has declined, but the mechanisms behind this trend remain unclear. While past research focused on socioeconomic factors, this study examines the role of digital literacy. Limited digital proficiency is hypothesized to marginalize minority languages online. To test this, we compiled global language data and used Shannon entropy to quantify linguistic diversity, analyzing links with digital and demographic indicators. Our findings show that internet access supports linguistic diversity, while education levels negatively impact it. However, digital skills had no significant effect. The study contributes to discussions on language endangerment and suggests directions for future research.

Martin Miesbauer (Interdisciplinary Project in Data Science, MSc Data Science, TU Wien)

The role of linguistically encoded emotional characteristics for cooperativeness in the Zurich Tangram Corpus

Research suggests that emotions correlate positively with cooperation in collaborative tasks. This study explores whether emotions can predict cooperativeness using a dataset of cooperative interactions. Emotional states are defined by three dimensions: valence (negative-positive), arousal (calm-agitated), and dominance (submissive-dominant). The study examines the importance of these factors in predicting cooperativeness and analyzes the impact of different measures. Specifically, it focuses on predicting task completion time, which is inversely related to cooperativeness. The findings aim to enhance understanding of how emotional states influence teamwork and task efficiency.