CCLS2024 Keynote


Maciej Eder: Text Analysis Made Simple (Kind of), or Ten Years of Stylo

Vienna | June 13, 2024

Abstract

The talk will revolve around software designed and developed specifically to perform text analysis tasks such as classification, clustering, and visualization. Special attention will be paid to the R library Stylo, which has been designed as a relatively simple, open source tool to conduct experiments in authorship attribution, but over the years evolved into fully-fledged set of functions tailored for different applications, including supervised and unsupervised classification, large-scale analyses following the ‘distant reading’ paradigm, sequential analysis of subsequent chunks of a text in question, and so forth. Apart from the original authorship attribution realm, the software can be used to address more general research questions, e.g. to trace genre, gender, chronology, intertextuality, and other stylometric ‘signals’. The talk will look back at the development of Stylo over the past ten years, including lessons learned for tool development, community engagement, teaching of text analysis methods, let alone tailoring the tool as new methodological inspirations and empirical investigations emerged in the fast developing field of stylometry.

Maciej Eder (maciejeder.org) is the director of the Institute of Polish Language (Polish Academy of Sciences), chair of the Committee of Linguistics at the Polish Academy of Sciences, principal investigator of the project Computational Literary Studies Infrastructure, co-founder of the Computational Stylistics Group, and the main developer of the R package ‘Stylo’ for performing stylometric analyses. He is interested in European literature of the Renaissance and the Baroque, classical heritage in early modern literature, and quantitative approaches to style variation. These include measuring style using statistical methods, authorship attribution based on quantitative measures, as well as ‘distant reading’ methods to analyze dozens (or hundreds) of literary works at a time.

Back to conference programme