Skip to main content

CCLS2025 Keynote


Maciej Eder: Text Analysis Made Simple (Kind of), or Ten Years of Stylo

Krakow | July 3, 2025

Abstract

The talk will revolve around software designed and developed specifically to perform text analysis tasks such as classification, clustering, and visualization. Special attention will be paid to the R library Stylo, which has been designed as a relatively simple, open source tool to conduct experiments in authorship attribution, but over the years evolved into fully-fledged set of functions tailored for different applications, including supervised and unsupervised classification, large-scale analyses following the ‘distant reading’ paradigm, sequential analysis of subsequent chunks of a text in question, and so forth. Apart from the original authorship attribution realm, the software can be used to address more general research questions, e.g. to trace genre, gender, chronology, intertextuality, and other stylometric ‘signals’. The talk will look back at the development of Stylo over the past ten years, including lessons learned for tool development, community engagement, teaching of text analysis methods, let alone tailoring the tool as new methodological inspirations and empirical investigations emerged in the fast developing field of stylometry.

About the speaker

Maciej Eder (maciejeder.org) is professor in linguistics at the Institute of Polish Language (Polish Academy of Sciences), and visiting professor in digital humanities at the University of Tartu. His relation with computational literary studies is long and intense. Among other things, Eder is the principal investigator of the project Computational Literary Studies Infrastructure (CLS INFRA), co-founder of the Computational Stylistics Group, and the main developer of the R package ‘Stylo’ for performing stylometric analyses. His main research area lays somewhere between literary studies and linguistics, and revolves around quantitative approaches to style variation. These include measuring style using statistical methods, authorship attribution based on quantitative measures, as well as ‘distant reading’ methods to analyze dozens (or hundreds) of literary works at a time.

Back to conference programme