<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20120330//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd">
<!--<?xml-stylesheet type="text/xsl" href="article.xsl"?>-->
<article article-type="research-article" dtd-version="1.2" xml:lang="en" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id journal-id-type="issn">2940-1348</journal-id>
<journal-title-group>
<journal-title>Journal of Computational Literary Studies</journal-title>
</journal-title-group>
<issn pub-type="epub">2940-1348</issn>
<publisher>
<publisher-name>Technische Universit&#228;t Darmstadt</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.48694/jcls.4209</article-id>
<article-categories>
<subj-group>
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Exploring Measures of Distinctiveness</article-title>
<subtitle>An Evaluation Using Synthetic Texts</subtitle>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0001-5545-9562</contrib-id>
<name>
<surname>Havrylash</surname>
<given-names>Julia</given-names>
</name>
<email>julia.havrylash@uni-trier.de</email>
<xref ref-type="aff" rid="aff-1">1</xref>
</contrib>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0002-4557-2753</contrib-id>
<name>
<surname>Sch&#246;ch</surname>
<given-names>Christof</given-names>
</name>
<xref ref-type="aff" rid="aff-1">1</xref>
</contrib>
</contrib-group>
<aff id="aff-1"><label>1</label>Trier Center for Digital Humanities, Trier University <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://ror.org/02778hg05">ROR</ext-link>, Trier, Germany</aff>
<pub-date publication-format="electronic" date-type="pub" iso-8601-date="2025-21-12">
<day>21</day>
<month>11</month>
<year>2025</year>
</pub-date>
<pub-date pub-type="collection">
<year>2025</year>
</pub-date>
<volume>4</volume>
<issue>1</issue>
<fpage>1</fpage>
<lpage>16</lpage>
<history>
<date date-type="received" iso-8601-date="2025-02-06">
<day>06</day>
<month>02</month>
<year>2025</year>
</date>
<date date-type="accepted" iso-8601-date="2025-10-26">
<day>26</day>
<month>10</month>
<year>2025</year>
</date>
<date date-type="published" iso-8601-date="2025-11-21">
<day>21</day>
<month>11</month>
<year>2025</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright: &#x00A9; 2025 The Author(s)</copyright-statement>
<copyright-year>2025</copyright-year>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>The text of this work is released under the Creative Commons license CC BY 4.0 International. You can find the contract text of the license at <uri xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</uri>. The illustrations are excluded from this license, here the copyright lies with the respective rights holder.</license-p>
</license>
</permissions>
<self-uri xlink:href="https://jcls.io/articles/10.48694/jcls.4209/"/>
<abstract>
<p>Measures of distinctiveness (aka keyness) are important tools for comparing groups of texts to identify each group&#8217;s characteristic features. Evaluating these measures is essential to ensure their reliability and predictability. In our research, we developed and applied a new method for evaluating measures of distinctiveness. Our method uses a synthetically generated, homogeneous text corpus to which we insert an artificial word whose frequency and dispersion are precisely manipulated. This approach allows us to determine each measure&#8217;s sensitivity to variations in frequency and dispersion. Through our evaluation, we have uncovered previously unknown characteristics of these measures. Specifically, we discovered that the TF-IDF-based measure we used is more sensitive to dispersion variations than other dispersion-based measures. Moreover, we found that Eta cannot detect a word with a clear dispersion contrast when it has the same frequency in both the target and comparison groups. In our next steps, we aim to explore practical applications of this new knowledge about measures of distinctiveness.</p>
</abstract>
<kwd-group>
<kwd>evaluation</kwd>
<kwd>measures of distinctiveness</kwd>
<kwd>keyness</kwd>
<kwd>synthetic texts</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="S1">
<title>1. Introduction</title>
<p>Comparing groups of texts to identify what is distinctive about each is a fundamental approach in many research contexts. In computational literary studies, such comparisons are particularly valuable for exploring literary style, genre conventions, authorial voice, or historical shifts in discourse. Such contrastive analyses of literary corpora have, for instance, been used to study different characters&#8217; speech in Shakespeare&#8217;s plays (<xref ref-type="bibr" rid="B3">Culpeper 2009</xref>), in the context of gender markers in English-language novels (<xref ref-type="bibr" rid="B24">Weidman and O&#8217;Sullivan 2018</xref>), for determining the place of tragicomedy with respect to comedy and tragedy (<xref ref-type="bibr" rid="B19">Sch&#246;ch 2018</xref>), or for identifying phrasal expressions characteristic of subgenres of the contemporary French novel (<xref ref-type="bibr" rid="B10">Gonon et al. 2018</xref>).</p>
<p>A key challenge in this task, alongside selecting the appropriate comparison corpora, is finding the most suitable measure and parameters for a specific research question and corpus composition. There is a wide range of measures available, and the list of most distinctive features they identify can vary considerably (as shown e.g. by <xref ref-type="bibr" rid="B4">Du et al. 2021a</xref> for the Zeta and Eta measures). While in principle, virtually any countable feature of texts may be submitted to a contrastive statistical analysis in order to identify distinctive features, we focus exclusively on lexical features in this research, specifically on word unigrams. In this paper, we explore and evaluate various measures of distinctiveness, also known as keyness measures, which support such research from a quantitative perspective. Although we do not prescribe a particular measure for researchers to use, our paper offers valuable insights into the characteristics of these measures, helping researchers understand their behavior and the potential outcomes when applying different distinctiveness measures in their studies.</p>
<p>In the research we report on here, we focus on evaluating measures of distinctiveness through an analysis based on synthetic texts.<xref ref-type="fn" rid="n1">1</xref> Our research proposes a new method for evaluating measures of distinctiveness, utilizing synthetically created text collections that reflect word frequencies as they would have occurred in a regular corpus built from the same original texts. Studies based on naturally occurring language must work around the fact that the frequency and dispersion of any word will vary and correlate to some extent. Our approach allows for precise, independent manipulation of word frequency and dispersion by inserting an artificial word. By conducting keyness analysis using synthetically created datasets and through inserting an artificial word with precisely manipulated frequency and dispersion into the synthetic dataset, we aim to systematically uncover the characteristics of different measures. Our goal is to determine the degree of sensitivity of each measure to variations in frequency and dispersion. Our method enables us to uncover new advantages and limitations of distinctiveness measures and to compare their sensitivity to frequency and dispersion variations under consistent conditions.</p>
<p>The structure of our paper is as follows: We begin with an overview of previous work in the evaluation of measures of distinctiveness (<xref ref-type="sec" rid="S2">section 2</xref>). Next, we describe our dataset (<xref ref-type="sec" rid="S3">section 3</xref>) and provide a detailed explanation of our methodology (<xref ref-type="sec" rid="S4">section 4</xref>). We then outline our hypotheses (<xref ref-type="sec" rid="S5">section 5</xref>) and present the results of our evaluation (<xref ref-type="sec" rid="S6">section 6</xref>). Finally, we conclude by summarizing our key findings and discussing potential directions for future research (<xref ref-type="sec" rid="S7">section 7</xref>).</p>
</sec>
<sec id="S2">
<title>2. Previous Work: Evaluation of Keyness Measures</title>
<p>Evaluating measures of distinctiveness is challenging due to the fact that generating a gold standard annotation, based on which performance measures such as precision and recall can be calculated, is very difficult. Distinctiveness is not an inherent characteristic of a word, nor does it depend only on local context; rather, it can only be detected in the context of the entire target corpus while considering it in comparison to another corpus. Therefore, alternative methods of comparison and evaluation of the measures of distinctiveness are required. To tackle this challenge, several studies have attempted to evaluate distinctiveness measures using various methods.</p>
<p>Kilgarriff (<xref ref-type="bibr" rid="B14">2001</xref>) examined corpus similarity by reviewing the mathematical characteristics of various distinctiveness measures and argued that the Chi-squared test is the most suitable in finding the most characteristic words of a corpus. Paquot and Bestgen (<xref ref-type="bibr" rid="B17">2009</xref>) compared three different measures in their ability to identify frequent and well distributed keywords of academic prose as opposed to fictional prose and discovered that the t-test leads to the best results for their task. Lijffijt et al. (<xref ref-type="bibr" rid="B15">2014</xref>) explored a broad array of measures, focusing on the statistical characteristics of these measures to identify their sensitivity to differences in word frequencies and distributions. The authors randomly sampled a text corpus into two parts in order to minimize differences in both parts and then performed a test for uniformity of p-values. Egbert and Biber (<xref ref-type="bibr" rid="B8">2019</xref>) introduced a distinctiveness measure based on dispersion, combining a straightforward dispersion metric with a log-likelihood ratio test. They compare the effectiveness of this approach with corpus frequency methods for identifying distinctive words in online travel blogs. Their study demonstrates that the dispersion-based measure outperforms the other types of measures. S&#246;nning (<xref ref-type="bibr" rid="B22">2023</xref>) evaluated 32 metrics, categorized into four dimensions of keyness. Like previously mentioned researchers, he distinguished between two primary perspectives on keyness: frequency-based and dispersion-based measures. His study assessed the effectiveness of these metrics in identifying predefined key verbs in academic writing. The results reveal significant differences among the metrics, with the Wilcoxon rank-sum test and dispersion-based measures emerging as the most effective.</p>
<p>The research we report on here also builds on fundamental work on measures of distinctiveness by our <italic>Zeta and Company</italic> project group. We conducted an in-depth analysis of the qualitative characteristics of these measures (<xref ref-type="bibr" rid="B21">Schr&#246;ter et al. 2021</xref>). To enhance accessibility and usability, we implemented nine measures of distinctiveness in the Python package <italic>pydistinto</italic> (<xref ref-type="bibr" rid="B5">Du et al. 2021b</xref>). With Du et al. (<xref ref-type="bibr" rid="B4">2021a</xref>), we then introduced a new dispersion-based measure called Eta and compared it with the existing Zeta measure to highlight the advantages and disadvantages of each. Our group also performed a quantitative evaluation of nine measures on natural texts, including several dispersion-based measures, using a downstream classification task (<xref ref-type="bibr" rid="B6">Du et al. 2022</xref>). Our approach involved first identifying a given number of distinctive words provided by each measure for novels of a specific genre, in comparison to other literary genres. These distinctive words were then used to classify the novels by genre, with the classification accuracy obtained being a measure of each word list&#8217;s distinctiveness (in the qualitative sense of discriminatory power). We concluded that dispersion-based measures are more effective than frequency-based measures in identifying characteristic words of a target corpus.</p>
<p>Overall, while previous studies have provided valuable insights into distinctiveness measures, their reliance on abstract statistical analyses, intuitive evaluations, or a narrow selection of measures underscores the need for further research. Our study addresses these limitations by introducing a controlled, synthetic approach with precise manipulation of word frequency and dispersion, while also incorporating a wide range of different measures to enable a more systematic and nuanced assessment of their sensitivity. We have already conducted several analyses using naturally-occurring texts. Now, with our approach using synthetic texts, we aim to test theoretical insights about the measures under specially controlled conditions, allowing for a clearer understanding of how each distinctiveness score is calculated.</p>
<p>We think that using a wide variety of evaluation strategies is most likely to result in robust results, as past experience has shown that even theoretically sound and convincing arguments may not hold up to empirical scrutiny, whether quantitative or qualitative (as a case in point, consider investigations of distance-based stylometric authorship attribution; <xref ref-type="bibr" rid="B1">Argamon 2007</xref><xref ref-type="bibr" rid="B9">Evert et al. 2017</xref>).</p>
</sec>
<sec id="S3">
<title>3. Data</title>
<p>Our research is conducted on a synthetic text collection generated through random sampling at the word level from a corpus of French contemporary novels. The foundation for this corpus is a balanced subset from our larger collection of French contemporary popular novels and consists of 320 novels from the 1980s and 1990s. This custom-built corpus maintains equal representation (in terms of the number of novels included), per decade and across four subgroups: literary fiction, sentimental novels, crime fiction novels, and science fiction novels.</p>
<p>The original text corpus comprises approximately 19 million words. We load the entire corpus as a single dataset and randomly sample synthetic &#8216;novels&#8217;, each with a consistent length of 40,000 words. The sampling was performed at the word level. Our newly generated corpus contains 320 synthetic &#8216;novels&#8217;, matching the number of novels in the original corpus. This approach addresses two main objectives. First, it ensures that the generated corpus reflects the word occurrences and frequencies as they can be observed in the original corpus. Second, it results in a homogeneous corpus, purposefully eliminating subgenre differences because each text is sampled from the entire corpus.</p>
</sec>
<sec id="S4">
<title>4. Methods</title>
<p>The objective of our analysis is to assess the hidden properties and limitations of the measures of distinctiveness in identifying distinctive words. This is achieved by applying each measure to a homogeneous synthetic corpus to which an artificial word with a controlled frequency and dispersion has been added. Systematically varying the frequency and dispersion of this word, and observing how its keyness rank in the results varies as a result, shows us to what degree a given keyness measure is sensitive to differences in frequency and/or dispersion. We chose to compare the ranks, rather than the measures&#8217; scores, for better comparability.</p>
<p>In our analysis, we have analyzed all nine measures of distinctiveness implemented in our Python package <italic>pydistinto</italic>. The following measures are available in this package: Burrows Zeta (<xref ref-type="bibr" rid="B2">Burrows 2007</xref>), logarithmic Zeta (<xref ref-type="bibr" rid="B20">Sch&#246;ch et al. 2018</xref>), Eta (<xref ref-type="bibr" rid="B4">Du et al. 2021a</xref>), TF-IDF (<xref ref-type="bibr" rid="B23">Sp&#228;rck Jones 1972</xref>), Wilcoxon rank-sum test (<xref ref-type="bibr" rid="B26">Wilcoxon 1945</xref>), Welch&#8217;s t-test (<xref ref-type="bibr" rid="B25">Welch 1947</xref>, <xref ref-type="bibr" rid="B16">Mann and Whitney 1947</xref>), the Ratio of relative frequencies (RRF, <xref ref-type="bibr" rid="B11">Gries 2010</xref>), the Chi-squared test (<xref ref-type="bibr" rid="B18">Plackett 1983</xref>), and the Log-likelihood ratio test (LLR, <xref ref-type="bibr" rid="B7">Dunning 1993</xref>).<xref ref-type="fn" rid="n2">2</xref> The implemented measures can be categorized into three distinct groups based on their approach to identifying unique keywords when comparing a target and a comparison corpus. Within this framework, the techniques employed can be classified as follows:</p>
<list list-type="order">
<list-item><p>Frequency-based measures: These measures primarily focus on the frequency of the target word in the corpus, treating the corpus as a &#8216;bag of words&#8217; and disregarding how the target word is distributed within the corpus. Examples of measures falling under this classification include the RRF, the Chi-squared test, and the LLR.</p></list-item>
<list-item><p>Distribution-based measures: Rather than just considering corpus-wide mean word frequencies, these measures are based on the distribution of a word (described e.g. via its central tendency and variability) in the corpus. Unlike simpler frequency-based measures, then, these metrics also consider variability indicators, such as standard deviation. They are also quite flexible, in that some of them don&#8217;t require a normal distribution, allowing for a more nuanced comparison across different distributions. Welch&#8217;s t-test falls into this category.</p></list-item>
<list-item><p>Dispersion-based measures: These measures evaluate the extent to which the target word is evenly distributed, or dispersed, across a corpus. Measures within this category encompass Burrows Zeta, logarithmic Zeta, Eta, TF-IDF (our implementation of a TF-IDF-based keyness measure), and Wilcoxon rank-sum test (with certain restrictions).<xref ref-type="fn" rid="n3">3</xref></p></list-item>
</list>
<p>Our approach was as follows: As <italic>pydistinto</italic> requires a certain format of input data (CSV format including the following columns: token, lemma and POS), the original French corpus was annotated with spaCy before randomization.<xref ref-type="fn" rid="n4">4</xref> For the analysis with <italic>pydistinto</italic>, we used lemmas as the feature type. At the beginning of the process, the synthetic corpus was divided into segments of equal length, each containing 5,000 words, resulting in 8 segments per novel and a total of 2,560 segments. This segmentation is essential for the calculation of certain measures, such as Zeta and Eta.</p>
<p>Subsequently, the entire corpus was randomly divided into two sub-corpora of equal size for each run of <italic>pydistinto</italic>: target and comparison corpus. An artificial word was then added to both the target and comparison corpus parts with a specified frequency and dispersion.<xref ref-type="fn" rid="n5">5</xref> To maintain a constant total word count while adding an artificial word, each instance of the artificial word replaces one instance of an existing word in the corpus.</p>
<p>Our experiment was conducted in two primary settings to investigate the impact of two criteria &#8211; the frequency and dispersion of the artificial word within a corpus &#8211; on its distinctiveness score, calculated by different measures.</p>
<p>In the first setting, we added an artificial word to only one segment of the target and comparison corpus, albeit with varying frequency. This setting enables us to analyze the influence of only one parameter, namely the frequency. The frequency of the artificial word was set to 10 in the comparison corpus and remained constant there, while varying from 10 to 2,000 words in the target corpus. We used 12 different parameters for the frequency setting in the target corpus (10, 20, 30, 40, 50, 100, 200, 300, 400, 500, 1000, and 2000). For each parameter setting, <italic>pydistinto</italic> was run 100 times to mitigate the impact on the results of high scores for frequent words, which may arise as a result of variation that follows from the random sampling procedure and may in turn influence the distinctiveness score of an artificial word. The corpus was randomly divided into target and comparison parts at the level of the &#8216;novels&#8217; for each run. Given the fact that texts were built by randomly sampling words from the entire corpus, and the two subcorpora were built by randomly sampling &#8216;novels&#8217; from among all &#8216;novels&#8217;, any difference between the target and comparison corpora, apart from the artificial word, can only be due to random variation.</p>
<p>In the second setting, we experimented with the dispersion of the artificial word. In this case, the frequency of the artificial word was kept constant at 1,000 occurrences in both the target and comparison corpus, but its dispersion varied in the target corpus while remaining constant in the comparison corpus. The idea was again to isolate one parameter, in this case dispersion, and analyze its influence on the performance of the different measures. For the comparison corpus we used the following settings: we added 1,000 instances of the artificial word to just 1 segment.<xref ref-type="fn" rid="n6">6</xref> Dispersion variation was achieved by adding the artificial word with a specified, constant total frequency to the target corpus, but with varying degrees of dispersion. We conducted distinctiveness analyses with variations in the target corpus according to the following schema, where the first number refers to the number of segments that receive the artificial word, and the second to the number of times the artificial word is included in each of the selected segments: 1/1000, 2/500, 5/200, 10/100, 20/50, 50/20, 100/10, 200/5, 500/2, 1000/1. The product of the two values, and therefore the total frequency, remains constant at 1,000 (and is therefore identical to the frequency of the word in the comparison corpus), but the number of segments these occurrences are spread out over is varied systematically. This resulted in a total of 10 parameter settings for the dispersion experiments. Again, <italic>pydistinto</italic> was run 100 times for each parameter setting.</p>
<p>Following this step, the results for each parameter setting were combined into a single dataframe. Subsequently, all words in the corpus were sorted based on their distinctiveness scores, and for each measure, the rank of the artificial word following from its distinctiveness score was recorded. Each measure&#8217;s performance was evaluated based on the rank of the artificial word (where a rank of 1 indicates the highest distinctiveness score).</p>
</sec>
<sec id="S5">
<title>5. Hypotheses</title>
<p>For this evaluation experiment, we developed the following hypotheses:</p>
<disp-quote>
<p><bold>Hypothesis 1.</bold> For dispersion-based measures (Eta, Zeta, and logarithmic Zeta, Wilcoxon rank-sum test), we hypothesize that they should not show any variation in scores when frequency changes while dispersion remains constant.</p>
<p><bold>Hypothesis 2.</bold> However, dispersion-based measures should be sensitive to even minimal variations in dispersion even when frequency remains constant, as the number of segments containing the target word is crucial for their calculation.</p>
<p><bold>Hypothesis 3.</bold> We hypothesize that frequency-based measures (RRF, LLR, and chi-square tests) will show high variations in distinctiveness scores even when the frequency difference of an artificial word between the target and comparison corpus is relatively small. This assumption stems from the statistical nature of these measures, which treat a corpus as a bag of words and do not account for word dispersion.</p>
<p><bold>Hypothesis 4.</bold> When the frequency of an artificial word is the same in both the target and comparison while its dispersion changes, the scores of frequency-based measures should remain unchanged.</p>
<p><bold>Hypothesis 5.</bold> Regarding our TF-IDF-based measure, we expect it to exhibit moderate sensitivity in both frequency and dispersion manipulations. This is because TF-IDF is based on term frequency, but the number of segments containing the target word also significantly influences its calculation.</p>
<p><bold>Hypothesis 6.</bold> Regarding Welch&#8217;s test, we hypothesize that there will be minimal variations in the score in the case of frequency manipulation. This assumption is based on the fact that the calculation of Welch&#8217;s test relies on the mean and standard deviation of the frequency distributions, rather than on the raw frequency of the word.</p>
</disp-quote>
</sec>
<sec id="S6">
<title>6. Results</title>
<p>Because our corpus is based on naturally occurring word frequencies, we conducted an additional analysis to identify potential artifacts caused by random sampling effects in the synthetic texts without the artificial word. This analysis aimed to identify the frequency differences of words in the corpus across multiple runs.</p>
<p><xref ref-type="fig" rid="F1">Figure 1</xref> illustrates the relationship between rank and the Ratio of Relative Frequencies (RRF) scores, based on 100 runs of randomly sampled synthetic corpora. As shown, the first rank is typically achieved with RRF scores ranging from 10 to 18. This suggests that, due to the natural variations in the frequencies of existing words, an RRF score below 10 for the artificial word is unlikely to secure the first rank.</p>
<fig id="F1">
<caption>
<p><bold>Figure 1:</bold> The correlation between the RRF score of the words and their ranks in the synthetic corpus.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jcls-4-1-4209-g1.png"/>
</fig>
<p>As discussed in <xref ref-type="sec" rid="S4">section 4</xref>, we conducted our evaluation in two main settings: frequency variation of an artificial word and dispersion variation. First, we are going to discuss the results of the evaluation based on frequency variations, before moving on to the results for dispersion variations.</p>
<sec id="S6.1">
<title>6.1. Evaluation Based on Frequency Variations</title>
<p>Concerning the impact of frequency variation on the performance of the measures, as described in <xref ref-type="sec" rid="S5">section 5</xref>, we used 12 different parameters for the frequency settings. <xref ref-type="fig" rid="F2">Figure 2</xref> depicts the variation in the rank of the artificial word, as calculated by Zeta, Eta, the rank-sum test and Welch&#8217;s test, respectively, depending on its frequency in the target corpus.<xref ref-type="fn" rid="n7">7</xref> The x-axis represents the frequency variation in the target corpus (from 20 to 2,000 occurrences in one segment of the target corpus). On the y-axis, the rank of the artificial word is depicted, each boxplot showing the median and range of the 100 ranks recorded for each particular parameter setting. To enhance the readability of the figure, the values on the y-axis are presented on a logarithmic scale.</p>
<fig id="F2">
<caption>
<p><bold>Figure 2:</bold> The relation between the frequency of the artificial word in the target corpus and its rank in the results, for Zeta, Eta, rank-sum test and Welch&#8217;s test.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jcls-4-1-4209-g2.png"/>
</fig>
<p>Dispersion-based measures including Zeta, logarithmic Zeta, Eta, Wilcoxon rank-sum test, as well as Welch&#8217;s t-test, which we consider rather as a distribution-based measure, demonstrate very similar results. For these measures, the frequency variations of an artificial word in the target corpus don&#8217;t play an important role. The rank of the artificial word consistently exceeds 10,000 for frequencies ranging from 20 to 2,000 in the target corpus, indicating a very low distinctiveness score according to these measures. The scores for Eta, Zeta, logarithmic Zeta, and the Wilcoxon rank-sum tests remain consistent across the board, supporting Hypothesis 1 and validating our method. The scores from Welch&#8217;s test show minimal variation, as expected in Hypothesis 6.</p>
<p>Frequency-based measures such as the chi-square test, LLR, and RRF exhibit high sensitivity to frequency variations, as expected, supporting Hypothesis 3. However, we can observe some interesting results here. When considering the RRF, the artificial word moves up in rank with increasing frequency from 20 to 100 (<xref ref-type="fig" rid="F3">Figure 3</xref>). Starting from 200 artificial words in the target corpus, RRF-based rank is always 1, which means that the artificial word gets the highest score among all words in the corpus. As for LLR and chi-squared tests, both measures are even more sensitive to frequency variation compared to RRF. Starting at a frequency of just 40, we consistently observe the artificial word achieving the top rank.</p>
<fig id="F3">
<caption>
<p><bold>Figure 3:</bold> The relation between the frequency of the artificial word in the target corpus and its rank in the results, for RRF, chi-squared test, LLR and TF-IDF.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jcls-4-1-4209-g3.png"/>
</fig>
<p>TF-IDF is more sensitive to frequency variation than dispersion-based measures but significantly less so than frequency-based measures, aligning with our expectation in Hypothesis 5. With increasing frequency of the artificial word in the target corpus, its rank moves up. <xref ref-type="fig" rid="F3">Figure 3</xref> shows a moderately strong but continuous rise of the rank of the artificial word.</p>
</sec>
<sec id="S6.2">
<title>6.2. Evaluation Based on Dispersion Variations</title>
<p>As previously described, the dispersion analysis was conducted with 1,000 instances of the artificial word in one segment of the comparison corpus. <xref ref-type="fig" rid="F4">Figure 4</xref> illustrates the variation in the rank of an artificial word calculated by chi-square, LLR, RRF and TF-IDF. The x-axis depicts the dispersion variation of the artificial word in the target corpus from 1/1000 to 1000/1, where the first number represents the number of segments and the second number represents the number of instances of the artificial word distributed over those segments. The dispersion of the artificial word in the comparison corpus remains constant, set at 1/1000, indicating 1,000 words occurring in one segment.</p>
<fig id="F4">
<caption>
<p><bold>Figure 4:</bold> The relation between the dispersion of the artificial word in the target corpus and its rank in the results, for RRF, chi-squared test, LLR and TF-IDF. Dispersion in the comparison corpus is fix at 1/1000.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jcls-4-1-4209-g4.png"/>
</fig>
<p>In these settings, the frequency-based measures produce results consistent with those predicted by Hypothesis 4. When the dispersion changes (while the frequency remains constant), the rank of an artificial word does not change significantly and consistently remains at a level between 10,000 and 100,000.</p>
<p>This is true except for TF-IDF, for which interesting results are observed. Here, we anticipated that, as the dispersion becomes more even, the artificial word would receive a higher score, but only with a moderate rank improvement compared to other dispersion-based measures. In fact, we can observe that TF-IDF scores indeed increase as the number of segments containing the artificial word rises. However, the improvement in scores is not moderate; rather, TF-IDF appears to be highly sensitive to variations in dispersion, which partially rejects Hypothesis 5. We observe the artificial word achieving the top rank starting with a dispersion of just 100 words in 10 segments (<xref ref-type="fig" rid="F4">Figure 4</xref>, bottom right). This oversensitivity implies that the TF-IDF measure fails to distinguish between a dispersion of 100 words across 10 segments vs. one single word across 1,000 segments.</p>
<p><xref ref-type="fig" rid="F5">Figure 5</xref> shows results for dispersion variations with the measures Zeta_log, Eta, ranksum and Welch. An interesting result was obtained by Eta. As it is a dispersion-based measure, we expected Eta to effectively identify an artificial word as distinctive, especially when the word is evenly spread across a high number of segments. However, as the number of segments containing the artificial word in the target corpus increases, its scores remain consistently low compared to randomly assigned words. Only in the most extreme setting, with one occurrence in 1,000 segments in the target corpus, the artificial word receives the top rank (<xref ref-type="fig" rid="F5">Figure 5</xref>).</p>
<fig id="F5">
<caption>
<p><bold>Figure 5:</bold> The relation between the dispersion of the artificial word in the target corpus and its rank in the results, for Zeta, Eta, rank-sum test and Welch&#8217;s test. Dispersion in the comparison corpus is constant at 1/1000.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jcls-4-1-4209-g5.png"/>
</fig>
<p>Regarding the results of Welch&#8217;s test, when the frequency of the artificial word is identical in both the target and comparison corpora, the score consistently remains zero, resulting in a rank above 10,000. This indicates that, like the frequency-based measures, Welch&#8217;s test is not sensitive to variations in dispersion within our settings. This actually means that Welch&#8217;s test is neither sensitive to frequency variations alone, nor to dispersion variations alone.</p>
<p>Regarding the remaining dispersion-based measures, such as both variants of Zeta and the rank-sum test, we observe expected results. With increasing numbers of segments containing the artificial word in the target corpus, the artificial word&#8217;s rank moves up. Specifically, starting with 10 words in 100 segments, the artificial word consistently receives the top rank according to these three measures (<xref ref-type="fig" rid="F5">Figure 5</xref>). This indicates that Hypothesis 2 is supported solely for these three measures.</p>
</sec>
</sec>
<sec id="S7">
<title>7. Conclusion</title>
<p>Conducting analyses of measures of distinctiveness based on synthetic texts, we created ideal conditions to uncover the hidden properties of a range of such measures. Through our experiments, we tested the sensitivity of these measures to variations in the frequency and dispersion of a specific word. In many cases, our hypotheses regarding the performance of the measures were confirmed. Frequency-based measures are not sensitive to variations in dispersion, while dispersion-based measures are not affected by frequency variations. These observations are not surprising, of course, but they do validate our method.</p>
<p>However, some hypotheses were partly rejected and we have also uncovered some previously unknown (or at least undocumented) properties of measures of distinctiveness. In particular, we found that LLR and chi-squared tests are even more sensitive to frequency variation than RRF. For this reason, we generally do not recommend using the LLR and chi-squared tests, as they are highly sensitive to changes in frequency and are therefore not well-suited for keyness analysis aimed at identifying important content words. Both Zeta variations and the rank-sum test demonstrated similar scores and abilities to detect distinctive words, including cases in which the differences only concern the dispersion of words. Moreover, we discovered that TF-IDF is highly sensitive to dispersion differences of the target word, compared to other dispersion-based measures. Finally, we found that Eta cannot detect a word with a clear contrast in dispersion when its frequency is the same in both the target and comparison corpora. In our evaluation we observed words steadily moving up in rank with Zeta and rank-sum, while TF-IDF and Eta show more abrupt increases. We suggest that a gradual, continuous rank improvement is a desirable characteristic of a distinctiveness measure, as it indicates better sensitivity to slight variations in dispersion and is likely to produce more predictable results. For example, if a researcher is interested in identifying words that display contrasting dispersion within two subcorpora, without considering their frequency, then Zeta and the rank-sum test would be most appropriate for this task.</p>
<p>Despite the interesting observations derived from these analyses, there is significant potential for future work. One key step is to extend our framework by implementing additional measures of distinctiveness. Another area for future work involves expanding our analysis by implementing additional parameter settings that combine frequency and dispersion variations of the artificial word. Isolating dispersion or frequency often results in constant scores from the measures, but combining these parameters promises to provide new opportunities to uncover additional properties of these measures. A final, crucial step is to explore practical applications of this newfound knowledge about distinctiveness measures. Understanding the specific contexts and scenarios in which each of these measures can be most effectively utilized will open up new possibilities and enhance our ability to analyze and compare textual corpora more accurately.</p>
</sec>
</body>
<back>
<sec id="S8">
<title>8. Data Availability</title>
<p>Data can be found here: <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://github.com/Zeta-and-Company/synthetic_texts_evaluation">https://github.com/Zeta-and-Company/synthetic_texts_evaluation</ext-link>. It has been archived and is persistently available at: <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://doi.org/10.5281/zenodo.15525428">https://doi.org/10.5281/zenodo.15525428</ext-link>.</p>
</sec>
<sec id="S9">
<title>9. Software Availability</title>
<p>Software can be found here: <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://github.com/Zeta-and-Company/synthetic_texts_evaluation">https://github.com/Zeta-and-Company/synthetic_texts_evaluation</ext-link>. It has been archived and is persistently available at: <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://doi.org/10.5281/zenodo.15525428">https://doi.org/10.5281/zenodo.15525428</ext-link>.</p>
</sec>
<sec id="S11">
<title>10. Author Contributions</title>
<p><bold>Julia Havrylash:</bold> Conceptualization, Data Curation, Methodology, Formal Analysis, Software, Visualisation, Writing &#8211; original draft, Writing &#8211; review &amp; editing</p>
<p><bold>Christof Sch&#246;ch:</bold> Funding Acquisition, Supervision, Visualisation, Writing &#8211; review &amp; editing</p>
</sec>
<fn-group>
<fn id="n1"><p>We use the term &#8216;synthetic texts&#8217; to describe texts that have been generated from documents written by humans through a specific word-level sampling procedure. These texts are therefore different both from &#8216;naturally-occurring&#8217; text and from text generated using generative LLMs.</p></fn>
<fn id="n2"><p>More information about our rationale for implementing this set of measures in <italic>pydistinto</italic>, as well as detailed descriptions of each measure, can be found in <xref ref-type="bibr" rid="B6">Du et al. 2022</xref>.</p></fn>
<fn id="n3"><p>Note that these latter measures are based on measures of dispersion that are not entirely uncorrelated with frequency (see e.g. <xref ref-type="bibr" rid="B12">Gries 2022</xref>). Detailed information about these measures can be found in Du et al. (<xref ref-type="bibr" rid="B6">2022</xref>).</p></fn>
<fn id="n4"><p>See: <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://spacy.io/">https://spacy.io/</ext-link> and Honnibal et al. (<xref ref-type="bibr" rid="B13">2020</xref>).</p></fn>
<fn id="n5"><p>An artificial word is a specially created combination of letters and numbers that cannot occur in any natural language. An example of an artificial word used in this study looks like the following: untuniutntrng55886.</p></fn>
<fn id="n6"><p>In the dispersion analysis, we also tested another scenario, in which we randomly selected 1,000 segments and added one instance of the artificial word to each of them in the comparison corpus, ensuring even dispersion. However, this scenario turned out not to provide significant or additional insights. Therefore, we are not providing further explanations or results here.</p></fn>
<fn id="n7"><p>Zeta and Eta_log are not depicted in the figure, because their results are very similar to the Zeta_log results.</p></fn>
</fn-group>
<ref-list>
<ref id="B1"><mixed-citation publication-type="journal"><string-name><surname>Argamon</surname>, <given-names>Shlomo</given-names></string-name> (<year>2007</year>). <article-title>&#8220;Interpreting Burrows&#8217;s Delta: Geometric and Probabilistic Foundations&#8221;</article-title>. In: <source>Literary and Linguistic Computing</source> <volume>23</volume> (<issue>2</issue>), <fpage>131</fpage>&#8211;<lpage>147</lpage>. <pub-id pub-id-type="doi">10.1093/llc/fqn003</pub-id>.</mixed-citation></ref>
<ref id="B2"><mixed-citation publication-type="journal"><string-name><surname>Burrows</surname>, <given-names>John</given-names></string-name> (<year>2007</year>). <article-title>&#8220;All the Way Through: Testing for Authorship in Different Frequency Strata&#8221;</article-title>. In: <source>Literary and Linguistic Computing</source> <volume>22</volume> (<issue>1</issue>), <fpage>27</fpage>&#8211;<lpage>47</lpage>. <pub-id pub-id-type="doi">10.1093/llc/fqi067</pub-id>.</mixed-citation></ref>
<ref id="B3"><mixed-citation publication-type="journal"><string-name><surname>Culpeper</surname>, <given-names>Jonathan</given-names></string-name> (<year>2009</year>). <article-title>&#8220;Keyness: Words, Parts-of-Speech and Semantic Categories in the Character-Talk of Shakespeare&#8217;s <italic>Romeo and Juliet</italic>&#8221;</article-title>. In: <source>International Journal of Corpus Linguistics</source> <volume>14</volume> (<issue>1</issue>), <fpage>29</fpage>&#8211;<lpage>59</lpage>. <pub-id pub-id-type="doi">10.1075/ijcl.14.1.03cul</pub-id>.</mixed-citation></ref>
<ref id="B4"><mixed-citation publication-type="webpage"><string-name><surname>Du</surname>, <given-names>Keli</given-names></string-name>, <string-name><given-names>Julia</given-names> <surname>Dudar</surname></string-name>, <string-name><given-names>Cora</given-names> <surname>Rok</surname></string-name>, and <string-name><given-names>Christof</given-names> <surname>Sch&#246;ch</surname></string-name> (<year>2021a</year>). <article-title>&#8220;Zeta &amp; Eta: An Exploration and Evaluation of Two Dispersion-based Measures of Distinctiveness&#8221;</article-title>. In: <source>Proceedings of Computational Humanities Research 2021</source>. Ed. by <string-name><given-names>Maud</given-names> <surname>Ehrmann</surname></string-name>, <string-name><given-names>Folgert</given-names> <surname>Karsdorp</surname></string-name>, <string-name><given-names>Melvin</given-names> <surname>Wevers</surname></string-name>, <string-name><given-names>Tara Lee</given-names> <surname>Andrews</surname></string-name>, <string-name><given-names>Manuel</given-names> <surname>Burghardt</surname></string-name>, <string-name><given-names>Mike</given-names> <surname>Kestemont</surname></string-name>, <string-name><given-names>Enrique</given-names> <surname>Manjavacas</surname></string-name>, <string-name><given-names>Michael</given-names> <surname>Piotrowski</surname></string-name>, and <string-name><given-names>Joris</given-names> <surname>van Zundert</surname></string-name>, <fpage>181</fpage>&#8211;<lpage>194</lpage>. <uri>http://ceur-ws.org/Vol-2989/short_paper11.pdf</uri> (visited on 10/14/2025).</mixed-citation></ref>
<ref id="B5"><mixed-citation publication-type="book"><string-name><surname>Du</surname>, <given-names>Keli</given-names></string-name>, <string-name><given-names>Julia</given-names> <surname>Dudar</surname></string-name>, and <string-name><given-names>Christof</given-names> <surname>Sch&#246;ch</surname></string-name> (<year>2021b</year>). <source>Pydistinto - a Python Implementation of Different Measures of Distinctiveness for Contrastive Text Analysis</source>. Version v0.1.1. <publisher-name>Zenodo</publisher-name>. <pub-id pub-id-type="doi">10.5281/zenodo.5245096</pub-id>.</mixed-citation></ref>
<ref id="B6"><mixed-citation publication-type="journal"><string-name><surname>Du</surname>, <given-names>Keli</given-names></string-name>, <string-name><given-names>Julia</given-names> <surname>Dudar</surname></string-name>, and <string-name><given-names>Christof</given-names> <surname>Sch&#246;ch</surname></string-name> (<year>2022</year>). <article-title>&#8220;Evaluation of Measures of Distinctiveness: Classification of Literary Texts on the Basis of Distinctive Words&#8221;</article-title>. In: <source>Journal of Computational Literary Studies</source> <volume>1</volume> (<issue>1</issue>). <pub-id pub-id-type="doi">10.48694/jcls.102</pub-id>.</mixed-citation></ref>
<ref id="B7"><mixed-citation publication-type="webpage"><string-name><surname>Dunning</surname>, <given-names>Ted</given-names></string-name> (<year>1993</year>). <article-title>&#8220;Accurate Methods for the Statistics of Surprise and Coincidence&#8221;</article-title>. In: <source>Computational Linguistics</source> <volume>19</volume> (<issue>1</issue>), <fpage>61</fpage>&#8211;<lpage>74</lpage>. <uri>http://aclweb.org/anthology/J93-1003</uri> (visited on 10/16/2025).</mixed-citation></ref>
<ref id="B8"><mixed-citation publication-type="journal"><string-name><surname>Egbert</surname>, <given-names>Jesse</given-names></string-name> and <string-name><given-names>Doug</given-names> <surname>Biber</surname></string-name> (<year>2019</year>). <article-title>&#8220;Incorporating Text Dispersion into Keyword Analyses&#8221;</article-title>. In: <source>Corpora</source> <volume>14</volume> (<issue>1</issue>), <fpage>77</fpage>&#8211;<lpage>104</lpage>. <pub-id pub-id-type="doi">10.3366/cor.2019.0162</pub-id>.</mixed-citation></ref>
<ref id="B9"><mixed-citation publication-type="journal"><string-name><surname>Evert</surname>, <given-names>Stefan</given-names></string-name>, <string-name><given-names>Fotis</given-names> <surname>Jannidis</surname></string-name>, <string-name><given-names>Thomas</given-names> <surname>Proisl</surname></string-name>, <string-name><given-names>Steffen</given-names> <surname>Pielstr&#246;m</surname></string-name>, <string-name><given-names>Thorsten</given-names> <surname>Vitt</surname></string-name>, <string-name><given-names>Christof</given-names> <surname>Sch&#246;ch</surname></string-name>, and <string-name><given-names>Isabella</given-names> <surname>Reger</surname></string-name> (<year>2017</year>). <article-title>&#8220;Understanding and Explaining Distance Measures for Authorship Attribution&#8221;</article-title>. In: <source>Digital Scholarship in the Humanities</source> <volume>23</volume> (<issue>suppl_2</issue>). <pub-id pub-id-type="doi">10.1093/llc/fqx023</pub-id>.</mixed-citation></ref>
<ref id="B10"><mixed-citation publication-type="book"><string-name><surname>Gonon</surname>, <given-names>Laetitia</given-names></string-name>, <string-name><given-names>Vannina</given-names> <surname>Goossens</surname></string-name>, <string-name><given-names>Olivier</given-names> <surname>Kraif</surname></string-name>, <string-name><given-names>Iva</given-names> <surname>Novakova</surname></string-name>, and <string-name><given-names>Julie</given-names> <surname>Sorba</surname></string-name> (<year>2018</year>). <chapter-title>&#8220;Motifs Textuels Sp&#233;cifiques Au Genre Policier et &#224; La Litt&#233;rature Blanche&#8221;</chapter-title>. In: <source>6<sup>e</sup>. Congr&#232;s Mondial de Linguistique Fran&#231;aise, SHS Web of Conferences</source> <volume>46</volume>. Ed. by <string-name><given-names>Franck</given-names> <surname>Neveu</surname></string-name>, <string-name><given-names>Bernard</given-names> <surname>Harmegnies</surname></string-name>, <string-name><given-names>Linda</given-names> <surname>Hriba</surname></string-name>, and <string-name><given-names>Sophie</given-names> <surname>Pr&#233;vost</surname></string-name>. <pub-id pub-id-type="doi">10.1051/shsconf/20184606007</pub-id>.</mixed-citation></ref>
<ref id="B11"><mixed-citation publication-type="book"><string-name><surname>Gries</surname>, <given-names>Stefan Th.</given-names></string-name> (<year>2010</year>). <chapter-title>&#8220;Useful Statistics for Corpus Linguistics&#8221;</chapter-title>. In: <source>A Mosaic of Corpus Linguistics: Selected Approaches</source>. Ed. by <string-name><given-names>Aquilino</given-names> <surname>S&#225;nchez</surname></string-name> and <string-name><given-names>Mois&#233;s</given-names> <surname>Almela</surname></string-name>. <publisher-name>Peter Lang</publisher-name>, <fpage>269</fpage>&#8211;<lpage>291</lpage>.</mixed-citation></ref>
<ref id="B12"><mixed-citation publication-type="journal"><string-name><surname>Gries</surname>, <given-names>Stefan Th.</given-names></string-name> (<year>2022</year>). <article-title>&#8220;What Do (Most of) Our Dispersion Measures Measure (Most)? Dispersion?&#8221;</article-title> In: <source>Journal of Second Language Studies</source> <volume>5</volume> (<issue>2</issue>), <fpage>171</fpage>&#8211;<lpage>205</lpage>. <pub-id pub-id-type="doi">10.1075/jsls.21029.gri</pub-id>.</mixed-citation></ref>
<ref id="B13"><mixed-citation publication-type="book"><string-name><surname>Honnibal</surname>, <given-names>Matthew</given-names></string-name>, <string-name><given-names>Ines</given-names> <surname>Montani</surname></string-name>, <string-name><given-names>Sofie</given-names> <surname>Van Landeghem</surname></string-name>, and <string-name><given-names>Adriane</given-names> <surname>Boyd</surname></string-name> (<year>2020</year>). <article-title>&#8220;spaCy: Industrial-strength Natural Language Processing in Python&#8221;</article-title>. In: <source>Zenodo</source>. <pub-id pub-id-type="doi">10.5281/zenodo.1212303</pub-id>.</mixed-citation></ref>
<ref id="B14"><mixed-citation publication-type="journal"><string-name><surname>Kilgarriff</surname>, <given-names>Adam</given-names></string-name> (<year>2001</year>). <article-title>&#8220;Comparing Corpora&#8221;</article-title>. In: <source>International Journal of Corpus Linguistics</source> <volume>6</volume> (<issue>1</issue>), <fpage>97</fpage>&#8211;<lpage>133</lpage>. <pub-id pub-id-type="doi">10.1075/ijcl.6.1.05kil</pub-id>.</mixed-citation></ref>
<ref id="B15"><mixed-citation publication-type="journal"><string-name><surname>Lijffijt</surname>, <given-names>Jefrey</given-names></string-name>, <string-name><given-names>Terttu</given-names> <surname>Nevalainen</surname></string-name>, <string-name><given-names>Tanja</given-names> <surname>S&#228;ily</surname></string-name>, <string-name><given-names>Panagiotis</given-names> <surname>Papapetrou</surname></string-name>, <string-name><given-names>Kai</given-names> <surname>Puolam&#228;ki</surname></string-name>, and <string-name><given-names>Heikki</given-names> <surname>Mannila</surname></string-name> (<year>2014</year>). <article-title>&#8220;Significance Testing of Word Frequencies in Corpora&#8221;</article-title>. In: <source>Digital Scholarship in the Humanities</source> <volume>31</volume> (<issue>2</issue>), <fpage>374</fpage>&#8211;<lpage>397</lpage>. <pub-id pub-id-type="doi">10.1093/llc/fqu064</pub-id>.</mixed-citation></ref>
<ref id="B16"><mixed-citation publication-type="book"><string-name><surname>Mann</surname>, <given-names>H. B.</given-names></string-name> and <string-name><given-names>D. R.</given-names> <surname>Whitney</surname></string-name> (<year>1947</year>). <article-title>&#8220;On a Test of Whether One of Two Random Variables Is Stochastically Larger than the Other&#8221;</article-title>. In: <source>The Annals of Mathematical Statistics</source> <volume>18</volume> (<issue>1</issue>), <fpage>50</fpage>&#8211;<lpage>60</lpage>. <pub-id pub-id-type="doi">10.1214/aoms/1177730491</pub-id>.</mixed-citation></ref>
<ref id="B17"><mixed-citation publication-type="book"><string-name><surname>Paquot</surname>, <given-names>Magali</given-names></string-name> and <string-name><given-names>Yves</given-names> <surname>Bestgen</surname></string-name> (<year>2009</year>). <chapter-title>&#8220;Distinctive Words in Academic Writing: A Comparison of Three Statistical Tests for Keyword Extraction&#8221;</chapter-title>. In: <source>Corpora: Pragmatics and Discourse</source>. Ed. by <string-name><given-names>Andreas H.</given-names> <surname>Jucker</surname></string-name>, <string-name><given-names>Daniel</given-names> <surname>Schreier</surname></string-name>, and <string-name><given-names>Marianne</given-names> <surname>Hundt</surname></string-name>. <publisher-name>Brill | Rodopi</publisher-name>, <fpage>247</fpage>&#8211;<lpage>269</lpage>. <pub-id pub-id-type="doi">10.1163/9789042029101_014</pub-id>.</mixed-citation></ref>
<ref id="B18"><mixed-citation publication-type="book"><string-name><surname>Plackett</surname>, <given-names>Robin L.</given-names></string-name> (<year>1983</year>). <article-title>&#8220;Karl Pearson and the Chi-Squared Test&#8221;</article-title>. In: <source>International Statistical Review / Revue Internationale de Statistique</source> <volume>51</volume> (<issue>1</issue>). <pub-id pub-id-type="doi">10.2307/1402731</pub-id>.</mixed-citation></ref>
<ref id="B19"><mixed-citation publication-type="book"><string-name><surname>Sch&#246;ch</surname>, <given-names>Christof</given-names></string-name> (<year>2018</year>). <chapter-title>&#8220;Zeta f&#252;r die kontrastive Analyse literarischer Texte. Theorie, Implementierung, Fallstudie&#8221;</chapter-title>. In: <source>Quantitative Ans&#228;tze in den Literatur- und Geisteswissenschaften. Systematische und historische Perspektiven</source>. Ed. by <string-name><given-names>Toni</given-names> <surname>Bernhart</surname></string-name>, <string-name><given-names>Sandra</given-names> <surname>Richter</surname></string-name>, <string-name><given-names>Marcus</given-names> <surname>Lepper</surname></string-name>, <string-name><given-names>Marcus</given-names> <surname>Willand</surname></string-name>, and <string-name><given-names>Andrea</given-names> <surname>Albrecht</surname></string-name>. <publisher-name>De Gruyter</publisher-name>, <fpage>77</fpage>&#8211;<lpage>94</lpage>. <pub-id pub-id-type="doi">10.1515/9783110523300-004</pub-id>.</mixed-citation></ref>
<ref id="B20"><mixed-citation publication-type="webpage"><string-name><surname>Sch&#246;ch</surname>, <given-names>Christof</given-names></string-name>, <string-name><given-names>Daniel</given-names> <surname>Schl&#246;r</surname></string-name>, <string-name><given-names>Albin</given-names> <surname>Zehe</surname></string-name>, <string-name><given-names>Henning</given-names> <surname>Gebhard</surname></string-name>, <string-name><given-names>Martin</given-names> <surname>Becker</surname></string-name>, and <string-name><given-names>Andreas</given-names> <surname>Hotho</surname></string-name> (<year>2018</year>). <chapter-title>&#8220;Burrows&#8217; Zeta: Exploring and Evaluating Variants and- Parameters&#8221;</chapter-title>. In: <source>Book of Abstracts of the Digital Humanities Conference 2018</source>. Ed. by <string-name><given-names>Jonathan Gir&#243;n</given-names> <surname>Palau</surname></string-name> and <string-name><given-names>Isabel Galina</given-names> <surname>Russell</surname></string-name>. <publisher-name>ADHO</publisher-name>. <uri>https://dh2018.adho.org/en/burrows-zeta-exploring-and-evaluating-variants-and-parameters/</uri> (visited on 10/16/2025).</mixed-citation></ref>
<ref id="B21"><mixed-citation publication-type="journal"><string-name><surname>Schr&#246;ter</surname>, <given-names>Julian</given-names></string-name>, <string-name><given-names>Keli</given-names> <surname>Du</surname></string-name>, <string-name><given-names>Julia</given-names> <surname>Dudar</surname></string-name>, <string-name><given-names>Cora</given-names> <surname>Rok</surname></string-name>, and <string-name><given-names>Christof</given-names> <surname>Sch&#246;ch</surname></string-name> (<year>2021</year>). <article-title>&#8220;From Keyness to Distinctiveness &#8211; Triangulation and Evaluation in Computational Literary Studies&#8221;</article-title>. In: <source>Journal of Literary Theory</source> <volume>15</volume> (<issue>1-2</issue>), <fpage>81</fpage>&#8211;<lpage>108</lpage>. <pub-id pub-id-type="doi">10.1515/jlt-2021-2011</pub-id>.</mixed-citation></ref>
<ref id="B22"><mixed-citation publication-type="journal"><string-name><surname>S&#246;nning</surname>, <given-names>Lukas</given-names></string-name> (<year>2023</year>). <article-title>&#8220;Evaluation of Keyness Metrics: Performance and Reliability&#8221;</article-title>. In: <source>Corpus Linguistics and Linguistic Theory</source> <volume>20</volume> (<issue>2</issue>), <fpage>263</fpage>&#8211;<lpage>288</lpage>. <pub-id pub-id-type="doi">10.1515/cllt-2022-0116</pub-id>.</mixed-citation></ref>
<ref id="B23"><mixed-citation publication-type="webpage"><string-name><surname>Sp&#228;rck Jones</surname>, <given-names>Karen</given-names></string-name> (<year>1972</year>). <article-title>&#8220;A Statistical Interpretation of Term Specificity and Its Application in Retrieval&#8221;</article-title>. In: <source>Journal of Documentation</source> <volume>28</volume> (<issue>1</issue>), <fpage>11</fpage>&#8211;<lpage>21</lpage>. <uri>https://dl.acm.org/doi/10.5555/106765.106782</uri> (visited on 10/14/2025).</mixed-citation></ref>
<ref id="B24"><mixed-citation publication-type="journal"><string-name><surname>Weidman</surname>, <given-names>Sean G.</given-names></string-name> and <string-name><given-names>James</given-names> <surname>O&#8217;Sullivan</surname></string-name> (<year>2018</year>). <article-title>&#8220;The Limits of Distinctive Words: Re-evaluating Literature&#8217;s Gender Marker Debate&#8221;</article-title>. In: <source>Digital Scholarship in the Humanities</source> <volume>33</volume> (<issue>2</issue>), <fpage>374</fpage>&#8211;<lpage>390</lpage>. <pub-id pub-id-type="doi">10.1093/llc/fqx017</pub-id>.</mixed-citation></ref>
<ref id="B25"><mixed-citation publication-type="journal"><string-name><surname>Welch</surname>, <given-names>Bernard Lewis</given-names></string-name> (<year>1947</year>). <article-title>&#8220;The Generalization of Student&#8217;s Problem When Several Different Population Variances Are Involved&#8221;</article-title>. In: <source>Biometrika</source> <volume>34</volume> (<issue>1-2</issue>), <fpage>28</fpage>&#8211;<lpage>35</lpage>. <pub-id pub-id-type="doi">10.1093/biomet/34.1-2.28</pub-id>.</mixed-citation></ref>
<ref id="B26"><mixed-citation publication-type="journal"><string-name><surname>Wilcoxon</surname>, <given-names>Frank</given-names></string-name> (<year>1945</year>). <article-title>&#8220;Individual Comparisons by Ranking Methods&#8221;</article-title>. In: <source>Biometrics Bulletin</source> <volume>1</volume> (<issue>6</issue>). <pub-id pub-id-type="doi">10.2307/3001968</pub-id>.</mixed-citation></ref>
</ref-list>
</back>
</article>