<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20120330//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd">
<!--<?xml-stylesheet type="text/xsl" href="article.xsl"?>-->
<article article-type="research-article" dtd-version="1.2" xml:lang="en" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id journal-id-type="issn">2940-1348</journal-id>
<journal-title-group>
<journal-title>Journal of Computational Literary Studies</journal-title>
</journal-title-group>
<issn pub-type="epub">2940-1348</issn>
<publisher>
<publisher-name>Universit&#228;ts- und Landesbibliothek Darmstadt</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.48694/jcls.116</article-id>
<article-categories>
<subj-group>
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Modeling and Measuring Short Text Similarities</article-title>
<subtitle>On the Multi-Dimensional Differences between German Poetry of Realism and Modernism</subtitle>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0001-6677-586X</contrib-id>
<name>
<surname>Ehrmanntraut</surname>
<given-names>Anton</given-names>
</name>
<xref ref-type="aff" rid="aff-1">1</xref>
</contrib>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0002-3731-6397</contrib-id>
<name>
<surname>Hagen</surname>
<given-names>Thora</given-names>
</name>
<xref ref-type="aff" rid="aff-1">1</xref>
</contrib>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0001-6944-6113</contrib-id>
<name>
<surname>Jannidis</surname>
<given-names>Fotis</given-names>
</name>
<xref ref-type="aff" rid="aff-1">1</xref>
</contrib>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0001-5833-0414</contrib-id>
<name>
<surname>Konle</surname>
<given-names>Leonard</given-names>
</name>
<xref ref-type="aff" rid="aff-1">1</xref>
</contrib>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0003-2717-0598</contrib-id>
<name>
<surname>Kr&#246;ncke</surname>
<given-names>Merten</given-names>
</name>
<xref ref-type="aff" rid="aff-2">2</xref>
</contrib>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0002-1006-7925</contrib-id>
<name>
<surname>Winko</surname>
<given-names>Simone</given-names>
</name>
<xref ref-type="aff" rid="aff-2">2</xref>
</contrib>
</contrib-group>
<aff id="aff-1"><label>1</label>Institut f&#252;r Deutsche Philologie, Julius-Maximilians-Universit&#228;t W&#252;rzburg <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://ror.org/00fbnyb24">ROR</ext-link>, W&#252;rzburg, Germany.</aff>
<aff id="aff-2"><label>2</label>Seminar f&#252;r Deutsche Philologie, Georg-August-Universit&#228;t G&#246;ttingen <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://ror.org/01y9bpm73">ROR</ext-link>, G&#246;ttingen, Germany.</aff>
<pub-date publication-format="electronic" date-type="pub" iso-8601-date="2022-11-22">
<day>22</day>
<month>11</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>1</volume>
<issue>1</issue>
<fpage>1</fpage>
<lpage>30</lpage>
<history>
<date date-type="received" iso-8601-date="2021-12-23">
<day>23</day>
<month>12</month>
<year>2021</year>
</date>
<date date-type="accepted" iso-8601-date="2022-03-30">
<day>30</day>
<month>03</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright: &#x00A9; 2022 The Author(s)</copyright-statement>
<copyright-year>2022</copyright-year>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>The text of this work is released under the Creative Commons license CC BY 4.0 International. You can find the contract text of the license at <uri xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</uri>. The illustrations are excluded from this license, here the copyright lies with the respective rights holder.</license-p>
</license>
</permissions>
<self-uri xlink:href="https://jcls.io/article/10.48694/jcls.116/"/>
<abstract>
<p>This study contributes to the ongoing discussion on how to operationalize text similarity for the purposes of computational literary studies by defining, justifying theoretically and employing a multi-dimensional text model. Additionally, we evaluate a set of strategies to implement this model for very short texts like poetry using a range of methods from weighted sparse vectors up to very recent neural sentence embeddings based on annotations of emotions, genre and similarity. And finally, we show the relevance of using such a complex text model by applying the best method to a research question about the development of early modernism in German poetry. While we can confirm some important hypotheses from literary studies, we are also able to differentiate or relativize others. In particular, our findings do not support the widely held thesis that the change from realism to modernism was a revolutionary &#8216;rupture&#8217;.</p>
</abstract>
<kwd-group>
<kwd>short text</kwd>
<kwd>similarity</kwd>
<kwd>poetry</kwd>
<kwd>modernism</kwd>
<kwd>realism</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="S1">
<title>1. Introduction</title>
<p>This paper pursues two equally important goals: First, to find a suitable state-of-the-art method to model and analyze text similarity for poetry, and second, to contribute to the field of literary studies by studying the transition from realist to modernist poetry using the concept of similarity. The perception of similarity between texts is the basis for the construction of many literary terms like genre, author, or, as in our case, period. Grouping texts according to these terms usually presupposes that these texts have something in common and that these groups can be distinguished via these commonalities from other texts. Though the concept of similarity is ubiquitous in the practice of literary studies it has seldom been analyzed explicitly. While there are several studies that focus on the political aspects of comparison and (dis)similarity analysis (e. g. <xref ref-type="bibr" rid="B23">Felski and Friedman 2013</xref>), these contributions generally have little direct connection to our methodological goals. More applicable are some studies by scholars in Comparative Studies who reflected on similarity as part of their discipline defining practice (e.g. <xref ref-type="bibr" rid="B19">Corbineau-Hoffmann 2013</xref>). Similar attempts to model text similarity beyond the aspect of content have also been undertaken in computational linguistics (e.g. <xref ref-type="bibr" rid="B7">B&#228;r et al. 2011</xref>). So one of the major contributions of this paper is our attempt to bring these discussion threads together. But while it is possible to discuss these dimensions on a very abstract level, it is not possible to evaluate them on the same level. When we talk about structural aspects of a text, we look at very different elements depending on the genre we look at: speaker, stage directions, dramatis personae, etc. for drama, or stanza, verse, rhyme, etc. for poetry. Therefore, in order to discuss the phenomenon not only theoretically, but also to be able to apply it practically &#8211; and that means, above all, to include an evaluation method &#8211; it is more productive to limit the task to one genre &#8211; in our case, poetry.</p>
<p>The second goal of our research is to provide a broad foundation for a literary history of the beginnings of modernism. In the last years, we assembled a corpus of German poetry consisting of poems from realist and modernist anthologies. We are analyzing this corpus under the perspective of whether we can contribute to the discussion about the transition from realism to (early) modernism. We are using these period terms, as is the custom nowadays in literary studies, as useful constructions. That means: On the one hand it is understood that real breaks and disruptions are very rare and that history can be better understood as an evolutionary, gradual process with many small changes at each step. On the other hand, we assume that this process is not happening at the same speed all the time and that many of the changes in one time segment show some commonalities. Specifically, we will use the concept of similarity to describe the changes between the texts from the different corpora.</p>
<p>We structure our paper as follows: In a theoretical section, we first develop a four-dimensional model of textual similarity for poetry (<xref ref-type="sec" rid="S2">section 2</xref>). We then describe our corpora; mainly the digitized anthologies of the poetry of realism and early modernism mentioned above (<xref ref-type="sec" rid="S3">section 3</xref>). A selection of these poems was previously manually annotated with a hierarchical system of emotion labels. Within the context of our work, a subset of this selection was then additionally annotated using the dimensions of similarity described in the theoretical section. The following section discusses how each of these four dimensions can be measured in poetry (<xref ref-type="sec" rid="S4">section 4</xref>). Poetry presents specific computational challenges even for semantics, a relatively traditional dimension of similarity. The main issue poses the shortness of the texts. Semantic similarity, which is usually modeled by using weighted terms to locate a document in vector space, does not work reliably on short texts. Additionally, working with poetry entails having to adapt to its specific language. This includes a high percentage of figurative speech, which makes the analysis of semantic similarity especially difficult, and also a high percentage of archaic words and expressions. For each of the four dimensions of similarity, we discuss and evaluate different methods to measure them using our poetry corpus in a first step. Among our methods used are traditional sparse document vectors, short dense feature vectors, and dense document embeddings, created either by computing them from token vectors or by using the recently proposed approach for sentence embeddings. In a second step, we adapt the best-performing models to each of the four dimensions (<xref ref-type="sec" rid="S4.4">subsection 4.4</xref>). In the last section, we employ our final models from this two-step approach to assess the degrees of similarity and difference between realist and modernist poetry (<xref ref-type="sec" rid="S5">section 5</xref>). In particular, we take up three specific research questions from literary studies and discuss our results with respect to the predominant hypotheses within the field. These questions are:</p>
<list list-type="order">
<list-item><p>How does naturalist poetry relate to realism and modernism?</p></list-item>
<list-item><p>How homogeneous are realist and modernist poems?</p></list-item>
<list-item><p>How revolutionary is early modernism?</p></list-item>
</list>
<p>In summary, this study contributes to the ongoing discussion on how to operationalize text for computational literary studies by defining, theoretically justifying, and employing a multi-dimensional model of similarity. Additionally, we evaluate a set of strategies to implement this model for poetry using a range of methods from weighted sparse vectors up to the recent neural sentence embeddings based on extensive annotations of emotions, genre, and similarity. And finally, we show the relevance of using such a complex text-based model by employing the best method to provide new input for the continued research on the development of early modernism in German poetry.</p>
</sec>
<sec id="S2">
<title>2. Theoretical Considerations</title>
<p>As far as we can see, in literary studies, text similarity has been discussed mainly by Comparative Studies, where the concept of &#8216;comparison&#8217; has been closely linked to &#8216;similarity&#8217; (e.g. <xref ref-type="bibr" rid="B69">Zelle 2005</xref>). There seems to be a consensus that comparison is only possible on the basis of similarity in some specific aspects. Though principally many different aspects have been and can be used to compare literature, some have been established as especially useful for the study of literature. Corbineau-Hoffmann (<xref ref-type="bibr" rid="B19">2013</xref>), for example, groups them under three headings:</p>
<list list-type="simple">
<list-item><p>I.&#160;&#160;&#160;&#160;Content (1. theme, 2. motifs, 3. settings, 4. characters, 5. concepts)</p></list-item>
<list-item><p>II.&#160;&#160;&#160;Text-organization (1. narrative/description, 2. poetry/prose, 3. style levels, 4. instances of speech, 5. discourse)</p></list-item>
<list-item><p>III.&#160;&#160;History (1. influences, 2. epochs, 3. other arts, 4. sciences, 5. genre).</p></list-item>
</list>
<p>While the first two groups are aspects of a text, the last group refers to typical contexts, often established again by analyzing groups of texts. To avoid the recursive loop hidden here, we focus on the two first aspects, &#8216;content&#8217; and &#8216;text-organization&#8217;. It is important to note that these are open lists. There are other interesting aspects, but the ones mentioned are often used when people compare literature. The terms grouped under &#8216;content&#8217; can be seen as parts of text semantics in general. A text has a theme, or there are specific motifs in a text, but usually, the meaning of text is more than each of these, it encompasses all of them. The terms grouped under &#8216;text-organization&#8217; on the other hand cover quite heterogeneous aspects &#8211; even if you substitute the more common &#8216;form&#8217; for it. In our experience, especially the term &#8216;style&#8217; is hard to subsume under the same dimension as other text-organizational aspects.</p>
<p>Semiotics and linguistics support this position as they also distinguish between form and style (<xref ref-type="bibr" rid="B48">N&#246;th 2008</xref>; <xref ref-type="bibr" rid="B56">Sandig 2006</xref>), and the three aspects &#8211; content, structure, and style &#8211; are also distinguished in one of the very few attempts in computational linguistics to model text similarity (<xref ref-type="bibr" rid="B8">B&#228;r et al. 2015</xref>). We propose to add one dimension which can only be subsumed with difficulties under one of the three headings and which is usually highly important, especially for literature and especially for poetry, which has been defined as the prototypical medium to express subjective feelings: emotion.<xref ref-type="fn" rid="n1">1</xref></p>
<p>Content, Form, Style, and Emotion are the four dimensions of similarity which we will use to describe the relations between texts. From the perspective of this study, it is more useful to explicate the dimensions via operationalizations and examples rather than &#8216;exact&#8217; definitions. To this end, the annotation guidelines (see <xref ref-type="sec" rid="S3">section 3</xref>) list specific components that make up the dimensions. Content consists of components such as theme, character, or setting; form is operationalized primarily through stanza structure, meter, and rhyme; style, in contrast, refers to components such as register or metaphor; and for emotion, we consider, among other things, the extent to which emotions are represented and their polarity. In further studies, these components could be analyzed individually and be integrated into an even more complex model of text similarity.</p>
<p>The heterogeneity of the four dimensions will have a direct influence on the inter-annotator agreement and the performance of any machine learning model trained to detect these aspects automatically. From a theoretical perspective, it is unclear how the dimensions relate to each other, or in the language of statistics, how much they correlate. Winko (<xref ref-type="bibr" rid="B68">2003</xref>), for example, assigns the aspect &#8216;linguistic shaping of emotions&#8217; via the aspect &#8216;presentation of emotions&#8217; to what is called &#8216;style&#8217; in our model, while she assigns it to content via the aspect &#8216;thematization of emotions&#8217;. From this perspective, a relatively high correlation of emotion with content and style is to be expected.</p>
</sec>
<sec id="S3">
<title>3. Corpus and Annotation</title>
<p>The corpus is a collection of anthologies of contemporary poetry from the two epochs &#8216;realism&#8217; and &#8216;modernism&#8217;. <xref ref-type="fn" rid="n2">2</xref> The collections contain poems that the anthologists, i.e. contemporary experts in poetry, consider to be particularly typical, outstanding, or representative among other aspects. From the large amount of poetry anthologies in both epochs, the corpus was compiled<xref ref-type="fn" rid="n3">3</xref> according to the following <bold>criteria</bold>: The collections contain <bold>contemporary</bold> poetry, have <bold>no thematic restrictions</bold>, and are all aimed at a <bold>general audience</bold> rather than a particular target group. The criteria minimize the risk that thematic constraints or specific addressee orientation could influence the poem selection as systematic factors. It is important to note that the corpus construction leads us to model &#8216;realism&#8217; and &#8216;modernism&#8217; from the perspective of the period of investigation. Ultimately, it is the anthologists who determine which texts are &#8216;realist&#8217; and which are &#8216;modernist&#8217;, and their views do not always have to coincide with the perspective of today&#8217;s research. The corpus contains texts by both canonical and non-canonical authors. We call authors &#8216;canonical&#8217; if they are frequently mentioned in recent literary histories. For early modernism, this applies to Stefan George, Hugo von Hofmannsthal, Arno Holz, Else Lasker-Sch&#252;ler, and Rainer Maria Rilke.</p>
<p><bold>(1) Sub-corpus &#8216;Realism&#8217;:</bold> The first sub-corpus consists of 7 anthologies with German poems from the realist epoch: <xref ref-type="bibr" rid="B6">Avenarius 1882</xref>; <xref ref-type="bibr" rid="B11">Bern 1877</xref>; <xref ref-type="bibr" rid="B39">Kneschke 1865</xref>; <xref ref-type="bibr" rid="B47">Moltke 1882</xref>; <xref ref-type="bibr" rid="B49">Polko 1860</xref>; <xref ref-type="bibr" rid="B51">Prutz 1859</xref>; <xref ref-type="bibr" rid="B67">Willatzen 1875</xref>. The poems included in the anthologies cover the period under study, 1850 to 1880. Some of the anthologies, but especially Elise Polko&#8217;s widely distributed collection, also contain some poems written before the period of study; these have been excluded. This sub-corpus consists of 3,039 poems by a total of 484 different authors.</p>
<p><bold>(2) Sub-corpus &#8216;Modernism&#8217;:</bold> Of the 941 anthologies of German-language poetry published in first edition between 1885 and 1912 (<xref ref-type="bibr" rid="B32">H&#228;ntzschel 1991, pp. 587&#8211;589</xref>), twelve anthologies meet the selection criteria: <xref ref-type="bibr" rid="B3">Arent 1885</xref>; <xref ref-type="bibr" rid="B10">Benzmann 1904</xref>; <xref ref-type="bibr" rid="B12">Bethge 1905</xref>; <xref ref-type="bibr" rid="B13">Bierbaum 1893</xref>, <xref ref-type="bibr" rid="B14">1894</xref>; <xref ref-type="bibr" rid="B15">Bonsels et al. 1905</xref>; <xref ref-type="bibr" rid="B22">Federmann 1908</xref>; <xref ref-type="bibr" rid="B26">P. Friedrich 1911</xref>; <xref ref-type="bibr" rid="B27">Gemmel 1898</xref>; <xref ref-type="bibr" rid="B34">Huch 1911</xref>; <xref ref-type="bibr" rid="B35">Jacobowski 1899</xref>; <xref ref-type="bibr" rid="B54">Renner 1899</xref>; <xref ref-type="bibr" rid="B63">Tille 1896</xref>. They all claim to contain &#8216;modern poetry&#8217;. This sub-corpus consists of 2,882 poems by a total of 361 authors. We annotated 1,278 poems from both sub-corpora for emotion and thematic genre. Thematic genres such as love poetry or nature poetry provide information about the content of the poems.<xref ref-type="fn" rid="n4">4</xref> The annotated emotions are not the readers&#8217; emotions, but rather the emotions expressed in the text itself. The annotators used a list of 40 discrete emotions which we categorized into 6 groups, inspired by the emotion hierarchy in <xref ref-type="bibr" rid="B59">Shaver et al. 1987</xref>: love, joy, agitation/surprise, anger, sadness, and fear. First, emotions and genres were annotated independently by two annotators, then they merged annotations manually into a consensus annotation. Their agreement before creating the consensus annotation, measured with &#947; (<xref ref-type="bibr" rid="B45">Mathet et al. 2015</xref>), was 0.6445 for individual emotions, 0.7491 for the emotion groups, and 0.69 Krippendorff&#8217;s alpha (<xref ref-type="bibr" rid="B40">Krippendorff 2011</xref>) for the thematic genres.</p>
<p>Additionally, we annotated the similarity of the poems.<xref ref-type="fn" rid="n5">5</xref> The task was not to annotate absolute similarities (&#8220;These two poems are not at all/a little/very similar&#8221;), but relative similarities (&#8220;Poem A is more similar to poem B than poem C&#8221;), which is much easier. For each triple of poems, the annotators had to judge for each similarity dimension (content, form, style, and emotion) and for a comprehensive &#8216;overall&#8217; category whether the focus poem was more similar to the one on the left, to the one on the right, or equally (dis)similar to both. The annotation guidelines specify for each dimension which components should be taken into consideration, e.g. stanza structure, rhyme, meter, and text length in case of the formal dimension, and which of these aspects are typically most important. Nevertheless, the annotators ultimately had to weigh the components on a case-by-case basis, which required considerable literary expertise.</p>
<p>We annotated 470 triples, consisting of a total of 866 poems. One constraint for the selection of the triples was that the poem length had to be quite short due to technical prerequisites.<xref ref-type="fn" rid="n6">6</xref> In addition, we selected triples for which we expected a strong similarity of the middle text with either the left or right text based on formal features such as text length or previous annotations of thematic genres and emotions. This second constraint ensured that the annotators could deal with reasonably clear cases. There were 400 triples covering both constraints available in our annotated poems. 70 triples were additionally annotated without similarity expectations. Each triple was annotated by at least two people.</p>
<fig id="F1">
<label>Figure 1</label>
<caption>
<p>Pearson correlation in annotated dimensions (majority vote).</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jcls-116_konle-g1.png"/>
</fig>
<p>The agreement, measured with Krippendorff&#8217;s alpha, was 0.53 for content, 0.68 for form, 0.44 for style, 0.32 for emotion, and 0.48 for overall. Possible reasons for the differences in agreement are that the dimensions with lower agreement are more dependent on interpretation or that the weighting of the components is more ambiguous in their case. An experiment showed, however, that three annotators who created a consensus annotation after annotating 60 triples were able to increase their agreement when annotating another 30 triples from 0.49 to 0.63 on content, from 0.48 to 0.69 on emotion, from 0.32 to 0.41 on style, and from 0.45 to 0.68 on overall (only the agreement on form deteriorated from 0.77 to 0.71, but still remained high). Since the creation of consensus annotations seems to significantly improve the annotation quality, we plan to create consensus annotations for all triples in the future. Until then, for the triples without consensus annotations, we will only use annotations that the majority of annotators agree with. In the evaluation of the following section, we also omit all annotations that the majority of annotators found that &#8216;the middle text is equally (dis)similar to both the left and the right text&#8217;. <xref ref-type="fn" rid="n7">7</xref> That leaves us with 346 usable annotations for content, 388 for form, 331 for style, 359 for emotion, and 381 for overall, with every annotation stating that the middle text is more similar to either the left text or the right text.</p>
<p>Some of the similarity dimensions correlate strongly with each other, according to the annotations. The &#8216;overall&#8217; dimension correlates most strongly, especially with content and style. This is understandable since the annotation of the &#8216;overall&#8217; dimension is usually based on annotations of the other similarity dimensions. Another relevant correlation exists between content and style. The most independent dimension is form, whose correlation with the other dimensions is the weakest.</p>
</sec>
<sec id="S4">
<title>4. Dimensions of the Similarity of Poems</title>
<p>Measuring the similarity of poems along the dimensions discussed above poses two challenges: First, the shortness of the texts makes it difficult to apply well-established approaches with high reliability. Research in natural language processing has proposed a set of methods for the measurement of short text similarity (<xref ref-type="bibr" rid="B50">Prakoso et al. 2021</xref>) usually complementing the texts with other sources which compensate for the lack of information in the text themselves. But research on text similarity, in general, focuses on the &#8216;content&#8217; aspect. So the second challenge lies in finding methods that can be used to model the other dimensions.</p>
<p>Overviews of the research on short text similarity classify the methods in four groups: string-based, corpus-based, knowledge-based, and hybrid-based (<xref ref-type="bibr" rid="B29">Gomaa and Fahmy 2013</xref>; <xref ref-type="bibr" rid="B50">Prakoso et al. 2021, p. 1</xref>): 1) string-based methods use only the word or character tokens to create a representation of the text. We use <italic>tfidf</italic> and <italic>mfw</italic>. 2) Knowledge-based methods use an external knowledge base like WordNet. Our two models <italic>features-formal</italic> and <italic>features-emotional</italic> can be seen as variants of this approach. 3) Corpus-based methods use an external corpus to create information-rich representations, nowadays usually word embeddings: <italic>FastText, Glove, GBert</italic>. Additionally, we experiment with document embeddings using different sentence embedding methods: <italic>XLM-R, mpnet, MiniLM, cross-en-de-roberta</italic>. The drawback of this approach is that we are limited to an input of 127 tokens, but it is reported to be the best representation for short texts. 4) Hybrid approaches, combining some of the strategies outlined above.<xref ref-type="fn" rid="n8">8</xref></p>
<p>The small number of poems we had annotated under the perspective of similarity made it inadvisable to use the typical finetuning approach. Instead, we opted for broader testing of how different text representations are able to mirror the results from our annotations, select the best representations, and then tweak the vector spaces with similarity learning based on our dimension annotations. So in <xref ref-type="sec" rid="S4.1">sections 4.1</xref> to <xref ref-type="sec" rid="S4.3">4.3</xref>, we introduce the different models we were able to use and their evaluation based on our similarity annotations. In <xref ref-type="sec" rid="S4.4">subsection 4.4</xref>, we apply similarity learning to the best performing models.</p>
<sec id="S4.1">
<title>4.1 Models</title>
<p>We evaluate the following embeddings, which can be roughly categorized into more simple baselines on the one hand, and some more complex embeddings derived from sophisticated deep-learning language models on the other. The baseline embeddings are defined as follows:</p>
<list list-type="simple">
<list-item><p><bold>TFIDF-{1000,10000,20000}:</bold> Poems are represented by a vector, where the dimensions correspond to the 1000 etc. most frequent terms in our corpus. Each individual vector component is the relative term frequency of that term in the poem, weighted by the inverse document frequency.</p></list-item>
<list-item><p><bold>MFW-{100,200,500,1000}:</bold> Defined like the embedding above, but the term frequencies are z-standardized for each term over all poems.</p></list-item>
<list-item><p><bold>Features-Formal:</bold> Poems are represented by a vector of the following four formal features: stanza count, verse count, word count, average stanza length in verses &#8211; each z-standardized over all poems.</p></list-item>
<list-item><p><bold>Features-Emotional:</bold> Each poem is embedded with a vector of its verse-level relative frequency of shaver emotions (see <xref ref-type="sec" rid="S3">section 3</xref>). These emotions derive either from annotations or, if no annotations are available, predicted by a machine learning model.<xref ref-type="fn" rid="n9">9</xref></p></list-item>
</list>
<p>The following deep-learning embeddings are derived from pre-trained static type-based embeddings:</p>
<list list-type="simple">
<list-item><p><bold>{FastText,GloVe}-{mean,median,meannorm,sif}:</bold> For each term in the poem (minus stopwords), we obtain the embedding vector for that term with FastText (trained on the German OSCAR corpus with <italic>d</italic> = 1536 as proposed by <xref ref-type="bibr" rid="B20">Ehrmanntraut et al. 2021</xref>, resp. a GloVe model with <italic>d</italic> = 300 provided by Deepset<xref ref-type="fn" rid="n10">10</xref>). Finally, on that set of vectors, we compute the arithmetic mean (resp. median, resp. meannorm (<xref ref-type="bibr" rid="B20">Ehrmanntraut et al. 2021</xref>), resp. arithmetic mean weighted by smooth inverse frequency (<xref ref-type="bibr" rid="B4">Arora et al. 2017</xref>) to obtain a single vector for a particular poem.</p></list-item>
</list>
<p>Similarly, the following embeddings are derived from the output of BERT, which generates vectors for each token, but also takes into consideration the textual context of the entire input sequence.</p>
<list list-type="simple">
<list-item><p><bold>GBERT-lastlayer-{mean,median,meannorm}:</bold> For a particular poem, we plug in the tokenized poem into GBERT<sub>Base</sub> (<xref ref-type="bibr" rid="B17">Chan et al. 2020</xref>), the currently best performing German BERT model. BERT then computes a contextualized output vector (i.e., the output of the last layer) for each token. We now aggregate all vectors by taking the arithmetic mean (resp. median, resp. meannorm), just like above. This results in a vector with 768 dimensions.</p></list-item>
<list-item><p><bold>GBERT-alllayers-{mean,median,meannorm}:</bold> Defined just like above, except that we not only consider the final output vector, but the outputs of all layers. That is, for each token, we concatenate the input embedding with the 12 Transformer outputs to derive a vector with <italic>d</italic> = 13 &#215; 768. Then, like above, we aggregate this sequence of token vectors into a single vector for a particular poem.</p></list-item>
</list>
<p>In contrast, the following embeddings result from pre-trained language models following a Sentence-BERT-architecture, as proposed by <xref ref-type="bibr" rid="B52">Reimers and Gurevych 2019</xref>, <xref ref-type="bibr" rid="B53">2020</xref>.</p>
<list list-type="simple">
<list-item><p><bold>paraphrase-XLM-R:</bold> This model is a multilingual XLM-RoBERTa model that is fine-tuned to imitate Sentence-BERT-paraphrases, as described by <xref ref-type="bibr" rid="B53">Reimers and Gurevych 2020</xref>. We let paraphrase-XLM-R interpret our poems as sentences, which outputs a vector representation for each poem with <italic>d</italic> = 768. Note that in the case of paraphrase-XLM-R and all following Sentence-BERT models, fine-tuning was only performed for input sequences no longer than 126 SentencePiece tokens. Therefore, we also restrict our evaluation of these models to poems that are no longer than 126 SentencePiece tokens. (These are 29% of all poems in our corpus.)</p></list-item>
<list-item><p><bold>paraphrase-mpnet, paraphrase-MiniLM, cross-en-de-roberta:</bold> Similarly, these are pre-trained Sentence-BERT models trained on a wide variety of sentence pair datasets and parallel multilingual data. Specifically, we use publicly available variants paraphrase-multilingual-mpnet-base-v2, paraphrase-multilingual-MiniLM-L12-v2 provided by <xref ref-type="bibr" rid="B52">Reimers and Gurevych 2019</xref>, and cross-en-de-roberta-sentence-transformer provided by T-Systems online. Again, we use these on poems of length &#8804; 126 SentencePiece tokens to obtain vector representations with <italic>d</italic> = 768.</p></list-item>
</list>
</sec>
<sec id="S4.2">
<title>4.2 Evaluation Setup</title>
<p>Evaluating the embeddings as described above requires us to formulate a task that probes each embedding space for its ability to represent certain dimensions of (dis&#8211;)similarity of poems via their distances in that particular embedding space, taking into consideration and comparing against the human annotations. As the embeddings define no particular distance function, we evaluate every embedding with each of the following three distance functions: Euclidean (L2), Manhattan (L1), and Cosine Distance.</p>
<p>We opted to replicate our prompts for the annotators by formulating a binary classification problem on a particular dimension of similarity, and checking if the model can replicate the majority vote. Note that these votes either take the value &#8216;annotated left&#8217; or &#8216;annotated right&#8217;. Assume some embedding space and some distance function <italic>d</italic> fixed. For some annotated triple, let <italic>left, anchor</italic> and <italic>right</italic> denote the corresponding vectors in that embedding space. Now, we make the following prediction:</p>
<list list-type="bullet">
<list-item><p>Predict &#8216;annotated left&#8217; if <italic>d</italic> (<italic>anchor, left</italic>) &lt; <italic>d(anchor, right)</italic>, i.e., <italic>left</italic> is closer to <italic>anchor</italic> than <italic>right</italic>.</p></list-item>
<list-item><p>Otherwise, predict &#8216;annotated right&#8217;.</p></list-item>
</list>
<p>To compare the true majority annotations with the predicted ones, we use the <italic>balanced accuracy</italic> (arithmetic mean over the recall of both classes, cf. <xref ref-type="bibr" rid="B30">Grandini et al. 2020</xref>) as our metric. Note that the random &#8216;no skill&#8217; classifier has a balanced accuracy score of 0.5.</p>
<p>We remark that variations on the above operationalization are possible as well, particularly if we do not omit cases where the majority of annotators chose &#8216;The middle text is equally (dis)similar to both the left and the right text&#8217;. However, while experimenting we observed that when including this third class &#8216;same&#8217; in the operationalization, the balanced accuracy significantly drops. (We made the different balanced accuracies comparable by rescaling to the range 1/(1&#8211;#classes) to 1, so that performance at random scoring is always at 0.) We suspect that this difference in performance is caused by the complexity of the triples that were labeled with &#8216;annotated same&#8217;: Human annotators agree on the features which make them classify a text as &#8216;more similar to the focus text&#8217;. But the label &#8216;same&#8217; is given when neither of both comparison texts shows obvious similarities to the focus text, but that does not imply that the comparison texts have any features in common; they can be different to the focus text in very diverse ways.</p>
<p>In particular, we experimented with the following two variations of the original operationalization:</p>
<list list-type="simple">
<list-item><p>(a) Probe whether the embedding can predict &#8216;annotated equally (dis)similar&#8217; vs &#8216;annotated left or right&#8217; by evaluating <italic>|d(anchor, left)-d(anchor, right)|&lt;&#1013;</italic> against some optimal decision boundary <italic>&#1013;</italic>.</p></list-item>
<list-item><p>(b) In a 3-class classification setup, probe whether the embedding space admits a classification using an optimal symmetric decision boundary <italic>&#1013;</italic>. That is, predict &#8216;annotated left&#8217; when <italic>d(anchor, right)-d(anchor, left)&gt;&#1013;</italic> (<italic>left</italic> is closer to <italic>anchor</italic> than <italic>right</italic> by at least <italic>&#1013;</italic>). Symmetric, when <italic>d(anchor, left)-d(anchor, right)&gt;&#1013;</italic>, predict &#8216;annotated right&#8217;. And otherwise, when <italic>|d(anchor, left)-d(anchor, right)|</italic>&#8804; <italic>&#1013;</italic>, predict &#8216;annotated equally (dis)similar&#8217;.</p></list-item>
</list>
<p>As outlined above, variant (a) is solved with lower balanced accuracy than the original operationalization throughout all embeddings and variant (b) with even lower accuracy.</p>
</sec>
<sec id="S4.3">
<title>4.3 Results</title>
<p>The results show for all dimensions except &#8216;form&#8217; a clear increase with the complexity of text representation: Word Embeddings are better than sparse representations &#8211; with dynamic embeddings based on BERT showing a better performance than static embeddings &#8211; and sentence embeddings are better than word embeddings. The best sentence embedding is showing an acceptable performance, especially if the cosine is used. As almost all the strategies of text representation, which we applied here, have been developed with the main focus on the semantic aspect, it is not too surprising that the best model is the best in all dimensions. The one big exception is form. Using only a very small set of features is enough to match the annotations. Discussions with the annotators revealed that they usually based their decision on a very small set of observations.</p>
<p>The best model is <italic>paraphrase-mpnet</italic>. To evaluate all German sentence embedding models,<xref ref-type="fn" rid="n11">11</xref> which are available at this moment, we use the about 9,000 sentences of the Sick dataset (<xref ref-type="bibr" rid="B44">Marelli et al. 2014</xref>) which we had translated into German with DeepL. Our results show <italic>paraphrase-XLM-R</italic> (correlation with human annotations: 0.82) slightly ahead of <italic>paraphrase-mpnet</italic> (0.8165), which is why we include these two models and the best model based on static word embeddings (<italic>FastText-mean</italic>) in the next step.</p>
</sec>
<sec id="S4.4">
<title>4.4 Similarity Learning</title>
<p>To adapt the text representations to the specific textual dimensions (content, form, style, and emotion), we additionally apply similarity learning. The goal of this step is to learn a transformation of the vectors presented in the previous chapter that allows for better reproduction of the annotation. We use a siamese neural network (<xref ref-type="bibr" rid="B16">Bromley et al. 1993</xref>) for this purpose, which we modeled following the maaten network structure from (<xref ref-type="bibr" rid="B62">Szubert et al. 2019</xref>). The base model consists of three dense layers (500, 500 and 2,000 neurons) each followed by a normalization activation function (see <xref ref-type="bibr" rid="B37">Klambauer et al. 2017</xref>) and dropout. The input for the network consists of our annotated poem triples.</p>
<p>Regardless of the original size of its vector representation, each poem is transformed into a space with 128 dimensions. The loss, and hence the optimization objective of the network, is to maximize the difference between the focus text and the negative example while also minimizing the difference between the focus text and the positive example, i.e. the text which has been annotated as being more similar to the focus text. In short: in Euclidean distances (dist(anchor, negative) - dist(anchor, positive)).<xref ref-type="fn" rid="n12">12</xref> Learning rate decrease is bound to a <italic>reduce on plateau</italic> mechanism, which leads to strong performance gains compared to more common choices like constant or time-based decrease rates. The network&#8217;s performance is measured via the amount of correctly identified positive examples (accuracy, see <xref ref-type="table" rid="T1">Table 1</xref>).</p>
<fig id="F2">
<label>Figure 2</label>
<caption>
<p>Balanced accuracy score for each model and dimension. Numbers on the x-axis indicate class support. For information on results with other distance metrics see <xref ref-type="fig" rid="F12">Figure 12</xref> in <xref ref-type="app" rid="app1">Appendix A</xref>.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jcls-116_konle-g2.png"/>
</fig>
<fig id="F3">
<label>Figure 3</label>
<caption>
<p>Architecture of the siamese neural network used for similarity learning.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jcls-116_konle-g3.png"/>
</fig>
<table-wrap id="T1">
<label>Table 1</label>
<caption>
<p>Similarity learning results (accuracy in 10-fold cross-validation). Format: best performance before similarity learning (see <xref ref-type="fig" rid="F2">Figure 2</xref>) &#8594; performance afterwards.</p>
</caption>
<table>
<thead>
<tr>
<td colspan="6"><hr/></td>
</tr>
<tr>
<td align="left" valign="top">Model</td>
<td align="center" valign="top">Content</td>
<td align="center" valign="top">Form</td>
<td align="center" valign="top">Style</td>
<td align="center" valign="top">Emotion</td>
<td align="center" valign="top">Overall</td>
</tr>
<tr>
<td colspan="6"><hr/></td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">paraphrase-XLM-R</td>
<td align="center" valign="top">.69&#8594;<bold>.81</bold></td>
<td align="center" valign="top">.58&#8594;.76</td>
<td align="center" valign="top">.66&#8594;<bold>.79</bold></td>
<td align="center" valign="top">.66&#8594;<bold>.76</bold></td>
<td align="center" valign="top">.69&#8594;<bold>.79</bold></td>
</tr>
<tr>
<td align="left" valign="top">paraphrase-mpnet</td>
<td align="center" valign="top">.71&#8594;.75</td>
<td align="center" valign="top">.64&#8594;.68</td>
<td align="center" valign="top">.71&#8594;.71</td>
<td align="center" valign="top">.70&#8594;.74</td>
<td align="center" valign="top">.73&#8594;.74</td>
</tr>
<tr>
<td align="left" valign="top">FastText-mean</td>
<td align="center" valign="top">.66&#8594;.77</td>
<td align="center" valign="top">.59&#8594;.67</td>
<td align="center" valign="top">.65&#8594;.72</td>
<td align="center" valign="top">.66&#8594;.74</td>
<td align="center" valign="top">.66&#8594;.72</td>
</tr>
<tr>
<td align="left" valign="top">Formal-Features</td>
<td align="center" valign="top">-</td>
<td align="center" valign="top"><bold>.81</bold>&#8594;<bold>.81</bold></td>
<td align="center" valign="top">-</td>
<td align="center" valign="top">-</td>
<td align="center" valign="top">-</td>
</tr>
<tr>
<td colspan="6"><hr/></td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec id="S4.5">
<title>4.5 Discussion</title>
<p>With our two-step approach, we are able to achieve good results for a complex task. It is probably open to discussion whether the restriction to 127 input tokens is acceptable compared to the small gain in performance. Future work will either improve on the input size or find a reliable way to compute representations for longer texts. Using one representation for three of the four aspects in the first step made us ask whether the representations after the second step are actually different. The correlations of distances (<xref ref-type="fig" rid="F4">Figure 4</xref>) rate positive correlations between content and style, content and emotion, or style and emotion. In other words, the vector space was attuned to the specific dimension by similarity learning. The close relationship between &#8216;content&#8217; and &#8216;overall&#8217; was already noticed by the annotators.</p>
<p>It is unclear to us, why the different embeddings show significantly different improvements in the second step (mean values): 0.126 for <italic>paraphrase-XLM-R</italic>, 0.026 for <italic>paraphrase-mpnet</italic>, and 0.08 for <italic>FastText-mean</italic>; on what factors does this capability for improvement depend? Which training data and training regime for the sentence embeddings enables the text representation to be adaptable to the text dimensions beyond content?</p>
<p>The results from <xref ref-type="fig" rid="F2">Figure 2</xref><xref ref-type="fn" rid="n13">13</xref> show that the best results are obtained using language models with transformer architecture and that they increase even more if those have previously been fine-tuned for sentence similarity. With the additional adaptation by similarity learning, we now perform a third tuning step of representations created this way. A next step would be instead of using the frozen output vectors of those networks, to include the network in the learning process and model the similarity learning as a fine-tuning step. Likewise, we should add another layer of pertaining before similarity learning and perform a domain adaptation (<xref ref-type="bibr" rid="B31">Gururangan et al. 2020</xref>) to our corpus.</p>
<fig id="F4">
<label>Figure 4</label>
<caption>
<p>Pearson correlation of distances in vector space after similarity learning.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jcls-116_konle-g4.png"/>
</fig>
</sec>
<sec id="S5">
<title>5. (Dis)similarity between the Poetry of Realism and the Poetry of Modernism</title>
<sec id="S5.1">
<title>5.1 Hypotheses from Literary Studies</title>
<p>In the following, we continue a discussion in German literary studies about the relationship of poems of realism to those of early modernism and the special position of naturalistic poetry in this development. We hope to contribute to this discussion by a mix of explorative methods and hypothesis testing. To enable the latter, we will condense positions in the debate into three hypotheses related to this transformation.</p>
<p><bold>Hypothesis 1: The poetry of naturalism, as represented in the anthology <italic>Moderne Dichter-Charaktere</italic>, is predominantly traditional rather than modernist.</bold> The question of where exactly naturalism can be located between realism and modernism has been debated many times. In this context, the anthology <italic>Moderne Dichter-Charaktere</italic>, which is part of our corpus, is considered central to naturalist poetry. The anthology&#8217;s introductions emphatically assert the novelty and revolutionary character of the texts (especially <xref ref-type="bibr" rid="B18">Conradi 1885, pp. I&#8211;III</xref>). Research, on the other hand, is mainly of the opinion that these statements are exaggerated and that the poetry of the anthology is, on the whole, traditional (<xref ref-type="bibr" rid="B65">Vietta 1992, p. 194</xref>; <xref ref-type="bibr" rid="B21">F&#228;hnders 1998, pp. 36&#8211;37</xref>; <xref ref-type="bibr" rid="B60">Sprengel 1998, p. 621</xref>; <xref ref-type="bibr" rid="B5">Austerm&#252;hl 2000, pp. 350&#8211;51</xref>; <xref ref-type="bibr" rid="B41">Lamping 2000, pp. 145&#8211;146</xref>; <xref ref-type="bibr" rid="B1">Andreotti 2014, p. 17</xref>). However, some scholars, even if they consider the anthology as a whole to be traditional, argue that it was at least innovative in terms of <italic>content</italic> since new themes such as &#8216;big cities&#8217; or &#8216;social issues&#8217; were addressed (e.g. <xref ref-type="bibr" rid="B21">F&#228;hnders 1998, pp. 36&#8211;37</xref>).</p>
<p><bold>Hypothesis 2: Modernist poetry is heterogeneous, that is, more heterogeneous than realist poetry.</bold> While the poetry of realism, or at least the mass-produced poetry of this period, is considered by researchers to be relatively homogeneous (e.g <xref ref-type="bibr" rid="B61">Stockinger 2010, p. 88</xref>), modernist poetry is highly diverse, according to many scholars, given the simultaneity of a wide variety of literary movements (<xref ref-type="bibr" rid="B2">Anz 2007, pp. 330&#8211;331</xref>; <xref ref-type="bibr" rid="B9">Becker and Kiesel 2007, p. 30</xref>; <xref ref-type="bibr" rid="B21">F&#228;hnders 1998, pp. IX, 4</xref>). But the hypothesis of modernist heterogeneity has its limitations. For example, some researchers support the view that modernism is homogeneous at least insofar as it responds to the same social-cultural problems (<xref ref-type="bibr" rid="B65">Vietta 1992, pp. 30&#8211;31</xref>:; <xref ref-type="bibr" rid="B21">F&#228;hnders 1998, pp. 9&#8211;10</xref>; <xref ref-type="bibr" rid="B9">Becker and Kiesel 2007, p. 30</xref>; for further statements on the homogeneity of modernist poetry see <xref ref-type="bibr" rid="B25">H. Friedrich 1992, pp. 140&#8211;142</xref>; <xref ref-type="bibr" rid="B42">Lamping 2008, p. 13</xref>). One researcher, therefore, argues that the period around 1900 was characterized by a &#8220;homogeneity of the heterogeneous&#8221; (<xref ref-type="bibr" rid="B21">F&#228;hnders 1998, p. 11</xref>). Despite these limitations, most scholars would probably agree that modernist poetry is at least more heterogeneous than the poetry of realism.</p>
<p><bold>Hypothesis 3: There is a fundamental &#8216;rupture&#8217; between modernist poetry and earlier, more traditional poetry.</bold> This view was already held by contemporary authors, critics, and anthologists, who spoke of a &#8216;revolution&#8217; in poetry (as an example from the corpus anthologies see <xref ref-type="bibr" rid="B12">Bethge 1905, pp. 13&#8211;14</xref>; cf. on contemporary statements <xref ref-type="bibr" rid="B25">H. Friedrich 1992, p. 141</xref>; <xref ref-type="bibr" rid="B2">Anz 2007, p. 333</xref>; <xref ref-type="bibr" rid="B43">Lamping 2012</xref>; <xref ref-type="bibr" rid="B66">Wieland 2019, p. 17</xref>). Many researchers also emphasize major differences between modernism and previous literary periods, often using the metaphor of &#8216;rupture&#8217; (<xref ref-type="bibr" rid="B25">H. Friedrich 1992, p. 20</xref>; <xref ref-type="bibr" rid="B36">Kiesel 2004, pp. 141&#8211;142</xref>; <xref ref-type="bibr" rid="B24">Frick 2007, pp. 97&#8211;98</xref>; <xref ref-type="bibr" rid="B28">Goltschnigg 2007, p. 169</xref>; <xref ref-type="bibr" rid="B43">Lamping 2012</xref>; <xref ref-type="bibr" rid="B1">Andreotti 2014, p. 5</xref>; without this metaphor: <xref ref-type="bibr" rid="B38">Klinger 2002, p. 160</xref>; <xref ref-type="bibr" rid="B41">Lamping 2000, p. 140</xref>; <xref ref-type="bibr" rid="B42">Lamping 2008, pp. 11, 13</xref>). But the &#8216;rupture&#8217;-thesis is also partly qualified. For example, it is emphasized that modernism still refers to traditions (even though it uses them in new ways) (<xref ref-type="bibr" rid="B36">Kiesel 2004, pp. 142&#8211;143</xref>; <xref ref-type="bibr" rid="B24">Frick 2007, pp. 98 &#8211;99</xref>; <xref ref-type="bibr" rid="B28">Goltschnigg 2007, p. 169</xref>). Others argue that many relevant authors were located somewhere <italic>between</italic> realism and modernism or that they combined traditional as well as new elements, which implies a smoother transition between periods (see for C. F. Meyer <xref ref-type="bibr" rid="B57">Selbmann 1999, pp. 149, 152</xref>; for Fontane (!) <xref ref-type="bibr" rid="B58">Selbmann 2007, p. 201</xref>; for Baudelaire, Rilke, Hofmannsthal, and Kafka <xref ref-type="bibr" rid="B43">Lamping 2012</xref>). Still, others relativize the novelty of modernism in general (<xref ref-type="bibr" rid="B33">Hiebel 2005, p. 27</xref>; <xref ref-type="bibr" rid="B2">Anz 2007, p. 333</xref>). Thus, hypothesis 3 is partly controversial in research.</p>
<p>It is possible to combine the aforementioned hypotheses in a visual model. The purpose of this model is threefold: it visually summarizes the research hypotheses, it relates the hypotheses to one another, and it demonstrates that all hypotheses about similarity and dissimilarity combined offer a fairly comprehensive interpretation of the transformation from realism to modernism, again underscoring the relevance of similarity as a category of analysis.</p>
<p>In the model, each point represents a poem. The greater the distances between the points, the more dissimilar the texts. The distances are not based on calculations but on a hermeneutic understanding of the research and are meant as rough approximations of general ideas. One can see that the distances within realism are smaller than the distances within modernism. It is also visible that there is a strong division between realism and modernism and that the naturalist poems tend to gravitate more towards realism than modernism.<xref ref-type="fn" rid="n14">14</xref></p>
<p>Admittedly, this model is not explicitly advocated in research. Only rarely does a single scholar state all the hypotheses that the model synthesizes. Like any model, it represents only a section of reality and neglects other aspects, such as the differentiation of individual dimensions of similarity, or synchronic and diachronic period-internal differentiations of, for example, individual authors, groups of authors, or literary movements. Some aspects of the model are, as explained, controversial in research, but it is all the more interesting to examine whether our results fit the model and the underlying hypotheses.</p>
<fig id="F5">
<label>Figure 5</label>
<caption>
<p>Model of the distances between poems of realism, naturalism and modernism according to research.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jcls-116_konle-g5.png"/>
</fig>
</sec>
<sec id="S5.2">
<title>5.2 Results</title>
<p>For a first exploration of the (dis)similarities between realism and modernism, we project the poems into a two-dimensional space (<xref ref-type="fig" rid="F6">Figure 6</xref>). Some similarities with the model derived from research (<xref ref-type="fig" rid="F5">Figure 5</xref>) become apparent. In particular, a distinction between realism and modernism is evident, even though the separation is far from perfect since there are numerous overlaps between the two periods. Furthermore, it is consistent with the research model that the naturalist poems tend to stay within the realist spectrum and hardly enter &#8216;decidedly modernist&#8217; areas. However, it is necessary to test the hypotheses from literary studies more precisely than just by explorative means.</p>
<p><bold>Hypothesis 1: The poetry of naturalism, as represented in the anthology <italic>Moderne Dichter-Charaktere</italic>, is predominantly traditional rather than modernist</bold> To test the first hypothesis, we examined the similarity of the programmatically naturalistic anthology <italic>Moderne Dichter-Charaktere</italic> to the realism and modernism corpora. In addition, we measured the distances between the poems within the latter two corpora to be able to assess the comparative analyses more accurately.</p>
<p>The boxplots show that overall and for each individual dimension &#8216;content&#8217;, &#8216;form&#8217;, &#8216;style&#8217;, and &#8216;emotion&#8217; the distances between naturalism and realism are smaller than the distances<xref ref-type="fn" rid="n15">15</xref> between naturalism and modernism. At the same time, the distance between the naturalism and realism corpus is larger than the distance between the poems within the realism corpus. Surprisingly, in the dimension &#8216;content&#8217; no higher proximity to the modernism corpus is seen.</p>
<fig id="F6">
<label>Figure 6</label>
<caption>
<p>Poems embedded with both vanilla <italic>GBERT-alllayers-meannorm</italic> (see <xref ref-type="fig" rid="F2">Figure 2</xref>) and <italic>FastText-meannorm</italic> transformed to reflect the aspect &#8216;content&#8217; (see <xref ref-type="table" rid="T1">Table 1</xref>) projected in 2-dimensional space using UMAP (<xref ref-type="bibr" rid="B46">McInnes et al. 2018</xref>).</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jcls-116_konle-g6.png"/>
</fig>
<p>A stronger similarity between naturalist and modernist poems would have been expected based on the literary-historical theses we have mentioned above. As expected, the analyses support the thesis that the naturalism corpus is more similar to the realism corpus than to the modernist corpus. However, a more detailed look shows that differences can be found in the individual dimensions. This could indicate that the naturalistic poems probably do not use the same means as realistic poems. What exactly these differences are should be investigated in a further study. However, equating naturalist with realist poetry falls short in any case since the internal distance in the realism corpus is smaller than that in the comparison between naturalism and realism. It should be emphasized that we have studied the effect only for the anthology <italic>Moderne Dichter-Charaktere</italic> and only using its short poems, as stated above. Further study would have to take into account that the modernism corpus also contains some naturalistic poems.</p>
<fig id="F7">
<label>Figure 7</label>
<caption>
<p>Distances between poems from Realism/Naturalism and Modernism/Naturalism and poems within Realism and Modernism. Distances in &#8216;content&#8217;, &#8216;style&#8217;, &#8216;emotion&#8217; and &#8216;overall&#8217; are measured in the space of <italic>paraphrase-XLM-R</italic> embeddings transformed via similarity learning (see <xref ref-type="sec" rid="S4.4">section 4.4</xref>). Distances in &#8216;form&#8217; are measured in the Feature-Form embedding space (see <xref ref-type="sec" rid="S4.1">section 4.1</xref>). Each boxplot represents pairwise euclidean distances of 2,000 samples with a size of 20 poems.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jcls-116_konle-g7.png"/>
</fig>
<fig id="F8">
<label>Figure 8</label>
<caption>
<p>Distances within poems from Realism, Modernism and canonic Modernism. Distances in &#8216;content&#8217;, &#8216;style&#8217;, &#8216;emotion&#8217; and &#8216;overall&#8217; are measured in the space of <italic>paraphrase-XLM-R</italic> embeddings transformed via similarity learning (see <xref ref-type="sec" rid="S4.4">section 4.4</xref>). Distances in &#8216;form&#8217; are measured in the Feature-Form embedding space (see <xref ref-type="sec" rid="S4.1">section 4.1</xref>). Each boxplot represents pairwise euclidean distances of 2,000 samples with a size of 20 poems.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jcls-116_konle-g8.png"/>
</fig>
<p><bold>Hypothesis 2: Modernist poetry is heterogeneous, that is, more heterogeneous than realist poetry.</bold> From now on, when we compare realism with modernism, we no longer include the naturalist poems in our calculations and visualizations, since we have seen that naturalism is located somewhere between realism and modernism. However, we now distinguish in modernist texts canonical and non-canonical authors in order to point out some peculiarities of the canonical poems.<xref ref-type="fn" rid="n16">16</xref></p>
<p>To test hypothesis 2, we compare the distances within realism with those within modernism (<xref ref-type="fig" rid="F8">Figure 8</xref>). In all dimensions, the distances within modernism are greater than in realism, most clearly in the dimension &#8216;form&#8217;. Thus, the hypothesis that modernist poetry is more heterogeneous than realist poetry can be confirmed by our data. However, the differences in heterogeneity are mostly small and should not be overemphasized. Modernist poems by canonical authors are slightly more heterogeneous than non-canonical poems regarding the dimension &#8216;form&#8217;. Otherwise, the canonical poems are not characterized by greater distances among themselves than non-canonical modernist poems. On the contrary, the distances for the dimensions of style and especially emotion are much smaller within the canonical texts than within the non-canonical modernist poems. All in all, the canonical texts are no more heterogeneous than the non-canonical ones. This is surprising, since one might have expected a particularly high degree of individuality and thus heterogeneity in the canon. In any case, it must be kept in mind that the subcorpus of canonical modernist poems is very small (58 poems, 5 authors), which limits the validity of the results. Further research is needed here.</p>
<p><bold>Hypothesis 3: There is a fundamental &#8216;rupture&#8217; between modernist poetry and earlier, more traditional poetry.</bold> Some difficulties arise in testing hypothesis 3. It is not clear what exactly is meant by &#8216;rupture&#8217; and how we should measure it. One possibility is to assume that the term &#8216;rupture&#8217; in hypothesis 3 denotes a certain kind of literary change, namely a change that (a) is particularly large compared to other changes between periods (e.g. between romanticism and realism) and that (b) occurs abruptly, i.e. in a very short period of time. However, we do not have data on other changes between literary periods and we cannot analyse whether the shift between realism and modernism was a continuous, decades-long process or a matter of a few years.</p>
<p>So while we are not able to test hypothesis 3 directly, we can at least share some observations that are likely to be related. In particular, we will compare the distances between realism and modernism with distances within realism. If the distances between realism and modernism are greater than within realism, it can at least be said that modernism is different from realism.</p>
<p>In all dimensions, the distances between realism and modernism are larger than the distances within realism. However, the differences are not enormous. Moreover, the two-dimensional plot above (<xref ref-type="fig" rid="F6">Figure 6</xref>) shows that modernist poems appear not only outside realism, but often within the realist spectrum as well. While these observations cannot falsify hypothesis 3 directly, they certainly do not confirm the hypothesis either. If anything, our results suggest that the notion of a fundamental &#8216;rupture&#8217; between realism and modernism might be exaggerated.</p>
<fig id="F9">
<label>Figure 9</label>
<caption>
<p>Distances within poems from Realism and between Realism/non-canon Modernism, Realism/canonic Modernism and non-canon Modernism/canonic Modernism. Distances in &#8216;content&#8217;, &#8216;style&#8217;, &#8216;emotion&#8217; and &#8216;overall&#8217; are measured in the space of <italic>paraphrase-XLM-R</italic> embeddings transformed via similarity learning (see <xref ref-type="sec" rid="S4.4">section 4.4</xref>). Distances in &#8216;form&#8217; are measured in the Feature-Form embedding space (see <xref ref-type="sec" rid="S4.1">section 4.1</xref>). Each boxplot represents pairwise euclidean distances of 2,000 samples with a size of 20 poems.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jcls-116_konle-g9.png"/>
</fig>
<fig id="F10">
<label>Figure 10</label>
<caption>
<p>Graph timelines for the &#8216;content&#8217; and &#8216;form&#8217; dimensions based on the mean pairwise similarities of 30 poems, sampled for each 5-year time span, based on the similarity-adapted vectors of <italic>paraphrase-XLM-R</italic> (content) and the formal feature vectors (form). See <xref ref-type="fig" rid="F13">Figure 13</xref> in <xref ref-type="app" rid="app1">Appendix A</xref> for a larger version of this figure.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jcls-116_konle-g10.png"/>
</fig>
<p>One could assume that researchers use the metaphor of &#8216;rupture&#8217; because they focus on other, namely canonical texts. The distance between realism and canonical modernism is indeed larger than the distance between realism and non-canonical modernism regarding the form, and at least a tiny bit larger for the dimensions &#8216;content&#8217; and &#8216;overall&#8217;. But in terms of style, canonical modernism is no further from realism than non-canonical modernism, and in regards to emotion, the distance between realism and canonical modernism is even smaller than between realism and non-canonical modernism. Thus, our results do not show that the distances between canonical modernism and realism are systematically larger than between non-canonical modernism and realism. The idea that the canonical texts set a trend that the non-canonical texts follow, just not as decisively, cannot be confirmed.</p>
<p>One might expect the canonical modernist poems to be at least closer to the non-canonical modernist texts than to the realist poems, but this is not true either, according to our data: The distances from canonical modernist poems to realist texts on the one hand and to non-canonical modernist texts on the other hand do not differ significantly. In the case of the dimension &#8216;form&#8217;, the canonical modernist poems are even closer to the realist ones than to the non-canonical modernist ones.</p>
<p>The results for the canon are counter-intuitive and call for further research. Again, our observations may have something to do with the fact that our subcorpus of canonical texts is very small and that we only analyze short poems.</p>
<p>To further explore the differences between modernist and realist poetry in our vector space, we constructed a timeline from a graph network. The network was created using all pairwise distances (or similarities more precisely) between the document vectors. For all dimensions except &#8216;form&#8217;, the distances are based on the vectors of <italic>paraphrase-XLM-R</italic>, after the adaptation with similarity learning. For &#8216;form&#8217;, only the formal feature vector similarities were used. All distances were standardized per dimension to lie between 0 and 1 (due to the different metrics used to determine the vector distances).</p>
<p>Each node in the graphs represents a span of 5 years (i.e. 1865 for the span 1863-1867). The edge between two year slices is depicted by the mean distances of a sample of 30 poems &#8211; if less than 30 poems were available, poems were drawn multiple times. The alpha of one edge between two years visualizes the degree of their similarity based on the chosen poems. We only used poems where the corresponding years were manually checked and corrected by us if necessary. This amounted to 321 poems between 1845 and 1911 specifically.</p>
<fig id="F11">
<label>Figure 11</label>
<caption>
<p>Graph timelines for the &#8216;emotion&#8217; and &#8216;style&#8217; dimensions based on the mean pairwise similarities of 30 poems, sampled for each 5-year time span, based on the similarity-adapted vectors of <italic>paraphrase-XLM-R</italic>. See <xref ref-type="fig" rid="F14">Figure 14</xref> in <xref ref-type="app" rid="app1">Appendix A</xref> for a larger version of this figure.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jcls-116_konle-g11.png"/>
</fig>
<p>From this visualization which is based not on the assignment of the poems to a period by the editors of the anthologies, but on the publication date of the poems, we can make some observations. In terms of form, we can surmise from the timeline that realist poems are more similar to each other and thus more homogenous than modernist poems are (coinciding with our findings from hypothesis 2). Additionally, the further the nodes are away from realism, the weaker the similarity becomes, implying that later modernist poems become even more estranged from the form of realist poems. The networks for content and style seem similar: both suggest a kind of split between the epochs, hinting at the possibility that modernist and realist poetry have a higher inter- than extra-epochal similarity (coinciding with our findings from hypothesis 3). The timelines could potentially not only help with identifying whether a rupture between the epochs exists or not but also when exactly such a rupture occurs. While &#8216;style&#8217; shows its split around 1880, the split for &#8216;content&#8217; appears to be at around 1885, implying that the change from realism to modernism first became apparent in style and then in content. For &#8216;emotion&#8217;, we cannot discover any kind of pattern in the timeline, suggesting that emotions thematized or expressed in the poems might contribute to a continuity between the two epochs.</p>
<p>In summary, we were able to confirm some important hypotheses from literary studies, while differentiating or relativizing others. Our data supports the view that naturalist poetry is closer to realism than to modernism; however, simply equating naturalist and realist poetry would not be appropriate. We showed that modernist poetry is indeed more heterogeneous than realist poetry, even though the differences are limited. Finally, our findings suggest that the change from realism to modernism was an evolutionary transition rather than a revolutionary disruption. The results encourage increased attention in literary history to processes of gradual, limited change, rather than thinking only in terms of either stasis or rupture.</p>
<p>The assumptions made in this section are still only based on exploratory visualizations and comparatively little data. Subsequent research could expand this subcorpus of year-annotated poems (most importantly including longer poems as already mentioned) while further research questions could investigate these assumptions, e.g. whether the rupture between the epochs could have happened at slightly different points in time for different dimensions or whether &#8216;form&#8217; really is the most suitable dimension to measure homogeneity and heterogeneity within realism and modernism for example.</p>
<p>In a recent article (<xref ref-type="bibr" rid="B64">Underwood and So 2021</xref>) discuss the question of whether the mapping of cultural artifacts to some spatial representation is not &#8216;distorting&#8217; them, whether cultural relationships obey a spatial logic at all. Their experiments show that even if we have some seemingly convincing arguments against this kind of mapping, we accumulate more and more empirical evidence that it works very often astonishingly well. Our paper adds to this evidence: Textual representations in high-dimensional space seem well-suited to express even complex text models though more empirical work may expose its shortcomings in the future. In the meantime, we hope our approach can be used to reevaluate our understanding of the fundamental concept of similarity, not only in Computational Literary Studies.</p>
</sec>
</sec>
<sec id="S6">
<title>6. Data Availability</title>
<p>Data can be found here: <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://github.com/cophi-wue/jcls2022-poem-similarity">https://github.com/cophi-wue/jcls2022-poem-similarity</ext-link></p>
</sec>
<sec id="S7">
<title>7. Software Availability</title>
<p>Software can be found here: <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://github.com/cophi-wue/jcls2022-poem-similarity">https://github.com/cophi-wue/jcls2022-poem-similarity</ext-link></p>
</sec>
</body>
<back>
<sec id="S8">
<title>8. Acknowledgements</title>
<p>This work was funded by the Deutsche Forschungsgemeinschaft as part of the SPP 2207 Computational Literary Studies in the project <italic>The beginnings of modern poetry &#8211; Modeling literary history with text similarities</italic>.</p>
</sec>
<sec id="S9">
<title>9. Author Contributions</title>
<p><bold>Anton Ehrmanntraut:</bold> Software, Writing &#8211; original draft</p>
<p><bold>Thora Hagen:</bold> Visualization, Writing &#8211; original draft</p>
<p><bold>Fotis Jannidis:</bold> Conceptualization, Supervision, Writing &#8211; original draft, Funding acquisition</p>
<p><bold>Leonard Konle:</bold> Formal Analysis, Software, Writing &#8211; original draft</p>
<p><bold>Merten Kr&#246;ncke:</bold> Data curation, Methodology, Writing &#8211; original draft</p>
<p><bold>Simone Winko:</bold> Data Curation, Conceptualization, Supervision, Writing &#8211; original draft, Funding acquisition</p>
</sec>
<app-group>
<app id="app1">
<title>A. Appendix</title>
<fig id="F12">
<label>Figure 12</label>
<caption>
<p>Balanced accuracy score for each model and dimension and three distance metrics: L1, Cosine, and L2. Number on the x-axis indicate class support.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jcls-116_konle-g12.png"/>
</fig>
<fig id="F13">
<label>Figure 13</label>
<caption>
<p>Graph timelines for the &#8216;content&#8217; and &#8216;form&#8217; dimensions based on the mean pairwise similarities of 30 poems, sampled for each 5-year time span, based on the similarity-adapted vectors of <italic>paraphrase-XLM-R</italic> (content) and the formal feature vectors (form).</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jcls-116_konle-g13.png"/>
</fig>
<fig id="F14">
<label>Figure 14</label>
<caption>
<p>Graph timelines for the &#8216;emotion&#8217; and &#8216;style&#8217; dimensions based on the mean pairwise similarities of 30 poems, sampled for each 5-year time span, based on the similarity-adapted vectors of <italic>paraphrase-XLM-R</italic>.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jcls-116_konle-g14.png"/>
</fig>
</app>
</app-group>
<fn-group>
<fn id="n1"><p>Why emotion is a dimension of its own for the analysis of text is discussed in (<xref ref-type="bibr" rid="B68">Winko 2003</xref>).</p></fn>
<fn id="n2"><p>Since this epoch is characterized by a multitude of literary trends, the more neutral label &#8216;turn of the century around 1900&#8217; is preferred in literary studies. We choose the term &#8216;modernism&#8217; because the anthologies we include claim to present modern poetry. In the following, &#8216;modernism&#8217; always means &#8216;early modernism&#8217;, i.e. literature before expressionism.</p></fn>
<fn id="n3"><p>For our corpus selection we used G&#252;nter H&#228;ntzschel&#8217;s comprehensive bibliography (<xref ref-type="bibr" rid="B32">H&#228;ntzschel 1991</xref>).</p></fn>
<fn id="n4"><p>As the annotation is still ongoing to cover more poems, the entire corpus and a detailed report on the annotation guidelines for emotions and genre will be published at a later date.</p></fn>
<fn id="n5"><p>The annotation guidelines can be found here: <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://github.com/cophi-wue/jcls2022-poem-similarity/blob/main/annotation_guidelines_text_similarity.pdf">https://github.com/cophi-wue/jcls2022-poem-similarity/blob/main/annotation_guidelines_text_similarity.pdf</ext-link>.</p></fn>
<fn id="n6"><p>The length of poems is bound to a maximum of 124 sentence-piece tokens used as input for paraphrase-xlm-r-multilingual-v1.</p></fn>
<fn id="n7"><p>More precisely, for each triple and similarity dimension, we calculate the mode of the annotation results. We use &#8216;The middle text is more similar to the left text&#8217; (from now on: &#8216;left&#8217;) as the final annotation if the mode is &#8216;left&#8217;, but also if it is &#8216;left&#8217; and at the same time &#8216;The middle text is equally (dis)similar to both the left and the right text&#8217;. The same is true in reverse for annotations on the right. All other annotations are discarded.</p></fn>
<fn id="n8"><p>B&#228;r et al. (<xref ref-type="bibr" rid="B8">2015</xref>) distinguish between compositional measures, which usually &#8220;compute pairwise word similarity between all words, and aggregate the resulting scores to an overall similarity score&#8221; (<xref ref-type="bibr" rid="B8">B&#228;r et al. 2015, p. 5</xref>), and non-compositional measures, which project the texts into a shared space like the vector space model (<xref ref-type="bibr" rid="B55">Salton and McGill 1983</xref>). We concentrate here on the latter.</p></fn>
<fn id="n9"><p>The model achieves a performance of 0.73 (f1 score).</p></fn>
<fn id="n10"><p>See: <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://www.deepset.ai/german-word-embeddings">https://www.deepset.ai/german-word-embeddings</ext-link>.</p></fn>
<fn id="n11"><p>The multilingual models in Huggingface&#8217;s sentence transformers; see <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://huggingface.co/sentence-transformers">https://huggingface.co/sentence-transformers</ext-link>.</p></fn>
<fn id="n12"><p>Triplet margin loss.</p></fn>
<fn id="n13"><p>Motivated by feedback during the conference, we additionally tested the combination of paraphrase-XLM-R and Formal-Features. This variant leads to an accuracy of 0.78 for the form aspect. It improves the result of paraphrase-XLM-R slightly, but remains below the value achieved by Formal-Features alone.</p></fn>
<fn id="n14"><p>Distances within naturalism should not be given any further significance; no research hypotheses were considered in this regard.</p></fn>
<fn id="n15"><p>We tested for significance and all differences are highly significant. To make sure this is not solely an effect of the large sample size we randomly selected 100 texts, but the differences stay significant. New guidelines usually recommend complementing p-values with effect size. In our case this is not easy to apply, because the measure is not grounded in an intuitively comprehensible unit.</p></fn>
<fn id="n16"><p>In our study, in accordance with German literary history, Stefan George (6 poems), Hugo von Hofmannsthal (6 poems), Arno Holz (19 poems), Else Lasker-Sch&#252;ler (3 poems), and Rainer Maria Rilke (24 poems) represent canonical modernism.</p></fn>
</fn-group>
<ref-list>
<ref id="B1"><label>1</label><mixed-citation publication-type="book"><string-name><surname>Andreotti</surname>, <given-names>Mario</given-names></string-name> (<year>2014</year>). <source>Die Struktur der modernen Literatur. Neue Formen und Techniken des Schreibens: Erz&#228;hlprosa und Lyrik</source>. <edition>5th</edition> ed. <publisher-name>Haupt Verlag</publisher-name>.</mixed-citation></ref>
<ref id="B2"><label>2</label><mixed-citation publication-type="book"><string-name><surname>Anz</surname>, <given-names>Thomas</given-names></string-name> (<year>2007</year>). <chapter-title>&#8220;Thesen zur expressionistischen Moderne&#8221;</chapter-title>. In: <source>Literarische Moderne. Begriff und Ph&#228;nomen</source>. Ed. by <string-name><given-names>Sabina</given-names> <surname>Becker</surname></string-name> and <string-name><given-names>Helmuth</given-names> <surname>Kiesel</surname></string-name>. <publisher-name>De Gruyter</publisher-name>, pp. <fpage>329</fpage>&#8211;<lpage>346</lpage>.</mixed-citation></ref>
<ref id="B3"><label>3</label><mixed-citation publication-type="book"><string-name><surname>Arent</surname>, <given-names>Wilhelm</given-names></string-name>, ed. (<year>1885</year>). <source>Moderne Dichter-Charaktere</source>. <publisher-name>Kanzlah</publisher-name>.</mixed-citation></ref>
<ref id="B4"><label>4</label><mixed-citation publication-type="webpage"><string-name><surname>Arora</surname>, <given-names>Sanjeev</given-names></string-name>, <string-name><given-names>Yingyu</given-names> <surname>Liang</surname></string-name>, and <string-name><given-names>Tengyu</given-names> <surname>Ma</surname></string-name> (<year>2017</year>). <chapter-title>&#8220;A Simple but Tough-to-Beat Baseline for Sentence Embeddings&#8221;</chapter-title>. In: <source>5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings</source>. URL: <uri>https://openreview.net/pdf?id=SyK00v5xx</uri> (visited on 11/08/2022).</mixed-citation></ref>
<ref id="B5"><label>5</label><mixed-citation publication-type="book"><string-name><surname>Austerm&#252;hl</surname>, <given-names>Elke</given-names></string-name> (<year>2000</year>). <chapter-title>&#8220;Lyrik der Jahrhundertwende&#8221;</chapter-title>. In: <source>Naturalismus, Fin de si&#232;cle, Expressionismus, 1890&#8211;1918</source>. Ed. by <string-name><given-names>York-Gothart</given-names> <surname>Mix</surname></string-name>. Hansers Sozialgeschichte der deutschen Literatur vom 16. <publisher-name>Jahrhundert bis zur Gegenwart 7. Carl Hanser</publisher-name>, pp. <fpage>350</fpage>&#8211;<lpage>366</lpage>.</mixed-citation></ref>
<ref id="B6"><label>6</label><mixed-citation publication-type="book"><string-name><surname>Avenarius</surname>, <given-names>Ferdinand</given-names></string-name>, ed. (<year>1882</year>). <source>Deutsche Lyrik der Gegenwart seit 1850. Eine Anthologie mit biographischen und bibliographischen Notizen. Aus den Quellen</source>. <publisher-name>Ehlermann</publisher-name>.</mixed-citation></ref>
<ref id="B7"><label>7</label><mixed-citation publication-type="webpage"><string-name><surname>B&#228;r</surname>, <given-names>Daniel</given-names></string-name>, <string-name><given-names>Torsten</given-names> <surname>Zesch</surname></string-name>, and <string-name><given-names>Iryna</given-names> <surname>Gurevych</surname></string-name> (<year>2011</year>). <chapter-title>&#8220;A Reflective View on Text Similarity&#8221;</chapter-title>. In: <source>Proceedings of Recent Advances in Natural Language Processing</source>, pp. <fpage>515</fpage>&#8211;<lpage>520</lpage>. URL: <uri>https://aclanthology.org/R11-1071</uri> (visited on 04/22/2022).</mixed-citation></ref>
<ref id="B8"><label>8</label><mixed-citation publication-type="webpage"><string-name><surname>B&#228;r</surname>, <given-names>Daniel</given-names></string-name>, <string-name><given-names>Torsten</given-names> <surname>Zesch</surname></string-name>, and <string-name><given-names>Iryna</given-names> <surname>Gurevych</surname></string-name> (<year>2015</year>). <source>Composing Measures for Computing Text Similarity</source>. URL: <uri>https://tuprints.ulb.tu-darmstadt.de/4342/</uri> (visited on 04/22/2022).</mixed-citation></ref>
<ref id="B9"><label>9</label><mixed-citation publication-type="book"><string-name><surname>Becker</surname>, <given-names>Sabina</given-names></string-name> and <string-name><given-names>Helmuth</given-names> <surname>Kiesel</surname></string-name> (<year>2007</year>). <chapter-title>&#8220;Literarische Moderne. Begriff und Ph&#228;nomen&#8221;</chapter-title>. In: <source>Literarische Moderne. Begriff und Ph&#228;nomen</source>. Ed. by <string-name><given-names>Sabina</given-names> <surname>Becker</surname></string-name> and <string-name><given-names>Helmuth</given-names> <surname>Kiesel</surname></string-name>. <publisher-name>De Gruyter</publisher-name>, pp. <fpage>9</fpage>&#8211;<lpage>36</lpage>.</mixed-citation></ref>
<ref id="B10"><label>10</label><mixed-citation publication-type="book"><string-name><surname>Benzmann</surname>, <given-names>Hans</given-names></string-name>, ed. (<year>1904</year>). <source>Moderne deutsche Lyrik</source>. <publisher-name>Reclam</publisher-name>.</mixed-citation></ref>
<ref id="B11"><label>11</label><mixed-citation publication-type="book"><string-name><surname>Bern</surname>, <given-names>Maximilian</given-names></string-name>, ed. (<year>1877</year>). <source>Deutsche Lyrik seit Goethes Tode</source>. <publisher-name>Reclam</publisher-name>.</mixed-citation></ref>
<ref id="B12"><label>12</label><mixed-citation publication-type="book"><string-name><surname>Bethge</surname>, <given-names>Hans</given-names></string-name>, ed. (<year>1905</year>). <source>Deutsche Lyrik seit Liliencron</source>. <publisher-name>Hesse</publisher-name>.</mixed-citation></ref>
<ref id="B13"><label>13</label><mixed-citation publication-type="book"><string-name><surname>Bierbaum</surname>, <given-names>Otto Julius</given-names></string-name>, ed. (<year>1893</year>). <source>Moderner Musenalmanach auf das Jahr 1893</source>. <publisher-name>Albert</publisher-name>.</mixed-citation></ref>
<ref id="B14"><label>14</label><mixed-citation publication-type="book"><string-name><surname>Bierbaum</surname>, <given-names>Otto Julius</given-names></string-name> ed. (<year>1894</year>). <source>Moderner Musenalmanach auf das Jahr 1894</source>. <publisher-name>Albert</publisher-name>.</mixed-citation></ref>
<ref id="B15"><label>15</label><mixed-citation publication-type="book"><string-name><surname>Bonsels</surname>, <given-names>Waldemar</given-names></string-name>, <string-name><given-names>Hans</given-names> <surname>Brandenburg</surname></string-name>, <string-name><given-names>Bernd</given-names> <surname>Isemann</surname></string-name>, and <string-name><given-names>Will</given-names> <surname>Vesper</surname></string-name>, eds. (<year>1905</year>). <source>Die Erde</source>. <publisher-name>Bonsels</publisher-name>.</mixed-citation></ref>
<ref id="B16"><label>16</label><mixed-citation publication-type="journal"><string-name><surname>Bromley</surname>, <given-names>James</given-names></string-name>, <string-name><given-names>Isabelle</given-names> <surname>Guyon</surname></string-name>, <string-name><given-names>Yann</given-names> <surname>LeCun</surname></string-name>, <string-name><given-names>Eduard</given-names> <surname>Sackinger</surname></string-name>, and <string-name><given-names>Roopak</given-names> <surname>Shah</surname></string-name> (<year>1993</year>). <article-title>&#8220;Signature Verification Using a &#8216;Siamese&#8217; Time Delay Neural Network&#8221;</article-title>. In: <source>Proceedings of the 6th International Conference on Neural Information Processing Systems</source>, pp. <fpage>737</fpage>&#8211;<lpage>744</lpage>.</mixed-citation></ref>
<ref id="B17"><label>17</label><mixed-citation publication-type="journal"><string-name><surname>Chan</surname>, <given-names>Branden</given-names></string-name>, <string-name><given-names>Stefan</given-names> <surname>Schweter</surname></string-name>, and <string-name><given-names>Timo</given-names> <surname>M&#246;ller</surname></string-name> (<year>2020</year>). <article-title>&#8220;German&#8217;s Next Language Model&#8221;</article-title>. In: <source>arXiv preprint</source>. doi: <pub-id pub-id-type="doi">10.48550/arxiv.2010.10906</pub-id>.</mixed-citation></ref>
<ref id="B18"><label>18</label><mixed-citation publication-type="book"><string-name><surname>Conradi</surname>, <given-names>Hermann</given-names></string-name> (<year>1885</year>). <chapter-title>&#8220;Unser Credo&#8221;</chapter-title>. In: <source>Moderne Dichter-Charaktere</source>. Ed. by <string-name><given-names>Wilhelm</given-names> <surname>Arent</surname></string-name>. <publisher-name>Wilhelm Friedrich</publisher-name>, pp. <fpage>I</fpage>&#8211;<lpage>IV</lpage>.</mixed-citation></ref>
<ref id="B19"><label>19</label><mixed-citation publication-type="book"><string-name><surname>Corbineau-Hoffmann</surname>, <given-names>Angelika</given-names></string-name> (<year>2013</year>). <source>Einf&#252;hrung in die Komparatistik</source>. <edition>3rd</edition> ed. <publisher-name>Erich Schmidt</publisher-name>.</mixed-citation></ref>
<ref id="B20"><label>20</label><mixed-citation publication-type="webpage"><string-name><surname>Ehrmanntraut</surname>, <given-names>Anton</given-names></string-name>, <string-name><given-names>Thora</given-names> <surname>Hagen</surname></string-name>, <string-name><given-names>Leonard</given-names> <surname>Konle</surname></string-name>, and <string-name><given-names>Fotis</given-names> <surname>Jannidis</surname></string-name> (<year>2021</year>). <article-title>&#8220;Typeand Token-based Word Embeddings in the Digital Humanities&#8221;</article-title>. In: <source>Proceedings of the Conference on Computational Humanities Research, CHR2021</source>, pp. <fpage>16</fpage>&#8211;<lpage>38</lpage>. URL: <uri>http://ceur-ws.org/Vol-2989/long_paper35.pdf</uri> (visited on 04/22/2022).</mixed-citation></ref>
<ref id="B21"><label>21</label><mixed-citation publication-type="book"><string-name><surname>F&#228;hnders</surname>, <given-names>Walter</given-names></string-name> (<year>1998</year>). <source>Avantgarde und Moderne 1890-1933</source>. <publisher-name>Lehrbuch Germanistik. J. B. Metzler</publisher-name>.</mixed-citation></ref>
<ref id="B22"><label>22</label><mixed-citation publication-type="book"><string-name><surname>Federmann</surname>, <given-names>Herta</given-names></string-name>, ed. (<year>1908</year>). <source>Der Schatzbehalter</source>. <publisher-name>Steinicke &amp; Lehmkuhl</publisher-name>.</mixed-citation></ref>
<ref id="B23"><label>23</label><mixed-citation publication-type="book"><string-name><surname>Felski</surname>, <given-names>Rita</given-names></string-name> and <string-name><given-names>Susan Stanford</given-names> <surname>Friedman</surname></string-name>, eds. (<year>2013</year>). <source>Comparison</source>. <publisher-loc>Baltimore</publisher-loc>: <publisher-name>John Hopkins University Press</publisher-name>.</mixed-citation></ref>
<ref id="B24"><label>24</label><mixed-citation publication-type="book"><string-name><surname>Frick</surname>, <given-names>Werner</given-names></string-name> (<year>2007</year>). <chapter-title>&#8220;Avantgarde und longue dur&#233;e. &#220;berlegungen zum Traditionsverbrauch der klassischen Moderne&#8221;</chapter-title>. In: <source>Literarische Moderne. Begriff und Ph&#228;nomen</source>. Ed. by <string-name><given-names>Sabine</given-names> <surname>Becker</surname></string-name> and <string-name><given-names>Helmuth</given-names> <surname>Kiesel</surname></string-name>. <publisher-name>De Gruyter</publisher-name>, pp. <fpage>97</fpage>&#8211;<lpage>112</lpage>.</mixed-citation></ref>
<ref id="B25"><label>25</label><mixed-citation publication-type="book"><string-name><surname>Friedrich</surname>, <given-names>Hugo</given-names></string-name> (<year>1992</year>). <source>Die Struktur der modernen Lyrik. Von der Mitte des neunzehnten bis zur Mitte des zwanzigsten Jahrhunderts</source>. <publisher-name>Rowohlt</publisher-name>.</mixed-citation></ref>
<ref id="B26"><label>26</label><mixed-citation publication-type="book"><string-name><surname>Friedrich</surname>, <given-names>Paul</given-names></string-name>, ed. (<year>1911</year>). <source>Neuland. Ein Buch j&#252;ngstdeutscher Lyrik. Borngr&#228;ber</source>.</mixed-citation></ref>
<ref id="B27"><label>27</label><mixed-citation publication-type="book"><string-name><surname>Gemmel</surname>, <given-names>Ludwig</given-names></string-name>, ed. (<year>1898</year>). <source>Die Perlenschnur. Eine Anthologie moderner Lyrik</source>. <publisher-name>Schuster &amp; Loeffler</publisher-name>.</mixed-citation></ref>
<ref id="B28"><label>28</label><mixed-citation publication-type="book"><string-name><surname>Goltschnigg</surname>, <given-names>Dietmar</given-names></string-name> (<year>2007</year>). <chapter-title>&#8220;Traditionszusammenh&#228;nge der &#246;sterreichischen Moderne (am Beispiel der Heine- und B&#252;chner-Rezeption)&#8221;</chapter-title>. In: <source>Literarische Moderne. Begriff und Ph&#228;nomen</source>. Ed. by <string-name><given-names>Sabine</given-names> <surname>Becker</surname></string-name> and <string-name><given-names>Helmuth</given-names> <surname>Kiesel</surname></string-name>. <publisher-name>De Gruyter</publisher-name>, pp. <fpage>169</fpage>&#8211;<lpage>180</lpage>.</mixed-citation></ref>
<ref id="B29"><label>29</label><mixed-citation publication-type="journal"><string-name><surname>Gomaa</surname>, <given-names>Wael</given-names></string-name> and <string-name><given-names>Aly</given-names> <surname>Fahmy</surname></string-name> (<year>2013</year>). <article-title>&#8220;A Survey of Text Similarity Approaches&#8221;</article-title>. In: <source>International Journal of Computer Applications</source> (<volume>68</volume>), pp. <fpage>13</fpage>&#8211;<lpage>18</lpage>. doi: <pub-id pub-id-type="doi">10.5120/11638-7118</pub-id>.</mixed-citation></ref>
<ref id="B30"><label>30</label><mixed-citation publication-type="journal"><string-name><surname>Grandini</surname>, <given-names>Margherita</given-names></string-name>, <string-name><given-names>Enrico</given-names> <surname>Bagli</surname></string-name>, and <string-name><given-names>Giorgio</given-names> <surname>Visani</surname></string-name> (<year>2020</year>). <article-title>&#8220;Metrics for Multi-Class Classification: an Overview&#8221;</article-title>. In: <source>arXiv preprint</source>. DOI: <pub-id pub-id-type="doi">10.48550/arxiv.2008.05756</pub-id>.</mixed-citation></ref>
<ref id="B31"><label>31</label><mixed-citation publication-type="journal"><string-name><surname>Gururangan</surname>, <given-names>Suchin</given-names></string-name>, <string-name><given-names>Ana</given-names> <surname>Marasovi&#263;</surname></string-name>, <string-name><given-names>Swabha</given-names> <surname>Swayamdipta</surname></string-name>, <string-name><given-names>Kyle</given-names> <surname>Lo</surname></string-name>, <string-name><given-names>Iz</given-names> <surname>Beltagy</surname></string-name>, <string-name><given-names>Doug</given-names> <surname>Downey</surname></string-name>, and <string-name><given-names>Noah A.</given-names> <surname>Smith</surname></string-name> (<year>2020</year>). <article-title>&#8220;Don&#8217;t Stop Pretraining: Adapt Language Models to Domains and Tasks&#8221;</article-title>. In: <source>arXiv preprint</source>. DOI: <pub-id pub-id-type="doi">10.48550/arXiv.2004.10964</pub-id>.</mixed-citation></ref>
<ref id="B32"><label>32</label><mixed-citation publication-type="journal"><string-name><surname>H&#228;ntzschel</surname>, <given-names>G&#252;nter</given-names></string-name> (<year>1991</year>). <source>Bibliographie der deutschsprachigen Lyrikanthologien 1840-1914</source>. <string-name><given-names>K. G.</given-names> <surname>Saur</surname></string-name>.</mixed-citation></ref>
<ref id="B33"><label>33</label><mixed-citation publication-type="book"><string-name><surname>Hiebel</surname>, <given-names>Hans H</given-names></string-name>. (<year>2005</year>). <source>Das Spektrum der modernen Poesie. Teil 1. 1900&#8211;1945</source>. <publisher-name>K&#246;nigshausen &amp; Neumann</publisher-name>.</mixed-citation></ref>
<ref id="B34"><label>34</label><mixed-citation publication-type="book"><string-name><surname>Huch</surname>, <given-names>Margarethe</given-names></string-name>, ed. (<year>1911</year>). <source>Frauenlyrik der Gegenwart</source>. <publisher-name>Eckardt</publisher-name>.</mixed-citation></ref>
<ref id="B35"><label>35</label><mixed-citation publication-type="book"><string-name><surname>Jacobowski</surname>, <given-names>Ludwig</given-names></string-name>, ed. (<year>1899</year>). <source>Neue Lieder der besten neueren Dichter f&#252;r&#8217;s Volk</source>. <publisher-name>Liemann</publisher-name>.</mixed-citation></ref>
<ref id="B36"><label>36</label><mixed-citation publication-type="book"><string-name><surname>Kiesel</surname>, <given-names>Helmuth</given-names></string-name> (<year>2004</year>). <source>Geschichte der literarischen Moderne. Sprache &#8211; &#196;sthetik &#8211; Dichtung im zwanzigsten Jahrhundert</source>. <publisher-name>C.H. Beck</publisher-name>.</mixed-citation></ref>
<ref id="B37"><label>37</label><mixed-citation publication-type="journal"><string-name><surname>Klambauer</surname>, <given-names>G&#252;nter</given-names></string-name>, <string-name><given-names>Thomas</given-names> <surname>Unterthiner</surname></string-name>, <string-name><given-names>Andreas</given-names> <surname>Mayr</surname></string-name>, and <string-name><given-names>Sepp</given-names> <surname>Hochreiter</surname></string-name> (<year>2017</year>). <article-title>&#8220;Self-Normalizing Neural Networks&#8221;</article-title>. In: <source>arXiv preprint</source>. DOI: <pub-id pub-id-type="doi">10.48550/arXiv.1706.02515</pub-id>.</mixed-citation></ref>
<ref id="B38"><label>38</label><mixed-citation publication-type="book"><string-name><surname>Klinger</surname>, <given-names>Cornelia</given-names></string-name> (<year>2002</year>). <chapter-title>&#8220;Modern/Moderne/Modernismus&#8221;</chapter-title>. In: <source>&#196;sthetische Grundbegriffe</source>. Bd. <edition>4</edition>. Ed. by <string-name><given-names>Karlheinz</given-names> <surname>Barck</surname></string-name>, <string-name><given-names>Martin</given-names> <surname>Fontius</surname></string-name>, <string-name><given-names>Dieter</given-names> <surname>Schlenstedt</surname></string-name>, and <string-name><given-names>Friedrich</given-names> <surname>Wolfzettel</surname></string-name>. <publisher-name>J.B. Metzler</publisher-name>, pp. <fpage>121</fpage>&#8211;<lpage>167</lpage>.</mixed-citation></ref>
<ref id="B39"><label>39</label><mixed-citation publication-type="book"><string-name><surname>Kneschke</surname>, <given-names>Emil</given-names></string-name>, ed. (<year>1865</year>). <source>Anthologie deutscher Lyriker seit 1850</source>. <publisher-name>Lorck</publisher-name>.</mixed-citation></ref>
<ref id="B40"><label>40</label><mixed-citation publication-type="webpage"><string-name><surname>Krippendorff</surname>, <given-names>Klaus</given-names></string-name> (<year>2011</year>). <source>Computing Krippendorff&#8217;s Alpha-Reliability</source>. URL: <uri>https://repository.upenn.edu/asc_papers/43</uri> (visited on 04/22/2022).</mixed-citation></ref>
<ref id="B41"><label>41</label><mixed-citation publication-type="book"><string-name><surname>Lamping</surname>, <given-names>Dieter</given-names></string-name> (<year>2000</year>). <source>Das lyrische Gedicht. Definitionen zu Theorie und Geschichte der Gattung</source>. <edition>3rd</edition> ed. <publisher-name>Vandenhoeck &amp; Ruprecht</publisher-name>.</mixed-citation></ref>
<ref id="B42"><label>42</label><mixed-citation publication-type="book"><string-name><surname>Lamping</surname>, <given-names>Dieter</given-names></string-name> (<year>2008</year>). <source>Moderne Lyrik</source>. <publisher-name>Vandenhoeck &amp; Ruprecht</publisher-name>.</mixed-citation></ref>
<ref id="B43"><label>43</label><mixed-citation publication-type="webpage"><string-name><surname>Lamping</surname>, <given-names>Dieter</given-names></string-name> (<year>2012</year>). <source>Klassiker der Moderne. &#220;ber die Kanonisierung moderner Literatur</source>. URL: <uri>https://literaturkritik.de/id/16853</uri> (visited on 04/22/2022).</mixed-citation></ref>
<ref id="B44"><label>44</label><mixed-citation publication-type="webpage"><string-name><surname>Marelli</surname>, <given-names>Marco</given-names></string-name>, <string-name><given-names>Stefano</given-names> <surname>Menini</surname></string-name>, <string-name><given-names>Marco</given-names> <surname>Baroni</surname></string-name>, <string-name><given-names>Luisa</given-names> <surname>Bentivogli</surname></string-name>, <string-name><given-names>Raffaella</given-names> <surname>Bernardi</surname></string-name>, and <string-name><given-names>Roberto</given-names> <surname>Zamparelli</surname></string-name> (<year>2014</year>). <chapter-title>&#8220;A SICK cure for the evaluation of compositional distributional semantic models&#8221;</chapter-title>. In: <source>Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC&#8217;14</source>). <publisher-name>European Language Resources Association (ELRA)</publisher-name>, pp. <fpage>216</fpage>&#8211;<lpage>223</lpage>. URL: <uri>http://www.lrec-conf.org/proceedings/lrec2014/pdf/363_Paper.pdf</uri> (visited on 04/22/2022).</mixed-citation></ref>
<ref id="B45"><label>45</label><mixed-citation publication-type="journal"><string-name><surname>Mathet</surname>, <given-names>Yann</given-names></string-name>, <string-name><given-names>Antoine</given-names> <surname>Widl&#246;cher</surname></string-name>, and <string-name><given-names>Jean-Philippe</given-names> <surname>M&#233;tivier</surname></string-name> (<year>2015</year>). <article-title>&#8220;The Unified and Holistic Method Gamma (&#947;) for Inter-Annotator Agreement Measure and Alignment&#8221;</article-title>. In: <source>Computational Linguistics</source> <volume>3</volume> (<issue>41</issue>), pp. <fpage>437</fpage>&#8211;<lpage>479</lpage>. DOI: <pub-id pub-id-type="doi">10.1162/COLI_a_00227</pub-id>.</mixed-citation></ref>
<ref id="B46"><label>46</label><mixed-citation publication-type="journal"><string-name><surname>McInnes</surname>, <given-names>Lelland</given-names></string-name>, <string-name><given-names>John</given-names> <surname>Healy</surname></string-name>, <string-name><given-names>Nathaniel</given-names> <surname>Saul</surname></string-name>, and <string-name><given-names>Lukas</given-names> <surname>Gro&#223;berger</surname></string-name> (<year>2018</year>). <article-title>&#8220;UMAP: Uniform Manifold Approximation and Projection&#8221;</article-title>. In: <source>The Journal of Open Source Software</source> <volume>29</volume> (<issue>3</issue>). DOI: <pub-id pub-id-type="doi">10.21105/joss.00861</pub-id>.</mixed-citation></ref>
<ref id="B47"><label>47</label><mixed-citation publication-type="book"><string-name><surname>Moltke</surname>, <given-names>Max</given-names></string-name>, ed. (<year>1882</year>). <source>Neuer deutscher Parna&#223;. Silberblicke aus der Lyrik unserer Tage</source>. <publisher-name>R&#252;hle</publisher-name>.</mixed-citation></ref>
<ref id="B48"><label>48</label><mixed-citation publication-type="book"><string-name><surname>N&#246;th</surname>, <given-names>Winfried</given-names></string-name> (<year>2008</year>). <chapter-title>&#8220;Stil als Zeichen&#8221;</chapter-title>. In: <source>Rhetoric and Stylistics. Handbooks of Linguistics and Communication Science</source>. Ed. by <string-name><given-names>Ulla</given-names> <surname>Fix</surname></string-name>, <string-name><given-names>Andreas</given-names> <surname>Gardt</surname></string-name>, and <string-name><given-names>Joachim</given-names> <surname>Knape</surname></string-name>. <publisher-name>De Gruyter</publisher-name>, pp. <fpage>1178</fpage>&#8211;<lpage>1196</lpage>.</mixed-citation></ref>
<ref id="B49"><label>49</label><mixed-citation publication-type="book"><string-name><surname>Polko</surname>, <given-names>Elise</given-names></string-name>, ed. (<year>1860</year>). <source>Dichtergr&#252;&#223;e. Neuere deutsche Lyrik</source>. <publisher-name>Amelang</publisher-name>.</mixed-citation></ref>
<ref id="B50"><label>50</label><mixed-citation publication-type="journal"><string-name><surname>Prakoso</surname>, <given-names>Dimas Wibisono</given-names></string-name>, <string-name><given-names>Asad</given-names> <surname>Abdi</surname></string-name>, and <string-name><given-names>Chintan</given-names> <surname>Amrit</surname></string-name> (<month>Mar.</month> <year>2021</year>). <article-title>&#8220;Short text similarity measurement methods: a review&#8221;</article-title>. In: <source>Soft Computing</source> <volume>6</volume> (<issue>25</issue>), pp. <fpage>4699</fpage>&#8211;<lpage>4723</lpage>. DOI: <pub-id pub-id-type="doi">10.1007/s00500-020-05479-2</pub-id>.</mixed-citation></ref>
<ref id="B51"><label>51</label><mixed-citation publication-type="book"><string-name><surname>Prutz</surname>, <given-names>Robert</given-names></string-name>, ed. (<year>1859</year>). <source>Deutsche Dichter der Gegenwart. Ein lyrisches Album</source>. <publisher-name>Kober &amp; Markgraf</publisher-name>.</mixed-citation></ref>
<ref id="B52"><label>52</label><mixed-citation publication-type="book"><string-name><surname>Reimers</surname>, <given-names>Nils</given-names></string-name> and <string-name><given-names>Iryna</given-names> <surname>Gurevych</surname></string-name> (<year>2019</year>). <chapter-title>&#8220;Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks&#8221;</chapter-title>. In: <source>Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source>. <publisher-name>Association for Computational Linguistics</publisher-name>, pp. <fpage>3980</fpage>&#8211;<lpage>3990</lpage>. DOI: <pub-id pub-id-type="doi">10.18653/v1/D19-1410</pub-id>.</mixed-citation></ref>
<ref id="B53"><label>53</label><mixed-citation publication-type="book"><string-name><surname>Reimers</surname>, <given-names>Nils</given-names></string-name> and <string-name><given-names>Iryna</given-names> <surname>Gurevych</surname></string-name> (<year>2020</year>). <chapter-title>&#8220;Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation&#8221;</chapter-title>. In: <source>Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>. <publisher-name>Association for Computational Linguistics</publisher-name>, pp. <fpage>4512</fpage>&#8211;<lpage>4525</lpage>. DOI: <pub-id pub-id-type="doi">10.18653/v1/2020.emnlp-main.365</pub-id>.</mixed-citation></ref>
<ref id="B54"><label>54</label><mixed-citation publication-type="book"><string-name><surname>Renner</surname>, <given-names>August</given-names></string-name>, ed. (<year>1899</year>). <source>Das lyrische Wien. Eine moderne Lese</source>. <publisher-name>Georg Szelinski</publisher-name>.</mixed-citation></ref>
<ref id="B55"><label>55</label><mixed-citation publication-type="book"><string-name><surname>Salton</surname>, <given-names>Gerard</given-names></string-name> and <string-name><given-names>Michael J.</given-names> <surname>McGill</surname></string-name> (<year>1983</year>). <source>Introduction to Modern Information Retrieval</source>. <publisher-name>McGraw-Hill</publisher-name>.</mixed-citation></ref>
<ref id="B56"><label>56</label><mixed-citation publication-type="book"><string-name><surname>Sandig</surname>, <given-names>Barbara</given-names></string-name> (<year>2006</year>). <source>Textstilistik des Deutschen</source>. <publisher-name>De Gruyter</publisher-name>.</mixed-citation></ref>
<ref id="B57"><label>57</label><mixed-citation publication-type="book"><string-name><surname>Selbmann</surname>, <given-names>Rolf</given-names></string-name> (<year>1999</year>). <source>Die simulierte Wirklichkeit. Zur Lyrik des Realismus</source>. <publisher-name>Aisthesis</publisher-name>.</mixed-citation></ref>
<ref id="B58"><label>58</label><mixed-citation publication-type="book"><string-name><surname>Selbmann</surname>, <given-names>Rolf</given-names></string-name> (<year>2007</year>). <chapter-title>&#8220;Die Lyrik des Realismus&#8221;</chapter-title>. In: <source>Realismus. Epoche &#8211; Autoren &#8211; Werke</source>. Ed. by <string-name><given-names>Christian</given-names> <surname>Begemann</surname></string-name>. <publisher-name>Wissenschaftliche Buchgesellschaft</publisher-name>, pp. <fpage>189</fpage>&#8211;<lpage>206</lpage>.</mixed-citation></ref>
<ref id="B59"><label>59</label><mixed-citation publication-type="journal"><string-name><surname>Shaver</surname>, <given-names>Phillip</given-names></string-name>, <string-name><given-names>Judith</given-names> <surname>Schwartz</surname></string-name>, <string-name><given-names>Donald</given-names> <surname>Kirson</surname></string-name>, and <string-name><given-names>Cary</given-names> <surname>O&#8217;Connor</surname></string-name> (<year>1987</year>). <article-title>&#8220;Emotion Knowledge. Further Exploration of a Prototype Approach&#8221;</article-title>. In: <source>Journal of Personality and Social Psychology</source> <volume>6</volume> (<issue>52</issue>), pp. <fpage>1061</fpage>&#8211;<lpage>1086</lpage>. DOI: <pub-id pub-id-type="doi">10.1037/0022-3514.52.6.1061</pub-id>.</mixed-citation></ref>
<ref id="B60"><label>60</label><mixed-citation publication-type="book"><string-name><surname>Sprengel</surname>, <given-names>Peter</given-names></string-name> (<year>1998</year>). <source>Geschichte der deutschsprachigen Literatur 1870-1900. Von der Reichsgr&#252;ndung bis zur Jahrhundertwende</source>. <publisher-name>Geschichte der deutschen Literatur von den Anf&#228;ngen bis zur Gegenwart 9.1. C.H. Beck</publisher-name>.</mixed-citation></ref>
<ref id="B61"><label>61</label><mixed-citation publication-type="book"><string-name><surname>Stockinger</surname>, <given-names>Claudia</given-names></string-name> (<year>2010</year>). <source>Das 19. Jahrhundert. Zeitalter des Realismus</source>. <publisher-name>Akademie Verlag</publisher-name>.</mixed-citation></ref>
<ref id="B62"><label>62</label><mixed-citation publication-type="journal"><string-name><surname>Szubert</surname>, <given-names>Benjamin</given-names></string-name>, <string-name><given-names>Jennifer E.</given-names> <surname>Cole</surname></string-name>, <string-name><given-names>Claudia</given-names> <surname>Monaco</surname></string-name>, and <string-name><given-names>Ignat</given-names> <surname>Drozdov</surname></string-name> (<year>2019</year>). <article-title>&#8220;Structurepreserving visualisation of high dimensional single-cell datasets&#8221;</article-title>. In: <source>Scientific Reports</source> <volume>1</volume> (<issue>9</issue>). DOI: <pub-id pub-id-type="doi">10.1038/s41598-019-45301-0</pub-id>.</mixed-citation></ref>
<ref id="B63"><label>63</label><mixed-citation publication-type="book"><string-name><surname>Tille</surname>, <given-names>Alexander</given-names></string-name>, ed. (<year>1896</year>). <source>Deutsche Lyrik von Heute und Morgen</source>. <publisher-name>Neumann</publisher-name>.</mixed-citation></ref>
<ref id="B64"><label>64</label><mixed-citation publication-type="journal"><string-name><surname>Underwood</surname>, <given-names>Ted</given-names></string-name> and <string-name><given-names>Richard Jean</given-names> <surname>So</surname></string-name> (<year>2021</year>). <article-title>&#8220;Can We Map Culture?&#8221;</article-title> In: <source>Journal of Cultural Analytics</source> <volume>3</volume> (<issue>6</issue>), pp. <fpage>32</fpage>&#8211;<lpage>51</lpage>. DOI: <pub-id pub-id-type="doi">10.22148/001c.24911</pub-id>.</mixed-citation></ref>
<ref id="B65"><label>65</label><mixed-citation publication-type="journal"><string-name><surname>Vietta</surname>, <given-names>Silvio</given-names></string-name> (<year>1992</year>). <source>Die literarische Moderne. Eine problemgeschichtliche Darstellung der deutschsprachigen Literatur von H&#246;lderlin bis Thomas Bernhard</source>. <string-name><given-names>J.B.</given-names> <surname>Metzler</surname></string-name>.</mixed-citation></ref>
<ref id="B66"><label>66</label><mixed-citation publication-type="journal"><string-name><surname>Wieland</surname>, <given-names>Klaus</given-names></string-name> (<year>2019</year>). <article-title>&#8220;Die deutschsprachige Lyrik der Fr&#252;hen Moderne (1890-1930)&#8221;</article-title>. In: <source>Recherches Germaniques</source> <volume>14</volume>, pp. <fpage>5</fpage>&#8211;<lpage>27</lpage>. DOI: <pub-id pub-id-type="doi">10.4000/rg.976</pub-id>.</mixed-citation></ref>
<ref id="B67"><label>67</label><mixed-citation publication-type="book"><string-name><surname>Willatzen</surname>, <given-names>Peter Johann</given-names></string-name>, ed. (<year>1875</year>). <source>Bl&#252;thenzweige deutscher Lyrik nach Goethe. Eine Anthologie</source>. <publisher-name>K&#252;htmann</publisher-name>.</mixed-citation></ref>
<ref id="B68"><label>68</label><mixed-citation publication-type="book"><string-name><surname>Winko</surname>, <given-names>Simone</given-names></string-name> (<year>2003</year>). <source>Kodierte Gef&#252;hle. Zu einer Poetik der Emotionen in lyrischen und poetologischen Texten um 1900</source>. <publisher-name>Erich Schmidt</publisher-name>.</mixed-citation></ref>
<ref id="B69"><label>69</label><mixed-citation publication-type="journal"><string-name><surname>Zelle</surname>, <given-names>Carsten</given-names></string-name> (<year>2005</year>). <article-title>&#8220;Komparatistik und &#8216;comparatio&#8217; &#8211; der Vergleich in der Vergleichenden Literaturwissenschaft: Skizze einer Bestandsaufnahme&#8221;</article-title>. In: <source>Komparatistik. Jahrbuch der Deutschen Gesellschaft fur Allgemeine und Vergleichende Literaturwissenschaft</source>, pp. <fpage>13</fpage>&#8211;<lpage>33</lpage>.</mixed-citation></ref>
</ref-list>
</back>
</article>
