Skip to main content
Article

Detecting Rhyme and Meter in Hungarian Poetry: From Algorithms to Web Tools and Research


Abstract

This paper presents three algorithms designed to detect rhyme, as well as Hungarian quantitative and qualitative meters, developed for the ELTE Poetry Corpus, which includes the complete works of 53 canonical Hungarian authors. Besides presenting the principles and main steps of the algorithms, it provides data on the frequency and regularity of meter types in Hungarian poetry and an evaluation of the meter-detection algorithms. Based on the algorithm detecting rhyme patterns, a rhyming dictionary of Hungarian poetry was generated and an online query tool was developed for accessing the dictionary. Building on the algorithms for rhyme and meter detection, another online tool, the Poem Form Searcher for Hungarian Canonical Poetry, was created. To demonstrate the effectiveness of the rhyming dictionary and the Poem Form Searcher in research, the paper presents two case studies: one on inflectional rhymes and another on the attraction between trochaic and qualitative meters in Hungarian poetry.

Keywords: Hungarian poetry, automatic meter detection, automatic rhyme detection, rhyming dictionary, ELTE Poetry Corpus

How to Cite:

Horváth, P., (2026) “Detecting Rhyme and Meter in Hungarian Poetry: From Algorithms to Web Tools and Research”, Journal of Computational Literary Studies 5(1). doi: https://doi.org/10.48694/jcls.4315

92 Views

33 Downloads

Published on
2026-05-14

Peer Reviewed

1. Introduction

Any large-scale quantitative analysis of poetic features related to sound devices is based on a workflow consisting of numerous stages. These stages include the collection of poems, the development of algorithms annotating sound devices, the generation of secondary databases containing the occurrences or the frequencies of these sound devices, and the creation of web applications allowing researchers to retrieve quantitative data without programming. The general goal of this paper is to present such a workflow from the second stage, from algorithm development, until the last stage, drawing quantitative conclusions.

The paper gives a detailed overview of the algorithms and web applications developed for analyzing Hungarian rhyming and meter as part of the ELTE Poetry Corpus project. The ELTE Poetry Corpus is a database containing all the poems of 53 Hungarian canonical authors from the 16th century until the first half of the 20th century (P. Horváth et al. 2022). The corpus contains 14,358 poems and 3.6 million tokens. Currently, this is the largest Hungarian poetry corpus with annotations of grammatical features. Besides the grammatical features, the corpus also contains annotations of poetic sound devices. The paper presents the main principles and steps of the rule-based algorithms detecting rhyme and meter in Hungarian poetry. While the older algorithm analyzing rhyming in the ELTE Poetry Corpus uses only one rule set, the newly implemented algorithm uses eight rule sets, which provides better quality results due to its greater flexibility. Since there are two meter systems in Hungarian poetry, quantitative and qualitative, two algorithms needed to be implemented. The algorithm detecting Hungarian quantitative meter is currently the only one that has a detailed description and whose implementation is downloadable. To the author’s knowledge, the algorithm detecting qualitative meter is the first attempt to automatically recognize this meter type in Hungarian poetry. In Hungarian poetry, these two meter systems can be used at the same time. The paper presents how this situation can be addressed when detecting meter and building web applications.

Automatic annotations of rhyme patterns were used to generate the Rhyming Dictionary of Hungarian Poetry, which is the first rhyming dictionary for canonical Hungarian poetry. An online and freely accessible query interface was also developed for the rhyming dictionary. Based on the automatic annotation of rhyme patterns and meter, another web application, the Poem Form Searcher for Hungarian Canonical Poetry, was also created. This tool enables users to query the poems of the ELTE Poetry Corpus on the basis of formal features.

Section 2 briefly presents some of the most important related works. Section 3 describes the main principles of the algorithm for detecting rhyme. Section 4 describes the principles of two algorithms designed to detect Hungarian quantitative and qualitative meters. This section also presents an evaluation of the algorithms. Since no gold-standard poetry corpus with manually annotated meter is available, the evaluation adopts an alternative approach, using the example poems from a book on Hungarian versification as a quasi-gold-standard corpus. Section 5 presents the formats, the data types, and the online query interface of the automatically generated Rhyming Dictionary of Hungarian Canonical Poetry. This section also includes a case study on the use of inflectional rhyme in Hungarian canonical poetry. Section 6 gives an overview of the Poem Form Searcher for Hungarian Canonical Poetry, and it presents a case study on the attraction between trochaic and qualitative meters. Finally, section 7 provides a summary of the key achievements discussed in this paper.

2. Related Works

There are several corpora, software, and research projects dealing with the automatic analysis of sound devices in languages other than Hungarian. An important source of inspiration for the design of the ELTE Poetry Corpus was the Czech Poetry Corpus, which contains 80,000 poems with automatically generated annotations of grammatical features, rhyming, and meter (Plecháč and Kolár 2015). The Corpus de Sonetos del Siglo de Oro (Corpus of Spanish Golden-Age Sonnets), which includes the automatically generated annotations of rhythm, was also taken into consideration when the ELTE Poetry Corpus was built (Navarro-Colorado 2015; Navarro-Colorado et al. 2016).

Over the past 20 years, many rule-based programs have been developed to automatically detect the metrical features of English poetry (e.g. Agirrezabal et al. 2016b; Hartman 2005; Plamondon 2006). In addition to the programs developed for English, there are a growing number of programs that detect the sound devices in poetic traditions of other European languages (e.g. Bobenhausen 2011; de la Rosa et al. 2020; Ibrahim and Plecháč 2011; Navarro-Colorado 2018). De Sisto et al. (2024) provides a comprehensive overview of programs that detect the meter of various Indo-European languages. In recent years, besides the rule-based approach, a data-driven approach to the analysis of metrical characteristics has been increasingly popular (e.g. Agirrezabal et al. 2016a, 2017; de la Rosa et al. 2023; Haider 2021; Klesnilová et al. 2024; Tanasescu et al. 2016). These data-driven approaches based on machine learning usually achieve higher accuracy in detecting stress patterns than their rule-based counterparts. In Hungarian poetry, syllable length and stress follow some general rules that rule-based methods can effectively capture. Abstracting meter from syllable properties presents a different challenge. Due to the lack of training data, this research developed rule-based algorithms to identify Hungarian quantitative and qualitative meters. Nevertheless, creating training data and testing machine learning methods in future research could yield valuable insights.

There are some early works based on the automatic detection of sound devices in Hungarian poetry. In his paper, Voigt (1972) presented a program that analyzed syllable length in three poems and calculated the mean values of the lines and syllable positions. This was the first attempt to detect sound devices automatically in Hungarian poetry. Another early work is a book by Jékel and Papp (1974), which contains automatically generated phoneme statistics of all the poems written by Endre Ady, a Hungarian poet from the early 20th century. A few years later, Jékel and Szuromi (1980) published another book, which contains partly automatically generated values of different types of prominence for each syllable in 300 poems by Sándor Petőfi, a Hungarian Romantic poet from the mid-19th century. Neither the method of Jékel and Szuromi nor Voigt’s method was able to abstract meter from the automatically recognized rhythm patterns. To the author’s knowledge, besides the program presented here, Lesi (2006, 2008) developed the only program that could detect quantitative meter. Unfortunately, the program is not accessible, and from the two short papers describing the project, only a few hints and a screenshot of the program’s output indicate that the program was able to recognize meter. In addition to meter, Lesi’s program could also detect rhyme patterns and alliterations.

In recent years, there has been some research in which specific aspects of Hungarian poetic sound devices were analyzed automatically. Labádi (2018) analyzed various phonetic features in the poems of the 19th-century poet Dániel Berzsenyi. Maróthy et al. (2021) and Seláf and Plecháč (2023) investigated rhyming in 16th-century Hungarian historical songs. In their research, they detect rhyme patterns automatically. When building the ELTE Poetry Corpus, a program detecting alliterations, rhyme patterns, and rhythm (short and long syllables) was also developed (P. Horváth 2020; P. Horváth et al. 2022). The new algorithm analyzing rhyming, presented below, is an improved version of the earlier rhyme analyzer built in the previous version of the annotator tool.

3. Detecting Rhyme in Hungarian Poetry

The original annotation algorithm of the ELTE Poetry Corpus used only one rule set to detect the rhyme pattern of stanzas. However, one rule set could not reflect the different approaches to rhyming applied by different authors. Some poets use rhyming on the basis of strong regulations, while others use it more freely. To handle this flexibility of rhyming, a new algorithm was implemented, which uses eight rule sets. The eight rule sets reflecting different regularity levels can grasp the different ideas of rhyming in Hungarian poetry.

The algorithm aims to give a consistent analysis of rhyme patterns in the case of each poem. Consistent analysis means that the poem’s stanzas get the same rhyme patterns. However, if no consistent analysis is possible, the poem receives an inconsistent one. If the analysis based on the first rule set is not consistent, the program analyzes the poem using the second, less strict rule set. If the program fails to achieve a consistent analysis using the second rule set, it proceeds to progressively weaker rule sets. Once a rule set produces a consistent analysis, the algorithm stops, and the poem is annotated based on that result. Table 1 shows the eight rule sets in the same order as they are applied in the algorithm.

Table 1: Rule sets of the algorithm detecting the rhyme pattern of stanzas.

Rule set Second to last vowels are the same Last vowels are the same Same length of the second to last syllables Same in closing consonant
1 yes yes yes yes
2 no yes yes yes
3 yes yes yes no
4 yes yes no yes
5 no yes yes no
6 yes yes no no
7 no yes no yes
8 no yes no no

As Table 1 illustrates, under the strictest rule set (first row), two lines rhyme only if their last and penultimate vowels are identical, the penultimate syllables have the same length (both long or both short), and both lines either end with a consonant or lack one. In contrast, the least strict rule set, which consists of only one rule, considers two lines to rhyme as long as their last vowels match, regardless of vowel length.1 If the program cannot provide a consistent analysis with any of the eight rule sets, it annotates the poem with the result given by the second rule set in Table 1. Similarly, the second rule set is used when the number of lines in the stanzas is different, or the poem has only one stanza, since in these cases it is inherently impossible to provide a consistent analysis.

The goal of applying this algorithm was to obtain as many poems with the same rhyme patterns of stanzas as possible. Table 2 shows the number of consistently annotated poems out of 13,362 poems in the ELTE Poetry Corpus, for different settings.2 In the first setting, only the second rule set has been applied. In the second setting, the second, the fifth, and the eighth rule sets have been used, while in the third setting, all eight rule sets have been applied. As the table shows, the more rule sets are used, the more poems are annotated consistently.

Table 2: Evaluation of the rule sets for rhyming (total = 13,362).

Rule sets applied Number of poems with consistent rhyme pattern
Rule set 2 5054
Rule set 2, 5, and 8 5357
All rule sets 5983

4. Detecting Meter in Hungarian Poetry

In Hungarian poetry, there are two meter systems. The first one is a quantitative meter system based on the opposition between short and long syllables. Although this meter system is rooted in Greek and Latin poetry, it was largely adopted from Western European poetic traditions, where the original opposition between long and short syllables is replaced by the opposition between stressed and unstressed syllables, reflecting the phonetic nature of those languages. In Hungarian, however, the stressed syllables are always the first syllables of words. This phonetic characteristic makes Hungarian unsuitable for using this meter system as the regular alternation between stressed and unstressed syllables. On the other hand, Hungarian is well-suited for employing the classical meter system in its original form, as the regular alternation between long and short syllables. Thus, when Hungarian poetry adopted the quantitative meter system from Western European poetic traditions, it retained the classical approach, distinguishing between long and short syllables instead of stressed and unstressed syllables.

Although in Hungarian, it is not possible to imitate classical meters as the alternation between stressed and unstressed syllables, it is possible to use the first syllables of the words, which are always stressed, to form bars. This is the second meter system of Hungarian poetry, which is based on the opposition between the first, stressed syllables of the words and the other unstressed syllables. In this meter system, the bars are formed by consecutive syllables in which the first one is always the first syllable of a word. The length of a bar can vary, but rarely exceeds six syllables. Within a bar, there can be additional stressed syllables beyond the first syllable. In other words, a bar can consist of more than one word. The only criterion is that the first syllable of a bar needs to be the first syllable of a word. This type of meter will be referred to here as qualitative meter (see I. Horváth 2009). The example below shows the first stanza of a poem by János Arany, in which each line contains two six-syllable bars. The boundaries between the bars are marked with vertical lines.

Mint egy alélt vándor, | midőn fele útján,

Csüggeteg szemmel néz | hátra, majd előre:

Mielőtt e rögös | pályát tovább futnám:

Hadd nézzek a multra, | nézzek a jövőre.

(János Arany: Mint egy alélt vándor…)

As the example demonstrates, in Hungarian qualitative meter, the metrical structure is determined only by those stressed syllables that recur in the same positions from line to line, marking the beginnings of the bars. Other stressed syllables within the lines have no metrical function and do not contribute to the formation of bars. This highlights an important difference between the Hungarian qualitative meter system and most European accentual systems, in which every stressed syllable contributes to the formation of metrical feet.3

A distinctive characteristic of Hungarian poetry is that a poem can have quantitative and qualitative meters at the same time. For instance, a poem can combine an iambic quantitative meter with a qualitative meter based on two four-syllable bars in each line. When quantitative and qualitative meters occur together, it is often called simultaneous meter (e.g. Szepes and Szerdahelyi 1981, 510–513). Since in Hungarian, there are two meter systems following different rules, two different algorithms had to be implemented for the automatic detection of meter.

4.1 Main Principles of Detecting Hungarian Quantitative Meter

The input for the program detecting Hungarian quantitative meter consists of the TEI XML files from the ELTE Poetry Corpus, which contain automatically annotated syllable lengths for each line, represented as a sequence of 0s and 1s (e.g., 01010101). Syllable lengths in Hungarian poetry can be determined by applying a set of general rules, which are the following (see Szepes and Szerdahelyi 1981).

  1. A syllable is short if it contains a short vowel and is not followed directly by a consonant, or if it is followed by only a short consonant.

  2. A syllable is long if it contains a long vowel, or if a short vowel is followed by a long consonant or two or more consonants.

  3. Word-initial consonant clusters (e.g. in krákog, strigula) do not lengthen the preceding syllable ending in a short vowel.

P. Horváth et al. (2022) presented an evaluation of syllable length detection. The percentage of incorrectly annotated lines in the sample was 2.33%. However, the error rate is slightly influenced by the time of the poems. For poems written between 1505 and 1701, it was 3.5%, while for those from 1772 to 1854 and 1855 to 1909, it was 1.5% and 2.0%, respectively.

The algorithm can recognize four types of quantitative meter: dactylic, anapestic, iambic, and trochaic. In addition to the label of the meter type, the poems also get a regularity score between 0 and 1. The regularity score reflects how consistently the rhythm realizes the meter in the poem. The higher this score, the more regularly the poem’s meter is realized. By using regularity scores, it is possible to grasp meter as a scalar phenomenon instead of as a binary category (yes or no). Since poems can realize quantitative meter types in varying degrees, it seemed reasonable to design the algorithm to indicate this degree of regularity in the output. The regularity score allows for the use of a threshold to exclude poems with low regularity from the analysis.

The program tests the poems for all four meter types and gives the label of the meter category for which it has the highest regularity score. To achieve this goal, the algorithm first divides the sequences of 0s and 1s, which represent short and long syllables, into metrical feet. This segmentation is based on different principles for dactylic and anapestic and for iambic and trochaic meters. While in the case of dactylic and anapestic meters, the segmentation is based on the number of moras, for iambic and trochaic meters, the segmentation is based on the number of syllables. When testing dactylic and anapestic meters, the algorithm segments lines into four-mora feet, allowing the last foot to be incomplete with fewer than four moras. Mora is the abstract time unit in Greek, Latin, or Hungarian quantitative meter systems based on syllable length. Short syllables represent one mora, while long syllables represent two moras. In the case of dactyls, anapests, and spondees, the sum of the syllables’ moras in the feet is four. When testing iambic and trochaic meters, the program segments lines into two-syllable feet, permitting the last foot to be incomplete with only one syllable.

The regularity score of the poems is determined based on a scoring system. Each complete foot gets a score from 0 to 4, depending on the tested meter type. The base feet of the tested meter type receive 4 points, the primary substitute feet receive 2 points, and the secondary substitute feet receive 1 point. The opposite feet of the tested meter get 0 points. For instance, if the tested meter is iambic, iamb receives 4 points as the base foot of the meter type, spondee gets 2 points as the primary substitute foot, pyrrhus gets 1 point as the secondary substitute foot, and trochee does not get any points because it is the opposite foot of the tested meter. Table 3 shows the scoring system for the four meter types.

Table 3: Scoring system of quantitative meter.

Meter Base foot (4 pts) Substitute foot 1 (2 pts) Substitute foot 2 (1 pts) Opposite foot (0 pts)
dactylic dactyl spondee proceleusmatic anapest
anapestic anapest spondee proceleusmatic dactyl
iambic iamb spondee pyrrhus trochee
trochaic trochee spondee pyrrhus iamb

In calculating the regularity score of the poem, the algorithm first calculates the regularity score of each line for the tested meter type. The regularity score of the line is the mean score of the complete feet in the line. Based on the scoring system presented above, the formula for calculating the regularity score of a line is as follows (where b is the number of base feet, p is the number of primary substitute feet, s is the number of secondary substitute feet, and n is the total number of complete feet in the line):

line score=4b + 2p + sn

However, for a line to receive a regularity score greater than 0 for a given meter type, it must meet certain minimum conditions. These conditions, based on the norms of Hungarian versification system, are the following.

  1. Minimum conditions for anapestic and dactylic lines: (1) All the complete feet consist of four moras.4 (2) There is no amphibrach among the feet.

  2. Minimum conditions for iambic lines: (1) If the last foot of the line is complete, then it is an iamb or a pyrrhus. (2) If the last foot is incomplete, then the penultimate foot (the last complete foot) is an iamb.

  3. Minimum conditions for trochaic lines: (1) If the last foot of the line is complete, then it is a trochee or a spondee. (2) If the last foot is incomplete, then the penultimate foot (the last complete foot) is a trochee.

If the line does not satisfy the minimum conditions for the tested meter type, the regularity score of the line will be zero. In the second step, the algorithm calculates the regularity score of the whole poem for the tested meter by dividing the sum of the regularity scores of the lines by the number of lines. In the final step, the program compares the four regularity scores obtained for the four meter categories and labels the poem with the meter category having the highest regularity score.

4.2 Main Principles of Detecting Hungarian Qualitative Meter

Since the Hungarian qualitative meter system is based on different principles than the quantitative one, a second algorithm had to be developed. The output of the algorithm is the bar structure of the poem. The algorithm can recognize two main types of qualitative meter. The first one is when all of the lines in the poem have the same bar structure. For instance, the output 4-4 means that each line consists of two four-syllable bars. Naturally, lines can have more than two bars. For example, when the output is 4-3-2, each line of the poem has three bars, the first one contains four syllables, the second one has three syllables, and the third bar has two syllables. The second main type of bar structure detected by the program is when the odd and even lines of the stanzas have different bar structures. This means that two bar structures alternate from line to line. For instance, the output 4-4_4-3-2 indicates that the odd lines of the stanzas contain two four-syllable bars, while the even lines consist of three bars: one with four syllables, one with three, and one with two. There are other, more complex types of Hungarian qualitative meter as well; however, these two main types cover the majority of Hungarian poems with qualitative meter.

A bar can consist of only one syllable. In the program’s analysis, the maximum number of syllables per bar is six. This means that if the poem’s lines have fewer than seven syllables, the lines themselves can form a bar without any inner segmentation. For instance, an output of 6 indicates that each line of the poem has six syllables, and these six-syllable units cannot be further divided into smaller bars.

The main principle behind the algorithm is the use of matrices to define the bar structure. The rows of the matrix represent the lines, and the columns represent the syllable positions of the lines. In the matrix, only two values are used: 1 and 0. The value 1 represents syllables that are the first syllables of words (stressed syllables), while the value 0 represents all other syllables (unstressed syllables). The algorithm determines the first syllables of the bars by calculating the mean values of the columns. If this value is at least 0.75, the syllable position is analyzed as the beginning of a new bar. The 0.75 threshold reflects that there can be lines that do not follow the general bar structure of the poem. In the algorithm, there is a restriction on the second syllable position: this cannot be the beginning of a new bar, regardless of its mean value. The reason for this restriction is that defining single-syllable bars at the beginning of lines would be inconsistent with Hungarian versification. Not to mention that these single-syllable words at the beginning of the lines are usually unstressed function words.

4.3 Frequency and Regularity of Meter Types in Canonical Hungarian Poetry

These two algorithms implemented in a Python program called Hunpoem_meter_analyzer have been used to annotate the quantitative and qualitative meters of the poems in the ELTE Poetry Corpus (see level3 and level4 XML files in the repository of the corpus). Table 4 shows the number of poems with different meters. Only those quantitative meter poems are counted that have a regularity score higher than 0.5. The third column indicates the number of poems that have qualitative and quantitative meters simultaneously. For example, 1370 poems follow qualitative and iambic meters at the same time. The total number of these simultaneous poems is 2554, using the threshold of 0.5. Table 5 shows the same, but with a threshold of 0.3.

Table 4: Number of poems in different meters (total = 14,358; threshold = 0.5).

All With qualitative
Qualitative 3998
Quantitative 8353 2554
Iambic 6086 1370
Trochaic 1917 1153
Dactylic 309 24
Anapestic 41 7

Table 5: Number of poems in different meters (total = 14,358; threshold = 0.3).

All With qualitative
Qualitative 3998
Quantitative 12704 3739
Iambic 8177 1686
Trochaic 3763 1989
Dactylic 643 44
Anapestic 113 18

The regularity scores of the poems’ rhythm allow for the analysis of changes in rhythmic regularity throughout the history of canonical Hungarian poetry. Figure 1 shows the median regularity scores of iambic poems for authors from the 19th century and the first half of the 20th century, arranged chronologically. Only those poets who have at least 100 poems have been included in the analysis. Figure 2 presents the median scores for poems written in trochaic meter. A threshold of 0.3 was applied in both cases. Using black and gray bar colors, the bar charts differentiate the median regularity scores of authors born before and after Endre Ady (1877–1919), considered the first poet of Hungarian classical modernism at the beginning of the 20th century. As the bar charts indicate, the rhythmic regularity of both iambic and trochaic poems declined in the first half of the 20th century.5 This decline appears to align with the rise of classical modernism in Hungarian poetry.

Figure 1: Median regularity scores of iambic poems over time (Threshold = 0.3).

Figure 2: Median regularity scores of trochaic poems over time (Threshold = 0.3).

4.4 Evaluation of the Algorithms Detecting Hungarian Quantitative and Qualitative Meters

Unfortunately, there is no gold-standard corpus of Hungarian poems with manually annotated meter. Therefore, to evaluate the efficiency of the algorithms detecting meter, an alternative method had to be developed. Instead of building a costly gold-standard corpus, a database was created that lists the example poems from the book by Szepes and Szerdahelyi (1981). This is one of the most comprehensive handbooks on Hungarian poetry meters, and the two algorithms developed for detecting Hungarian quantitative and qualitative meters are largely based on its approach. The book provides numerous examples for each meter type. I have recorded the titles of the poems, along with their meter types, for those that are also included in the ELTE Poetry Corpus. In other words, the book’s examples, which illustrate different meter types, serve as a quasi-gold-standard corpus. In this way, 78 poems with quantitative meter and 55 poems with qualitative meter were collected. Only poems with an unambiguous meter, as identified by the authors, were recorded. For example, poems in which different stanzas follow different meters were not included in the list.

Table 6 shows the number of matches and their proportion (accuracy) for the four quantitative meters. Accuracy was calculated at regularity score thresholds of 0.5, 0.4, 0.3, 0.2, and 0.0. The lower the threshold value, the greater the number of poems with a correctly recognized meter. However, it is important to highlight that different meter types respond differently to threshold values. With a threshold of 0.5, all but one of the trochaic poems and 70% of the iambic poems are recognized, whereas only about half of the anapestic and dactylic poems are identified. This suggests that for anapestic and dactylic meters, a lower threshold should be used to capture a higher number of poems. Four of the 78 poems cannot be correctly recognized, even without a threshold (threshold = 0.0). In two of these cases, there is an internal caesura after three and a half feet, which the program is unable to handle.

Table 6: Evaluation of the algorithm detecting Hungarian quantitative meter (total = 78).

Iambic Trochaic Anapestic Dactylic All
abs acc abs acc abs acc abs acc abs acc
all 23 15 11 29 78
0.5 16 0.7 14 0.93 6 0.55 14 0.48 50 0.64
0.4 20 0.87 15 1 8 0.73 22 0.76 65 0.83
0.3 21 0.91 15 1 10 0.91 25 0.86 71 0.91
0.2 21 0.91 15 1 10 0.91 28 0.97 74 0.95
0.1 21 0.91 15 1 10 0.91 28 0.97 74 0.95
0.0 21 0.91 15 1 10 0.91 28 0.97 74 0.95

It is worth noting that, in this case, accuracy based on exact match is equivalent to recall. Precision is 1.0 for iambic, anapestic, and dactylic poems across all threshold values. For the trochaic meter, precision is 1.0 at the 0.5 threshold, but at the 0.4 threshold, it drops to 0.88 and further decreases to 0.79 at lower thresholds. This indicates that, within the sample used, only the trochaic label is applied more broadly than necessary. In other words, all four incorrectly annotated poems are annotated with the trochaic label. However, the low and unbalanced number of suitable example poems in the book prevents obtaining a reliable precision value. Consequently, reporting F-scores would also be misleading.

Table 7 shows the number and proportion of matches for the qualitative meter. The second column indicates the exact matches, which occur when the bar structure given by the program is the same as the one provided in the book. The third column also includes matches where the program provides a more general or more specific bar structure than the book (e.g., 6-6 instead of 4-2-4-2 or 5-3-2 instead of 3-2-3-2). In this latter case, 50 out of 55 poems (91%) have been analyzed in the same or a similar way as by the authors of the book.

Table 7: Evaluation of the algorithm detecting Hungarian qualitative meter (total = 55).

Exact matches Matches
abs 46 50
rel 0.84 0.91

5. The Rhyming Dictionary of Hungarian Poetry

While creating an explanatory dictionary is a time-consuming task, which requires the work of many lexicographers, the creation of a rhyming dictionary can be fully automated if a large poetry corpus is available. A draft of such an automatically generated rhyming dictionary of Hungarian Poetry was outlined by Mártonfi (2008) more than fifteen years ago. However, at that time, there were no Hungarian poetry corpora of the appropriate size and annotations. Thanks to the creation of the ELTE Poetry Corpus, the idea of an automatically generated rhyming dictionary has now become a reality. The automatically generated annotations of the corpus include the rhyme patterns of the stanzas in the traditional notation using the consecutive letters of the Latin alphabet. Besides the texts and the grammatical and phonological annotations of the words, these annotations of rhyme patterns provide the input for the script generating the rhyming dictionary. The output of this script is an XML file containing all the rhyming pairs of the corpus, as well as further data about them. TSV, SQLite, and PDF versions are also generated automatically from the XML version. The rhyming dictionary can be downloaded in all formats from the repository of the project.

5.1 Data Types in the Rhyming Dictionary

The number of rhyming pairs in the rhyming dictionary is 650,561. Besides the word forms forming the rhyming pairs, the database contains three types of data. On the one hand, it includes the grammatical and phonological features of the words in the rhyming pair, such as lemma, part of speech, morphosyntactic features, number of syllables, vowel type, and phonological structure. These properties were retrieved from the annotations of the ELTE Poetry Corpus. The grammatical features were annotated using the e-magyar tool (Indig et al. 2019; Váradi et al. 2018), while the phonological features of the words were annotated with a program specifically developed for the ELTE Poetry Corpus (P. Horváth 2020).

On the other hand, the rhyming dictionary also provides information on the position of the rhyming words. In the case of each rhyming pair, the dictionary indicates the number of lines between the rhyming words (this can be a maximum of four lines), the order of the rhyming words, and the number of those lines between the rhyming words that also rhyme with them. Finally, the dictionary includes the bibliographical data of the poem containing the rhyming pair, as well as a URL of the poem pointing to the text query interface of the ELTE Poetry Corpus.

The dictionary lists the rhyming pairs in alphabetical order. The alphabetic sorting of the rhyming pairs is based on the following principles.

  1. The rhyming pairs are sorted according to lemmas.

  2. In the case of rhyming pairs with the same lemmas but different parts of speech, the rhyming pairs having the same part of speech are listed consecutively.

  3. Rhyming pairs with the same lemmas and the same parts of speech are sorted according to word forms.

  4. From the occurrences realizing a rhyming pair, those items are listed first in which the distance between the members of the rhyming pair is smaller.

5.2 The Rhyming Dictionary’s Online Query Tool

For the rhyming dictionary, an online query tool has been developed.6 The backend of the tool was programmed in the FastAPI framework of the Python programming language, and it queries the SQLite relational database of the dictionary. The tool allows searches by lemma and word form, with results filterable by part of speech, author, and the position of the rhyming word within the rhyming pair. In addition to the word forms of the rhyming pairs, the query displays the lemmas, the parts of speech, the distance between the members of the rhyming pairs, and the number of other rhyming words between the members. The results also include bibliographical information about the poem in which the rhyming pair occurs. By clicking the link shown in the output, the poem opens in the text query interface of the ELTE Poetry Corpus.7 The search results can be downloaded in TSV format. By using the Statistics function, the number of lemmas or word forms rhyming with the specified word can be displayed in a tabular format. The table also shows how many authors have at least one poem that contains these lemmas or word forms rhyming with the specified word.

5.3 Changes in the Use of Inflectional Rhymes in Hungarian Poetry

Using the data of the rhyming dictionary, various aspects of rhyming in Hungarian canonical poetry can be explored. To demonstrate the dictionary’s potential for research, a small case study on inflectional rhymes is presented here. According to the norms of Hungarian poetry, inflectional rhymes should be avoided. However, as some observations suggest, this rule has not always been (strictly) followed. The acceptance of inflectional rhymes has varied across different periods. For example, Maróthy et al. (2021) and Seláf and Plecháč (2023) quantitatively demonstrated that inflectional rhymes were commonly used in 16th-century Hungarian historical songs. However, until now, it has been difficult to provide a precise overview of how their use has changed in Hungarian poetry in general. The rhyming dictionary makes it possible to identify and visualize this trend using quantitative data.

In Hungarian, inflections convey the morphosyntactic features of words. The same morphosyntactic features are signified by the same type of inflections. This allows for a quantitative assessment of changes in the use of inflectional rhymes by identifying rhyming pairs in which both words share the same part of speech and morphosyntactic features. In this investigation, those rhyming pairs were queried in which both rhyming words are nouns, adjectives, or verbs sharing the same morphosyntactic features. Cases where the rhyming words appear in their basic, uninflected forms were excluded. This means that the following were not considered: singular nouns in the nominative case, singular adjectives in the nominative case and positive degree, and singular third-person indicative indefinite verbs in the present tense.

By plotting the relative frequencies of the queried inflectional rhyming pairs for each author over time, the bar chart shown in Figure 3 was obtained. Using gray and black bar colors, the relative frequencies of inflectional rhymes are differentiated for authors born before 1770, between 1770 and 1860, and between 1860 and 1910. The chart reveals a decline in the frequency of inflectional rhymes during the 19th century, and the frequency remains low throughout the 20th century. In other words, the chart indicates that inflectional rhymes were widely used in Hungarian poetry before the 19th century, but in the 19th century, they gradually became stigmatized and fell out of favor.

Figure 3: Changes in the relative frequencies of inflectional rhymes over time.

6. Poem Form Searcher for Hungarian Canonical Poetry

Besides the Rhyming Dictionary of Hungarian Canonical Poetry, another online query tool, the Poem Form Searcher for Hungarian Canonical Poetry, was also developed.8 This tool allows for searching poems in the ELTE Poetry Corpus according to various formal properties related to sound devices. As for the rhyming dictionary, the backend was programmed in the FastAPI framework of Python. The tool’s database is not the ELTE Poetry Corpus, but an SQLite relational database generated from the corpus. This database does not include the texts of the poems. It contains only the poems’ formal features, retrieved from the annotations of the corpus.

Users of the tool can specify the rhyme pattern and syllable pattern, along with the scope of these patterns (stanza vs. whole poem). Poems can also be queried by meter, with both quantitative and qualitative meter types available for search. Regarding quantitative meter, poems in iambic, trochaic, dactylic, and anapestic meters can be searched for. It is also possible to set the threshold of regularity, a value between 0 and 1, which indicates the degree to which the meter is realized in the poem. The higher this threshold is set, the more the query is narrowed down to poems that realize the quantitative meter on a higher level of regularity. The default value is 0.5, which gives reasonable results. In the case of qualitative meter, it is possible to specify a particular bar structure by entering the number of syllables per bar. Following the algorithm detecting qualitative meter, two types of qualitative meters can be identified: (1) each line of the poem contains the same number of bars, with the same number of syllables; (2) odd and even lines of the stanzas differ in the number of bars and/or the number of syllables in the bars. The search results can be filtered by author and maximum word count.

The query returns the author, title, and formal properties of all poems in the ELTE Poetry Corpus that match the specified formal features. The results also include links pointing to the poems in the text query interface. Additionally, it is possible to download the search results in TSV format. A Statistics function is also available, displaying the number and proportion of poems that match the search criteria for each author.

6.1 Attraction between Trochaic and Qualitative Meters

To illustrate the usefulness of the Poem Form Searcher, this section presents a brief case study on the interaction between trochaic and qualitative meters. In Hungarian literary studies, there is a prevailing assumption that qualitative and trochaic meters are inherently drawn to each other (e.g. Vargyas 1966, 148–150; J. Horváth 1969, 128–133; Szepes and Szerdahelyi 1981, 267–269). In other words, it is believed that poems with trochaic meter are likely to also feature qualitative meter. This attraction between the two meters is explained by the phonological characteristics of the Hungarian language. However, to date, there has been no quantitative research testing the hypothesis that the two meters attract each other. By retrieving data using the Poem Form Searcher, this assumption can now be quantitatively examined.

To answer this question, mutual information scores were calculated for poems with iambic and qualitative meters and poems with trochaic and qualitative meters, respectively. In linguistics, association measures, such as mutual information, are usually used to extract collocations (Church and Hanks 1989). However, these measures can also be used for textual co-occurrence to measure whether two textual features co-occur in the same texts more often than would be expected by chance (see Evert 2009). Table 8 shows the mutual information scores for the pairings of iambic and qualitative meters as well as trochaic and qualitative meters. If the score is greater than 0, the two meters occur together more often than would be expected by chance, which means that they attract each other. If the value is less than 0, the two meters occur together less often than would be expected by chance. In other words, they repel each other. The MI scores were calculated for the entire corpus as well as for each author individually. In the latter case, the mean value was computed. The table also shows the number of poets with an MI score greater than 0 and less than 0. In the calculations by author, only those authors were included who had written at least one poem that is both iambic and qualitative, or trochaic and qualitative.

Table 8: MI scores of qualitative-iambic and qualitative-trochaic meters.

qualitative + iambic qualitative + trochaic
Whole corpus -0.307 1.111
Mean of the authors’ MI -0.123 0.994
Authors with positive MI 14 41
Authors with negative MI 26 2

As Table 8 indicates, while iambic and qualitative meters occur together less often than expected by chance (the values are below 0), trochaic and qualitative meters occur together more often than expected by chance (the values are above 0). The number of authors with positive and negative MI scores differs considerably between the two cases. In the case of qualitative-trochaic meter, most authors have a positive MI score, while for qualitative-iambic meter, the majority of authors have a negative MI score. This means that the observation of literary scholars that trochaic and qualitative meters attract each other can be quantitatively validated in the context of canonical Hungarian poetry. However, the corpus does not allow for statistical inferences about Hungarian poetry in general. The results also clearly show that in canonical Hungarian poetry, iambic and trochaic meters tend to repel each other.

7. Conclusion

The paper outlined the different stages of a corpus linguistic workflow for the quantitative analysis of sound devices in Hungarian poetry. This workflow begins with the development of algorithms to annotate sound devices, progresses with the automatic generation of databases containing these features, involves the development of web tools, and concludes with research findings derived from the databases and web tools.

The paper described the main principles of the algorithms annotating rhyme and meter in the ELTE Poetry Corpus, which is the largest Hungarian poetry corpus currently. The algorithm annotating rhyme patterns gives better results than the previous ones, thanks to its flexibility achieved by the eight rule sets built in. The algorithm detecting Hungarian quantitative meter recognizes iambic, trochaic, anapestic, and dactylic meters, and it also outputs a regularity score. The regularity scores allow for the analysis of changes in rhythmic regularity over time. Another rule-based algorithm detecting Hungarian qualitative meter was also designed and implemented. Using the Python program detecting quantitative and qualitative meters, the poems in the ELTE Poetry Corpus were annotated with both meter types. The paper also introduced an alternative method for evaluating tools detecting meter, using the example set from a book on Hungarian versification.

Based on the annotations of rhyme and meter, a rhyming dictionary and a poetic form search tool were created. The Rhyming Dictionary of Hungarian Canonical Poetry was generated automatically and contains all the rhyming pairs of the ELTE Poetry Corpus with various types of data. An online query tool was also developed for the database. The Poem Form Searcher for Hungarian Canonical Poetry enables querying the poems of the ELTE Poetry Corpus on the basis of rhyme pattern, syllable pattern, quantitative meter, and qualitative meter. To demonstrate the usefulness of the rhyming dictionary and the Poem Form Searcher in research, two case studies were presented: one on the decrease of inflectional rhymes and another on the attraction between trochaic and qualitative meters in canonical Hungarian poetry. These phenomena have been discussed in the literature on Hungarian poetry, but have not been quantitatively demonstrated before.

8. Data Availability

The ELTE Poetry Corpus, annotated with metrical features, can be found here: https://github.com/ELTE-DH/poetry-corpus. It has been archived and is persistently available at: https://doi.org/10.5281/zenodo.15521399.

The Rhyming Dictionary of Hungarian Canonical Poetry can be found in various formats here: https://github.com/ELTE-DH/rhyming-dictionary. It has been archived and is persistently available at: https://doi.org/10.5281/zenodo.15485000.

9. Software Availability

The program Hunpoem_meter_analyzer detecting Hungarian meter and the evaluation data for the program can be found here: https://github.com/ELTE-DH/hunpoem-meter-analyzer. They have been archived and are persistently available at: https://doi.org/10.5281/zenodo.15485259.

10. Acknowledgements

This research was supported by the National Digital Heritage Laboratory, funded by the National Research, Development and Innovation Office of Hungary. I owe thanks to Balázs Indig for uploading the query tool of the rhyming dictionary and the Poem Form Searcher, along with the necessary modifications, to the university server. I also thank Mihály Nagy for his suggestions regarding the front-end development of the query tools.

11. Author Contributions

Péter Horváth: Conceptualization, Data Curation, Formal analysis, Investigation, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing

Notes

  1. The program takes into account that, unlike the Hungarian vowel pairs i–í, o–ó, ö–ő, u–ú, and ü–ű, the vowels a–á and e–é differ from each other in more than just length. [^]
  2. The number of poems in the corpus at the time of evaluation was lower than the current number. [^]
  3. It should be noted that some authors classify Hungarian qualitative meter as a syllable-counting type and assign only secondary importance to the stress on the first syllables of the bars (e.g. Lotz 1972, 101; I. Horváth 1991, 148). However, the difference between the two approaches has no significance for the algorithm developed in this research. [^]
  4. In the case of pentameter, which is a specific subtype of dactylic meter, the four-mora-per-foot rule cannot be applied, as there is a half-foot in the middle of the line. Therefore, in the algorithm, all possible rhythmic patterns of the pentameter are listed in order to check whether the line matches one of them. This pattern matching is performed before the algorithm attempts to divide the line into four-mora feet. [^]
  5. It should be noted that the low regularity scores for iambic and trochaic meters in the case of Dániel Berzsenyi are due to his use of more complex classical metrical forms, which the program cannot recognize. [^]
  6. See: https://rimszotar.elte-dh.hu/. [^]
  7. See: https://verskorpusz.elte-dh.hu/. [^]
  8. See: https://versformakereso.elte-dh.hu/. [^]

References

Agirrezabal, Manex, Iñaki Alegria, and Mans Hulden (2016a). “Machine Learning for Metrical Analysis of English Poetry”. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. Ed. by Yuji Matsumoto and Rashmi Prasad. The COLING 2016 Organizing Committee, 772–781.

Agirrezabal, Manex, Iñaki Alegria, and Mans Hulden (2017). “A Comparison of Feature-based and Neural Scansion of Poetry”. In: Proceedings of Recent Advances in Natural Language Processing. Ed. by Ruslan Mitkov and Galia Angelova. INCOMA Ltd., 18–23.  http://doi.org/10.26615/978-954-452-049-6_003.

Agirrezabal, Manex, Bertol Arrieta, Aitzol Astigarraga, and Mans Hulden (2016b). “ZeuScansion: A Tool for Scansion of English Poetry”. In: Journal of Language Modelling 4 (1), 3–28.  http://doi.org/10.15398/jlm.v4i1.102.

Bobenhausen, Klemens (2011). “The Metricalizer – Automated Metrical Markup of German Poetry”. In: Current Trends in Metrical Analysis. Ed. by Christoph Küper, Wilfried Kürschner, and Volker Schulz. Peter Lang, 119–132.

Church, Kenneth W. and Patrick Hanks (1989). “Word Association Norms, Mutual Information, and Lexicography”. In: th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 76–83.  http://doi.org/10.3115/981623.981633.

de la Rosa, Javier, Álvaro Pérez, Laura Hernández, Aitor Díaz, Salvador Ros, and Elena González-Blanco (2020). “Rantanplan, Fast and Accurate Syllabification and Scansion of Spanish Poetry”. In: Procesamiento del Lenguaje Natural 65, 83–90.  http://doi.org/10.26342/2020-65-10.

de la Rosa, Javier, Álvaro Pérez, Mirella de Sisto, Laura Hernández, Aitor Díaz, Salvador Ros, and Elena González-Blanco (2023). “Transformers Analyzing Poetry: Multilingual Metrical Pattern Prediction with Transformer-based Language Models”. In: Neural Computing and Applications 35, 18171–18176.  http://doi.org/10.1007/s00521-021-06692-2.

De Sisto, Mirella, Laura Hernández-Lorenzo, Javier de la Rosa, Salvador Ros, and Elena González-Blanco (2024). “Understanding Poetry Using Natural Language Processing Tools: A Survey”. In: Digital Scholarship in the Humanities 39, 500–521.  http://doi.org/10.1093/llc/fqae001.

Evert, Stefan (2009). “Corpora and Collocations”. In: Corpus Linguistics: An International Handbook. Vol. 2. Ed. by Anke Lüdeling and Merja Kytö. Walter de Gruyter, 1212–1248.  http://doi.org/10.1515/9783110213881.2.1212.

Haider, Thomas (2021). “Metrical Tagging in the Wild: Building and Annotating Poetry Corpora with Rhythmic Features”. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Ed. by Paola Merlo, Jörg Tiedemann, and Reut Tsarfaty. Association for Computational Linguistics, 3715–3725.  http://doi.org/10.18653/v1/2021.eacl-main.325.

Hartman, Charles O. (2005). The Scandroid. Version 1.1. [User guide]. https://web.archive.org/web/20260126070721/https://academic.hartman.digital.conncoll.edu/Assets/Programs/Scandroid%20Manual%201-1.pdf (visited on 01/25/2026).

Horváth, Iván (1991). A vers: Három megközelítés. Gondolat Kiadó.

Horváth, Iván (2009). “A Rule of Metrical Uniformity in Old Hungarian Poetry”. In: Towards a Typology of Poetic Forms: From Language to Metrics and beyond. Ed. by Jean-Louis Aroui and Andy Arleo. John Benjamins, 371–384.  http://doi.org/10.1075/lfab.2.19hor.

Horváth, János (1969). Rendszeres magyar verstan. Akadémiai Kiadó.

Horváth, Péter (2020). “A vershangzás jellemzőinek automatikus feltárása József Attila verseiben”. In: Digitális Bölcsészet 3, M:3–M:27.  http://doi.org/10.31400/dh-hun.2020.3.422.

Horváth, Péter, Péter Kundráth, Balázs Indig, Zsófia Fellegi, Eszter Szlávich, Tímea Borbála Bajzát, Zsófia Sárközi-Lindner, Bence Vida, Aslihan Karabulut, Mária Timári, and Gábor Palkó (2022). “ELTE Poetry Corpus: A Machine Annotated Database of Canonical Hungarian Poetry”. In: Proceedings of the 13th Conference on Language Resources and Evaluation. Ed. by Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, and Stelios Piperidis. European Language Resources Association, 3471–3478.

Ibrahim, Robert and Petr Plecháč (2011). “Toward Automatic Analysis of Czech Verse”. In: Formal Methods in Poetics: A Collection of Scholarly Works Dedicated to the Memory of Professor M.A. Krasnoperova. Ed. by Barry P. Scherr, James Bailey, and Evgeny V. Kazartsev. RAM, 295–305.

Indig, Balázs, Bálint Sass, Eszter Simon, Iván Mittelholcz, Noémi Vadász, and Márton Makrai (2019). “One Format to Rule Them All – The emtsv Pipeline for Hungarian”. In: Proceedings of the 13th Linguistic Annotation Workshop. Ed. by Annemarie Friedrich, Deniz Zeyrek, and Jet Hoek. Association for Computational Linguistics, 155–165.  http://doi.org/10.18653/v1/W19-4018.

Jékel, Pál and Ferenc Papp (1974). Ady Endre összes költői mŮveinek fonémastatisztikája. Akadémiai Kiadó.

Jékel, Pál and Lajos Szuromi (1980). Petőfi metrumai. Kossuth Lajos Tudományegyetem.

Klesnilová, Kristýna, Magda Klouda Karel annd Friedjungová, and Petr Plecháč (2024). “Automatic Poetic Metre Detection for Czech Verse”. In: Studia Metrica et Poetica 11.1, 44–61.  http://doi.org/10.12697/smp.2024.11.1.02.

Labádi, Gergely (2018). “Az olvasó gép: Berzsenyi Dániel versei távolról”. In: Digitális Bölcsészet 1, 17–34.  http://doi.org/10.31400/dh-hun.2018.1.126.

Lesi, Zoltán (2006). “Automatikus verselemzés tanuló algoritmusok alkalmazásával”. In: IV. Magyar Számítógépes Nyelvészeti Konferencia. Ed. by Zoltán Alexin and Dóra Csendes. Szegedi Tudományegyetem Informatikai Tanszékcsoport, 402–407.

Lesi, Zoltán (2008). “Automatikus formai verselemzés”. In: Alkalmazott Nyelvtudomány 8 (1-2), 197–208.

Lotz, John (1972). “Uralic”. In: Versification: Major Language Types. Sixteen Essays. Ed. by W. K. Wimsatt. New York University Press, 100–121.

Maróthy, Szilvia, Levente Seláf, and Petr Plecháč (2021). “Rhyme in 16th-Century Hungarian Historical Songs: A Pilot Study”. In: Tackling the Toolkit: Plotting Poetry through Computational Literary Studies. Ed. by Petr Plecháč, Robert Kolár, Anne-Sophie Bories, and Jakub Říha. Institute of Czech Literature of the Czech Academy of Sciences, 43–58.  http://doi.org/10.51305/ICL.CZ.9788076580336.04.

Mártonfi, Attila (2008). “Egy magyar rímszótár terve”. In: “Mielz valt mesure que ne fait estultie”: A hatvanéves Horváth Iván tiszteletére. Ed. by István Bartók, Béla Hegedüs, Levente Seláf, Mihály Szegedy-Maszák, Márton Szentpéteri, and András Veres. Krónika Nova, 198–204.

Navarro-Colorado, Borja (2015). “A Computational Linguistic Approach to Spanish Golden Age Sonnets: Metrical and Semantic Aspects”. In: Proceedings of the Fourth Workshop on Computational Linguistics for Literature. Ed. by Anna Feldman, Anna Kazantseva, Stan Szpakowicz, and Corina Koolen. Association for Computational Linguistics, 105–113.  http://doi.org/10.3115/v1/W15-0712.

Navarro-Colorado, Borja (2018). “A Metrical Scansion System for Fixed-metre Spanish Poetry”. In: Digital Scholarship in the Humanities 33, 112–127.  http://doi.org/10.1093/llc/fqx009.

Navarro-Colorado, Borja, Marí Ribes Lafoz, and Noelia Sánchez (2016). “Metrical Annotation of a Large Corpus of Spanish Sonnets: Representation, Scansion and Evaluation”. In: Proceedings of the Tenth Edition of the Language Resources and Evaluation Conference. Ed. by Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, and Stelios Piperidis. European Languages Resources Association, 4360–4364.

Plamondon, Marc R. (2006). “Virtual Verse Analysis: Analysing Patterns in Poetry”. In: Literary and Linguistic Computing 21 (1), 127–141.  http://doi.org/10.1093/llc/fql011.

Plecháč, Petr and Robert Kolár (2015). “The Corpus of Czech Verse”. In: Studia Metrica et Poetica 2 (1), 107–118.  http://doi.org/10.12697/smp.2015.2.1.05.

Seláf, Levente and Petr Plecháč (2023). “Számoljuk meg a valákat! A históriás énekek rímelése”. In: A históriás ének: poétikai és filológiai kérdések. Ed. by Levente Seláf. Gépeskönyv. https://f-book.com/book/2023/A-historias-enek-kerdesek/index.php?chapter=7 (visited on 01/26/2026).

Szepes, Erika and István Szerdahelyi (1981). Verstan. Gondolat Könyvkiadó.

Tanasescu, Chris, Bryan Paget, and Diana Inkpen (2016). “Automatic Classification of Poetry by Meter and Rhyme”. In: Proceedings of the Twenty-Ninth International Florida Artificial Intelligence Research Society Conference. Ed. by Zdravko Markov and Ingrid Russell. Artificial Intelligence Research Society.

Váradi, Tamás, Eszter Simon, Balázs Sass, Iván Mittelholcz, Attila Novák, Balázs Indig, Richárd Farkas, and Veronika Vincze (2018). “E-magyar – A Digital Language Processing System”. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation. Ed. by Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Isaharam Hitoshi, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, and Takenobu Tokunaga. European Language Resources Association, 1307–1312.

Vargyas, Lajos (1966). Magyar vers – magyar nyelv: Verstani tanulmány. Szépirodalmi Könyvkiadó.

Voigt, Vilmos (1972). “Számítógépes ritmuselemzési kísérlet”. In: Irodalomtörténeti Közlemények 76 (2), 203–211. Detecting Rhyme and Meter