Skip to main content
Article

Rhymefindr. An Historical Poetics Method for Identifying Rhymes in Nineteenth-Century English Poetry


Abstract

This paper describes a new approach to rhyme identification that is grounded in the critical tradition of historical poetics. Rhymefindr comprises a set of R scripts designed to identify rhymes in nineteenth-century English poetry by operationalizing the rules presented in an 1824 edition of John Walker’s A Rhyming Dictionary, one of the leading references on rhyme throughout the nineteenth century. By using an historical dictionary as a data source, Rhymefindr is sensitive to changes in pronunciation as well as changing theories about rhyme. As a corpus-independent method it can be used to identify rhymes in corpora of any size.

Keywords: computational poetics, historical poetics, rhyme, Rstats

How to Cite:

Houston, N. M., (2026) “Rhymefindr. An Historical Poetics Method for Identifying Rhymes in Nineteenth-Century English Poetry”, Journal of Computational Literary Studies 5(1). doi: https://doi.org/10.48694/jcls.4229

47 Views

22 Downloads

Published on
2026-02-16

Peer Reviewed

1. Introduction

Although poetic language is made up of words and sentences, and many text analysis methods can therefore be fruitfully applied to poetry, poetry also displays a number of distinctive formal features, including lineation, stanza patterns, meter, and rhyme, which can enrich text analysis and be the object of study themselves. Rhyme is of particular interest because it not only connects individual words through their shared sounds, but also connects poetic lines within stanzas. The patterns created by rhyme are thus integral to both the structure and the sound of poetry.

The predominant placement of rhyme words in modern English poetry is at the end of poetic lines. At the simplest level, rhyme can be defined as the relationship between “two syllables at line end… that have identical stressed vowels and subsequent phonemes but differ in initial consonant(s) if any are present – syllables that, in short, begin differently and end alike” (Greene et al. 2012, 1184). The availability of rhymes is determined in part by linguistic structures: Highly inflected languages, for example, produce many more possible rhymes than are available in English.

But the very same reference work also notes that “the definition of what counts as rhyme is conventional and cultural: it expands and contracts from one national poetry, age, verse tradition, and genre to another” (Greene et al. 2012, 1185). Both rhyme practice and rhyme theory change throughout history: Although perfect rhymes (cat, hat) have always been used, at different points in English poetic history, poets and critics have variously accepted or rejected other forms of rhyme. Some of these alternate forms include near rhymes, where the vowel sounds are close in sound, but not identical (soul, all); eye rhymes, where words are orthographically similar but pronounced differently (good, food); and the repetition of a given word. Additionally, historical changes in pronunciation mean that some rhyme words used in the sixteenth century, for example, are no longer pronounced similarly (love, prove).

Computational methods for the analysis of literary texts have flourished in recent decades, spurred by the increasing availability of digitized text corpora. The ability to analyze features of poetic language across large corpora supports research in distant reading. As Franco Moretti suggests, shifting the focus of analysis to “units that are much smaller or much larger than the text” brings forth new kinds of knowledge about the literary “system in its entirety” (Moretti 2000, 57). Rhyme is a fundamental component of English poetry and understanding the connections it draws among words and ideas can contribute to research in many areas of poetics.

Previous work on the identification of rhyme words within English poetry include phonetic dictionary-based approaches, sometimes paired with text-to-speech generation, to identify words with matching final syllables (Heuser et al. 2018; McCurdy et al. 2015); an unsupervised expectation maximization algorithm to generate rhyme schemes (Reddy and Knight 2011); and a collocation-based method for identifying rhyme pairs in large corpora based on the frequency of their co-occurrence within individual poems (Plecháč 2018). However, these approaches do not directly address the variations in how rhyme has been defined and used throughout literary history, and particularly in the nineteenth century.

This paper describes a new approach to rhyme identification that is grounded in the critical tradition of historical poetics, which contextualizes the study of literary form in the theories and assumptions that poets and readers of past historical periods would have encountered and absorbed (Jarvis 2014, 98). By combining an historical poetics concept with computational criticism, this project makes it possible to model historical works of poetic theory and test them against collections of texts beyond the specific examples cited in those theories. This expands the work of historical poetics beyond the conception of its founders, a collaborative group of literary scholars focused on theoretical, not applied scholarship. Yopie Prins, for instance, says that “practical application is not the point of historical poetics” (Prins 2008, 233).

In contrast, this paper suggests that computational analysis provides a method for understanding nineteenth-century theories of rhyme through examining their relationship to actual historical poetic practice. By historicizing those rules as part of the analytic process, this project seeks to reconcile the multiple subjectivities of humanist knowledge with methods of quantitative analysis, responding to Johanna Drucker’s call for a “radical critique to return the humanistic tenets of constructed-ness and interpretation to the fore” of digital humanities scholarship (Drucker 2011, 1). This paper describes translating a specific historical theory about rhyme – one critic’s set of rules for understanding and evaluating rhyme – into code that can be processed by a machine. Although this iteration of the code only utilizes one historical dictionary, additional rhyme dictionaries will be added in the future for further comparison and analysis.

The structure of this paper is as follows: Section 2 discusses the historical context of English rhyme and rhyme dictionaries in the nineteenth century; section 3 discusses previous approaches to rhyme identification. Section 4 presents Rhymefindr, a set of R scripts designed to identify rhymes in nineteenth-century English poetry by operationalizing the rules presented in an 1824 edition of John Walker’s A Rhyming Dictionary, one of the leading references on rhyme throughout the nineteenth century (Walker [1775] 1824). By using Walker’s dictionary as the basis for rhyme matching, this method is grounded in the theories of rhyme that were contemporary with the nineteenth-century poetry being analyzed. This method provides the opportunity to compare historical rhyme theory with historical rhyme practice by assessing how Walker’s rules for rhyme compare to actual poetic usage. Section 5 presents an evaluation of this approach using gold standard data from the Chicago Rhyming Poetry Corpus (Reddy and Sonderegger 2011). Section 6 discusses the findings and section 7 notes future enhancements planned for this project.

2. Nineteenth-Century Rhyme and Rhyme Dictionaries

Readers today often come to the study of rhyme with assumptions drawn from the aesthetic values of the twenty-first century. In an era that elevates free verse, structured verse forms are often seen as old-fashioned and contemporary critics and poets often assume rhyme constrains poetic expression (Cohen-Vrignaud 2015, 995). But in the nineteenth century, as Peter McDonald suggests, “the legitimacy of rhyme as a mode of writing was not in serious question… rhyme was a shared idiom, without which the lyric was all but unthinkable. To that extent, a rhymed poem did not really represent, in any useful sense, a decision to use rhyme” (McDonald 2012, 6–7). Almost all nineteenth-century English lyric poems are rhymed, and some dramatic and narrative poems use rhyme as well.1 Rhyme was so prevalent in nineteenth-century verse that it would likely “feel to poets and readers as though it were something like a feature of the language itself” (Jarvis 2011, 36). This larger context of rhyme pairs that would have been familiar to many readers shaped a poet’s choice of specific rhyming words, whether they were typical or unusual.

The central nineteenth-century critical debate about rhyme focused on whether imperfect rhymes were acceptable in poetry. Imperfect rhymes are today frequently termed near rhymes: words that are not pronounced exactly the same, as in a perfect rhyme, but are closely similar in sound. A second, sometimes overlapping, category of imperfect rhymes are eye rhymes: words whose endings are spelled the same, but pronounced differently. The history of English poetry from every century includes examples of words that do not sound the same, but are nonetheless interpreted as rhyme pairs because of the structural context in which they are placed. For example, because Alexander Pope’s 1711 poem An Essay in Criticism is written in heroic couplets, the reader understands that “take” and “track” are presented as rhyme words in this passage about the necessity of poetic license:

If, where the rules not far enough extend,

(Since rules were made but to promote their end)

Some lucky LICENCE answers to the full

Th’ intent propos’d, that licence is a rule.

Thus Pegasus, a nearer way to take,

May boldly deviate from the common track.

(Pope 1831, 8)

As Pope suggests here, poets do not always follow the rules. An historical poetics approach to rhyme seeks to understand this variability both in how rhyme was used and how it was theorized.

Rhyme dictionaries, which became quite prevalent from the 18th century onward, offer a valuable resource for understanding the changing idiom of nineteenth-century rhyme and the history of rhyme theories. These dictionaries reflected poetic practice, often quoting examples of specific rhymes in the works of major poets, and they also prescribed particular rules and values around rhyme. The two most popular rhyme dictionaries for the eighteenth and nineteenth centuries, Edward Bysshe’s 1714 The British Parnassus and John Walker’s 1775 A Rhyming Dictionary, both draw on examples from canonical English poets to justify their inclusion of imperfect rhymes (Bysshe 1714; Walker [1775] 1824). Walker, for example, claims that: “The delicate ears of a Pope or an Addison, would scarcely have acquiesced in the usage of imperfect rhymes, and sanctified them so often by their practice, if such rhymes had been really a blemish” (Walker [1775] 1824, 635). But later in the century, when many competing rhyme dictionaries were published, Tom Hood would instruct the reader of his 1869 The Rules of Rhyme that “he must use such rhymes only as are perfect to the ear, when correctly pronounced” (Hood 1869, xii). Hood’s emphasis on correct pronunciation reflects the association of pronunciation with social class in England. Like any reference work, rhyme dictionaries are not neutral: They often reveal how class and education shaped the aesthetic values associated with rhyme. Rhyme was frequently a touchpoint for larger cultural concerns during a time period in which increasing quantities of poetry were being published, not only in book form, but also in periodicals and newspapers.

3. Related Work

Brown et al. (2024) conduct a mapping review of 89 studies on rhyme identification algorithms, demonstrating increasing interest in this area of research since the 1960s. While the identification and analysis of rhymes has remained a continued thread of research, many recent studies have focused rhyme generation and have shifted from poetry to rap lyrics as the sample texts (Malmi et al. 2016; Popescu-Belis et al. 2023). This section highlights key topics in rhyme identification and analysis relevant to the historical poetics approach outlined in Rhymefindr.

3.1 Characteristics of Rhyme

As discussed previously, poetic rhyme is understood as the relationship between two or more words that terminate in syllables with similar sounds. Rhyme relationships exist within many kinds of natural language use, but within poetry and song lyrics, we find “foregrounded phonetic repetition” due to the placement of rhymes at the end of lines and within patterned stanza structures (Rickert 1978, 35). Similarly, Condit-Schultz suggests that rhyme should be understood as “a perceptual phenomenon which is evoked by phonemic parallellism” (Condit-Schultz 2016, 132). Poetic rhyme occurs within particular structures and patterns that encourage listeners or readers to perceive certain words as rhyme words. Conversely, two words that share the same rhyme sound but are widely separated (ie, by 50 lines within a long poem) may not be perceived by the reader as a rhyme because of the temporal distance in perception. Thus studies of rhyme as a poetic phenomenon within specific texts may operationally define a window within which two lines may be considered to rhyme (Plecháč 2018, 86); studies of rhyme as a larger linguistic phenomenon may be interested in all words with shared endings, regardless of placement within the text.

In texts where a given rhyme sound is shared by more than two words, it is customary to understand those relationships as forming a rhyme chain (Joyce 1979, 129; Condit-Schultz 2016, 132). Although poetic lines are sequentially presented in a poem, and the proximal paired word would presumably have the most impact, the rhyme relationships accumulate, such that in a poem containing lines ending in “day”, “stay”, and “away”, three rhyme pairs would be counted for the syllable “ay”. Thus rhyme relationships can be considered as a graph structure. Joyce (1979) models the rhyme relationships within one long Middle English poem as a directed graph to maintain the sequential component of these chains. Sonderegger (2011) constructs an undirected rhyme graph for a large corpus of modern poetry and finds that its connected components reflect pronunciation, suggesting that rhymes could be used as supporting information for studies of historical pronunciation changes. Baley (2023) applies graph theory to the problem of evaluating inter-annotation agreement on rhymes in Chinese poetry.

3.2 Rhyme as a Stylistic Feature of Poetry

Many text analysis approaches treat rhyme as a stylistic feature of poetry. Kaplan and Blei (2007) include four different types of rhyme among the 89 features of poetic style they modeled to compare the work of American poets. Mayer et al. (2008) use rhyme along with text statistics to classify music lyrics by genre. Hirjee and Brown (2010) train a probabilistic model to identify rhymes as part of a stylistic study of rap lyrics. Kao and Jurafsky (2012) use a logistic regression model over 16 features of contemporary poetry, including rhyme, to distinguish between the work of amateur and professional poets. Pérez Pozo et al. (2022) compare a rule-based system, decision trees, and neural network approaches to classifying 46 defined stanza types in Spanish poetry based on verse length, rhyme structure, and rhyme pattern.

3.3 Pronunciation

Because rhyme relationships are constituted by similar word sounds, rhyme has been used as the basis of studies of historical pronunciation (List et al. 2017; Sonderegger 2011) and references on pronunciation are used as support for rhyme identification (Plamondon 2006).

Many researchers, like Kaplan and Blei (2007), Kao and Jurafsky (2012), and McCurdy et al. (2015) rely on the open-source machine-readable Carnegie Mellon University Pronouncing Dictionary, which provides phonetic transcriptions for 134,000 words in North American English (The CMU Pronouncing Dictionary 2026). This dictionary is widely available but was not designed for literary analysis. Its vocabulary is also skewed towards contemporary English. McCurdy notes the limitations of the CMU dictionary’s vocabulary and extends it by use of letter-to-sound rules and syllable segmentation algorithms (McCurdy et al. 2015, 17). Popescu-Belis et al. (2023) uses the CMU dictionary to construct synthetic rhyme data to fine tune a GPT-2 model to generate rhymed verse. Other researchers have incorporated text-to-speech technologies into rhyme identification workflows (Heuser et al. 2018; Plecháč 2018).

3.4 Rhyme Identification

Because rhyme describes a relationship, the task of rhyme identification has been defined either as the discovery of stanzaic rhyme schemes (ie, ABAB, ABBA) or as the discovery of rhyme pairs.

Noting the limitations of using phonetic transcription for historical texts, Reddy and Knight (2011) proposed identifying rhyme schemes through an unsupervised expectation maximization algorithm trained on a corpus of 93,014 lines of English poetry from 1450-1950 and 26,543 lines of French poetry from 1450-1650 with rhyme annotations. This approach starts with a predefined set of 462 possible stanza rhyme schemes drawn from the training corpus. The algorithm builds on the intuition that rhyming words within a given stanza are also likely to co-occur within a large corpus. Adding a measure to account for orthographic similarity improved the performance of their model, as did using a hidden Markov model to condition each stanza on the previous one in the poem. Other related approaches to rhyme scheme identification include Addanki and Wu (2013), who use a hidden Markov model with nine rhyme patterns for an unsupervised approach to detecting rhyme schemes in rap lyrics.

Building on the work of Reddy and Knight, but noting the limitations of their stanza-based approach, Plecháč (2018) focuses on the discovery of rhyme pairs in large poetic corpora. The model is first trained with the collocation of rhyme word pairs throughout the corpus. Then text-to-speech corpus transcription is used to obtain the phonetic elements of the rhyme words and learn the “rhyme probabilities between particular vowels (syllable peaks) and consonant clusters,” with an added probability for orthographic similarity (Plecháč 2018, 84). Plecháč shows that this collocation approach generally outperforms Reddy and Knight’s maximization approach on their corpus of English and French poetry and on a corpus of 2.5 million lines of Czech poetry (Plecháč and Kolár 2015). A recent supervised approach to the identification of rhyme pairs uses Siamese Recurrent Networks to identify rhyme pairs in German, English, and French poetry (Haider and Kuhn 2018).

One challenge in identifying rhymes in historical texts are changes in how rhyme was defined and used. An historical poetics approach to rhyme does not assume that rhyme relationships are static. Using specific historical guides to rhyme as the basis for rhyme identification allows for the discovery of rhymes that may not be identified by phonetic matching with contemporary dictionaries, particularly given the variability of national pronunciation differences and historical changes in pronunciation. Rhymefindr has been designed to support stylistic analysis by identifying features related to rhyme words and rhyme syllables. As a rule-based approach, Rhymefindr does not require a large training corpus, as do the expectation maximization and collocation approaches.

4. Rhymefindr

The Rhymefindr approach to rhyme identification presented here is grounded in rules of rhyme that were relevant for poets and readers in the nineteenth century. Specifically, this approach utilizes John Walker’s A Rhyming Dictionary, which was highly influential throughout the nineteenth century, particularly in its documentation of imperfect rhymes that were acceptable in English verse. Walker’s dictionary also offers a window onto historical British pronunciation of English words that is valuable for analyzing rhyme.

Many nineteenth-century poets deliberately experimented with rhyme and other formal structures in their poetry. Rhymefindr does not utilize knowledge of a particular literary corpus or of specific stanza rhyme patterns, so it is not limited to finding rhyme only in works that conform to the literary tradition, or in works written by familiar canonical poets. Although Walker’s dictionary includes quotations from poetry to support his views on near rhymes as compared with perfect ones, the rhyme data contained in the dictionary’s entries are completely distinct from any poetic tradition. In arguing for distant reading as an alternative to close reading, Franco Moretti argued that traditional literary scholarship “necessarily depends on an extremely small canon. . . . you invest so much in individual texts only if you think that very few of them really matter” (Moretti 2000, 58). As a corpus-independent method, Rhymefindr supports research in non-canonical poetics and can be used to identify rhymes in corpora of any size, thereby contributing to a wide range of research situations.

Rhymefindr currently comprises a key-value table created from an historical rhyme dictionary; an endword extraction script; and a rhyme identification script. The find_rhymes script performs a series of attempts to match the rhyming words within a poem based on the different kinds of rhyme expressed in that historical dictionary. Although the current iteration of the project utilizes only one dictionary, future versions will incorporate additional rhyme dictionaries to enable comparative analysis of rhyme theories as well as rhyme practice in the nineteenth century.

4.1 Dictionary Data

John Walker’s A Rhyming Dictionary; Answering, at the Same Time, the Purposes of Spelling and Pronouncing the English Language, on a Plan not Hitherto Attempted was selected as the data source for the dictionary component of this project because it was one of the most popular rhyme dictionaries throughout the nineteenth century. (Byron and Tennyson both owned copies, as did many other poets.) It was first published in 1775 and reprinted and expanded in both British and American editions throughout the nineteenth century. A Google-digitized file created from a Harvard University copy of the 1824 edition published in London by W. Baynes and Son was used to prepare the data for this project.

Walker’s dictionary is structured in two parts, both of which focus on the endings of English words. Walker argued that his work was more than a “mere rhyming dictionary” or “resource for poetasters”; rather, his “dictionary of terminations subservient to the art of spelling and pronouncing” would provide a new perspective on the structures of the English language: “In this arrangement of the language, we easily discover its idiomatic structure, and find its several parts fall into their proper classes, and almost every word as much distinguished by its termination as by its sense” (Walker [1775] 1824, v–vi). The first part of the volume, titled a “Syllabic Dictionary,” lists English words with brief definitions, as one might find in other dictionaries. However, Walker lists these words according to reverse-spelling order (“s” in these entries indicates nouns, or substantives):

Elf A fairy; a devil, s.

Delf A mine; quarry, earthen ware, s.

Shelf A board to lay things on; a sand bank in the sea; hard coat of earth under the mould, s.

(Walker [1775] 1824, 186)

Later editors changed the title of the dictionary to make this innovation clear: The rhyming dictionary of the English language: in which the whole language is arranged according to its terminations (Walker [1775] 1894). Walker argued that presenting its contents in reverse-spelling order would help teach the rules for English spelling, which he calls “an insuperable difficulty for foreigners” and an “eternal source of dispute and perplexity for ourselves” (Walker [1775] 1824, vi). This reverse-spelling presentation makes groups of rhyming words readily visible on the page.

But Walker also recognized that readers accustomed to other rhyme dictionaries would want an easier way of finding rhymes. So the second part of the volume consists of an “Index of Perfect and Allowable Rhymes” containing entries for the final syllables of English words, arranged alphabetically by their first letters (elf, elk, elm, elp) as the editors of previous rhyme dictionaries had done (Bysshe 1714; Poole 1657). What distinguished Walker’s index from those earlier dictionaries was his decision to document and include imperfect rhymes, which he renamed “allowable” rhymes, documented with “authorities for their usage from our best poets” (Walker [1775] 1824, 635). By renaming what earlier critics had called “imperfect” rhymes as “allowable,” Walker emphasizes the capacious quality of his approach to rhyme. Walker’s generous definition of allowable rhyme became the standard theory of rhyme for many nineteenth-century readers and poets, even after the resurgence of stricter definitions of perfect rhyme in competing rhyme dictionaries published in the 1860s. Walker’s “Index of Perfect and Allowable Rhymes” serves as the basis for the dictionary portion of this project.

Entries in the “Index of Perfect and Allowable Rhymes” begin with a rhyming syllable, followed by a list of words that include the key syllable, or that rhyme perfectly with it. Some of these lists are ostensibly comprehensive, but others end with an “etc” suggesting that the reader would be able to come up with additional rhyming words. After the perfect rhymes, Walker occasionally notes what he terms “nearly perfect” rhymes, and then lists the allowable rhymes:

EM

Gem, hem, stem, them, diadem, stratagem, &c. Perfect rhymes, condemn, contemn, &c. Allowable rhymes, lame, tame, &c. team, seam, theme, phlegm, &c.

(Walker [1775] 1824, 655)

Where the allowable rhymes are especially controversial, Walker provides references to specific rules in his Preface and quotations from the works of English poets who use the rhyme. Within the entries there are also a number of cross-references: Entries for some syllables consist entirely of a cross reference to a homophone, and cross references are also included within the lists of perfect or allowable rhyme syllables.

4.2 Creation of a Key-Value Dictionary

A key-value dictionary was created to represent Walker’s index of rhymes, with each rhyme syllable that heads an entry in the dictionary defined as a key and matched with the values listed in Walker for perfect rhyme syllables, perfect rhyme words, allowable rhyme syllables, and allowable rhyme words. The small number of words Walker labels “nearly perfect” were included with the perfect rhymes.

Although the intention behind this project is to create an historically sensitive rule base for rhyme from Walker’s rhyme dictionary, that historical document contained some inconsistencies in its presentation of data, so in some instances strict fidelity to Walker’s text had to be modified in order to make the key-value dictionary fully operational. For example, many cross-referenced rhyme syllables are listed under both headings in Walker, but in some cases only one is cross-referenced: The entry for EIGHT says “see ATE” but the entry for ATE does not point to EIGHT. To standardize the data for this project, all cross-referenced rhyme syllables were duplicated for both key entries. The other modification to the historical data obtained from Walker’s dictionary was to add modern spellings for one-syllable past participles (adding missed where Walker lists miss’d) to make the key-value dictionary applicable to a wider range of nineteenth-century texts.

4.3 Endword Extraction Script

The get_endwords R script is included in the project repository to facilitate the extraction of endwords from a directory containing plain text files of poems. Because this script is designed for the analysis of rhyme, hyphens are removed and hyphenated words are put together. Thus the common nineteenth-century spelling “to-day” becomes “today” rather than “to day.” Although this decision produces some odd-looking word forms, like “garretroom,” overall it produces more accurate results in the rhyme analysis stage.

In addition to the vectors of endwords for each poem that are required for the rhyme discovery script, the get_endwords script also outputs several poetic features useful for exploratory text analysis, including the number of stanzas and lines in the poem.

4.4 Rhyme Identification Script

The find_rhymes R script is designed to work with an input csv containing a text id and a character vector of endwords for each poem. The final syllable of each endword is extracted with regular expressions based on the orthographic principles of English and is used as the basis for a series of lookups in the key-value table created from Walker’s dictionary. For each endword, the script looks first to match it with a perfect rhyme syllable or rhyme word in Walker; if one isn’t found, it checks the allowable rhyme syllables and words listed in Walker. As rhyme matches are found, a vector indicating the rhyme sequence is constructed. Capital letters are conventionally used for this purpose in the study of poetics, and are applied to all of the endwords in the poem, including any non-rhyming lines. A final lookup checks for orthographic matches among the rhyme syllables in the poem that have not been matched to rhymes in Walker’s dictionary; however, these matches are currently limited to identical matches, or perfect rhymes.

It should be noted that all of the entries in Walker’s rhyme dictionary are for single rhyme syllables. The majority of rhymes used in nineteenth-century English poetry (and indeed, English language poetry from any period) are monosyllabic rhymes, in large part because of the predominance of iambic meter in both natural English speech and especially in English poetry. An iambic metrical foot consists of an unstressed syllable followed by a stressed syllable; thus most lines of iambic poetry end with a stressed syllable, which is the focus of the rhyme. Although the find_rhymes script thus only identifies single syllable rhymes, many bisyllabic rhyme pairs can also be identified through this approach.

After all the rhymes have been identified, the ratio of unique rhymes to the total number of rhymes in the poem is calculated to assess the likelihood of whether the poem is rhymed or not, using the first 75 lines of longer poems and the entire text for poems with fewer than 75 lines. For nineteenth-century English poetry, an operationally successful range of ratios was defined. Ratios smaller than .70 indicate rhymed poems; ratios between .70 and .86 indicate possibly rhymed poems; and ratios greater than .86 indicate unrhymed poems. These ranges account for the likelihood that even ostensibly unrhymed poems, like long poems written in blank verse, will contain some rhymes across many hundreds of lines.

For each poem, the script outputs the rhyme scheme, a categorical indicator of the likelihood of the poem being rhymed, and a vector indicating which of the rhymes are perfect rhymes according to Walker’s dictionary.

5. Evaluation

Rhymefindr was tested using the gold standard annotated data for English poetry in the Chicago Rhyming Poetry Corpus (Reddy and Sonderegger 2011) which was the same English corpus used by Reddy and Knight (2011) and Plecháč (2018). Because rhyme constitutes a relationship between two or more words, different approaches to evaluating rhyme discovery have been applied in previous work and are used here for comparison.

5.1 Gold Standard Data

The English language component of the Chicago Rhyming Poetry Corpus contains annotated rhyme data for 11,613 stanzas containing 93,014 lines of poetry by 32 poets (Reddy and Sonderegger 2011). The gold standard data files are separated into five 100-year spans from 1450-1950. These files contain an entry for each stanza in the corpus poems that consists of its end words and a numeric sequence indicating its rhymes.

Because Rhymefindr is based on a rhyme dictionary popular in the nineteenth century, it is relevant to consider the representation of nineteenth-century poets in the gold standard data subgroups for 1750-1850 and 1850-1950. Although no information is provided in the corpus repository about how poets or poems were selected for the rhyme corpus, all of the poets included in the English selections in the 1850-1950 chronological period overlap with the list provided in Sonderegger (2011), which describes compiling a rhyme corpus of “poetry written by English authors around 1900” (Sonderegger 2011, 657). As seen in Table 1, which arranges the list of poets by date of birth, the Chicago Rhyming Poetry Corpus includes poets who were mostly active during the Romantic and Edwardian eras, skipping over poets from the Victorian period (1837-1900).

Table 1: Chicago Rhyming Poetry Corpus. Poets included in the 1750-1850 and 1850-1950 sub-corpora in Reddy and Sonderegger (2011).

Sub-corpus Poet Lifespan
1750-1850 Oliver Goldsmith 1728-1774
Charlotte Turner Smith 1749-1806
William Wordsworth 1770-1850
Samuel Taylor Coleridge 1772-1834
Lord Byron (George Gordon) 1788-1824
Percy Bysshe Shelley 1792-1822
1850-1950 A. E. Housman 1859-1936
Thomas Crosland 1865-1924
Rudyard Kipling 1865-1936
G. K. Chesteron 1874-1936
Edward Thomas 1878-1917
Rupert Brooke 1887-1915

In the process of working with the Chicago Rhyming Poetry Corpus, 102 entries in the published gold standard files were found to have incomplete data and were discarded from the evaluation; an obvious typographical error was corrected in one additional entry.2 This resulted in a total of 11,511 stanzas, distributed over the five chronological sub-groups as shown in Table 2.

Table 2: Gold standard data files. Number of stanzas and lines in the gold standard data files used in the evaluation.

Sub-corpus Stanzas Lines
1415_pgold 197 1,250
1516_pgold 3,786 35,485
1617_pgold 2,141 19,683
1718_pgold 2,546 20,546
1819_pgold 2,843 15,408
totals 11,513 92,372

5.2 Rhyme Scheme Evaluation Metrics

As described in section 3, Reddy and Knight (2011)’s expectation maximization (EM) approach identifies rhyme schemes in separate stanzas of poetic texts. They define accuracy at the scheme level, indicating that a discovered rhyme scheme either does or does not match the gold standard rhyme scheme exactly. Table 3 shows Rhymefindr’s accuracy in discovering rhyme schemes according to Reddy and Knight’s definition and compares it to the performance of two of their models: their EM approach for separate stanzas with an initialization for orthographic similarity, and their hidden Markov model (HMM) approach which conditions for stanza dependencies (Reddy and Knight 2011, 81).

Table 3: Rhyme scheme accuracy percentage for Rhymefindr compared with Reddy and Knight’s EM and HMM approaches (Reddy and Knight 2011, 81).

RK EM with orthographic RK HMM Rhymefindr
1450-1550 69.04 74.31 61.93
1550-1650 71.98 79.17 53.20
1650-1750 89.54 91.23 51.24
1750-1850 33.62 49.11 57.66
1850-1950 54.05 58.95 70.56

Rhymefindr performs better according to this measure of rhyme scheme accuracy than the EM or HMM approaches for the chronological periods 1750-1850 and 1850-1950, which are the time periods for which Walker’s dictionary (first published in 1775) would be expected to have the strongest relevance. Notably, Reddy and Knight’s EM and HMM approaches perform significantly worse on poetry after 1750 than on poetry from the earlier subgroups. This may be due to the greater variety of stanza structures in later poetry or to the makeup of the training set data.

Reddy and Knight (2011) also calculate precision and recall at the stanza level: precision as the number of rhyming words within each stanza that are correctly discovered by the algorithm divided by the number of rhyming words output for the stanza, and recall as the number of correctly discovered rhyming words within the stanza divided by the number of rhyming words in the gold standard for the stanza. Words without rhyme pairs in a stanza are ignored. They total the precision and recall scores for all stanzas before calculating the F score for each chronological sub-group. Table 4 compares Rhymefindr’s precision and recall for rhyme schemes calculated in this way with Reddy and Knight’s EM approach with orthographic similarity and their HMM approach (Reddy and Knight 2011, 81).

Table 4: Rhyme scheme F scores for Rhymefindr compared with Reddy and Knight’s EM and HMM approaches (Reddy and Knight 2011, 81).

RK EM with orthographic RK HMM Rhymefindr
1450-1550 0.82 0.86 0.88
1550-1650 0.88 0.90 0.87
1650-1750 0.96 0.97 0.84
1750-1850 0.70 0.82 0.88
1850-1950 0.84 0.90 0.87

Rhymefindr’s performance on poetry after 1750 improves on Reddy and Knight’s EM approach and is close to the performance of their HMM approach.

5.3 Rhyme Pair Evaluation Metrics

As discussed earlier, Plecháč (2018) defines the task as the discovery of rhyme pairs, rather than stanza rhyme schemes, and uses a collocation approach to train a model with the phonetic probabilities of rhyme. Plecháč does not provide an accuracy metric in the evaluation, focusing instead on precision and recall, calculated with the total numbers of rhyme pairs in the output and gold standard. Table 5 evaluates Rhymefindr’s performance using this approach to precision and recall and compares it to the results of Plecháč’s collocation approach (Plecháč 2018, 89).

Table 5: Rhyme pair F scores for Rhymefindr compared with Plecháč’s collocation approach (Plecháč 2018, 89).

Plecháč collocation Rhymefindr
1450-1550 0.87 0.9
1550-1650 0.91 0.84
1650-1750 0.92 0.81
1750-1850 0.92 0.9
1850-1950 0.93 0.87

Rhymefindr’s performance according to this metric is notably better for poetry after 1750 than for 1550-1750, and while it does not match the performance of Plecháč’s collocation approach, its F scores are still good.

6. Discussion

As noted earlier, the definitions of acceptable poetic rhyme change over time and can be shaped by many factors, including changes in pronunciation and conventions of usage. Historical poetics emphasizes the importance of understanding that complexity. The question of whether a given pair of words rhyme may not always be possible to answer with a strict logical yes/no; sometimes the answer depends upon the historical period, expected national or regional pronunciation, and the literary context surrounding the words. Inspection of the rhyme vectors from the evaluation corpus with poor accuracy scores reveals three main causes for rhyme misclassification according to the gold standard data: plural nouns, historical pronunciation differences, and near rhymes.

Walker’s dictionary is inconsistent in its presentation of plural nouns, because he expected readers to be able to generalize from the singular noun to its plural. For example, the word “eyes” does not appear anywhere in Walker’s entries, but of course is very frequently used in nineteenth-century poetry. The Reddy and Sonderegger (2011) corpus includes rhymes between eyes/wise and eyes/dies, neither of which are marked as rhymes by Rhymefindr.

Pronunciation differences between nineteenth-century British English and contemporary English are another source for mismatches between the Reddy and Sonderegger (2011) annotations and the rhymes identified by Rhymefindr. For example, their gold standard data defines “anew/you” as a rhyme pair, which according to Walker (and most British pronunciation) have completely different vowel sounds. Walker’s dictionary was selected because it provides a guide to historical British pronunciation, which was considered important for an historical poetics project focused on the nineteenth century.

Walker’s inclusion of allowable, or near rhymes, is another source of mismatches with the gold standard data. For example, Walker says that ale/ell syllables are allowable rhymes, so Rhymefindr tags vale/hell as a rhyme, where the gold standard data does not. Future iterations of the project will give the user an option of selecting perfect and allowable rhymes, or only perfect rhymes, when making identifications, just as a reader of Walker’s dictionary could have chosen for their own purposes.

Unfortunately, Reddy and Sonderegger do not provide documentation of their approach to creating the rhyme annotations in the Chicago Rhyming Poetry Corpus, and how they handled different kinds of ambiguous or non-perfect rhymes. Understanding historical rhyme usage requires taking into account the various ways in which our contemporary sense of rhyme may not align with historical poetic practice. By keying rhyme identification to the constraints of particular historical dictionaries, Rhymefindr reminds users that identifying and describing rhyme is always an act of critical interpretation.

7. Future Work

With the framework of this historical dictionary-based method in place, other dictionaries will be added to expand the capacities of Rhymefindr as a rhyme identification tool and to enhance the utility of this project for comparative historical poetics. Several different rhyme dictionaries were published in the nineteenth century, including J. E. Carpenter’s A Handbook of Poetry (1868); Tom Hood’s The Rules of Rhyme (1869); Samuel W. Barnum’s A Vocabulary of English Rhymes, Arranged on a New Plan (1876); and Andrew Loring’s The Rhymer’s Lexicon (1905). Operationalizing multiple dictionaries would contribute not only to the computational analysis of rhyme, but would also enable new experiments that could test the application of different theories of rhyme over a large poetry corpus.

8. Data Availability

The Walker dictionary data needed to run Rhymefindr can be found here: https://github.com/nmhouston/Rhymefindr. They have been archived and are persistently available at: https://doi.org/10.5281/zenodo.18259162). Gold standard rhyme data from the Chicago Rhyming Poetry Corpus (Reddy and Sonderegger 2011) and outputs from the evaluation scripts can be found here: https://github.com/nmhouston/rf_eval. They have been archived and are persistently available at: https://doi.org/10.5281/zenodo.18259257.

9. Software Availability

The Rhymefindr scripts can be found here: https://github.com/nmhouston/Rhymefindr. They have been archived and are persistently available at: https://doi.org/10.5281/zenodo.18259162. The evaluation scripts can be found here: https://github.com/nmhouston/rf_eval. They have been archived and are persistently available at: https://doi.org/10.5281/zenodo.18259257.

10. Author Contributions

Natalie M. Houston: Conceptualization, Formal analysis, Writing – original draft, Writing – review & editing

Notes

  1. For example, 95% of the 108,182 nineteenth-century poems in the Chadwyck-Healey English Poetry database are rhymed. [^]
  2. Details are available at: https://github.com/nmhouston/rf_eval. [^]

References

Addanki, Karteek and Dekai Wu (2013). “Unsupervised Rhyme Scheme Identification in Hip Hop Lyrics Using Hidden Markov Models”. In: International Conference on Statistical Language and Speech Processing, 39–50.  http://doi.org/10.1007/978-3-642-39593-2_3.

Baley, Julien (2023). “Evaluating Rhyme Annotations for Large Corpora: Metrics and Data”. In: Cahiers de Linguistique Asie Orientale 52 (2), 137–162.  http://doi.org/10.1163/19606028-bja10032.

Brown, Daniel G., Rebecca Hutchinson, and Carolyn E. Lamb (2024). A Systematic Mapping Review of Algorithms for the Detection of Rhymes, from Early Digital Humanities Projects to the Rise of Large Language Models. https://uwspace.uwaterloo.ca/bitstream/handle/10012/20723/rhymesysrev.pdf?sequence=1 (visited on 01/06/2026).

Bysshe, Edward (1714). The British Parnassus: or, a Compleat Common-Place-Book of English Poetry: … To which is Prefix’d, A Dictionary of Rhymes. J. Nutt.

Cohen-Vrignaud, Gerard (2015). “Rhyme’s Crimes”. In: ELH: English Literary History 82 (3), 987–1012. http://www.jstor.org/stable/24477831 (visited on 01/06/2026).

Condit-Schultz, Nathaniel (2016). “MCFlow: A Digital Corpus of Rap Transcriptions”. In: Empirical Musicology Review 11 (2), 124–147.

Drucker, Johanna (2011). “Humanities Approaches to Graphical Display”. In: Digital Humanities Quarterly 5 (1), 1–21. https://dhq-static.digitalhumanities.org/pdf/000091.pdf (visited on 01/06/2026).

Greene, Roland, Stephen Cushman, Clare Cavanagh, Jahan Ramazani, and Paul Rouzer (2012). The Princeton Encyclopedia of Poetry and Poetics. Princeton University Press.

Haider, Thomas and Jonas Kuhn (2018). “Supervised Rhyme Detection with Siamese Recurrent Networks”. In: Proceedings of the Second Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature. Ed. by Beatrice Alex, Stefania Degaetano-Ortlieb, Anna Feldman, Anna Kazantseva, Nils Reiter, and Stan Szpakowicz. Association for Computational Linguistics, 81–86. https://aclanthology.org/W18-4509/ (visited on 01/06/2026).

Heuser, Ryan, J.D. Porter, Jonathan Sensenbaugh, Justin Tackett, Mark Algee-Hewitt, and Maria Kraxenberger (2018). Poesy. https://github.com/quadrismegistus/poesy (visited on 01/06/2026).

Hirjee, Hussein and Daniel G. Brown (2010). “Using Automated Rhyme Detection to Characterize Rhyming Style in Rap Music”. In: Empirical Musicology Review 5 (4), 121–145.  http://doi.org/10.18061/1811/48548.

Hood, Tom (1869). The Rules of Rhyme: A Guide to English Versification. With a Compendious Dictionary of Rhymes, an Examination of Classical Measures, and Comments upon Burlesque, Comic Verse, and Song-Writing. James Hogg & Son.

Jarvis, Simon (2011). “Why Rhyme Pleases”. In: Thinking Verse 1 (2).

Jarvis, Simon (2014). “What is Historical Poetics?” In: Theory Aside. Ed. by Jason Potts and Daniel Stout. Duke University Press, 97–116.

Joyce, James (1979). “Re-Weaving the Word-Web: Graph Theory and Rhymes”. In: Annual Meeting of the Berkeley Linguistics Society, 129–141.  http://doi.org/10.3765/bls.v5i0.3261.

Kao, Justine and Dan Jurafsky (2012). “A Computational Analysis of Style, Affect, and Imagery in Contemporary Poetry”. In: Proceedings of the NAACL-HLT 2012 Workshop on Computational Linguistics for Literature. Ed. by David Elson, Anna Kazantseva, Rada Mihalcea, and Stan Szpakowicz. Association for Computational Linguistics, 8–17. https://aclanthology.org/W12-2502/ (visited on 01/06/2026).

Kaplan, David M. and David M. Blei (2007). “A Computational Approach to Style in American Poetry”. In: Seventh IEEE International Conference on Data Mining (ICDM 2007). IEEE, 553–558.  http://doi.org/10.1109/ICDM.2007.76.

List, Johann-Mattis, Jananan Sylvestre Pathmanathan, Nathan W. Hill, Eric Bapteste, and Philippe Lopez (2017). “Vowel Purity and Rhyme Evidence in Old Chinese Reconstruction”. In: Lingua Sinica 3, 1–17.  http://doi.org/10.1186/s40655-017-0021-8.

Malmi, Eric, Pyry Takala, Hannu Toivonen, Tapani Raiko, and Aristides Gionis (2016). “DopeLearning: A Computational Approach to Rap Lyrics Generation”. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 195–204.  http://doi.org/10.1145/2939672.2939679.

Mayer, Rudolf, Robert Neumayer, and Andreas Rauber (2008). “Rhyme and Style Features for Musical Genre Classification by Song Lyrics”. In: ISMIR 2008, Proceedings of the 9th International Conference on Music Information Retrieval. Ed. by Juan Pablo Bello, Elaine Chew, and Douglas Turnbull. https://archives.ismir.net/ismir2008/2008_ISMIR_Proceedings.pdf (visited on 01/06/2026).

McCurdy, Nina, Vivek Srikumar, and Miriah Meyer (2015). “Rhymedesign: A Tool for Analyzing Sonic Devices in Poetry”. In: Proceedings of the Fourth Workshop on Computational Linguistics for Literature. Ed. by Anna Feldman, Anna Kazantseva, Stan Szpakowicz, and Corina Koolen. Association for Computational Linguistics, 12–22.  http://doi.org/10.3115/v1/W15-0702.

McDonald, Peter (2012). Sound Intentions: The Workings of Rhyme in Nineteenth-Century Poetry. Oxford University Press.

Moretti, Franco (2000). “Conjectures on World Literature”. In: New Left Review 2 (1), 54–68. https://newleftreview.org/issues/ii1/articles/franco-moretti-conjectures-on-world-literature (visited on 01/06/2026).

Pérez Pozo, Álvaro, Javier de la Rosa, Salvador Ros, Elena González-Blanco, Laura Hernández, and Mirella De Sisto (2022). “A Bridge too Far for Artificial Intelligence?: Automatic Classification of Stanzas in Spanish Poetry”. In: Journal of the Association for Information Science and Technology 73 (2), 258–267.  http://doi.org/10.1002/asi.24532.

Plamondon, Marc R. (2006). “Virtual Verse Analysis: Analysing Patterns in Poetry”. In: Literary and Linguistic Computing 21 (suppl_1), 127–141.  http://doi.org/10.1093/llc/fql011.

Plecháč, Petr (2018). “A Collocation-Driven Method of Discovering Rhymes (in Czech, English, and French Poetry)”. In: Taming the Corpus: From Inflection and Lexis to Interpretation. Ed. by Masako Fidler and Václav Cvrček. Springer International Publishing, 79–95.  http://doi.org/10.1007/978-3-319-98017-1_5.

Plecháč, Petr and Robert Kolár (2015). “The Corpus of Czech Verse”. In: Studia Metrica et Poetica 2 (1), 107–118.  http://doi.org/10.12697/smp.2015.2.1.05.

Poole, Josua (1657). The English Parnassus: Or, A Helpe to English Poesie. Tho. Johnson.

Pope, Alexander (1831). The Poetical Works of Alexander Pope. 2. William Pickering.

Popescu-Belis, Andrei, Alex R. Atrio, Bastien Bernath, Étienne Boisson, Teo Ferrari, Xavier Theimer-Lienhardt, and Giorgos Vernikos (2023). “GPoeT: a Language Model Trained for Rhyme Generation on Synthetic Data”. In: Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature. Ed. by Stefania Degaetano-Ortlieb, Anna Kazantseva, Nils Reiter, and Stan Szpakowicz. Association for Computational Linguistics.  http://doi.org/10.18653/v1/2023.latechclfl-1.2.

Prins, Yopie (2008). “Historical Poetics, Dysprosody, and the Science of English Verse”. In: PMLA 123 (1), 229–234.  http://doi.org/10.1632/pmla.2008.123.1.229.

Reddy, Sravana and Kevin Knight (2011). “Unsupervised Discovery of Rhyme Schemes”. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Ed. by Dekang Lin, Yuji Matsumoto, and Rada Mihalcea. Association for Computational Linguistics, 77–82. https://aclanthology.org/P11-2014/ (visited on 01/06/2026).

Reddy, Sravana and Morgan Sonderegger (2011). The Chicago Rhyming Poetry Corpus. https://github.com/sravanareddy/rhymedata/ (visited on 01/06/2026).

Rickert, William E. (1978). “Rhyme Terms”. In: Style 12 (1), 35–46. https://www.jstor.org/stable/45109024 (visited on 01/06/2026).

Sonderegger, Morgan (2011). “Applications of Graph Theory to an English Rhyming Corpus”. In: Computer Speech & Language 25 (3), 655–678.  http://doi.org/10.1016/j.csl.2010.05.005.

The CMU Pronouncing Dictionary (2026). http://www.speech.cs.cmu.edu/cgi-bin/cmudict (visited on 01/06/2026).

Walker, John [1775] (1824). A Rhyming Dictionary; Answering, at the Same Time, the Purposes of Spelling and Pronouncing the English Language, on a Plan not Hitherto Attempted. William Baynes and Son.

Walker, John [1775] (1894). The Rhyming Dictionary of the English Language: in which the Whole Language is Arranged According to its Terminations. Ed. by John Longmuir. Rev. and enlarged. George Routledge and Sons.