Patrick Burns (ISAW) Lecture at Taft Center
Description
Please note that this lecture will take place at the Taft Center.
On Thursday (2/20) at 4:30 pm, Dr. Patrick Burns (Institute for the Study of the Ancient World) will be giving a public lecture at the Taft Center for the Humanities. His lecture will address how new developments in digital language processing can be applied to Latin corpora (full abstract below).
The Digital Afterlife of a Dead Language: Or Recovering 34 Billion(!) Latin Words from AI Training Data
Latin has been a perhaps unexpected beneficiary of recently published Large Language Model (LLM) training datasets. For example, an artificial intelligence firm just released a text repository advertising 34 billion Latin tokens—a number over 5,000 times larger than a comprehensive repository of canonically classical Latin like the Perseus Digital Library. The number is so outstandingly large relative to other Latin collections—“unfathomable” in the parlance of AI critique—that it demands a fuller accounting of what it means for humanities scholars to work with such collections, leading us to ask questions like—What novel methods are necessary to explore such a library? How do we handle the massive amount of textual corruption found in these volumes? What tools and models can we build—and build responsibly—with that amount of textual data? In this talk, I will bring in threads from natural language processing, cryptography, and textual criticism, among other disciplines, to redefine philology at scale for our computational, LLM-inflected moment. While the presentation will lead with examples from Latin texts, the talk invites humanities scholars working in any language or literature to reflect on how issues of training data quantity and quality affect their areas of research.