Class Business
Readings for Tuesday (May 7)
- Background Readings in Linguistics Theory
- Ferdinand de Saussure, Course in General Linguistics (1959) – read pp. 114-117, 123-27.
- J. [John] R. [Rupert] Firth, “A Synopsis of Linguistic Theory, 1930-55” (Oxford: Blackwell, 1957) – read sections III-IV (pp. 7-13) [available on course Canvas site]
- Due Tuesday, May 11th: Topic Modeling Exercise
Class Cancelled on Tuesday (May 7)
- Class 11 on the course schedule (May 7) is cancelled.
- We’ll next meet on Thursday (May 9), which is labeled “Class 12” on the course schedule.
- Skip over the readings from Saussure and Firth assigned for Class 11.
- Just jump ahead in the assigned readings. That is, for Thursday May 9 continue with the course schedule by doing the readings for Class 12:
- Luis Serrano, “What Are Word and Sentence Embeddings?” (2023)
- Optional: create an account on the Cohere site mentioned in the article and try the “embedding” function in the Cohere “playground.” (See an example of results.)
- Saptarashmi Bandyopadhyay et al., “Word Embedding Demo: Tutorial” (2022) — Note: The actual interactive demo accompanying this tutorial about word embeddings (or word vectors) is assigned for the second part of Practicum 5.
- Nika Mavrody, Laura B. McGrath, Nichole Nomura, and Alexander Sherman, “Voice” (2021) — Read the “Abstract” and pp. 155-164.
- Luis Serrano, “What Are Word and Sentence Embeddings?” (2023)
- Practicum 4 will now be due Thursday (by class 12, May 9th): Topic Modeling Exercise
- Demo of Topic Modeling Tool
- Practicum 5 due by class 13 (May 14) will now be optional: Word Embedding Exercise
Text Analysis Project Proposals (continued)
Student proposals
- Shakespeare Authorship Research
- R. John Leigh, “A Scientific Approach to the Shakespeare Authorship Question.” SAGE Open, vol. 9, no. 1, Jan. 2019, p. 215824401882346. DOI.org (Crossref), https://doi.org/10.1177/2158244018823465.
- Sierra Adams, “Forensic Linguistics and Authorship Analysis.” Literary Ashland. https://literaryashland.org/?p=10579. Accessed 2 May 2024.
Discussion of Topic Modeling
Andrew Goldstone and Ted Underwood, “The Quiet Transformations of Literary Studies: What Thirteen Thousand Scholars Could Tell Us” (2014)
- Andrew Goldstone, Topic model of 100 years of literary criticism journals (visualized in Goldstone’s Dfr-browser interface)
- Andrew Goldstone et al., Topic model of 40 years of the Signs journal of “Women in Culture and Society” (visualized in Goldstone’s Dfr-browser interface)
-
- John Guillory, Professing Criticism: Essays on the Organization of Literary Study (University of Chicago Press, 2022)
From Goldstone & Underwood,
“The Quiet Transformations”
The aim of topic modeling is to identify the thematic or rhetorical patterns that inform a collection of documents….
The topics of topic modeling are not simply themes; they might also reflect rhetorical frames, cognitive schemata, or specialized idioms (of the sort that Bakhtin conceived as mixed together in social heteroglossia); if they are capacious enough, topics may even indicate a discourse in Foucault’s sense. (361)
Topics are interestingly slippery objects that require interpretation. Violence might be a reasonable one-word summary of topic 80, but it isn’t a complete description. The most common word in the topic, after all, is power—a somewhat broader concept. The topic also includes strange details, like what appear to be the names of body parts: blood, head, hands, face, and eyes. There is a coherence to this list, but it may not be the kind of coherence we ordinarily associate with the term topic. (363)
- Close Reading
- quotes from Cleanth Brooks, “The Heresy of Paraphrase,” 1947)
- Sir Thomas Wyatt, “They Flee From Me” (1557)
Discussion of Topic Modeling (continued)
Andrew Piper, excerpt from “Topoi (Dispersion),” in Enumerations: Data and Literary Study (2019) — read only pp. 66–75
And yet, despite the growing body of work on topic models, no one has stopped to ask the question “What is a topic?,” either in the classical rhetorical sense or in the computational one. If we have this new way of deriving semantic significance from texts at a large scale, how does it fit within the longer philosophical and philological traditions of understanding “topics”? What, in other words, do these lists of words mean? (67)
… I will begin with an overview of the history of thinking about topics, from Aristotle to Renaissance commonplace books to nineteenth-century encyclopedism. Understanding how topic modeling fits within this longer tradition of deriving coherent categories of thought from a surplus of information — where there has always been a surplus from a single human perspective — will help us see how computation has a distinct pre-computational past. (68)
Commonplace Books
- Wikipedia, “Commonplace Book”
- Trisha M., “Common Quotes in a Commonplace” (2016)
- Kelsey McKinney, “Social media: Nothing new? Commonplace books as predecessor to Pinterest” (2015)
- John Locke, page from one of his commonplace books.
- John Locke
- New and Easie Method of Making Common-Place-Books (1686/ 1706)
- Index in one of Locke/s commonplace books (Bodleian Library, Oxford, Ms. Locke f. 18, 110-111)
- Commonplace Corner, “Structuring a Commonplace Book (John Locke Method”
- Alan Walker, “Indexing Commonplace Books: John Locke’s Method” (2001)
- Zettelkasten Method of Note-taking
- Niklas Luhmann’s Zettelkasten method (information about)
- Example Zettelkasten tool: Obsidian
- Vannevar Bush, “As We May Think” (1945)
- Information “Ontologies” and Taxonomies
- Ontotext. “What Are Ontologies?”
- Semantic Web
- RDF (Resource Description Model)
- OWL (Web Ontology Languge)
- Jeff Heflin, “An Introduction to the OWL Web Ontology Language”
- Ontology creation & management tools
- Example: Protégé (and WebProtégé)
David M. Blei, “Probabilistic Topic Models” (2013)
Imagine searching and exploring documents based on the themes that run through them. We might “zoom in” and “zoom out” to find specific or broader themes; we might look at how those themes changed through time or how they are connected to each other. Rather than finding documents through keyword search alone, we might first find the theme that we are interested in, and then examine the documents related to that theme. (77)
[Note]: Indeed calling these models “topic models” is retrospective — the topics that emerge from the inference algorithm are interpretable for almost any collection that is analyzed. The fact that these look like topics has to do with the statistical structure of observed language and how it interacts with the specific probabilistic assumptions of LDA. (78n.)
Ted Underwood, “Topic Modeling Made Just Simple Enough” (2012)
Of course, we can’t directly observe topics; in reality all we have are documents. Topic modeling is a way of extrapolating backward from a collection of documents to infer the discourses (“topics”) that could have generated them. (The notion that documents are produced by discourses rather than authors is alien to common sense, but not alien to literary theory.)
As a literary scholar, I find that I learn more from ambiguous topics than I do from straightforwardly semantic ones. When I run into a topic like “sea,” “ship,” “boat,” “shore,” “vessel,” “water,” I shrug. Yes, some books discuss sea travel more than others do. But I’m more interested in topics like this:
A topic like this one is hard to interpret. But for a literary scholar, that’s a plus. I want this technique to point me toward something I don’t yet understand, and I almost never find that the results are too ambiguous to be useful. The problematic topics are the intuitive ones — the ones that are clearly about war, or seafaring, or trade. I can’t do much with those.
Andrew Piper, excerpt from “Topoi (Dispersion),” in Enumerations: Data and Literary Study (2019) — read only pp. 66–75
And yet, despite the growing body of work on topic models, no one has stopped to ask the question “What is a topic?,” either in the classical rhetorical sense or in the computational one. If we have this new way of deriving semantic significance from texts at a large scale, how does it fit within the longer philosophical and philological traditions of understanding “topics”? What, in other words, do these lists of words mean? (67)
… I will begin with an overview of the history of thinking about topics, from Aristotle to Renaissance commonplace books to nineteenth-century encyclopedism. Understanding how topic modeling fits within this longer tradition of deriving coherent categories of thought from a surplus of information — where there has always been a surplus from a single human perspective — will help us see how computation has a distinct pre-computational past. (68)
Reading topologically provides a new way of attending to the form of language, this time through an attention to the latent quantities of words. It allows us to envision how figure and concentration serve as an essential foundation of human thought, and that their opposites, dispersion and formlessness, are equally essential for the process of intellectual change….Topological reading makes visible the way topics are neither firmly bounded objects stable through time, the transcendentals of philosophical thought, nor clearly evolving genealogical units, the elements of Begriffsgeschichte [history of ideas, or conceptual history] that move coherently from one form to another across linear time.
Studying topics in this way allows us to see how topics ultimately contain a sense of their own otherness, that, like the computational topics used to model them, each topic contains within itself the potentiality of all other topics in the topical space. (70)
… quotation-based models of the commonplace would be replaced in the eighteenth century by new systems of indexing knowledge at the document level, just as the open-ended lists that accompanied topics would be replaced by new forms of compressed, keyword-driven forms such as the modern encyclopedia, which would emerge in the nineteenth century as a staple of the publishing industry. “Amitie” could still be a topical keyword, but in the Encyclopedia of Diderot and D’Alembert it was no longer composed of a list of quotations, but a condensed definition of the thing itself: “the pure exchange of the spirit is simply called acquaintance; the exchange where the heart is concerned is called friendship”….
In this sense, topic modeling can be seen as a natural extension of the document-level system of indexing that became increasingly popular during and after the eighteenth century (and that had its early modern and medieval precursors). (72-73)
At the same time, the disambiguation between topics in topic models is complemented by a greater degree of ambiguity within topics….
The computational topic, by contrast, incorporates that openness within itself. Rather than group statements under a single keyword or phrase, it organizes a heterogeneity of statements under a complex semantic field. It operates according to the principle of many to many. (74)