Class Business
- Readings for Next Series of Classes
- For next week (Tuesday)
- Andrew Goldstone et al., Topic model of 40 years of the Signs journal of “Women in Culture and Society” (visualized in Goldstone’s Dfr-browser interface)
- Helpful in learning how to work with Dfr-browser is the guide page on “Interpreting the topic model of Signs“
- Andrew Piper, excerpt from “Topoi (Dispersion),” in Enumerations: Data and Literary Study (2019)
— read only pp. 66–75 [available on course Canvas site]
- Andrew Goldstone et al., Topic model of 40 years of the Signs journal of “Women in Culture and Society” (visualized in Goldstone’s Dfr-browser interface)
- For next week (Thursday):
- Background Readings in Linguistics Theory
- Ferdinand de Saussure, Course in General Linguistics (1959) – read pp. 114-117, 123-27.
- J. [John] R. [Rupert] Firth, “A Synopsis of Linguistic Theory, 1930-55” (Oxford: Blackwell, 1957) – read sections III-IV (pp. 7-13) [available on course Canvas site]
Due next Thursday, Oct. 27th: Topic Modeling Exercise
- Background Readings in Linguistics Theory
- For next week (Tuesday)
Text Analysis Project Proposals
Student proposals
- MonkeyLearn, “5 Sentiment Anlysis Examples in Business”
- Chef Watson
- Joab Jackson, “IBM Watson Cooks Up Some New Dishes” (2014)
- Rochelle Bilow (Bon Appétit), “We Put a Computer in Charge of Our Test Kitchen for a Day, and Here’s What Happened” (2014)
- See also these Bon Appétit stories.
The Idea of Topic Modeling
David M. Blei, “Probabilistic Topic Models” (2013)
Imagine searching and exploring documents based on the themes that run through them. We might “zoom in” and “zoom out” to find specific or broader themes; we might look at how those themes changed through time or how they are connected to each other. Rather than finding documents through keyword search alone, we might first find the theme that we are interested in, and then examine the documents related to that theme. (77)
[Note]: Indeed calling these models “topic models” is retrospective — the topics that emerge from the inference algorithm are interpretable for almost any collection that is analyzed. The fact that these look like topics has to do with the statistical structure of observed language and how it interacts with the specific probabilistic assumptions of LDA. (78n.)

Ted Underwood, “Topic Modeling Made Just Simple Enough” (2012)
Of course, we can’t directly observe topics; in reality all we have are documents. Topic modeling is a way of extrapolating backward from a collection of documents to infer the discourses (“topics”) that could have generated them. (The notion that documents are produced by discourses rather than authors is alien to common sense, but not alien to literary theory.)
As a literary scholar, I find that I learn more from ambiguous topics than I do from straightforwardly semantic ones. When I run into a topic like “sea,” “ship,” “boat,” “shore,” “vessel,” “water,” I shrug. Yes, some books discuss sea travel more than others do. But I’m more interested in topics like this:
![]()
A topic like this one is hard to interpret. But for a literary scholar, that’s a plus. I want this technique to point me toward something I don’t yet understand, and I almost never find that the results are too ambiguous to be useful. The problematic topics are the intuitive ones — the ones that are clearly about war, or seafaring, or trade. I can’t do much with those.
- WhatEvery1Says (WE1S) Project. Home Page. 2022, https://we1s.ucsb.edu/.
Discussion of Topic Modeling (continued)
Andrew Goldstone and Ted Underwood, “The Quiet Transformations of Literary Studies: What Thirteen Thousand Scholars Could Tell Us” (2014)
- Andrew Goldstone, Topic model of 100 years of literary criticism journals (visualized in Goldstone’s Dfr-browser interface)
Whether numbers add subtlety or flatten it out depends on how you use them, and a simple graph of word frequency like figure 1 is not necessarily the most nuanced approach. The graph is hard to interpret in part because these words have been wrenched out of context. Five might count editions or it might count the length of five long winters. The meanings of words are shifting and context dependent. For this reason, it’s risky to construct groups of words that we imagine are equivalent to some predetermined concept. A group of numbers may be relatively uncontroversial, but a group of, say, “philological terms” would be pretty dubious. If historicism tells us anything, it’s that the meaning of a term has to emerge from the way it’s used in a specific historical context.
In recent years, researchers in computer science have devised exploratory techniques that can identify groups of words with more sensitivity to the discursive context. (360)
The aim of topic modeling is to identify the thematic or rhetorical patterns that inform a collection of documents….
The topics of topic modeling are not simply themes; they might also reflect rhetorical frames, cognitive schemata, or specialized idioms (of the sort that Bakhtin conceived as mixed together in social heteroglossia); if they are capacious enough, topics may even indicate a discourse in Foucault’s sense. (361)
Topics are interestingly slippery objects that require interpretation. Violence might be a reasonable one-word summary of topic 80, but it isn’t a complete description. The most common word in the topic, after all, is power—a somewhat broader concept. The topic also includes strange details, like what appear to be the names of body parts: blood, head, hands, face, and eyes. There is a coherence to this list, but it may not be the kind of coherence we ordinarily associate with the term topic. (363)
This change of scale made possible by the computer does not free us from the need for an interpretive methodology. Ours is drawn both from literary hermeneutics and from the methodology of the social sciences…. Quantitative approaches to literary history like ours join in the wider renewal of interest in the sociology of literature. What is best in these new approaches is a shared determination to adapt concepts and techniques from the social sciences — including quantitative techniques — in order to enhance the nuance and precision of our interpretations of literary history. (366)
Quantitative methods may be especially useful for characterizing long, gradual changes, because change of that sort is otherwise difficult to grasp. But the methods we used in this article don’t prescribe a particular scale of historical analysis; on the contrary, one of their advantages is an ability to reveal overlapping phenomena on different scales, or even transformations of the pace of change itself. (379)
