Class 8 (English 197 – Spring 2024)

This is the main course website. There is also a course Canvas site for uploading assignments.

Class Business

Readings for Next Series of Classes


Text Analysis (continued)

Finish discussion of  Richard Jean So & Edwin Roland“Race and Distant Reading” (2020) (go to notes from last class)

The Idea of Topic Modeling

Explanation of topic modeling (Alan's standard basic introduction) (screenshot)

Explanation of topic modeling
(Alan’s standard basic introduction)

David M. Blei“Probabilistic Topic Models” (2013)

Imagine searching and exploring documents based on the themes that run through them. We might “zoom in” and “zoom out” to find specific or broader themes; we might look at how those themes changed through time or how they are connected to each other. Rather than finding documents through keyword search alone, we might first find the theme that we are interested in, and then examine the documents related to that theme. (77)

[Note]: Indeed calling these models “topic models” is retrospective — the topics that emerge from the inference algorithm are interpretable for almost any collection that is analyzed. The fact that these look like topics has to do with the statistical structure of observed language and how it interacts with the specific probabilistic assumptions of LDA. (78n.)

Fleuron icon (small)

Ted Underwood“Topic Modeling Made Just Simple Enough” (2012)

Of course, we can’t directly observe topics; in reality all we have are documents. Topic modeling is a way of extrapolating backward from a collection of documents to infer the discourses (“topics”) that could have generated them. (The notion that documents are produced by discourses rather than authors is alien to common sense, but not alien to literary theory.)

As a literary scholar, I find that I learn more from ambiguous topics than I do from straightforwardly semantic ones. When I run into a topic like “sea,” “ship,” “boat,” “shore,” “vessel,” “water,” I shrug. Yes, some books discuss sea travel more than others do. But I’m more interested in topics like this:
Topic example in Ted Underwood explanation of topic modeling

A topic like this one is hard to interpret. But for a literary scholar, that’s a plus. I want this technique to point me toward something I don’t yet understand, and I almost never find that the results are too ambiguous to be useful. The problematic topics are the intuitive ones — the ones that are clearly about war, or seafaring, or trade. I can’t do much with those.

Fleuron icon (small)

This is the main course website. There is also a course Canvas site for uploading assignments.
css.php