Class Business
- Reading for Next 2 Classes
- Thursday (April 25)
- David M. Blei, “Probabilistic Topic Models” (2013) — (read only to end of p. 79, before the math begins)
- Ted Underwood, “Topic Modeling Made Just Simple Enough” (2012)
- Tuesday (April 30)
- Andrew Goldstone and Ted Underwood, “The Quiet Transformations of Literary Studies: What Thirteen Thousand Scholars Could Tell Us” (2014)
- Andrew Goldstone, Topic model of 100 years of literary criticism journals (visualized in Goldstone’s Dfr-browser interface)
- Thursday (April 25)
- Due April 30: Project Concept Proposal 1 (Text Analysis Project Proposal)
- Questions?
- Your initial ideas for materials you want to work on?
Practicum 3: Text Analysis Exercise
Related Text Analysis Tools
- Voyant Tools (Stéfan Sinclair & Geoffrey Rockwell)
- Lexos (Project leads: Michael Drout, Scott Kleinman, and Mark LeBlanc)
-
Python tools — e.g.,
- Natural Language Toolkit
- spaCy NLP
-
See Steven Bird, Ewan Klein, and Edward Loper, Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit (2019)
- Example tutorial: Datacamp, “NLTK Sentiment Analysis Tutorial for Beginners” (2019)
- Melanie Walsh, Introduction to Cultural Analytics & Python (2021)
- R tools
- See Matthew L. Jockers, Text Analysis with R for Students of Literature (2014)
- Video & Film Analysis Tools
- E.g., Cinemetrics (Frederic Brodbeck’s project for “measuring and visualizing movie data, in order to reveal the characteristics of films and to create a visual ‘fingerprint’ for them. Information such as the editing structure, color, speech or motion are extracted, analyzed and transformed into graphic representations so that movies can be seen as a whole and easily interpreted or compared side by side”; includes downloadable code for Python script tools used to create the metrics)
Text Analysis
Alan’s slides on relation between text-encoding and text-anaylsis (using Yin Liu’s table of paradigms of text in “Ways of Reading, Models for Text, and the Usefulness of Dead People”)
Ryan Heuser and Long Le-Khac, “A Quantitative Literary History of 2,958 Nineteenth-Century British Novels: The Semantic Cohort Method” (2012)
The general methodological problem of the digital humanities can be bluntly stated: How do we get from numbers to meaning?… In our research we’ve found it useful to think about this problem through two central terms: signal and concept. We define a signal as the behavior of the feature actually being tracked and analyzed. The signal could be any number of things that are readily tracked computationally…. A concept, on the other hand, is the phenomenon that we take a signal to stand for, or the phenomenon we take the signal to reveal. It’s always the concept that really matters to us. (Postscript)
General interpretive method of text analysis
Discussion of Text Analysis (continued)
Indeed, the goal of this essay is to begin the hard work of developing a critical version of distant reading appropriate for the analysis of race and racial discourse in literature. We envision a reflexive method that is able to identify its own elisions while also pointing to new insights and opportunities for research.(72)