Class Business
- Readings for Next Week
- For Tuesday
- Stephen P. Borgatti, et al. (2009), “Network Analysis in the Social Sciences”
alternative source: pre-copyedited manuscript of the article
- Elijah Meeks and Scott B. Weingart, “Introduction to Network Analysis and Representation” — click on the tabs for “centrality, ” “clustering coefficient,” etc. for brief interactive tutorials
- Stephen P. Borgatti, et al. (2009), “Network Analysis in the Social Sciences”
- For Thursday
- Paola Pascual-Ferrá, Neil Alperstein, and Daniel J. Barnett, “Social Network Analysis of COVID-19 Public Discourse on Twitter: Implications for Risk Communication” (2020)
Due Thursday, Nov. 17th: Social Network Analysis Exercise (Part A)
- For Tuesday
Practicum 6: Large Language Models & Text-to-Image Models Exercise
- Adi Robertson, “Professional AI Whisperers Have Launched a Marketplace for DALL-E Prompts” (2022)
Student outputs
- Jake Houser
- Emily Franklin
- Lorna Kreusel
- Stella Jia
- [TBD]
Thinking With / Thinking about Large Language Models
Minh Hua and Rita Raley, “Playing With Unicorns: AI Dungeon and Citizen NLP” (2020)
If a complete mode of understanding is as-yet unachievable, then evaluation is the next best thing, insofar as we take evaluation, i.e. scoring the model’s performance, to be a suitable proxy for gauging and knowing its capabilities. (link)
In this endeavor, the General Language Understanding Evaluation benchmark (GLUE), a widely-adopted collection of nine datasets designed to assess a language model’s skills on elementary language operations, remains the standard for the evaluation of GPT-2 and similar transfer learning models…. Especially striking, and central to our analysis, are two points: a model’s performance on GLUE is binary (it either succeeds in the task or it does not)…. But if the training corpus is not univocal — if there is no single voice or style, which is to say no single benchmark — because of its massive size, it is as yet unclear how best to score the model.(link)
Our research questions, then, are these: by what means, with what critical toolbox or with which metrics, can AID [AI Dungeon], as a paradigmatic computational artifact, be qualitatively assessed, and which communities of evaluators ought to be involved in the process? (link)
AID, as an experiment with GPT-2, provides a model for how humanists might more meaningfully and synergistically contribute to the project of qualitative assessment going forward…. (link)
Our presupposition … is that it is not by itself sufficient to bring to bear on the textual output of a machine learning system the apparatus of critical judgment as it has been honed over centuries in relation to language art as a putatively human practice. What is striking even now is the extent to which humanistic evaluation in the domain of language generation is situated as a Turing decision: bot or not. We do not however need tales of unicorns to remind us that passable text is itself no longer a unicorn. And, as we will suggest, the current evaluative paradigm of benchmarking generated text samples — comparing output to the target data to assess its likeness — falls short when the source for generated samples is neither stable nor fully knowable. (link)

Emily M. Bender et al., “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?” (2021)
- Douglass Bakkum, Philip Gamblen, Guy Ben-Ary, Zenas Chao, and Steve Potter, “MEART: The Semi-Living Artist” (2007).
- Harold Cohen’s “Aaron”
- Margaret A. Boden, The Creative Mind: Myths and Mechanisms, 2nd ed. (1990/2004) (PDF)
