Practicum Instructions for English 197 (Fall 2022)
Digital Humanities: Introduction to the Field
Practicum Assignment Instructions
Due various dates
30% of Final Grade
Course “practicums” are small-scale, hands-on exercises that ask students to experiment at a beginner’s level with the methods and tools of the digital humanities.
Reason for Assignment: The goal is not technical mastery but learning enough about the technologies to think about, and through, their concepts and also to discover which methods and tools might be used in a student’s future research. In many cases, experience gained in the practicums will contribute directly to the discussion of issues in class.
Outputs from Practicums: Outputs should be uploaded on Canvas as a post in the discussion board set up for each practicum. (Posts on the discussion board can include links or attachments–e.g., an attachment of a Word document containing explanatory text, tables, and screenshots.) Posts in these practicum discussion boards will be viewable by all members of the class.
Grading of Practicums: Practicums will be graded cumulatively instead of individually. At about the middle of the quarter, Prof. Liu will provide each student with an assessment of how they have been doing in their practicums so far (e.g., whether a student is doing A or B-level work). Then at the end of the quarter, Prof. Liu will give a cumulative grade for a student’s practicums. The grade will be based on the following criteria:
Google Books Ngram Viewer Exercise
Learn to use the Google Books Ngram Viewer, including at least a few of the “advanced features” explained on the “About Ngram Viewer” page. Then use the Ngram Viewer to explore some topic or issue of interest to you.
Text Encoding Exercise
(A) Simple HTML Exercise (with an added conceptually challenging problem)
Step 1 (hands-on exercise): Go to the W3C Schools HTML Tutorial page and look quickly at the explanations in the first 7 lessons linked in the left sidebar (from “HTML Introduction” to “HTML Paragraphs”). Also selectively look at the explanations for other lessons of interest to you. Be sure to look as well at the explanation of “HTML CSS.” (Students already familiar with HTML and CSS can skip these explanations.)
Then on the home page of the W3C Schools HTML Tutorial page, click on the “Try it Yourself” button to open a two column interactive page where you can write HTML on the left, hit the “Run” button, and see the results rendered on the right. On this page, write HTML to create a simple web page with any content, images, and links you wish (subject, of course, to good taste and copyright laws). The page should include at least the following features:
- Text formatted in basic ways (as headers, bold, italics, etc.)
- Text in paragraph structures
- Text in lists
- A table
- An image
Also experiment with simple CSS (Cascading Style Sheets) to adjust the format/style of various elements on your test page. Students already familiar with HTML and CSS can conduct more advanced exercises if they wish.
Step 2 (conceptual exercise): This is a purely conceptual exercise; you don’t actually have to do it (unless you are interested). Take a look at E. E. Cummings’ well-known poem “r-p-o-p-h-e-s-s-a-g-r” (1935). (If you are interested, you can read an analysis of the poem, also known as “Grasshopper,” here.) Think about what might be the best approach to encoding this in HTML; and be prepared to share your ideas in class discussion. (Note that multiple spaces entered in HTML code renders normally as a single space.)
(B) Imaginary TEI/XML Exercise
In this exercise, you will automatically generate a basic TEI XML encoding of Lincoln’s Gettysburg Address (Bliss copy, 1864) and then make up additional tags to encode something you think is important or of interest to you in the text.
- First download and save this text file to your computer:
lincoln-gettysburg-address-bliss-copy-1864.txt. (Save as a file with a “.txt” extension.)
- Go to the Text Encoding Initiative’s OxGarage Conversion site, and follow this process:
- For “Convert from: ?”, click on “Documents”. Among the options for document formats, choose “Plain Text (.txt).”
- Then after a wait, you will see the prompt “Convert to: ?” appear to the right. Choose among the options “TEI P5 XML Document”.
- When the next set of prompts appear at the top of the screen, click on “Choose File” and select the Lincoln text file where you stored it on your computer.
- Then click on “Convert”, which will generate a TEI/XML encoded version of the Lincoln text and download it to you computer with the file extension
- Open this
.xmlfile in a text editor (such as Notepad, Notepad++, or TextEdit) so that you are prepared to copy and paste from it.
- Next go to the XML Viewer page on the Code Beautify site.
- Copy and paste the content of the Lincoln
.xmlfile you previously created into the “XML Input” column on the left.
- Then click the button in the center of the page for “Beautify Format” to generate a nicely formatted version of the XML in the column at the right.
- Then click on “Download” in the center of the page to download the beautified version of the Lincoln
.xmlfile. (Or just copy-and-paste from the column on the right into a text editor like Notepad, Notepad++, or TextEdit, and save the file as a .xml file.)
- Copy and paste the content of the Lincoln
- Now think about what you would like to encode in the beautified Lincoln
.xmlfile that seems important or interesting, and how to do it. Since this is an imaginary exercise, you do not need to study the TEI guidelines for actual tags designating elements, attributes, etc. (If you are interested, though, you can get a sense of what is possible by browsing the TEI by Example Tutorials, such as the one for encoding prose.) Just make up your own tags. For example, you could make up a tag to use in the first sentence of Lincoln’s speech such as this one (a tag for the element “metaphor” that has attributes for “type” and whatever else you wish): “Four score and seven years ago <metaphor type=”genealogical” subtype=”patriarchal” motive=”call_for_authority”>our fathers</metaphor>….” Encode part or all of the text with one or two kinds of tags of this sort.
Text Analysis Exercise
What you will need for this exercise:
- Antconc program (download)
- A stopwords list (download the following stopwords list and save it as a plain-text “.txt” file on your computer: buckley-salton.txt)
- A long document or set of documents (up to several hundred documents if you wish) that you have access to as plain text files (stored as “.txt” files). [See suggestions for sources of texts below]
Instructions for this exercise:
- Download onto your computer the Antconc text-analysis program (available for Mac, Windows, Linux). Antconc comes as a simple executable file that does not need to be “installed.” You just run the file.
- Note for Mac Users: When you try to open AntConc, there will be a security message that says the app was prevented from opening. In order to get around this, you need to click the Apple icon, go to System Preferences, then Security & Privacy, and (if you haven’t already) change your preferences to Allow apps downloaded from “App Store and identified developers.” Even if these are your settings, the system will prevent AntConc from opening (it’s not an identified developer), but there will be a caption next to the radio buttons that says something along the lines of “AntConc was prevented from opening” and a button labeled “Open Anyway.” Click Open Anyway, and you shouldn’t have any issues running the application after that.
- Find or create a plain-text (.txt) version of a long literary work or collection of works. Possible sources:
- Alan keeps a set of demo text corpora on his DH Toychest site here.
- Tutorials for Antconc:
- Basic Instructions for using Antconc: (click on thumbnails for larger screenshot images)
- Instructions for using a “stopwords” list in Antconc (to filter out common words like “the,” “of,” etc.):
- Download the following stopwords list and save it as a plain-text “txt” file on your computer: buckley-salton.txt
- In AntConc, click on “Tool Preferences” among the tabs at the top. Then follow these steps:
- Instructions for loading a “reference” comparison file(s) for use with Antconc’s “Keywords” function (to see what words are most unique in a text, or have most “keyness,” compared with the reference files):
- Have available a reference comparison file, or multiple files on your computer (must be plain-text “txt” files)
- In AntConc, click on “Tool Preferences” among the tabs at the top. Then follow these steps:
- Now you can start exploring the document file(s) you chose for study using Antconc. The best way to start is to count all the words in the document(s):
Topic Modeling Exercise
- Read through (and, if you wish, try) the lesson plan in Shawn Graham, Ian Milligan, Scott Weingart, “Topic Modeling By Hand” (from The Historian’s Macroscope).
- Experiment with the Topic Modeling Tool, which provides a simplified, GUI front-end for the underlying MALLET topic modeling tool. (See the Topic Modeling Tool’s page linked above for download and operation instructions. Under “Optional Settings” in the program you can load a stopword list. For example, download the following stopwords list and save it as a plain-text “.txt” file on your computer: buckley-salton.txt. Or you can find stopword lists in many languages here and here.)
[Optional, more ambitious exercise: download, install, and experiment with the actual MALLET topic-modeling tool, which runs from the command line. (See The Programming Historian Tutorial “Building a Topic Model with MALLET” for instructions on installing and running MALLET).]
An ideal experiment is to topic model a relatively small collection of texts (e.g., 10 to 100 documents you have extracted as plain text and put in a folder) or a “chunked” plain-text version of a long text (e.g., a novel with separate files for each chapter). If you wish, you can use any of the ready-to-go text collections in the “Demo Corpora” section of Alan’s DH Toychest.
Word Embedding Exercise
- Try the Tensorflow Embedding Projector (by Google AI)
- Go to Tensorflow Embedding Projector.
- Try various experiments. (See this Google AI blog post for some ideas of things you can do.) For example, enter a word to explore (in the search field at the top right) and click on one of the related words that appears in the list labeled “Nearest points in the original space.” Then click on “Isolate 101 points” and observe the nearness/farness of words from other words.
- Try the Word Embedding Demo (by S. Bandyopadhyay et al.)
- Go to Word Embedding Demo. (Note: it takes a minute or two for the demo to load, during which you will see the message “downloading model…”)
- Open the “Experiments” link in a different tab. Try at least a few of the suggestions under “Basic Exploration,” “Analogies,” and “Semantic Dimensions.”P.S., If you are interested, the creators of this demo published a research article about it: Saptarashmi Bandyopadhyay et al., “Interactive Visualizations of Word Embeddings for K-12 Students” (2022).
- Optional: For fun, try the Semantris game from Google AI (in its “Arcade” or “Blocks” variants)!
GPT-2 and Craiyon Exercise
- Try InferKit (by Adam King) or Write with Transformer GPT-2 (by Hugging Face). Use one or both of these demos of GPT-2 to generate the beginning of an essay, story, or some other text.
- Note: You can also try sudowrite (by Human++). This is a pay service but allows you to sign up for a temporary free account.
- Try Craiyon (by Craiyon LLC) (Craiyon was formerly called “DALL-E mini”) Use the Craiyon tool to generate one or more images from a text prompt. Try interesting or challenging experiments, such as not just “flying cats” as a prompt, for example, but longer prompts, or prompts combining objective and subjective terms such as “Sad cats wearing gowns, looking up with hope at the far moon and wishing that they could be flying dragons dropping hope blossoms.” See what it is possible, not possible, to do with the tool.
- Note: On the actual DALL-E neural network GPT-3 text-image model from OpenAI, see this blog post.
Image generated using the Craiyon.com tool from the text prompt: “Sad cats wearing gowns, looking up with hope at the far moon and wishing that they could be flying dragons dropping hope blossoms.”
Social Network Analysis Exercise (Part A)
This is an optional practicum for those who have a Twitter account. Use one or both of the following online services to explore networks of words, hashtags, or users on Twitter. (Create a free account on these services; and then allow the association of the service with your Twitter account, which will enable the services to use the Twitter API for their searches.)
Social Network Analysis Exercise (Part B)
- Manually create a small-scale social network analysis of a literary work (or section of a work) by deciding what nodes and edges you want to study. Read through the work and record the nodes and edges. (For example: nodes = characters in a play; edges = characters who speak to each other.)
- Enter the nodes and edges in a spreadsheet using Excel or Google Sheets following the instructions here.
- (Note that for the yEd graphing program mentioned below, you need to create the Nodes and Edges tables on the same sheet in the spreadsheet, not separate sheets However, if you optionally choose to use the Gephi graphing program instead (mentioned further below), the Nodes and Edges tables need to be created in separate sheets that you can import in .csv format into Gephi separately.
- Then use the yEd graphiing program (download here) to open and visualize the spreadsheet.