Class 3 (English 238 – Fall 2019)

Class Business

Plan for class:

  • Tools for DH work
  • Discussion

Tools for DH Work

“Data is . . .”

  • Gitelman and Jackson, “Introduction” to “Raw Data” is an Oxymoron:

The word data has become what is called a mass noun, so it can take a singular verb. Sentences that include the phrase “data is …’ are now roughly four times as common (on the web, at least, and according to Google) as those including “data are …’ despite countless grammarians out there who will insist that data is a plural. (8)

  • Daniel Rosenberg, “Data before the Fact”:

Already in the eighteenth century, stylists argued over whether the word was singular or plural, and whether a foreign word of its ilk belonged in English at all. In Latin, data, is always plural, but in English, even in the eighteenth century, common usage has allowed “data” to function either as a plural or as a collective singular. (18-19)

“Data is given . . .”

  • Daniel Rosenberg, “Data before the Fact”:

The word “data” comes to English from Latin. It is the plural of the Latin word datum, which itself is the neuter past participle of the verb dare, to give. A “datum” in English, then, is something given in an argument, something taken for granted. (18) [cf., “Analyse des données”]

In these early years, the term “data” was still employed, especially in the realm of mathematics, where it retained the technical sense that it has in Euclid, as quantities given in mathematical problems, as opposed to the quaesita, or quantities sought, and in the realm of theology, where it referred to scriptural truths — whether principles or facts — that were given by God and therefore not susceptible to questioning. In the seventeenth century, in theology, one could already speak of “historical data,” but “historical data” referred to precisely the sorts of information that were outside of the realm of the empirical. These were the God-given facts and principles that grounded the historian’s ability to determine the quaesita of history. (19)

It is tempting to want to give data an essence, to define what exact kind of fact data is. But this misses the most important aspect of the term, and it obscures why the term became so useful in the mid-twentieth century. Data has no truth. Even today, when we speak of data, we make no assumptions at all about veracity. Electronic data, like the data of the early modern period, is given. It may be that the data we collect and transmit has no relation to truth or reality whatsoever beyond the reality that data helps us to construct. (37)

  • David Donoho, “50 Years of Data Science”:

Generative modeling, in which one proposes a stochastic model that could have generated the data, and derives methods to infer properties of the underlying generative mechanism. This roughly speaking coincides with traditional academic statistics and its offshoots.

Predictive modeling, in which one constructs methods which predict well over some given data universe—that is, some very specific concrete dataset. This roughly coincides with modern Machine Learning, and its industrial offshoots.

  • Matthew L. Jones, “How We Became Instrumentalists (Again): Data Positivism since World War II”

By the late 1960s, most researchers in pattern recognition ultimately cared little whether neural networks in any way replicated human cognition; the networks were tools for prediction, not means for understanding the brain: “Whatever success we have had [has] been the result of an effective transformation of a perceptionrecognition problem into a classification problem.” (677)

By the late 1990s, a growing literature began to show, Leo Breiman argued, that “combining a multiple set of predictors, all constructed using the same data, can lead to dramatic decreases in test error.” This predictive success came at great cost. “At the end of the day, what we are left with is an almost inscrutable prediction function combining many different predictors. But the resulting predictor can be quite accurate.” Despite their epistemologically questionable status, such inscrutable combinations predict better. A bevy of techniques with snappy names emerged to create such ensembles: bagging, boosting, arcing, etc. (683)

  • Gitelman and Jackson, “Introduction” to “Raw Data” is an Oxymoron:

One omission, certainly, which this Introduction accentuates with its brief attention to English usage and the history of concepts, is any account of non-Western contexts or intercultural conjunctions that might illuminate and complicate data past and present. (11)

spacerArrowhead right

Priestley vs. Drucker

Joseph Priestley, A Chart of Biography (1765)

Johanna Drucker

Bowker and Star

Seventeenth-century mortality table (in Bowker and Star, Sorting Things Out, pp. 22-23)
Seventeenth-century mortality table (in Bowker and Star, Sorting Things Out, pp. 22-23)

Final thoughts

Sister Miriam Joseph, “What Are the Liberal Arts?” (2002)

Sister Miriam Joseph, What are the Liberal Arts, Figure 3
Sister Miriam Joseph, What are the Liberal Arts, Figure 3
Sister Miriam Joseph, What are the Liberal Arts, Figure 4
Sister Miriam Joseph, What are the Liberal Arts, Figure 4