- Upcoming readings
- A look ahead to the conceptual spreadsheet assignment due for Class 8
- Example of “easy” and “hard” spreadsheets.
- Exporting to CSV format.
- A look ahead to the in-class activity for next Tuesday (Class 9)
Plan for class: Discussion of big data (continued) Discussion of data structures & models
Epigraph for Class
John Keats, “Ode on a Grecian Urn” (1819)
“Beauty is truth, truth beauty,—that is all
Ye know on earth, and all ye need to know.”
V V V V
Jonathan Stuart Ward and Adam Barker, “Undefined By Data: A Survey of Big Data Definitions” (2013)
[The Gartner Report, 2001] proposed a three fold definition encompassing the “three Vs”: Volume, Velocity, Variety…. This definition has since been … expanded upon by … others to include a fourth V: Veracity.
Volume | Velocity | Variety | Veracity
Intel is one of the few organisations to provide concrete figures in their literature. Intel links big data to organisations “generating a median of 300 terabytes (TB) of data weekly.”
Estimates are that the big four [Google, Amazon, Microsoft and Facebook] store at least 1,200 petabytes between them. That is 1.2 million terabytes (one terabyte is 1,000 gigabytes). And that figure excludes other big providers like Dropbox, Barracuda and SugarSync, to say nothing of massive servers in industry and academia.
Locating computers owned by HFT [High-Frequency Trading] firms and proprietary traders in the same premises where an exchange’s computer servers are housed … enables HFT firms to access stock prices a split second before the rest of the investing public. Co-location has become a lucrative business for exchanges, which charge HFT firms millions of dollars for the privilege of “low latency access.”
Daniel Rosenberg, “Data before the Fact”
There are important distinctions here: facts are ontological, evidence is epistemological, data is rhetorical. A datum may also be a fact, just as a fact may be evidence. But, from its first vernacular formulation, the existence of a datum has been independent of any consideration of corresponding ontological truth. When a fact is proven false, it ceases to be a fact. False data is data nonetheless. (18)
It is tempting to want to give data an essence, to define what exact kind of fact data is. But this misses the most important aspect of the term, and it obscures why the term became so useful in the mid-twentieth century. Data has no truth. Even today, when we speak of data, we make no assumptions at all about veracity. Electronic data, like the data of the early modern period, is given. It may be that the data we collect and transmit has no relation to truth or reality whatsoever beyond the reality that data helps us to construct. (37)
Matthew L. Jones, “How We Became Instrumentalists (Again): Data Positivism since World War II” (2018) (pp. 674, 675)
Article in Lawtomated on “Explainable AI” (2020)
Matthew L. Jones, “Querying the Archive: Data Mining from Apriori to PageRank,” p. 314
Data mining concerns databases of very large size—millions or billions of records, usually with elements of high dimensionality (meaning that every record typically comprises a large number of elements). For each record in a retail database, a data mining operation might seek unexpected relationships among the item purchased, the store’s zip code, the purchaser’s zip code, variety of credit card, time of day, date of birth, other items purchased at the same time, even every item viewed, or the history of every previous item purchased or returned. Performing reasonably fast analyses of high dimensional, messy real-world data is central to the identity and purpose of data mining, in contrast to its predecessor fields such as statistics and machine learning….
Something else is needed, something less pure—because it deals with vast impurities of dynamic data, nearly always from a particular business, governmental, or scientific research goal.
Text Structures & Models
Digital Data Structures & Models (selected)
- Relational database (“table,” “record”)
- Key-value pairs (and dictionaries or arrays)
- Associated methods
- Linked Data