“Data Moves: Libraries and Data Science Workflows”

Citation:”Data Moves: Libraries and Data Science Workflows.” Libraries and Archives in the Digital Age. Ed. Susan Mizruchi. Cham: Palgrave Macmillan, 2020: 211-219.

  • Abstract: Library-based collections and repositories are today advancing well beyond accumulating resources in digital form for the purposes of searching, reading, and other primary access. New advances toward treating collections as “always already data” facilitate next-generation computational uses of digitized materials—for example, treating collections as datasets for advanced datamining analysis.
    In considering how library collections can serve as data for a variety of data ingestion, transformation, analysis, reproduction, presentation, and circulation purposes, it may be useful to compare examples of data workflows across disciplines to identify common data-analysis “moves” as well as points in the data trajectory that are especially in need of library support because they are for a variety of reasons brittle. Drawing on the precedent of so-called in silico science—which has had a ten-year start on developing methods and standards for tracking the provenance of data, annotating and visualizing data analysis workflows for reproducibility, and comparing data workflows in different fields—Liu argues that other disciplines such as the humanities and social sciences can exploit today’s library data collections in similar ways. The goal is twofold: first, open, shareable, and reproducible data scholarship, and second, higher or meta-level analysis of such scholarship. For example, might methods for comparing data workflows in the sciences (seeing, e.g., how astrophysics compares with medical science in using data) be extended across the disciplines to the digital humanities, digital arts, and digital social sciences? Beyond borrowing science data paradigms for other disciplines, Liu also thinks in the reverse direction. He draws on the twentieth-century tradition of literary and ethnographical analysis—for example, the idea of the narrative “motif” or “move” (in the Russian: mov)—to suggest that humanities and social science approaches to data workflows are just as crucial as scientific ones. After all, however one analyzes data (and in which field), one ultimately has to tell the story of that workflow and its results. That puts the problem squarely in the domain of narrative motifs and moves, which Liu argues can be matched to data workflow moves.