Linked Data for modelling and replicating the knowledge production process in data-driven humanities research

Stapel, R.J. and Zandhuis, I. (2024) ‘Linked Data for modelling and replicating the knowledge production process in data-driven humanities research’, Digital Scholarship in the Humanities, 39, pp. i1–i8. Available at: https://doi.org/10.1093/llc/fqae038.

The steps from primary source to scholarly claim typically follow a ‘breadcrumb trail’ of wide-ranging decisions by one or more scholars. For replication studies, these decisions must be identified and made as explicit as possible so that the results can be faithfully reproduced. We use a replication study of premodern population estimates as a use case to gain more insight into the kinds of scholarly practices that need to be made explicit, and to think about possible solutions for modelling this process with future replicability in mind. We argue that Linked Data is the logical and flexible technique for modelling such a replication study. For our use case, it allows us to capture both the data itself and the individual decisions made to arrive at a scientifically derived population estimate based on, for example, geographical phenomena or source criticism. In this article, we set an agenda for the use of Linked Data in data-driven humanities research by exploring the benefits, opportunities, and challenges associated with using Linked Data to conduct a replication study in the humanities (and to promote the reproducibility of humanities research more generally). We identify three key aspects that can benefit from the flexibility offered by Linked Data: increasing the trustworthiness of quantitative studies; facilitating the reproducibility of future studies; and facilitating the interpretation of replication results.