Infrastructures and Practices for Reproducible Research in Geography, Geosciences, and GIScience

Nüst, Daniel

Thesis (doctoral or post-doctoral)

Abstract

Reproducibility of computational research, i.e., research based on code and data, poses enormous challenges to all branches of science. In this dissertation, technologies and practices are developed to increase reproducibility and to connect it better with the process of scholarly communication with a particular focus on geography, geosciences, and GIScience. Based on containerisation, this body of work creates a platform that connects existing academic infrastructures with a newly established executable research compendium (ERC). It is shown how the ERC can improve transparency, understandability, reproducibility, and reusability of research outcomes, e.g., for peer review, by capturing all parts of a workflow for computational research. The core part of the ERC platform is software that can automatically capture the computing environment, requiring authors only to create computational notebooks, which are digital documents that combine text and analysis code. The work further investigates how containerisation can be applied independent of ERCs to package complex workflows using the example of remote sensing, to support data science in general, and to facilitate diverse use cases within the R language community. Based on these technical foundations, the work concludes that functioning practical solutions exist for making reproducibility possible through infrastructure and making reproducibility easy through user experience. Several downstream applications built on top of ERCs provide novel ways to discover and inspect the next generation of publications. To understand why reproducible research has not been widely adopted and to contribute to the propagation of reproducible research practices, the dissertation continues to investigate the state of reproducibility in GIScience and develops and demonstrates workflows that can better integrate the execution of computational analyses into peer review procedures. We make recommendations for how to (re)introduce reproducible research into peer reviewing and how to make practices to achieve the highest possible reproducibility normative, rewarding, and, ultimately, required in science. These recommendations are rest upon over 100 GIScience papers which were assessed as irreproducible, the experiences from over 30 successful reproductions of workflows across diverse scientific fields, and the lessons learned from implementing the ERC. Besides continuing the development of the contributed concepts and infrastructure, the dissertation points out broader topics of future work, such as surveying practices for code execution during peer review of manuscripts, or reproduction and replication studies of the fundamental works in the considered scientific disciplines. The technical and social barriers to higher reproducibility are strongly intertwined with other transformations in academia, and, therefore, improving reproducibility meets similar challenges around culture change and sustainability. However, we clearly show that reproducible research is achievable today using the newly developed infrastructures and practices. The transferability of cross-disciplinary lessons facilitates the establishment of reproducible research practices and, more than other transformations, the movement towards greater reproducibility can draw from accessible and convincing arguments both for individual researchers as well as for their communities.

Details zur Publikation

Release year: 2022
Language in which the publication is writtenEnglish
Event: Zenodo
Link to the full text: https://zenodo.org/record/4768096/files/PhD%20Daniel%20N%C3%BCst%20-%20WWU%20M%C3%BCnster%20-%202022.pdf?download=1