New Paper: “Precisely and Persistently Identifying and Citing Arbitrary Subsets of Dynamic Data”

Name	SBA Research Cookie
Provider	SBA Research
Purpose	Saves the settings of the visitors selected in the cookie box.
Cookie name	sba-research-cookie
Cookie runtime	1 year

Name	YouTube
Provider	YouTube
Purpose	Used to unblock YouTube content.
Privacy policy	https://policies.google.com/privacy
Host(s)	google.com
Cookie name	NID
Cookie runtime	6 months

Name	Vimeo
Provider	Vimeo
Purpose	Used to unblock Vimeo content.
Privacy policy	https://vimeo.com/privacy
Host(s)	player.vimeo.com
Cookie name	vuid
Cookie runtime	2 years

December 2, 2021

The paper “Precisely and Persistently Identifying and Citing Arbitrary Subsets of Dynamic Data” was published in Harvard Data Science Review 3 (4).

Lead author was our Key Researcher Andreas Rauber, one of the Co-Authors was our Senior Researcher Tomasz Miksa.

Abstract

Precisely identifying arbitrary subsets of data so that these can be re-produced is a daunting challenge in data- driven science, the more so if the underlying data source is dynamically evolving. Yet, an increasing number of settings exhibit exactly those characteristics: larger amounts of data being continuously ingested from a range of sources (be it sensor values, (on-line) questionnaires, documents etc.), with error correction and quality improvement processes adding to the dynamics.

The Research Data Alliance (RDA) Working Group on Dynamic Data Citation has published 14 recommendations that are centered around time-stamping and versioning evolving data sources and identifying subsets dynamically via persistent identifiers that are assigned to the queries selecting the respective subsets. This paper provides an overview of the recommendations, reference implementations, and pilot systems deployed and then analyse lessons learned from these implementations. This provides a basis for institutions and data stewards considering adding this functionality to their data systems.

Download the paper here

Andreas Rauber, Bernhard Gößwein, Carlo Maria Zwölf, C. Schubert, Florian Wörister, James Duncan, Katharina Flicker, Koji Zettsu, Kristof Meixner, Leslie D. McIntosh, Reyna Jenkyns, Stefan Pröll, Tomasz Miksa, and Mark A. Parsons: Precisely and persistently identifying and citing arbitrary subsets of dynamic data.

In: Harvard Data Science Review 3(4)

Datum: 28.10.2021, DOI: 10.1162/99608f92.be565013

News