Publishing
Research Data Management

Data Administration - from organization to archiving

Data Organization

You should organize the data collected in a project clearly right from the start. A structured data collection helps you and the research team to find the data easily and quickly at any time during the project.

File location, folder structure and file naming

The selection of a file location depends on the requirements of the project. Questions regarding storage capacity, security standards and access management should play a role. To ensure clarity, data should be managed using a hierarchical folder structure, whereby individual folders should not contain too many files. Furthermore, clear and structured file naming is important. For the simultaneous renaming of several files there are different software programs, e.g. Ant Renamer or GNOME Commander. With the name assignment a balance between a content-specific designation and the length of the file name is to be found. The access rights to the file location as well as the structure of the folder and the conventions for file naming must be agreed among all project members before the start of the project.

Versioning

Versioning logs the changes of a file over the project lifetime. This can be done in an extra file in the same folder, in a higher-level ReadMe file, or directly as part of the file name. The files should be numbered sequentially. In addition to the version number, the save date should also be included in the form YYYYMMDD. There are stand-alone version control programs for software development. For example, the URZ of the TU Chemnitz supports the Gitlab, which enables version control using Git.

Further information on data organization

Forschungsdaten.info: Datenorganisation

Data Documentation

For easier collaboration in the project as well as later reuse, you should document the data sufficiently.

Documentation by description

To enable project partners, third parties or even oneself to reconstruct the data of a research project at a later point in time, all steps in connection with the creation andprocessing of the data should be documented in a comprehensible manner. For this purpose, a higher-level ReadMe file can be used, for example, or the documentation can be done directly in the file (e.g., in software projects). The documentation should be so precise that the data can be recreated at any time using the specified steps or it is comprehensible why and how the data was subsequently adapted. If documentation is provided outside the file, then a format that is as open as possible (e.g. .txt) should be used.

Metadata

In addition to a human-readable documentation, a machine-readable documentation in form of structured metadata should also take. Metadata describe certain characteristic of data, e.g. title of the file, name of the primary researcher/author, project number, time period and location of data collection, etc. There are different types of metadata (e.g., content, technical, administrative) and different standards depending on the scientific discipline or project requirements. By using a uniform metadata standard within a project or within a research discipline, it is possible to link and jointly process different data. There are also interdisciplinary standards such as the Dublin Core (representation in XML/RDF possible).

Electronic Lab Notebook

The individual steps of an experiment can also be documented digitally using an Electronic Lab Notebook (ELN). In this way, ELNs are an alternative to analog lab books. In contrast to the paper form, digital lab notebooks are easier to browse, enable uniform storage of data and can be viewed anywhere. To select a suitable ELN software, the web service ELN Finder can be used.

Further information on data documentation

Data Storage

The selection of suitable storage media as well as the regular creation of backups help you to secure your data during the project duration.

Storage media and locations

There are two main points to consider when storing data. Firstly, it must be decided on which devices (PC, laptop, USB stick) the data can be saved, whereby the storage capacity, the lifetime, the access options and many other points play a role. Secondly, the physical location for the storage is important, i.e. is the storage only local at the workplace or is it also possible to access the data from home or on the road. Through a suitable selection the data should be protected against loss and unauthorized manipulation and access. On an institutional level, the URZ of the TU Chemnitz offers various storage services for your research project.

Backup and data security

Regular backups can minimize the risks of data loss. A suitable backup strategy should at least answer the following two questions. Which data should be backed up in any case? How often are backups made? Furthermore, the 3-2-1 backup rule is a good orientation, i.e. at least 3 copies, in at least 2 different locations and one of them decentralized. At TU Chemnitz, the URZ supports the backup of data using the backup service BAREOS. In addition to data loss, data should also be protected from unauthorized access. The encryption of files and directories plays an important role here, for which again suitable passwords should be used. Furthermore, personal data must also be anonymized.

Further information on data storage

Data archiving

After the project ends, you should select final versions of their data and make them available for long-term reuse. The data should be easy to find and access, whereby good scientific practice recommends archiving for at least 10 years.

Data selection

Before data can be stored in a long-term archive or a repository (special form of an archive), a selection must be made, since on the one hand the storage capacities of archives are limited and on the other hand, depending on the storage volume, costs can be incurred for archiving. In addition to criteria such as verifiability (Are the data necessary to verify research results?), need (Are the data also of long-term interest?) or cost (Would a new data collection be disproportionately expensive?), the quality of the data also plays a role, whereby sufficient documentation as well as the provision of metadata for the data is also a quality criterion.

Persistent identifiers

A persistent identifier that is assigned to a data set and always refers unambiguously and directly to it is essential for the permanent retrieval and citability of data. Research data or digital objects are often assigned DOI. For the unique identification of persons, there is the ORCID. Scientists can use the ORCID to link themselves uniquely to their publications, research data and other products of the research process (e.g. software).

File formats

Already at the beginning of the project work you should use open and long-term stable file formats for saving, as these are suitable for later long-term archiving. PDF/A, ODT or TXT are recommended for texts, CSV, ODS or XLSX for tables and TIFF, PNG or JPEG2000 for images/graphics.

Publishing Research Data Management