Subject Guides: Research Data Management: Data Documentation

Quick Guides

RDM: A Brief Guide This brief guide presents a set of good data management practices that researchers can adopt, regardless of their data management skills and levels of expertise.
Dataverse Metatdata This brief guide gets you started with submitting datasets to Dataverse.

Subject Librarian

Susie Wilson

Email Me

Contact:

250-960-6607

Subjects: Research Data Management

Documentation Best Practices

Proper documentation increases the accessibility and usability of your research data for you and your research team as well as future users. The following are some best practices to follow when documenting your research data.

Organization

File Naming and Versioning

Keep file names short, descriptive, and use consistent conventions. Here are some general guidelines and examples to help:

Agree upon a file naming convention with your team when planning data management
Dates: Always use YYYYMMDD format for dates. This format is easiest to read and sort in chronological order
Use a short, unique, and descriptive identifier such as an acronym of your project name or grant #. This will make your files easy to find.
- Add key term summarizing the content of the file to the file name such as GrantProposal, Questionnaire, etc.
Use _ to delimit and avoid special characters as different computer systems will handle them differently
Keep track of versions by either changing the date and time or numbering system such as v01 or v01-01 ... v01-03 ... v03-02 to track file versions within different stages of the project.
- Use leading 0s so your computer can sort the versions in chronological order
Where appropriate you may also wish to include researcher initials or location information in the file name
Try to keep file hierarchies shallow:
- no more than 4 levels deep
- try to limit the number of files to around 10 files per folder

Examples

DO: NBCFH_GrantProposal_20170228_v01-04.docx

DON'T: finaldraft1 or finalfinaldraft3

Resources

Organising data
by UK Data Service
Includes sample screenshot of a well organized file structure.
Organize - File Naming Guidelines
by UBC Library

File Formats

Any file format can be uploaded to the Scholars Portal Dataverse however, to ensure the longevity, accessibility, and usability of your data, open and non-proprietary file formats are recommended.

File Type	Preferred Formats
Databases	XML, CSV
Container and Compressed files	ZIP, TAR, GZIP Note: Compressed files in .zip format are unpacked automatically when uploaded to Scholars Portal Dataverse and will preserve file structure and/or hierarchies
Images	TIFF, PNG, JPG
Sound	BWF, AIFF, FLAC, MP3
Text	TXT, CSV, PDF/A, ASCII, EPUB
Video	AVI (uncompressed), MOV (uncompressed), MPEG-4
Spreadsheets	CSV
Medical Images	DICOM
Geospatial	ESRI, SHP, GeoTiff, DBF
Statistical analysis	SPSS (.por), R, STATA

For more file format guidance please contact the Data Services Librarian.

Metadata

Metadata describes data like a label describes the contents of a container. A label is not strictly necessary but makes the contents of the container identifiable and discoverable. Metadata does the same for your data as well as making it citable and reusable.

Basic required and recommended fields:

Title: full title by which the dataset is known
Name(s): list the name(s) of the person or organization responsible for creating the work
Contact Information: name and email address for the main contact for the dataset
Description: summary of purpose, nature, and scope of the dataset
Subject: broad domain-specific subject category
Date(s): including the following where applicable
- Date of collection
- Time period covered
- Production date: when dataset is finalized and ready for analysis/distribution
- Deposit date: when dataset is deposited
- Distribution date: when dataset is made available for distribution
- Publication date: when dataset is made public in Dataverse
Keywords
Related Publication: publication for which dataset was created/used
Location: for geospatial data

The above list is based on the "A Brief Guide: Dataverse Metadata" produced by the Metadata Subgroup of the Portage Dataverse North Working group.

Metadata Standards

Some disciplines have specific metadata standards and schemas. Browse the Disciplinary Metadata standards via the Digital Curation Centre (DCC) to find a metadata standard and controlled vocabulary lists that best suits your research.

ReadMe.txt Files

In addition to metadata, ReadMe files allow you to further document and describe your dataset to future users. ReadMe files are usually text files to prolong the life of the file and ensure its accessibility. There are no standards for readme files but should include the above metadata along with:

Data and file overview for each file name including a short description of what data it contains and when the file was created
Licenses or restrictions placed on the data
Methodological information including, description of methods for data collection/generation and processing
Data-specific information for each dataset or file (as appropriate), including:
- Variable list, including full names and definitions of column headings for tabular data
- Units of measurement
- Definitions for codes or symbols used to record missing data

Find more information on ReadMe files in the Guide to writing "readme" style metadata by the Research Data Management Service Group at Cornell University.

Resources

Readme Template
Created by Cornell University's Research Data Management Service Group
Guide to writing "readme" style metadata
Created by Cornell University's Research Data Management Service Group
Creating a README for your dataset
Created by Doug Brigham, UBC

FAIR Principles

The FAIR Principles are concise and measurable guidelines to ensure that research (meta)data are findable, accessible, interoperable, and reusable. Since their introduction, the FAIR Principles have become a standard to evaluate research data management tools and services and have been widely adopted by funders, publishers, and service providers (Wilkinson et al., 2018).

Findable

F1. (meta)data are assigned a globally unique and persistent identifier (e.g. DOI)
F2. data are described with rich metadata (defined by R1 below)
F3. metadata clearly and explicitly include the identifier of the data it describes
F4. (meta)data are registered or indexed in a searchable resource

Accessible

A1. (meta)data are retrievable by their identifier using a standardized communications protocol
A1.1 the protocol is open, free, and universally implementable
A1.2 the protocol allows for an authentication and authorization procedure, where necessary
A2. metadata are accessible, even when the data are no longer available

Interoperable

I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
I2. (meta)data use vocabularies that follow FAIR principles
I3. (meta)data include qualified references to other (meta)data

Reusable

R1. meta(data) are richly described with a plurality of accurate and relevant attributes
R1.1. (meta)data are released with a clear and accessible data usage license
R1.2. (meta)data are associated with detailed provenance
R1.3. (meta)data meet domain-relevant community standards

Source: Wilkinson, M. D., Dumontier, M., Aalbersberg, Ij. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., … Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3(1), 160018. https://doi.org/10.1038/sdata.2016.18

Attribution 4.0 International (CC BY 4.0) except where stated otherwise.