Skip to Main Content

Research Data Management

New to the idea of Research Data Management? This guide will introduce you to the basics.

Quick Guides

Subject Librarian

Profile Photo
Susie Wilson
Contact:
250-960-6607

Documentation Best Practices

Proper documentation increases the accessibility and usability of your research data for you and your research team as well as future users. The following are some best practices to follow when documenting your research data.

Organization

File Naming and Versioning

Keep file names shortdescriptive, and use consistent conventions. Here are some general guidelines and examples to help:

  • Agree upon a file naming convention with your team when planning data management
  • Dates: Always use YYYYMMDD format for dates. This format is easiest to read and sort in chronological order
  • Use a short, unique, and descriptive identifier such as an acronym of your project name or grant #. This will make your files easy to find.
    • Add key term summarizing the content of the file to the file name such as GrantProposal, Questionnaire, etc.
  • Use _ to delimit and avoid special characters as different computer systems will handle them differently
  • Keep track of versions by either changing the date and time or numbering system such as v01 or v01-01 ... v01-03 ... v03-02 to track file versions within different stages of the project.
    • Use leading 0s so your computer can sort the versions in chronological order
  • Where appropriate you may also wish to include researcher initials or location information in the file name
  • Try to keep file hierarchies shallow: 
    • no more than 4 levels deep
    • try to limit the number of files to around 10 files per folder

Examples

DO: NBCFH_GrantProposal_20170228_v01-04.docx

DON'T: finaldraft1 or finalfinaldraft3

Resources


File Formats

Any file format can be uploaded to the Scholars Portal Dataverse however, to ensure the longevity, accessibility, and usability of your data, open and non-proprietary file formats are recommended.

File Type Preferred Formats
Databases XML, CSV
Container and Compressed files

ZIP*, TAR, GZIP
*Note: Compressed files in .zip format are unpacked automatically when uploaded to Scholars Portal Dataverse and will preserve file structure and/or hierarchies

Images TIFF, PNG, JPG
Sound BWF, AIFF, FLAC, MP3
Text TXT, CSV, PDF/A, ASCII, EPUB
Video AVI (uncompressed), MOV (uncompressed), MPEG-4
Spreadsheets CSV
Medical Images DICOM
Geospatial ESRI, SHP, GeoTiff, DBF
Statistical analysis SPSS (.por), R, STATA

For more file format guidance please contact the Data Services Librarian.

Metadata

Metadata describes data like a label describes the contents of a container. A label is not strictly necessary but makes the contents of the container identifiable and discoverable. Metadata does the same for your data as well as making it citable and reusable.

Basic required and recommended fields:

  • Title: full title by which the dataset is known
  • Name(s): list the name(s) of the person or organization responsible for creating the work
  • Contact Information: name and email address for the main contact for the dataset
  • Description: summary of purpose, nature, and scope of the dataset
  • Subject: broad domain-specific subject category
  • Date(s): including the following where applicable
    • Date of collection
    • Time period covered
    • Production date: when dataset is finalized and ready for analysis/distribution
    • Deposit date: when dataset is deposited
    • Distribution date: when dataset is made available for distribution
    • Publication date: when dataset is made public in Dataverse
  • Keywords
  • Related Publication: publication for which dataset was created/used
  • Location: for geospatial data

The above list is based on the "A Brief Guide: Dataverse Metadata" produced by the Metadata Subgroup of the Portage Dataverse North Working group.

Metadata Standards

Some disciplines have specific metadata standards and schemas. Browse the Disciplinary Metadata standards via the Digital Curation Centre (DCC) to find a metadata standard and controlled vocabulary lists that best suits your research.

ReadMe.txt Files

In addition to metadata, ReadMe files allow you to further document and describe your dataset to future users. ReadMe files are usually text files to prolong the life of the file and ensure its accessibility. There are no standards for readme files but should include the above metadata along with:

  • Data and file overview for each file name including a short description of what data it contains and when the file was created
  • Licenses or restrictions placed on the data
  • Methodological information including, description of methods for data collection/generation and processing
  • Data-specific information for each dataset or file (as appropriate), including:
    • Variable list, including full names and definitions of column headings for tabular data
    • Units of measurement
    • Definitions for codes or symbols used to record missing data

Find more information on ReadMe files in the Guide to writing "readme" style metadata by the Research Data Management Service Group at Cornell University.

Resources

FAIR Principles

FAIR Principles

The FAIR Principles are concise and measurable guidelines to ensure that research (meta)data are findable, accessible, interoperable, and reusable. Since their introduction, the FAIR Principles have become a standard to evaluate research data management tools and services and have been widely adopted by funders, publishers, and service providers (Wilkinson et al., 2018). 

Findable

F1. (meta)data are assigned a globally unique and persistent identifier (e.g. DOI)
F2. data are described with rich metadata (defined by R1 below)
F3. metadata clearly and explicitly include the identifier of the data it describes
F4. (meta)data are registered or indexed in a searchable resource

Accessible

A1. (meta)data are retrievable by their identifier using a standardized communications protocol
A1.1 the protocol is open, free, and universally implementable
A1.2 the protocol allows for an authentication and authorization procedure, where necessary
A2. metadata are accessible, even when the data are no longer available

Interoperable

I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
I2. (meta)data use vocabularies that follow FAIR principles
I3. (meta)data include qualified references to other (meta)data

Reusable

R1. meta(data) are richly described with a plurality of accurate and relevant attributes
R1.1. (meta)data are released with a clear and accessible data usage license
R1.2. (meta)data are associated with detailed provenance
R1.3. (meta)data meet domain-relevant community standards

Source: Wilkinson, M. D., Dumontier, M., Aalbersberg, Ij. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., … Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3(1), 160018. https://doi.org/10.1038/sdata.2016.18

CC Attribution (CC BY) image Attribution 4.0 International (CC BY 4.0) except where stated otherwise.