Skip to main content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Research Data Management

Learn how to better manage your research team's data throughout all phases of the research lifecycle.

Data Organization

Data Types:

Various kinds of research projects generate and collect different kinds of data.  Data can be easily categorized into these four categories:

  • Observational
    • Usually captured in real time and not in the laboratory
    • Often irreplaceable (i.e. one time event) and not likely reproducible
    • E.g. astronomical observations, sensor readings, sensory observations etc.
  • Experimental
    • Captured in the laboratory under controlled conditions
    • Likely reproducible but can be expensive both in time and costs
    • E.g. gene sequences, microscopy, chromatograms etc.
  • Computational/Simulation
    • Computer generated from test models
    • Likely reproducible if computer inputs are preserved but is expensive both in time and costs
    • E.g. economic models, climate models etc.
  • Derived
    • Produced by existing datasets
    • Likely reproducible but can be expensive both in time and costs
    • E.g. text and data mining, compiled databases etc.

Back to Page Contents

File Organization

Organizing your data into various files and file directories is a very important part of data management; it saves you time locating a particular file at a later date.  Overall, it is best to keep your file organization clear, descriptive, and unique with a documented naming convention.  Unique, but clear file descriptions of the file contents aid precise file identification and discovery.  We will consider the following things when it comes to data/file organization

  • Directory structure and folder naming conventions
  • File naming conventions
  • File versioning
  • File formats
    File directory and hierarchy

 

 

 

 

 

 

 

Back to Page Contents

Directory Structure & Folder Naming Conventions

Directory Structure/Folder Naming Conventions:

The top level folder or directory should have the following descriptors and folder names should be kept under 32 characters

  • Project title
  • Unique identifier
  • Date (yyyy or yyyymmdd)

Folder Hierarchy Example: [Project]/[Experiment]/[Instrument Used]

FOLDER SUBSTRUCTURE - The folders/directories within the substructure should be split according to a particular theme; e.g. each folder may contain a run of an experiment or a different version of a particular dataset.

Back to Page Contents

File Naming Conventions

File Naming:

File names should give people a meaningful context for the named files and people should be able to identify and distinguish similar files from one another. In general, here are some key descriptors that you should consider when deciding on your file names:

  • Experiment or research project namePenning a hierarchy diagram
  • Data type
  • Experimental Conditions (e.g. temperature, lab instrument used etc)
  • Location of research
  • Researcher name/initials
  • Experiment date (or date range)
  • File version number
  • Application-specific codes for 3-letter file extension --- e.g. .mov, .tif etc.
  • Filename Example: [Project]_[Instrument]_[Date]_[Version].[ext]
    GN7799_ G1000_ 180308_v03.tif
    • GN7799 – Experiment/project name
    • G1000 – Instrument used
    • 180308 – Experiment date
    • v03 – File version number
    • .tif – 3-letter file extension

File Naming Tips:

Here are some popular tips on file naming in general:

  • Date should be formatted in the following way (i.e. ISO 8601): YYYYMMDD or YYMMDD
  • File name length shouldn’t be too long as it becomes incompatible with all software types --- leave to 32 characters maximum
  • Avoid special characters usage in file names like: ! @ $ % * () ‘;<>,[]{}”
  • When sequentially numbering files, use leading zeros in order to guarantee that files will sort properly; e.g. 0001, 0002 … 1001 vs. 1,2, … 1001
  • Avoid using spaces in file names; instead, use underscores (e.g. file_name), no separation (e.g. filename), dashes (e.g. file-name), or camel case (e.g. FileName)
     

README Text:

Think about designing a “README.txt” file that explains your naming convention, abbreviations and used codes to accompany your data.  For more information on on README.txt files, click here for more information on metadata/README.txt files.
 

Bulk File Renaming:

Renaming loads of files (i.e. too difficult by hand) is easy with these tools:

Back to Page Contents

File Versioning

Versioning:

When your research is collaborative in nature, keeping track of your changes/versions is very important to managing your data well.  It allows you to make changes so that you can go back and retrieve particular versions of your files at a later date instead of having to retrace your steps in order to recreate it.  You can manually keep track of your research data by using a sequential numbered system like in the following: e.g. v01, v02, … v99 etc.  You can also use version control software like SVN.  Try to avoid using confusing labels like “revision, final, final2” etc. and remove obsolete versions

  • File Versioning Example: DataMgmtNotesv03.txt instead of DataMgmtNotesFinalReally2.txt

Back to Page Contents

File Formats

Ideal File Format Types:

Selecting which file format to save your research has long term usage and access implications; for example, if the file format that you use is proprietary its long term accessibility and subsequent usage is unpredictable as it depends on the success and longevity of the business.  The reality of technology changing is real and as a result, researchers should plan for both hardware and software obsolescence and should plan to make file format decisions that will ensure long term usage and accessibility.  The following are some guidelines to help you in choosing an appropriate file format for your research:

  • Non-proprietary
  • Uncompressed
  • Unencrypted
  • Commonly used by the general research community
  • Open, documented standards
  • Using standard character encodings (ASCII, UTF-8)

Preferred File Formats:

  • Text: XML, PDF/A, HTML, ASCII, UTF-8 (not Word)
  • Tabular Data: CSV (not Excel)
  • Still Images: TIFF, JPEG 2000, PDF, PNG, BMP (not GIF or JPG)
  • Moving Images: MOV, MPEG, AVI, MXF (not Quicktime)
  • Sounds: WAVE, AIFF, MP3, MXF
  • Databases: XML, CSV
  • Statistics: ASCII, DTA, POR, SAS, SAV
  • Containers: TAR, GZIP, ZIP
  • Geospatial: SHP, DBF, GeoTIFF, NetCDF
  • Web Archive: WARC

Oregon State University has a table of other acceptable formats on top of the preferred file formats.

Back to Page Contents

WSU Libraries, PO Box 645610, Washington State University, Pullman WA 99164-5610, 509-335-9671, Contact Us