Data Types:
Various kinds of research projects generate and collect different kinds of data. Data can be easily categorized into these four categories:
Organizing your data into various files and file directories is a very important part of data management; it saves you time locating a particular file at a later date. Overall, it is best to keep your file organization clear, descriptive, and unique with a documented naming convention. Unique, but clear file descriptions of the file contents aid precise file identification and discovery. We will consider the following things when it comes to data/file organization
Directory Structure/Folder Naming Conventions:
The top level folder or directory should have the following descriptors and folder names should be kept under 32 characters
Folder Hierarchy Example: [Project]/[Experiment]/[Instrument Used]
FOLDER SUBSTRUCTURE - The folders/directories within the substructure should be split according to a particular theme; e.g. each folder may contain a run of an experiment or a different version of a particular dataset.
File Naming:
File names should give people a meaningful context for the named files and people should be able to identify and distinguish similar files from one another. In general, here are some key descriptors that you should consider when deciding on your file names:
File Naming Tips:
Here are some popular tips on file naming in general:
README Text:
Think about designing a “README.txt” file that explains your naming convention, abbreviations and used codes to accompany your data. For more information on on README.txt files, click here for more information on metadata/README.txt files.
Bulk File Renaming:
Renaming loads of files (i.e. too difficult by hand) is easy with these tools:
Versioning:
When your research is collaborative in nature, keeping track of your changes/versions is very important to managing your data well. It allows you to make changes so that you can go back and retrieve particular versions of your files at a later date instead of having to retrace your steps in order to recreate it. You can manually keep track of your research data by using a sequential numbered system like in the following: e.g. v01, v02, … v99 etc. You can also use version control software like SVN. Try to avoid using confusing labels like “revision, final, final2” etc. and remove obsolete versions
Ideal File Format Types:
Selecting which file format to save your research has long term usage and access implications; for example, if the file format that you use is proprietary its long term accessibility and subsequent usage is unpredictable as it depends on the success and longevity of the business. The reality of technology changing is real and as a result, researchers should plan for both hardware and software obsolescence and should plan to make file format decisions that will ensure long term usage and accessibility. The following are some guidelines to help you in choosing an appropriate file format for your research:
Preferred File Formats:
Oregon State University has a table of other acceptable formats on top of the preferred file formats: