This page provides recommendations for file formats to be deposited in Research Exchange. Recommendations are organized by item type, and are are divided into the following categories:
• Level 1: This category is for formats that WSU Libraries can anticipate will best preserve the content and presentation of your data. These formats provide the highest confidence for long term preservation.
• Level 2: This category is for formats that WSU Libraries anticipates will most likely remain usable and accessible in the long term. However, it is possible that certain elements of the file, such as presentation format, might not be 100% reproducible in future access.
• Consider Migration: This category is for file formats that WSU Libraries is not confident will be usable in the future. While it is possible to store and preserve the raw information in these files, in order to better ensure usability and access, it is recommended that they be 'migrated' (converted) to a format more suitable for long term preservation.
Level 1 | Level 2 | Consider Migration |
Plain-Text (UTF-8, UTF-16,US-ASCII) | Rich-Text | Plain-Text (other more obscure encodings) |
PDF/A | Word Perfect | |
MS Word (docx) | MS Word (doc) | |
ODT | ||
EPUB |
The level one format recommendations, Plain-Text and PDF/A were selected due to a higher level of confidence that information contained in those file types will remain accessible and accurate into the future. In the case of Plain-Text, its lack of complex formatting and its standardized encoding contribute to this confidence. In the case of PDF/A, the fact that it embeds the elements for future accurate rendering (such as fonts) contributes to this confidence.
Level 1 | Level 2 | Consider Migration |
TIFF (Uncompressed, LZW, Group 4 Preferred) | GIF | Photoshop Files |
JPEG/JPEG2000 | BMP | Adobe Illustrator Files |
PNG | DNG (Digital Negative) | |
RAW |
These level one recommendations were made due to the openness and extremely wide adoption of the formats. In the case of TIFF, JPEG2000 and PNG lossless compression is supported, providing the highest possible quality for any future access.
Level 1 | Level 2 | Consider Migration |
WAV (Broadcast Wave embedded metadata preferred) | M4A | RealAudio |
FLAC | MP3 | Windows Media Audio |
AIFF | Ogg Vorbis/Ogg Opus | |
These level one recommendations were made due to the openness and wide adoption of the formats. All level one formats are either uncompressed or support lossless compression. WAV (and Broadcast Wave) is the most commonly accepted standard for digital audio preservation. More information about Broadcast Wave metadata (and a tool for embedding it) can be found at the project site for BWF Metaedit. The FLAC file format is an open standard that employs lossless compression to provide high quality audio in a smaller file than uncompressed formats like WAV.
Level 1: Wrappers | Level 1: Codecs | Level 2: Wrappers | Level 2: Codecs | Consider Migrating: Wrappers | Consider Migrating: Codecs |
---|---|---|---|---|---|
MKV | Uncompressed(V210(10bit)/8bit YCbCr Preferred) | WebM | Motion JPEG-2000 | WMV (Windows Media) | Legacy Quicktime Video Codecs (RPZA, Sorenson Video etc.) |
MOV | FFV1 | MPEG-4(h.264,AVC) | RM (Real Media) | Windows Media Codecs | |
MXF | DV | MPEG-2(h.262) | RealVideo Codecs | ||
AVI | VP9 | ||||
ProRes | |||||
Ogg Theora |
Digital video is complicated in that consideration must be given both to the file 'wrapper', which is how the information is contained, and the file 'codec' which is how the information is represented. To use an analogy, the 'wrapper' could be thought of as the cover and table of contents of a book (describing what and how information is arranged) and the codec could be thought of as the language the text of the book is written in.
Level one recommendations for wrappers were made due to their openness and/or wide adoption lending confidence in their long term viability.
Level one recommendations for codecs were made because of their openness and ability to store lossless data in the case of Uncompressed and FFV1. DV files are recommended to be stored in their native DV format both due to the standardized nature of the format lending confidence for future access, as well as due to the existence of extensive provenance metadata within the encoded data.
Level 1 | Level 2 | Consider Migrating |
---|---|---|
WARC | ARC | Raw website components for complex interactive websites |
Raw website components for simple websites (HTML, CSS etc.) |
The WARC format is recommended for web archiving due to its ascension as a de-facto standard used by the largest web archiving projects such as the Internet Archive and the Library of Congress. By using a web archive format rather than raw website components future access to the intended presentation of the website is better ensured.
Level 1 | Level 2 | Consider Migrating |
---|---|---|
JSON, XML (Including schema) | XLSX (Excel) | XLS |
TSV, CSV (Tab Separated or Comma Separated Values) | ||
SQLite |
Level one recommendations were made due to their open and self-contained nature. Using self-contained formats for storing data (such as a list of comma separated values) greatly enhances the prospect of future access and use of that data when compared with complex proprietary formats.
Level 1 | Level 2 | Consider Migration |
---|---|---|
ZIP |
RAR | |
TAR | Stuffit Archives | |
GZIP |
Level one recommendations for archives were made due to their wide adoption and non-proprietary nature.