Skip to main content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Digitial Preservation: File Formats

This guide contains information and recommendations for file formats suitable for long term digital preservation. Its scope is focused on content to be ingested into the WSU Libraries' Institutional Repository, Research Exchange, but many of the recommend

Recommendations Intro

This page provides recommendations for file formats to be deposited in Research Exchange. Recommendations are organized by item type, and are are divided into the following categories:

 

• Level 1: This category is for formats that WSU Libraries can anticipate will best preserve the content and presentation of your data. These formats provide the highest confidence for long term preservation.

• Level 2: This category is for formats that WSU Libraries anticipates will most likely remain usable and accessible in the long term. However, it is possible that certain elements of the file, such as presentation format, might not be 100% reproducible in future access.

• Consider Migration: This category is for file formats that WSU Libraries is not confident will be usable in the future. While it is possible to store and preserve the raw information in these files, in order to better ensure usability and access,  it is recommended that they be 'migrated' (converted) to a format more suitable for long term preservation.

File Format Recommendations

Level 1 Level 2 Consider Migration
Plain-Text (UTF-8, UTF-16,US-ASCII) Rich-Text Plain-Text (other more obscure encodings)
PDF/A PDF Word Perfect
MS Word (docx) MS Word (doc)
ODT
EPUB

Explanation:

The level one format recommendations, Plain-Text and PDF/A were selected due to a higher level of confidence that information contained in those file types will remain accessible and accurate into the future. In the case of Plain-Text, its lack of complex formatting and its standardized encoding contribute to this confidence. In the case of PDF/A, the fact that it embeds the elements for future accurate rendering (such as fonts) contributes to this confidence.

Level 1 Level 2 Consider Migration
TIFF (Uncompressed, LZW, Group 4 Preferred) GIF Photoshop Files
JPEG/JPEG2000 BMP Adobe Illustrator Files
PNG DNG (Digital Negative)
RAW

Explanation:

These level one recommendations were made due to the openness and extremely wide adoption of the formats. In the case of TIFF, JPEG2000 and PNG lossless compression is supported, providing the highest possible quality for any future access.

Level 1 Level 2 Consider Migration
WAV (Broadcast Wave embedded metadata preferred) M4A RealAudio
FLAC MP3 Windows Media Audio
AIFF Ogg Vorbis/Ogg Opus

Explanation:

These level one recommendations were made due to the openness and wide adoption of the formats. All level one formats are either uncompressed or support lossless compression. WAV (and Broadcast Wave) is the most commonly accepted standard for digital audio preservation. More information about Broadcast Wave metadata (and a tool for embedding it) can be found at the project site for BWF Metaedit. The FLAC file format is an open standard that employs lossless compression to provide high quality audio in a smaller file than uncompressed formats like WAV.

Level 1: Wrappers Level 1: Codecs Level 2: Wrappers Level 2: Codecs Consider Migrating: Wrappers Consider Migrating: Codecs
MKV Uncompressed(V210(10bit)/8bit YCbCr Preferred) WebM Motion JPEG-2000 WMV (Windows Media) Legacy Quicktime Video Codecs (RPZA, Sorenson Video etc.)
MOV FFV1 MPEG-4(h.264,AVC) RM (Real Media) Windows Media Codecs
MXF DV MPEG-2(h.262) RealVideo Codecs
AVI VP9
ProRes
Ogg Theora

Explanation:

Digital video is complicated in that consideration must be given both to the file 'wrapper', which is how the information is contained, and the file 'codec' which is how the information is represented. To use an analogy, the 'wrapper' could be thought of as the cover and table of contents of a book (describing what and how information is arranged) and the codec could be thought of as the language the text of the book is written in.

Level one recommendations for wrappers were made due to their openness and/or wide adoption lending confidence in their long term viability.

Level one recommendations for codecs were made because of their openness and ability to store lossless data in the case of Uncompressed and FFV1. DV files are recommended to be stored in their native DV format both due to the standardized nature of the format lending confidence for future access, as well as due to the existence of extensive provenance metadata within the encoded data.

 

Level 1 Level 2 Consider Migrating
WARC ARC Raw website components for complex interactive websites
Raw website components for simple websites (HTML, CSS etc.)

Explanation:

The WARC format is recommended for web archiving due to its ascension as a de-facto standard used by the largest web archiving projects such as the Internet Archive and the Library of Congress. By using a web archive format rather than raw website components future access to the intended presentation of the website is better ensured.

Level 1 Level 2 Consider Migrating
JSON, XML (Including schema) XLSX (Excel) XLS
TSV, CSV (Tab Separated or Comma Separated Values)
SQLite

Explanation:

Level one recommendations were made due to their open and self-contained nature. Using self-contained formats for storing data (such as a list of comma separated values) greatly enhances the prospect of future access and use of that data when compared with complex proprietary formats.

Level 1 Level 2 Consider Migration

ZIP

RAR
TAR Stuffit Archives
GZIP

Explanation:

Level one recommendations for archives were made due to their wide adoption and non-proprietary nature.

WSU Libraries, PO Box 645610, Washington State University, Pullman WA 99164-5610, 509-335-9671, Contact Us