Skip to main content

Digitial Preservation: File Formats

This guide contains information and recommendations for file formats suitable for long term digital preservation. Its scope is focused on content to be ingested into the WSU Libraries' Institutional Repository, Research Exchange, but many of the recommend

Welcome!

Welcome to the File Format recommendations for Research Exchange, WSU's Institutional Repository! This guide provides information about suggested file types to use (and not use) to help ensure the long term viability and preservation of your research. While these recommendations are geared specifically for files to be deposited in Research Exchange, most are broadly applicable to best practices in file format selection.

 

File Format Considerations

Working vs. Final Formats

In general, works to be deposited into Research Exchange should be in their finalized form, rather than a 'working' format. This means that, for example, a completed image should be deposited as a TIFF file, rather than the Photoshop file used to create the image.

Open vs Proprietary Formats

When choosing a file format to save your work, considering how open vs. proprietary that format is can be important for helping ensure the future usability of that file. This includes considerations such as if the specifications have been openly published and if the format is subject to any patent claims.

Highly Adopted Formats

A format being very widely adopted can help increase chances that it will be usable in the future. High adoption means that it is more likely that tools will continue to exist to facilitate access to the file's content. It should be cautioned, however, that wide adoption does not necessarily guarantee that a format will not be superseded and rendered obsolete (especially with proprietary formats). One example of this is the RealVideo format that was once ubiquitous on the internet and now can prove problematic to open.

Lossless vs Lossy Formats

When creating digital objects intended for preservation, it is best to use the highest quality format as is practical. This means that, when possible it is advisable to use 'lossless' formats vs. 'lossy' formats. Lossy formats make compromises between quality and file size by throwing out/simplifying some of the information stored in the file. Lossless formats, while resulting in larger files, maintain all of the original information, thus making them the best source for derivatives and any future format conversions. An example of lossless formats are WAV files for audio and TIFF files for still images. Corresponding lossy formats would be MP3 for audio and JPG for images.

An extreme example of detail being lost can be found in the following video. This image was created by taking a video source, and then repeatedly reprocessing it as an MPEG-4 file, leading to a compounding effect of the data being thrown away with each conversion.

 

WSU Libraries, PO Box 645610, Washington State University, Pullman WA 99164-5610, 509-335-9671, Contact Us