Monday, 14 March 2011

Preservation Tools - Moving Forward

Over the last number of years, JISC and other bodies have funded a number of digital preservation projects which have resulted in some really valuable contributions to the area... now is the time to realise the benefits of this work and provide a digital preservation experience to everyday users.

To achieve this a not insignificant amount of work needs to be undertaken, namely to identify key applications and separate these from the complex systems into which they have been built. Alternatively many applications now need re-thinking and the best bits built into system which have super-ceded these applications.

File Format Identification Tools

File format identification now has a number of tools available, each with their own advantages and disadvantages, in no particular order they are:

DROID:
  • Started out as a tool to identify file types and versions of those types. :)
  • Each file version was assigned an identifier which could be referenced and re-used. :)
  • Identification of file was done via "signature", not extension matching. :)
  • Became complex as it was adjusted to suit workflows and provide much more complex information which few people understand or want :(
  • Added complexity increased the time required for each file classification, no longer a simple tool :(
FIDO:
  • A new cut down client which takes the DROID signature files and does the simple stuff again :)
FILE:
  • A built in Unix tool installed on every Unix based system in the world already! :)
  • Does not do version type identification :(
  • Does not provide a mime-type URI :(
  • Very quick to run :)
  • Has the capacity to add version type identification and there is a TODO in the code for it! :)

With the PRONOM registry now looking at providing URIs for file versions, why can't we stop coding new tools and change the FILE library. This way it could handle the version information and feed back the URIs if people want them. I've looked briefly into this and the PRONOM signatures should be easy to transport and use with the file tool.

If I get time I might well have a go at this and feed it back to the community.


1 comment:

Jack said...

Very well written article ! I just went through yours complete article and really enjoyed it. It was great reading of pros and cons of file format identification tools. Many thanks to you !
comment system