Monday, 31 October 2011

A little preservation watch tool for DROID users

Ever wondered what has changed in each new signature file released for The National Archives DROID tool?

Want a way to find out what objects a new signature file might affect or reclassify?

I have collected together all the available DROID signature files (still want more) and produced a little preservation watch tool that surmises changes between signature file versions. A summary is produced which outlines the signatures and file formats added to each new signature file. Additionally by selecting any two of the signature files, a user is also able to compare two specific versions.

Soon to be added will be the ability to subscribe to an email or RSS feed which alerts of new signature files and changes, allowing active preservation watch.

In order to tailor this more to a users n eeds I'm contemplating allowing selection of specific extensions/formats which users care a lot about and producing an alerting service which focusses on changes to only these types.

Thoughts welcome...

Monday, 18 July 2011

More on File Identification Tools

Since I last wrote a post on this back in March I have started some work for the Open Planets Foundation. As I said in my previous post, I see no reason to have too many unmaintainable tools when we could just pick the best one... the problem is making this choice (for some).

Which tool?

Simple - The one which is currently the most widely adopted... file.

Opinions may vary on this however ALL of these arguments talk about the feature set of a particular tool or the slowness of the tool when scanning billions of files.

Feature Sets and Ease of Use

File is a very simple tool which offers a mime-type and limited metadata exposing of the file types about which it knows. It only accepts single file execution, however you can wildcard it's input in the linux shell and it executes extremely quickly. In my testing file took 2.375s to identify 1000 files, that's 421 files a second (see the comparison to DROID and FIDO @

Other tools offer more power in other ways, so DROID fits well with The National Archives (UK) Digital Continuity project, providing a PRONOM identifier and mapping back to tools which can perform many operations on these files.

DROID is an ever improving tool as the underlying data (number of files it can identify) expands. However certain decisions have led to it becoming difficult in recent times to simply profile a single file. Rather it has become a tool which is now quiet heavily integrated with other systems rather than loosely coupled.

FIDO resulted from this realisation that DROID was becoming too slow and painful to profile a single file. Originally written in python, FIDO performs the same classification as DROID (using the same signatures) in a much shorter time and provides the PRONOM PUID as output. FIDO provided a great proof of concept that this operation could be quick however suffers from the problem that someone has wrapped the current release (0.9.5) of FIDO in java. This slows it down significantly due to having to launch the JAVA VM!

FITS - The file identification tool set.
This final tool pretty much wraps everything, some really useful and detailed output can be gained as a result of running not only all the identification tools, but also classification tools like exiftool and ffmpeg. Lots of detail, really slow! Also, more so than any of the others, FITS suffers from the problem of being up to date more than the others.

The problem with FITS is that it wraps the other tools to provide one output. In the case of FITS they still wrap DROID v4 with a very old signature file. They chose DROID v4 as this was the last version with decent command line execution, which is (i'm guessing) the way they call it. FITS wraps a great number of other tools in it's distribution as well, such as the exiftool, which already have package managed versions, thus all the tools FITS uses are being constantly dated by new versions which then required the effort to wrap them. FITS is a great attempt at a very valuable tool, however the problems with the tools it bundles being updated constantly is likely to cause many maintainability problems.

Along with all the other tools, FITS suffers from the fact that it doesn't update the DROID signature file (a format which hasn't changed) automatically. This is a simple way for a tool to keep up to date, we should not have to rely on the users doing this when the tool should just be doing it for you! People are lazy these days and expect the package to just work, or their to be an available update of the whole package (akin to App Store approaches). The users are right in this respect BTW!

File, the tool I haven't mentioned for a while is packaged managed by every widely used platform now, so if there is an update, people are alerted to it. From this point it only takes one click to download the latest, greatest and fastest version.


Like Weird Al's song "Albuquerque" I have finally got to the point (but not the conclusion) ... Packaging.

In order to keep users happy we MUST learn to allow them to download and use tools which suit them.

Personally I'm fed up of JAVA integration, when to install a package you have to first install Maven, then install something else, then do this..... blah blah bored....

We need to start packaging cross platform tools inside one click install MSIs (windows), RPM (redhat) and DEB packages (Ubuntu/Debian and the rest of Linux).

I don't care if these packages install dependencies but the user shouldn't have to take more than one step to install a tool.

Futher the tool should either self update, or the user should be prompted to update it via the package managing option which their operating system already contains.

Using packages, FITS could simply be a very small meta package which depends on other packages, thus keeping the whole suit of tools up to date... independently.


* Yes a tool should be fast
* Yes a tool should be feature rich
* Most of all, you should be able to install it and keep it up to date easily!

What's Next

The billion dollar question, for me I've done some performance testing of the various tools and decided that speed is due to features. As a package gets bloated and feature rich, it becomes slower! The faster it is the simpler it is.

What i'd like to be is to make the fastest one (file) feature rich without bloating it and slowing it down. Also file is already package managed which saves me what appears (according to the other tools) to be a very hard job.

Some investigation on this aim is likely to follow, along with some requirements gathering and classification on critical features before things move forward.

Monday, 14 March 2011

Preservation Tools - Moving Forward

Over the last number of years, JISC and other bodies have funded a number of digital preservation projects which have resulted in some really valuable contributions to the area... now is the time to realise the benefits of this work and provide a digital preservation experience to everyday users.

To achieve this a not insignificant amount of work needs to be undertaken, namely to identify key applications and separate these from the complex systems into which they have been built. Alternatively many applications now need re-thinking and the best bits built into system which have super-ceded these applications.

File Format Identification Tools

File format identification now has a number of tools available, each with their own advantages and disadvantages, in no particular order they are:

  • Started out as a tool to identify file types and versions of those types. :)
  • Each file version was assigned an identifier which could be referenced and re-used. :)
  • Identification of file was done via "signature", not extension matching. :)
  • Became complex as it was adjusted to suit workflows and provide much more complex information which few people understand or want :(
  • Added complexity increased the time required for each file classification, no longer a simple tool :(
  • A new cut down client which takes the DROID signature files and does the simple stuff again :)
  • A built in Unix tool installed on every Unix based system in the world already! :)
  • Does not do version type identification :(
  • Does not provide a mime-type URI :(
  • Very quick to run :)
  • Has the capacity to add version type identification and there is a TODO in the code for it! :)

With the PRONOM registry now looking at providing URIs for file versions, why can't we stop coding new tools and change the FILE library. This way it could handle the version information and feed back the URIs if people want them. I've looked briefly into this and the PRONOM signatures should be easy to transport and use with the file tool.

If I get time I might well have a go at this and feed it back to the community.

Friday, 4 March 2011

Installing Kinect on Ubuntu (A full guide)

1) sudo apt-get install libglut3-dev build-essential libusb-1.0-0-dev git-core

2) mkdir ~/kinect && cd ~/kinect

3) git clone

4) cd OpenNI/Platform/Linux-x86/Build

5) make && sudo make install

6) cd ~/kinect/

7) git clone

8) cd Sensor

9) git checkout kinect

10) cd Platform/Linux-x86/Build

11) make && sudo make install

12) go to this page at openNI to download the latest NITE release for your platform:NITE download page or for the impatient:

13) Save the NITE tarball to ~/kinect and untar it

14) cd ~/kinect/NITE/Nite-

15) Open Sample-User.xml,Sample-Scene.xml and Sample-Tracking.xml and replace the existing License line with the line below:
NOTE: this is case sensitive!

< vendor="PrimeSense" key="0KOIk2JeIBYClPWVnMoRKn5cdY4=">

16) Repear step 15 and replace the existing MapOutputMode line with the line below in all 3 files.

< xres="640" yres="480" fps="30">

19) sudo niLicense PrimeSense 0KOIk2JeIBYClPWVnMoRKn5cdY4=

20) cd ~/kinect/NITE/Nite-

21) sudo ./install.bash

22) make && sudo make install

23) cd ~/kinect/NITE/Nite-

24) sudo adduser YOURNAME video

25) nano /usr/etc/primesense/XnVHandGenerator/Nite.ini by uncommenting the two config parameters it contains

26) sudo nano /etc/udev/rules.d/51-kinect.rules

# ATTR{product}=="Xbox NUI Motor"
SUBSYSTEM=="usb", ATTR{idVendor}=="045e", ATTR{idProduct}=="02b0", MODE="0666"
# ATTR{product}=="Xbox NUI Audio"
SUBSYSTEM=="usb", ATTR{idVendor}=="045e", ATTR{idProduct}=="02ad", MODE="0666"
# ATTR{product}=="Xbox NUI Camera"
SUBSYSTEM=="usb", ATTR{idVendor}=="045e", ATTR{idProduct}=="02ae", MODE="0666"

27) sudo /etc/init.d/udev restart

28) cd ~kinect/Nite-

29) ./Sample-PointViewer and PLAY