Sunday 8 June 2008

Repository Software is Dead

Repository Software for digital collections as we know it supplies the complete solution to the client, thus without the software you cannot access any of the data in your repository. This is a bad thing for object reuse and digital preservation!

Many people at conferences such as Open Repositories 2008 and from workgroups like CRIG have been talking for a long while about the importance of Interoperability. However, if you get rid of the need for the interoperability and use a standard specification for accessing simple data objects (pdfs and their metadata), then you don't need interoperability!

So this leads me to the fact that EPrints, Fedora and hopefully at some point DSpace are abstracting their database and storage layers to support use of any type of storage platform. Thanks goes SUN Microsystems preservation action group and open storage group for pushing this work from a commercial perspective. But we need to go further than this to get rid of the need for interoperability.

From Open Repositories 2008, myself and a college Ben O'Steen from Oxford University proved how OAI-ORE (OAI specification for Object Reuse and Exchange) can be used to enable high level repository interoperability. This work won us $5000 but more importantly got the community thinking about the true power of a specification like OAI-ORE. Ben and I are now hoping to push this work down to the low level storage such that the objects within an ORE map (documents and metadata) can be directly referenced without the need for the current repository layer. For this to happen all objects need to be stored in their simplest form - NO WRAPPER FORMATS ALLOWED at the lowest level.

From recent talks with Sandy Payette and Les Carr (Fedora and EPrints respectively) I am envisaging that the current repository software becomes classified as repository service software which is able to manage low level objects but is not specifically required to access these objects. So current services which plug into the repository software can act directly on the objects.

A couple of problems to solve, security and consistency of cached data. All especially applicable if you have more than one piece of repository service software modifying your objects.

No comments: