Daves thoughts on stuff: June 2008

Friday, 27 June 2008

OAI-PMH + OAI-ORE (Atom) + Pronom Droid = Pretty

I've just finished writing a wrapper (very simple!) which takes a OAI-ORE Resource Map in Atom Format and classifies the objects which are listed in the Aggregation using the National Archives (UK) technical registry (Pronom).

The wrapper provides a simple front end to the DROID tool, it takes an OAI-PHM URI and requests the latest resource maps in atom format (ore-atom) and creates a list of the resources which are passed to DROID to classify directly.

The wrapper requires OAI-PMH as it requests all records which have been modified since it last did a parse of the repository. This way the wrapper can be scheduled to run once a day/week/month etc.

A single DROID xml file comes back as the output.

This is all working with EPrints repository software currently.

Next stage is to do something useful with the output xml in terms of providing useful data back to the repository manager.

Total lines of source code for the wrapper: 302 :)

Sunday, 8 June 2008

Repository Software is Dead

Repository Software for digital collections as we know it supplies the complete solution to the client, thus without the software you cannot access any of the data in your repository. This is a bad thing for object reuse and digital preservation!

Many people at conferences such as Open Repositories 2008 and from workgroups like CRIG have been talking for a long while about the importance of Interoperability. However, if you get rid of the need for the interoperability and use a standard specification for accessing simple data objects (pdfs and their metadata), then you don't need interoperability!

So this leads me to the fact that EPrints, Fedora and hopefully at some point DSpace are abstracting their database and storage layers to support use of any type of storage platform. Thanks goes SUN Microsystems preservation action group and open storage group for pushing this work from a commercial perspective. But we need to go further than this to get rid of the need for interoperability.

From Open Repositories 2008, myself and a college Ben O'Steen from Oxford University proved how OAI-ORE (OAI specification for Object Reuse and Exchange) can be used to enable high level repository interoperability. This work won us $5000 but more importantly got the community thinking about the true power of a specification like OAI-ORE. Ben and I are now hoping to push this work down to the low level storage such that the objects within an ORE map (documents and metadata) can be directly referenced without the need for the current repository layer. For this to happen all objects need to be stored in their simplest form - NO WRAPPER FORMATS ALLOWED at the lowest level.

From recent talks with Sandy Payette and Les Carr (Fedora and EPrints respectively) I am envisaging that the current repository software becomes classified as repository service software which is able to manage low level objects but is not specifically required to access these objects. So current services which plug into the repository software can act directly on the objects.

A couple of problems to solve, security and consistency of cached data. All especially applicable if you have more than one piece of repository service software modifying your objects.

CRIG / IEDemonstator After Thoughts

IEDemonstrator is a really bad name for a project as it just says Microsoft to me but I'm fairly it isn't anything to do with that most stable of web browsers.

From the workshop it has become clear to me that discussing a specification for service interaction globally is going to be impossible. This could be due to the fact that SOAP did such a good job of it and no one wants to use anything else (enough sarcasm??). I think many people left the workshop with a much better idea at how HTTP error codes (which have been around years) already go most of the way to solving a web service model. We also realised quickly that any specification would have to be built specifically for pay services (e.g. make use of the 402 code), this would then encourage companies/institutions to supply reliable services which last more than 4 years (cough AHDS cough).

Friday, 6 June 2008

First Post - CRIG DRY Workshop

Well there's a surprise!

CRIG DRY Workshop in Bath is where I am now. So what's happening:

People have been talking about services and proposed projects to provide authoritative and complete services to users/agents/repositories. A couple of themes have come out morning session for me:

SKOS: A lot of projects (incl. Library of Congress) are using this RDF language to describe subject and properties. Each provides access to this information in so many different ways it is hard to see how to interact in a constant manor.

Service Interaction (read on as the name is not that descriptive)

This moves us on from the Open Storage stuff i've been working on (again more later in another blog post) into how we facilitate the use of services and discover how to interact with these services. We are pushing for the use of http codes! CRIG it.

Tis it for now....

Daves thoughts on stuff