Daves thoughts on stuff

Wednesday, 8 April 2009

Less talk, less code, more data - The Preserv2 Data Registry

Yes, less talk more code (oxfordrepo.blogspot.com) is a good saying but i'm going to argue in this post that in fact we need more data! Having a ton of available services and a load of highly complex and well considered data models is all well and good but without data all of these services are useless; A repository is not a repository until it has something in it (Harnad).

If we look outside of the repository community for a minute we find the web community we are accumulating a whole ton of data, wikipedia being the main point of reference here. Yet in the repository community we are not harnessing this open linked data model to enhance our data.

I have been working in the area of digital preservation for a while now and the PRONOM file format registry (TNA UK) has been my friend for many years now and contains some valuable data. However I am concerned with the way I see it progressing. The main thing I use the PRONOM registry for is as a complement to DROID for file format information, and the data here is not even that complete. I am concerned however at the size of the new data model and the sheer effort which is going to be required to fill it with the data which it specifies.

Why not looked to the linked data web to see how to tie a series of smaller systems together to make a much more powerful and easier to maintain one!

This is where I have started with the preserv2 registry available at http://p2-registry.ecs.soton.ac.uk/.

The preserv2 registry is a semantic knowledge base (RDF triples based) with an SPARQL endpoint, RESTful services and a basic browser. Currently the data is focussed on file formats and is basically made up of the PRONOM database ported from a complex XML schema into simple RDF triples. On top of this i'm beginning to add data from dbpedia (wikipedia RDF'd) and making links between the PRONOM data and the dbpedia data!

Already this is helping is ascertain a greater knowledge base and the cost of gathering and compiling this data is very low. Other than that the registry took me less than a week to construct!

So "Go forth and make links" (Wendy Hall) is exactly what I'm now doing. With enough data you will be able to make complex OWL-S rules that can be used to deduce accurately facts such as formats which are at risk.

Wednesday, 21 January 2009

EPrints 3.2 - Amazon S3/Cloudfront Plug-in

A quick post to say that we have just successfully tested an EPrints 3.2 (svn) install with the new Storage Controller plugged into Amazon S3!

This has quiet a lot of implications for both EPrints and other projects wanting to provide external services which operate on objects in a repository. We hope to bring people more news on this at the upcoming Open Repositories 2009 conference in Atlanta.

For more information on this all check out storage section on the Preserv2 website @ www.preserv.org.uk.

Thursday, 18 September 2008

Institutions hate repositories... one simple reason.

Open access is not enough!

People want to give Open Access to some of their materials at their institution however the IR software is seen as a means to manage all Institutional content and not just that which is Open Access and part of the external image of the Institution.
The problem exists in the other direction as well where repository software is trying to solve these problems, thus people are not likely to use this software until it is included.

So what do we end up with...

Lots of Repository Islands which aren't interoperable with each other!

So if we solve the access and copyright issue will people use the software? errrr No. At this point the software is an all in solution and not a service which can be utilised by current institutional practise ... Give up...?

No!

Focus on providing a service, e.g. something which can manage your Digital Resources and enable this to plug to existing institutional services. Some softwares would argue they support this already. OK good, so don't try and solve the problem if it is just an integration issue.

To the repositories: Decouple! Build a set of services, build ways of plugging services together and allow the community to pic 'n' mix.

To the institution: You already have access control systems ask your Information/Computer Systems department. You probably already have a Content Management System for educational resources for students (Blackboard? - Integrates with an LDAP server), these use external services to manage access and authentication! Here's a few services for you... LDAP, Radius, Eduroam, Domain Controller.

Till next time!

Saturday, 19 July 2008

Is winning in Casino's a Bad Thing

Totally off my usual topics but by playing short games in the Casinos in Vegas and quitting while ahead i'm up by just over $100. It's not a lot but considering i've only been playing 1c machines and $5 - $10 blackjack I think that's quiet cool! However since I haven't lost yet does that mean i'll now want to continue... could it get addictive. Considering i'm going to Atlantic City next week is this bad news!

As for the technical note, follow the rules of blackjack on wikipedia (http://en.wikipedia.org/wiki/Blackjack) and make sure you buy in enough for 10-12 hands at minimum bet, and never bet more unless wikipedia says you should! Also when you are up, 3-4 decks worth of rounds... get out!

This may not be the longest game in the world but you take the money off the Casino!
Thank-you Venitian/Palazzo Las Vegas!

Wednesday, 16 July 2008

#crigshow - Conference 2 - Worldcomp

Agents and Web Services... Why no collaboration?

Out of all the presentations at worldcomp this one struck me as one of the most obvious but not covered areas for research in computer science. Probably the most well known agent system is that used by the travel industry where they have standard ways of interfacing with each other to find details of travel and hotels available on a global scale. This is no mean feat with the number of companies there are hooking into this network.

So why doesn't the same exist for web services or if there is such a system why isn't everyone in the open community using it?

Surely the point of web services is for people to discover and use them in their own scenarios just like the agents in the travel industry do. OK so maybe the problem lies in the fact that there are so many communities that there will never be a specific use case or framework and thus hosting a generic web service network becomes infinitely hard with the number of different APIs and Implementations.

OK so if you are going to use Agents in Web Services what issues do you need to consider? Also what do you gain through doing this?

One of the key ideas which came out of a talk at worldcomp is to use Agents to be the intelligent front to a web service. This enables an agent to track of a set of web services including information about a specific web service such as availability, versions, changing cost and and offline copy if the service allows this. So the agent becomes a Rendezvous Point for a series of web services.

So why aren't we seeing more collaboration between the Agent community and the Web Services community?

Monday, 14 July 2008

#crigshow - Conference 1 - Oscelot

This open source day (#osdiii) hosted by Oscelot was an unconferene which soon became based heavily around the Blackboard platform. This was expected as the majority of people attending it were then going on to attend the BbWorld conference. With the title of the conference being Open Source and yet the main topic being that of a Closed Source product this gave an opening for the CRIG team to promote the wider Open Source community to those who are focused on Blackboard use cases.

The day was a success for the team as we promoted good practices in web development, standards, resource management and the fact that the people who manage an eLearning platform has a responsibility to the content they hold.

From our point of view, we discovered: If blackboard is the industry leader in learning management systems then the repository community is big problems when it comes to archiving these resources by the current methodologies each community practices.

More Collaboration and Awareness please!

Friday, 27 June 2008

OAI-PMH + OAI-ORE (Atom) + Pronom Droid = Pretty

I've just finished writing a wrapper (very simple!) which takes a OAI-ORE Resource Map in Atom Format and classifies the objects which are listed in the Aggregation using the National Archives (UK) technical registry (Pronom).

The wrapper provides a simple front end to the DROID tool, it takes an OAI-PHM URI and requests the latest resource maps in atom format (ore-atom) and creates a list of the resources which are passed to DROID to classify directly.

The wrapper requires OAI-PMH as it requests all records which have been modified since it last did a parse of the repository. This way the wrapper can be scheduled to run once a day/week/month etc.

A single DROID xml file comes back as the output.

This is all working with EPrints repository software currently.

Next stage is to do something useful with the output xml in terms of providing useful data back to the repository manager.

Total lines of source code for the wrapper: 302 :)