Thursday, 5 July 2012

Jason Scott and the Archive Team

If you haven't heard of the archive team I suggest you look them up and even become an archiving activist or even hero!

I had heard of the archive team before and like many, I believed that this rogue group are like over "energetic hackers" with an agenda to preserve the lives of people that companies destroy.

This is an agenda I fully support but had not been inspired to take part until recently when I had the absolute pleasure of listening to a keynote speech by Jason Scott, one of the founding members of the archive team. Jason sums up himself as an "energetic hacker" and he is a man who shares many of my beliefs which I sum up in this post.

1) Publishing is moving too fast for archives to keep up.

Jason compares archiving peoples lives on the web in systems like geocities, MobileMe and others like trying to catch fireflies. They fly up and while you have time to say "How Pretty", you have no time to realise what the firefly does and its value to the community.

Once you do realise the value, they are already gone.

This is true of nature, buildings and even people themselves! In many cases there is nothing we can do, but what about digital information?

Cathy Marshall presents a survey carried out at the Library of Congress of digital lives with some children and one quote stuck out:

"Why don't we just save facebook as this is our diary, yearbook, guinness book of records..."

This is absolutely something that can be done, and something that the archive team are doing...great... but why aren't others?

One word... Policy

2) Just get on with it, ask for forgiveness later.

It is much easy to ask for forgiveness that it is to reconstruct a building that has been knocked down. Once it is gone, nothing can be done. Something that has been saved, can still be removed...

Jason and archive team apply this policy, archiving whole websites and peoples public content, indexing it and making it available on torrent sites. And this has saved a lot of peoples content from the trash can.

Again, POLICY is a blocker to people just doing this, the other blocker is the belief that people will not want you to archive and share their data... let's address that...

3) People know what they want, so ask them!

Question for you to consider: Should your medical record be shared?

What if it was shared with researchers worldwide so that a cure might be found quicker?

What if your medical record became the top hit for your name of google?

It is this last point that seems to be key to most people, many people don't mind historical information being made available, however they just don't want such information to be any more prominent that it was.

Finally the hardest thing to archive is to allow the user to still be in control of their data and allow it to be deleted if they want it too. A big problem with all of these systems is that companies and archives assume ownership of peoples data, it is not their data, it belongs to the curators of the data and control should remain with those people.

If you lend a dinner set to a friend or neighbour for a big party they are planning, you expect to be able to get it back, not for them to take control of it and then throw it away instead of give it back to you.


It is time that policy was changed. It definitely should not get in the way of progress!

Wednesday, 22 February 2012

Drupal 7 - From Blank to Working

I have never set up a site from scratch in Drupal before, but I am impressed how easy it is to use for users who don't want to delve into a terminal and try and understand templating that way. Problem is getting from a fresh install to basic site is actually quiet hard! This blog lists the steps I went through to make it work, it already assumes you have set up an admin account. The following steps look at building a simple site in the Bartik theme.

1) Set up your development environment - Make sure you have two browsers open, one in Incognito/Private browsing mode such that you can see changes to users who are not logged in.

2) Login and create a main page, this is a basic page and will be called something like node/1. Via Configuration -> Site Information you will need to set it as a home page (set the other options while you are here).

3) Change the theme to Bartik in Appearance and then click settings to customise your colour scheme and default icon (note that icons need to be the correct size).

4) Go to People and then click the Permissions tab. Make sure that all users can search your site. Your can then add the search block (via Structure -> Blocks) to the header (or other) section of your page.

5) Go to Structure -> Menus and click "list links" against "user menu". Then click add link and add a Login link with the url user/login and enable this. You can do the same with the url user/register to get a registration link. Refreshing the non-logged in page should now show these links at the top right. Turning off registration can be done via Configuration -> Account Settings.

6) From Structure -> Blocks create a new block with the following content and add it to the header section of your page:

<style type="text/css">
.breadcrumb {
display: none;

This should set you up a basic and usable site.

Friday, 20 January 2012

Making Debian Changelogs from Github repositories

One of the many things that irks me is the gap between good developers who put all their code on platforms such as GitHub, and those who then actually bother to put some effort into packaging up their code for easy platform installation.

I have come to the realisation that this is mainly due to the pedantic nature of packaging formats and platform lock in. One such example is the exacting format of the debian changelog...

GitHib2Changelog is a bit of code that I knocked together to help in this situation. It takes a GitHub repository URL and builds a debian changelog from the repository commits and tags.

By looking at the tags and commits it works out which commits are related to which tags (something GitHub APIv3 doesn't do) and then outputs this directly to you already formatted.

The service is built in php, and is web based with both a pretty front end and API access.

Ironically, since i've now committed the code to GitHub here I now need to use the service on itself and build the easy to install packages. More on that soon...

Thursday, 19 January 2012

DepositMOre - The Prototype

Building on the success of DepositMO and SWORDv2, I thought it would be a good idea to put a quick HTML5 client together to save myself some pain.

The basic premise of this web-based client is to automatically search for "your stuff" in a number of ways and then allow it all to be submitted to a repository in one click.

First target for me was This service is used as an online conference submission and review system. In a nut-shell if an author wants to get accepted into a conference, easychair is one system which they WILL have to battle with in order to submit their content. As a result there is a strong potential that easychair knows about many publications which should also be present in other systems.

From the main screen in easychair it is possible to navigate and find the many conference publications which you have submitted. Each publication is tied to a conference and it can take a substantial number of clicks to navigate between each publication.


DepositMOre is a modular system which is intended to be a home for many services which locate your publications. The first module to be developed is for easychair.

By simply providing your login credentials to the DepositMOre system, it will not only list all your authored items from easychair but also check if these are present in your locally detected repository. If they are not deposited, and they should be then one click will do this for you.

A combination of HTML5 and SWORD2 make this process quick and seemless! Multiple items can be submitted at once and as each are submitted you can instantly click a link to your item and can view it in the repository.

The following video gives a demo of the prototype in action. We hope to continue development with the support of a funded project.

Technologies Used

  • HTML/Javascript/JQuery/PHP

  • SWORD2 PHP Library - Stuart Lewis -