Tuesday 8 September 2009

Thoughts on digitization, data deluge and linking

It's been a while since I've put a post up and this is probably due to being busy and also trying to tidy up a lot of stuff before starting on new projects.

In this post then: Digitisation

I never really gathered how big the area of digitisation is and how many non repository people are actively involved in digitisation. There are a great many projects >50 who are digitising resources and these include national libraries. Items being digitised include everything from postcards and newspapers to full books and old journals.

So what's the problem here ... simple ... how many people are digitising the same things? Yes I know that there is so much out there that this is unlikely to be the case however it brings me nicely to the problem of information overload. There is already more valuable information on the internet than we can possibly handle effectively, so how do you ensure that any resources you digitize for open access usage on the web can be found and used?

I don't normally say this but perhaps we should look at physical libraries for the answer. Libraries are a very good central point where you can find publications related to all subject areas, and if your local library does not have a copy then it will try and find a copy somewhere else.

How then does this map onto the web? Web sites become the library and links become the references to additional items or items this site does not contain, simple right? Unfortunately with 50+ projects I can count already, this leads to 50+ different web sites all with differing information presented in different ways. Due to the presentation of each web site being totally different this means that in fact they are not a library - that pride themselves on the standard way to organise resources -
thus web sites become books. Thus to find resources we have to rely on search engines and federation. Thus we are back to where we started and we have a problem with information overload.

Unfotunately I don't have an answer to this problem, however I do know that links hold the key to the solution. Each website at the moment is simply an island of infromation, what is desperately required is the authors and community to establish links to these resources. If digitisation houses are curating refereed resources then the simplist way to link to these would be to put information about them on wikipedia.

This would be my final point then, wikipedia is actually a good thing, simply because of the the community aspect. However it also provides many other huge benefits:

  • External resources such as photoes have to have a licience

  • In annotating a page/item you create links and establish facts which are available by semantic wikipedia (dbpedia)

  • Wikipedia is an easy way to establish your presence on the link data web (linkeddata.org)



So if you are digitising books by an author, add this link to their wikipedia page. If you are digitising a collection of World War images, add links to some of these to wikipedia and flikr.

Establish links and help yourself to help everyone else.