I’ve been quietly exploring the Linked Data world for some time, and thinking about how cultural heritage information might play in that space.
On the face of it, we don’t have the type of information which is readily expressed as assertions, i.e. simple statements of “fact”, such as you find in dbpedia. There’s lots of uncertainty, lots of imprecision (“provenance: Asia” – give me a break!), and many – sometimes conflicting – opinions about the history of material culture.
However, I think that there is also enough hard data to make the exercise worthwhile. I think, too, that we can usefully represent the uncertainty and imprecision as Linked Data, though the resulting RDF may be a little more complex than the sets of simplistic triples which you tend to find, for example, in dbpedia.
[image: Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ ]
One thing I think we do need is some common points of reference. At present, the standard way of publishing say a museum collection as Linked Data is to invent a set of URLs for the collection objects (which is fine and necessary), but then to do the same for all related entities, such as people and places. Thus the British Museum Linked Data has:
…which is the identifier for the person Vardanes II. This URL is fine, but it is specific to the BM’s database. I assume he is the same Vardanes II who is described in:
but there is nothing in the BM Linked Data to confirm (or refute) this.
One thing which I think we need in the cultural heritage field is a single Linked Data authority resource for personal identity. This authority should contain enough information about each person to allow automated processes to calculate the likelihood that two people are actually the same individual. Key facts would be name(s), and date and place of birth and death. [Sticking to dead people avoids Data Protection and libel issues ] Other potentially useful facts could be added: titles, gender, nationality, occupation, etc. However, the goal would not be to build an encyclopaedia of personal data, but just to have enough facts about each person to allow identity matching.
I’ve grabbed the 150,000 or so person entries from dbpedia which have dates of birth and death, and set them up as a stand-alone database. Having done this, I wanted to use the personal information in the Open Plaques data for a comparison test, to see how straightforward this sort of automated matching might be. I thought that there should be quite a big overlap between the sort of people who are considered noteworthy enough to be in Wikipedia, and those considered worthy of plaques.
Extracting the personal information from Open Plaques was my first challenge. I had found biographical details in the XML for plaques, but it was presented as escaped CDATA – not useful for my purposes. After a conversation about this with Jez Nicholson, nicely structured XML person authority data appeared as though by magic.
There is also an XML “index” which lists all the people who are represented in Open Plaques. I used this index in an XSLT transform, which grabbed all the records it mentioned and put them into a single source XML document (well, two: the XLST ran out of memory so the job had to be split into two chunks). In the process my transform reported 9 Open Plaques URLs in the index which didn’t point to a real resource – useful error-checking.
“But” I hear you say “surely Open Plaques’ XML outputs aren’t ‘proper’ Linked Data?”. Here is part of an example:
<?xml version="1.0" encoding="UTF-8"?>
<address>1 Market Square, Newent, United Kingdom</address>
This isn’t RDF. However, from my point of view it is just as good, because it has the two attributes I need:
* it is machine-processible, so I can select data, transform it, analyse it
* it uses persistent dereferenceable URLs to identify the concepts it is describing, so I can grab any related material I’m interested in. (You need either to stick ‘.xml’ on the end, or make an HTTP request where you specify that it’s an XML response that you want. Either way, you can use the standard XML document() function to access related resources.)
I now have a Modes data file containing just over 3,500 person records, and I’m all set to try my comparison with dbpedia. I’ll let you know how I get on.
About Richard Light
Richard has worked in the cultural heritage information retrieval field for over 30 years. A founding staff member of the Museum Documentation Association (now Collections Trust) from 1977 to 1991, he helped develop the current UK museum standards framework.
Since then as a freelancer Richard has additionally specialised in markup technologies as they impact cultural heritage information resources. With a particular interest in the potential of Linked Data techniques for cultural heritage, he has provided assistance to widely-used classification schemes (UDC, BLISS, SHIC, Israel Museum) in their attempts to move towards a Linked Data (SKOS) manifestation. Further posts on his XML, museums and linked data blog.