One of the purposes of Open Plaques is to provide an interesting geographical dataset that external projects can use licence free. We are trying a few projects before fully opening up the api, so please contact us if you have a good idea and want to use our data.
Ian Ozsvald, author of The AI Cookbook, has been experimenting with Optical Character Reading (OCR) of plaque photographs and has created a challenge for other AI hackers to advance the work.
Ian says, “the challenge aims to automatically read flickr images of plaques and then to use computer vision and optical character recognition tools to transcribe the text with human-level accuracy.”
“Currently I have a manual process which gives a human-like result (99% accuracy including spaces and punctuation errors). I’m working on an automated process: http://blog.aicookbook.com/2010/07/automatic-plaque-transcription-using-python-work-in-progress/”
“I have a demonstration system written in Python: http://aicookbook.com/wiki/Automatic_plaque_transcription which can be started in 30 minutes by any Python coder (or converted to another language by a competent programmer). The demo downloads a set of 30 plaque images, passes them through the open source tesseract OCR tool and scores the resulting transcriptions.”
“Ultimately I’d like to have a system that can run inside an iPhone, transcribing plaques as they’re photographed and uploading the results into openplaques with little manual entry for the human to do. The bigger picture is to understand how humans read text in the real (messy!) world so we can create augmented reality applications on mobile devices – imagine if your phone could ‘read’ a poster in the street and augment the display with location, background, details and propose a calendar entry for you – all from pointing the device at text in the real world.”
“I’m looking for hackers to join me in this project, to that end I’m offering an Amazon voucher (£25 or equivalent value) as a monthly prize to anyone who has the best automatic, open source (with no human involvement!) transcription system. Hackers of all levels are welcome.”
Any AI hackers out there who are interested in participating should check the AI Cookbook Google Group and/or the AI Cookbook wiki
One of our users has made progress bringing the average error down to 44 characters (which is quite skewed – some recognitions are OK, some are awful still):
http://groups.google.com/group/aicookbook/browse_thread/thread/4853cefff5d70231
There’s plenty of ground to cover yet and easy wins to be had with geo-tags, open source OCR tools and more for anyone who wants to get involved!
Ian.
If you are today looking for an OCR tool compatible with any platforms like Windows, Linux and Mac, a site that can do OCR for you online may attract you: goodocr.com. The result looks promising to me (English only).