The data “gold” rush is upon us. We live in a digitally enabled world with data-driven services all around us from online banking to near-instant food delivery. In these times of fast-paced change, our archive data will become increasingly important, delivering value to the economy and wider society. The Geospatial Commission (GC) in collaboration with its Partner Bodies, has produced the next in the series of best practice guides, which we hope will help everyone navigate archive data projects.
Earlier this year, the GC commissioned a project, titled ‘Archive Data Capture Methodologies’ to identify how archived location data is currently being ‘digitised’, and what new methods are being used and developed to meet future demand. Led by the British Geological Survey (BGS) the project team included experts from the Coal Authority, HM Land Registry, Valuation Office Agency, United Kingdom Hydrographic Office and the Office for National Statistics.
The hidden value of 'location data' in our archives
We started by finding out what work was happening both in the UK and across the world in the field of data extraction / Map Archive Mining. Map archive mining involves taking location and associated data from, for example, old paper maps, documents and photos, and turning it into digital location data.
We reached out to our contacts and we were overwhelmed by the positive response. We met people extracting building footprints from maps using automated machine learning techniques, using artificial intelligence to calculate areas at risk of flood using historic maps, georeferencing data virtually using crowd capture techniques and others bringing historic data together in new ways and across disciplines to answer real-world questions.
We know that today most data is born digital. This means that the data is created, stored and delivered digitally. This of course was not always so. Things looked very different in the past when much of the UK’s location data was collected by hand and stored in non-digital formats. The BGS, and other UK geospatial agencies, have been collecting data since the 19th century, long before computers and the internet. Therefore there is a large store of (sometimes dusty!) old data that until recent advances in technology would have taken several lifetimes to utilise for the benefit of the public.
While location data held in archives may hold significant value, much of this data is currently not accessible. There are many steps required to convert data from non-digital to digital form. Our Extracting Data from Archives: Best Practice Guide shares useful processes that will save time and money when undertaking this fundamental digital step.
Some of our key take-home points are:
- The importance of reconnaissance to understand what state your archive material is in before you start digitalising. Is the archive indexed? Do the documents require special handling? Are there IP or security issues?
- Use industry standards where they exist throughout a data extraction pipeline. Following data standards will increase the data interoperability and confidence in the data extracted, therefore potentially increasing data value as a product/service.
- Consider the full range and potential value of archive materials and extracted data. As well as any wider applications and cascade uses.
- Avoid partial extraction as future currently unknown use cases will likely require wider content.
Opening up this wealth of national data will likely play a key role in our response to many of the 21st century's challenges, from issues on climate change through to poverty and much more.
The Partner Bodies referred to above include the British Geological Survey, Coal Authority, HM Land Registry, Valuation Office Agency and UK Hydrographic Office.
You can find more data improvement resources which all data managers can use by following this link - Best practice guidance and tools for geospatial data managers.