Location data is becoming increasingly familiar to us in our daily lives, with new applications constantly emerging - from navigation tools, to coordinating emergency service response, or even finding the nearest Indian restaurant to me.
There have been numerous reports such as Initial Analysis of the Potential Geospatial Economic Opportunity’ that suggest that there is significantly more value - up to £11bn per year - that could be released through better and wider use of location data. The Geospatial Commission has been set up to unlock the latent value from the UK’s geospatial sector by increasing the use of location data and encouraging greater innovation with it.
As the Commission’s Data and Standards Lead in the Implementation team, my role is hands on and technical, with a focus on how we can improve access to government geospatial data, remove barriers and promote its use and increase the potential to join datasets across domains. As well as representing the commission on various boards and technical committees, a key part of my role is working with our six partner bodies, to deliver a programme of projects designed to ensure that the nation gets the maximum value out of their data.
Why are we not able to get full value now?
There is an often quoted statistic, that data scientists spend 80% of their time searching for and preparing data and only 20% actually doing analysis and generating insight. Anecdotally, and from my own experience, it feels about right! So what that means, is that at the moment, we as a nation are not able to use our valuable location data to the fullest extent and we have highly skilled analysts (a relatively scarce resource), spending most of their time doing low value, frustrating and tedious work.
In an attempt to explain why it is so difficult to use location data (although the same applies to almost any data in truth), I have adapted Maslow's Hierarchy of Needs to make a 'Hierarchy of Data Needs'.
The rationale behind Maslow's Hierarchy of Needs, is that an individual cannot reach their full potential (self actualisation) unless all their lower level needs are met first (eg food and water, security, relationships and emotional support).
In the same way, I would argue that you cannot achieve true knowledge and insight from data unless all the basic foundations are in place:
- If you cannot find data (or even know where to look) you cannot even start
- If you can find the data but can't access it you are just frustrated
- If you can find and access it but it is unusable for your purpose due to format or quality issues, you are back to square one
- If you can find and access the right data but you can't easily join it to other data sets, you will still potentially waste time and run the risk of making false connections
Collaborating to make location data more useable
The Geospatial Commission has been set up to provide strategic oversight of the geospatial landscape in the UK and as a key part of that, we are working closely with our six ‘partner bodies’ to make their data easier to find, access and use. The partner bodies (often referred to as the ‘Geo6’) will be familiar to many: Ordnance Survey, UK Hydrographic Office, HM Land Registry, British Geological Survey, Coal Authority and the Valuation Office Agency.
Over the last 15 months the Geo6 and others have been collaborating on a programme of data improvement projects aimed at developing consistent data standards, and improving the discoverability, accessibility, reusability and quality of public sector geospatial datasets.
These projects have focussed on three broad themes:
- Data discoverability - How do we make our data more easily discoverable by search engines? How do we help people understand which data sets are authoritative? How do we think consistently, about what data we are able to release and under what constraints? How can we improve Data.Gov.UK as a search portal for geospatial data?
- Data licensing - How can we harmonise the terms under which Geo6 data is licensed? How can we share data more freely across government departments? Can licenses be made machine readable and what are the benefits in doing so?
- Data linking - What is best practice in creating and managing unique identifiers? Is there value in creating authoritative 'correlation' relationships (ie where features are related but not the same) between government data sets and publishing them for anyone to use?
Catalogues, licencing and linked identifiers
We have been working on these topics for a while now. The first step has been to review the landscape and understand what happens at the moment across the Geo6 - good and bad, and see what data we all held.
In April 2019 we published detailed Data Catalogues for each agency, containing published and unpublished data which can now be found on Data.Gov.UK, if you search ‘geospatial data’.
A new single Data Exploration Licence (DEL) was produced to harmonise and simplify the access and use of geospatial data held by our Partner Bodies, allowing anyone to freely access data for research, development and innovation purposes.
We are now getting to the point where we have identified best practice and potential solutions. The first in a series of guides is the Linked Identifiers Best Practice Guide - based on the work of the Geo6, but peer reviewed by a wide range of colleagues from outside those organisations.
This guide gives practical steps to format your data, which will make it easier and quicker for people to search your data and join it with other datasets. If this best practice is used across all organisations, it will make it easier to join data and carry out more complex analysis on the country’s infrastructure and natural features - from housing and railways to roads, rivers and forests.
We are continuing to build on the work we have done to date and in the coming months you will see several more tools and guides published. Later this month we will publish guidance on the best ways to ensure that your data sets are found by internet search engines and appear higher in the rankings of search results.
We are producing tools for people who publish data to enable them to maximise the amount of data they share. The Dataset Risk Assessment tool will help to assess the risk of releasing a dataset against a range of criteria such as security, GDPR etc, propose mitigating actions where risks are identified and hopefully encourage data stewards to think ‘why can’t I share this data?’ rather than ‘why should I share this data?’
There will also be a tool for assessing the authority of a dataset and this will also provide a ‘flag’ that can be used to highlight the most authoritative data to users.
We are running a project looking at how to document and share relationships between datasets - so that they do not have to be created from scratch by each new user, and we are investigating how new technology can help to unlock the mass of information that is currently locked away on manuscript documents in the archives of the Geo6.
There will be more on all these in later blogs so watch this space, but we are keen to involve more people in our work so if you have found any of these topics interesting or have used any of our guides or tools or just have comments based on your experience, please get in touch via email@example.com
Sign up to this blog to get an email notification every time we publish a new blog post.