Data Management Specialist at Ordnance Survey, Howard Askew, describes his recent work on Data Improvement for the Geospatial Commission, and how this new tool will help to tackle, two of the big challenges faced by users of geospatial data today:
- How do you find the data you need...?
- ...And how do you know if you can trust it?
The COVID-19 pandemic response brought the power of geospatial data analysis into sharp focus. Now more than ever, we need to maximise the smart use of location data, analysis and insight to make good decisions to drive economic, social and environmental recovery.
Where do you start and how do people find data they can trust?
To start to answer this question we gathered colleagues from around 30 public bodies from England, Scotland, Wales and Northern Ireland. We stuck them in a virtual room and together we explored how to signpost new data users to relevant organisations.
It’s little surprise, that for most an internet search engine is usually the first port of call, experienced users might visit individual organisations or departmental websites directly, whilst others find themselves on data portals such as data.gov.uk.
Whatever route users take, our research shows they encounter significant challenges:
- There are so many varied organisations, departments, agencies and arm’s length bodies.
- They cover different geographies and jurisdictions and sometimes overlap.
- They can use different terminology for similar subjects, topics or concepts.
- There is enormous variation in the way data is published and labelled so the most useful data may be lost among older, less relevant, or duplicated resources.
This ‘Data Discovery’ is the first stumbling block on the road to improved access to better location data and data-driven decision making. The challenge faced is not insignificant, however, this is where discovery metadata really demonstrates its value.
What is Metadata?
Think of the labels on all the cans in a supermarket. If the labels were missing, how would you know if you were buying baked beans, mushy peas or dog food? Moreover, imagine you are directed to the area where the beans are kept and are confronted with the no label scenario! You know you're getting beans as you're in the bean aisle, but the disappointment, wasted effort and frustration you feel when you bring home salad beans instead of baked beans, effectively ruins date night! These metadata labels help organise, share and discover data assets. Metadata is not a new concept. There are many documented metadata standards all describing roughly the same intent, one such standard recommended by Government for geospatial data is UK Gemini.
What may come as a surprise is that many of our publicly held data have missing, incomplete or incorrect metadata. We found huge variation in approaches to data discovery among UK data publishers. Often this work is seen as secondary to an organisation’s public task and deprioritised in favour of frontline services.
Mapping data publishers to themes and keywords
I started with data.gov.uk – envisaged as a single portal for all public sector datasets. I wanted to explore a data-driven approach to the challenge and used the Application Programmer Interface (API) to retrieve information on all the resources listed there. What can we learn from this?
- 55,000 datasets, from 1,400 data publishers.
- Topics ranging from Environment and Mapping (most common) down to Crime and Justice, and Defence (least common).
- Many records were not labelled with a topic.
- In some cases, topics listed did not align with recognised classification schemes – such as the United Nations Global Geospatial Information Management (UN-GGIM) Fundamental Themes or the INSPIRE themes that feature in the UK Gemini metadata standard.
This hints at two of the wider challenges around improving discovery – incentivising organisations to use metadata labels, and managing reference data across sectors. Is there a consistent language that can be used across organisations?
Digging deeper, focusing on the 22,000 geospatial datasets from central government bodies and agencies, I collected metadata labels and potential keywords by text mining the titles and abstracts. For example, from the abstract…
“With Code-Point Open, you can display on a map any information that contains a postcode; for example, customer records. For basic route planning, Code-Point Open will locate your starting and destination postcode.”
… potential keywords could include ‘postcode’ and ‘route planning’.
I ranked 5,000 such words and phrases based on their popularity in search engines and shared the highest-ranking terms for discussion with publishing bodies. We worked together to refine the list and manage similar and overlapping terms. For example, ‘aerial photography’ from the Environment Agency relates to the same topic as ‘imagery’ from Ordnance Survey.
Signposting authoritative data publishers
Another challenge for new data users is around ‘trust’. Where there are many sources of data on a topic, how do you know which to choose? Ordnance Survey led a project on this subject for the Geospatial Commission last year and created the Authoritative Data Assessment tool for data publishers. By ‘authoritative data’, we mean:
Officially recognised data of appropriate quality provided by trustworthy organisations.
This is often related to the authority, remit or public task of the organisation. It is also related to confidence in the data, for example, the capture, maintenance and quality management processes. The Authoritative Data Assessment tool looks at both aspects to produce a Bronze, Silver or Gold ranking at a dataset level.
In this new tool, the Geospatial Themes and Related Data Publishers we created, we included a column to indicate where an organisation may be considered an ‘Authoritative Provider’ at a theme or topic level.
This can be informative, for example:
- Where data on a theme is published by more than one body, each authoritative in a different geographical region
- Where local publishers are authoritative, though national aggregations are published by a central body or bodies
- Where different publishers are authoritative in different aspects of a data theme
The ‘Authoritative Provider’ flag does not imply the organisation is the only provider for a topic, or that the organisation is authoritative in all aspects of a topic.
The Data Improvement Programme is just one step in the longer journey towards Mission 2 of the UK’s Geospatial Strategy – Improve access to better location data. The work has highlighted the challenges ahead and given us many valuable lessons which are informing our next steps.
We are working with the Geospatial Commission, its Partner Bodies and the broader data publishing community to tackle the challenges highlighted by the programme, for example:
- To promote more consistent use of metadata for geospatial data resources, we have developed a Metadata Best Practice Guide for Publishers which will be published shortly.
- We are sharing our location insight expertise with those working on cross-government data initiatives such as the Data Standards Authority, work on National Digital Twins, and the Office of National Statistic’s Integrated Data Programme (IDP)
- To further support new data users starting their search, we are sharing our findings with the data.gov.uk team.
- And, we are looking at our own discovery metadata processes to ensure our core data products are easily findable, accessible, interoperable and reusable.
We hope the list of data publishers by topic will be a useful resource for new data users while we tackle some of these challenges. We welcome feedback from users and from other data publishers who wish to be included.