Needles in Haystacks:
Quality Control For
Large Datasets

Large-scale flooding has increasingly been in the news over the last few years and anyone who has witnessed the aftermath of these floods will understand how devastating they can be, and how important it is to have adequate flood insurance.

With the majority of UK insurers using the JBA 5m UK Flood Map for risk analysis and pricing, having the most accurate data possible in our products is very important. Since there are over 9 billion individual 5m by 5m pixels in the UK Flood Map, you might think that an error in the odd square here or there wouldn’t matter.

However, if your house happened to sit on that square, you might find that your ability to get flood insurance was affected.

With such a large quantity of data, how do you make sure it’s correct without taking too long to check everything?

For such large datasets, it’s impractical to have someone check everything manually – if it took just a couple of seconds to check each square, it would still take around 600 years to check the entire map.

Instead, we use computers to carry out automated checks. When quality criteria for the data can be defined quantitatively (for example, flood depths may not be less than zero), a computer can look for problems much faster than a human and will make fewer mistakes. To use the analogy of looking for a needle (an error) in a haystack (a large dataset), a computer doing automated checks acts as a large magnet that can remove all the needles very quickly, while a human would be a farmhand examining each individual strand of hay.

The continuing need for human validation

However, even though using computers reduces the time it takes to check data by a huge amount, there are some quality criteria that cannot easily be translated into a form that a computer can understand.

For example, criteria that require judgement (A is not too different from B) or things that just look “odd”. In these cases, a human can identify quality issues that a computer can’t – to return to the previous analogy, a large magnet can efficiently remove needles from a haystack, but it can’t tell you if the whole haystack is blue. In these cases, a human is still needed to ensure the final quality of the data.

Looking to the future

One area in which future developments may further improve the speed and consistency of data quality control is machine learning.

This is an area of artificial intelligence which is being employed for a wide range of uses, from recommending what you should watch next on Netflix to the diagnosis of heart disease. At JBA, we are developing specialist machine learning tools to help construct and check our flood maps to ensure that our data meet the highest possible standards.

To find out more about our work, get in touch.

News &

Blog Validating Flood Maps

The validation of components and product throughout the development cycle of our flood maps gives us the confidence that we are creating the highest quality data. In this blog Cameron Whitwham takes us through some of the key steps.

Learn more
News Enhanced access to JBA Flood Maps for World Bank

The World Bank will have enhanced access to JBA Global Flood Maps following JBA's appointment to the Development Data Partnership.

Continue reading
Blog Flooding in South Africa: Understanding Risk

In a country that has already experienced some challenging flood events, JBA's South Africa flood risk intelligence offers insight to help manage and mitigate risk.

Continue reading
News South Africa’s New National partners with JBA for global flood data

Specialist insurer New National Assurance has partnered with JBA. The 50-year-old independent insurer will use the JBA flood data for South Africa, which includes climate change intelligence, to underwrite both commercial and personal lines property-risk.

Continue reading