What is the relationship between AI and data?

Data collection, use, governance and deletion underpins AI technologies, because AI systems use machine-learning techniques that are trained on large amounts of data. Increased computational power and larger datasets underpin recent advances in AI.

Machine learning is data-driven. By learning from existing data (known as training data’), systems can identify patterns to make predictions or calculate probabilities. Based on what they learn from training data, systems can perform a variety of functions, like playing chess, recognising faces or assessing welfare benefit applications.

AI systems use both personal and non-personal data. Personal data includes any information that relates to an identified, living person. Data protection laws apply to the collection, storage and sharing of personal data. These state when and how it is legal to use personal data in AI technologies.

High-quality data is the lifeblood of successful AI technologies that can be hugely beneficial to people and society. Data that is high-quality, relevant, FAIR (Findable, Accessible, Interoperable, and Reusable), free from bias, and meets legal and ethical standards can be used in AI to deliver significant benefits. These potential benefits include some of the technologies discussed in the survey, such as early cancer detection or creating efficiencies through automating and simplifying processes.

It is important that the data in AI technologies is used in a fair way and follows legal frameworks. For example, a key principle of data protection law is the requirement to collect and process as little data as possible, and to keep it for no longer than needed. This is known as data minimisation’. 

However, some AI systems use huge datasets collected from the internet, which can include personal data that has been used without consent. For example, companies have been fined significant amounts for using images collected from the web to create online databases for facial recognition software.