Understanding the EU AI Act as a Data Scientist

Do not index

Canonical URL

What is the EU AI Act?

The EU AI Act is a regulatory framework proposed by the European Union to govern the development, deployment, and use of AI systems within the member states. The Act aims to ensure that AI is developed and used in a manner that is both safe and respects fundamental rights.

The Act follows a risk-based approach to define which regulatory measures to apply to AI systems.

The idea is to be strict only when an AI system is likely to have an impact on safety, fundamental rights, and societal values. There are three levels of risk: unacceptable, high, and low.

In this blog post, I’ll go through some of the essential parts of the Act, specifically the parts that a data scientist might care about. Such as knowing the difference between each risk level, what the requirements are to comply with the Act, how much the fines are if I don’t comply with it, and finally, how this new regulation might change the way we put models in production.

Let’s start by understanding the different risk levels.

How are the AI risk levels defined?

According to the Act, every AI system must be categorized into one of the following three risk categories: unacceptable risk, high risk, or low risk. The category should be calculated by considering the AI’s impact on human rights and safety. Let’s take a closer look at each risk level.

Unacceptable Risk

There are three types of AI systems that are forbidden to develop or use.

Manipulative systems: These are systems that, through subliminal techniques or by exploiting vulnerabilities of specific groups, are able to distort people's behavior in a manner that is likely to cause them or another person psychological or physical damage.

Social scoring systems: The Act also prohibits any public authority from using social scoring systems to rank or categorize people. (Kind of similar to the one in the Black Mirror episode).

Real-time biometric identification systems: ‘Real-time’ remote biometric systems in publicly accessible spaces for the purpose of law enforcement are also forbidden unless it is used for:

A targeted search for specific potential victims of crime or missing children.
Prevention of imminent threats to life or terrorist attacks.
Prosecution of a perpetrator or suspect of a crime.

High Risk

High-risk systems are probably the ones we should focus on since they are the most regulated through the Act. The list of high-risk systems is long, but generally speaking if an AI system is intended to be used as a safety component of a product or as a product itself, they are probably categorized as high-risk. Some specific examples of high-risk AI systems are:

Operation of critical infrastructure: AI systems intended to be used as safety components in the management and operation of road traffic and the supply of water, gas, heating, and electricity.

Education: AI systems intended to be used for the purpose of assessing students in educational and vocational training institutions.

Recruitment: AI systems intended to be used for recruitment or selection of people, notably for advertising vacancies, screening or filtering applications, and evaluating candidates during interviews or tests.

Credit Scores: AI systems intended to be used to evaluate people's creditworthiness or establish their credit score, with the exception of AI systems put into service by small-scale providers for their own use.

These are only a few of them, for a complete list, check out Annex III of the EU AI Act. Or ask our EU AI GPT bot to see if your check the risk level of your case.

Low Risk

We could say that everything that is not prohibited or high-risk falls into the low-risk category. These systems are still subject to transparency obligations, and users must be provided with clear information about the system's capabilities and limitations, but they don’t require additional regulatory requirements.

How to comply with the EU AI Act?

As mentioned before, the Act introduces requirements and compliance measures mostly for high-risk AI systems. The full list of requirements can be separated between risk management, data governance, technical documentation, record-keeping, transparency and explainability, human oversight, and accuracy.

Let’s go through all of these requirements to understand better what they mean.

Requirements for high-risk AI systems

Risk management system

Developers and operators of high-risk AI systems must implement a risk management approach. This involves assessing potential risks, implementing safeguards, and ensuring transparency throughout the system's lifecycle.

Data and data governance

The Act mentions that high-risk AI systems that involve training a model with data should be developed on the basis of training, validation, and testing datasets that are subject to appropriate data governance and management practices. Some of the listed practices are:

relevant design choices of the data and data collection process

relevant data preprocessing

examination of possible biases in the data

identification of any possible data gaps or shortcomings and how they can be addressed

They also mentioned that the train, validation, and test datasets should be relevant, representative, error-free, and complete. The specific rules of how this should be validated are still missing.

Technical documentation

Before putting the system on the market, a comprehensive technical documentation should be prepared. This includes information about the system's design, functioning, and intended use.

Record-keeping

High-risk AI system should automatically record events (logs) while it is operating. This is to ensure a level of traceability of the AI system’s functioning.

Transparency and provision of information to users

High-risk AI systems should be transparent enough to allow users to interpret the systems’ output and limitations. In specific, it should inform the user about the level of accuracy of the AI system on the validation and test sets as well as any known or foreseeable circumstances that may have an impact on the expected level of accuracy. Other transparency requirements are mentioned in Article 13 of the Act.

Human oversight

High-risk AI systems should be designed and developed in such a way that their maintainers can oversee them during the period in which the AI system is in use. This means that providers of the AI system should be able to monitor its operation, so that signs of anomalies, dysfunctions, and unexpected performance can be detected and addressed as soon as possible.

Accuracy, robustness, and cybersecurity

The relevant performance metrics of high-risk AI systems should be shared with the instructions of use.

Reviews of papers, articles and law impacts on data science, every month in your inbox

Testing your system in a regulatory sandbox

To regulate that all the previously mentioned requirements are met, the Act proposes the creation of AI regulatory sandboxes. These sandboxes are controlled environments that, in principle, should facilitate the development, validation, and testing of AI systems before they are put on the market.

The Act mentions that if there are any big dangers to people's health, safety, or basic rights found while making and testing these systems, they have to be fixed right away. If they can't be fixed, then the whole process of making and testing the systems has to stop until the problems are taken care of.

The specifics of how these regulatory sandboxes will work and how to access them are not yet defined and are expected to be defined in an “Implementation Act”. But, they do mention that to support innovation from small-scale providers and startups, they will have priority to access the regulatory sandboxes.

Implementing a Monitoring System

By post-market monitoring, the Act refers to a set of regulations that focus on keeping an eye on AI systems once they are already in use. It aims to ensure these systems' ongoing safety, effectiveness, and compliance. Here's an explanation of some key points:

Monitoring system: Providers of high-risk AI systems should establish a monitoring system that:

Systematically collects, documents, and analyses relevant data provided by users and its effects on the performance of high-risk AI systems throughout their lifetime.
Allow the provider to evaluate the continuous compliance of AI systems with their requirements.

Monitoring plan: The monitoring system should be based on a post-market monitoring plan, which should be part of the technical documentation of the AI system.

What are fines and penalties?

What happens if a provider doesn’t comply with the previous requirements? Well, it is a bit complicated since the exact fines depend on the nature, cavity, and duration of the infringement and the size and market share of the entity committing the infringement.

But, broadly speaking, fines can be separated into three categories.

Applying forbidden AI practices or non-compliance with the data and data governance requirements: Administrative fines of up to €30 000 000 or, if the offender is a company, up to 6% of its total worldwide turnover for the preceding financial year, whichever is higher.

Non-compliance with any of the other requirements: Administrative fines of up to €20 000 000 or, if the offender is a company, up to 4% of its total worldwide turnover for the preceding financial year, whichever is higher.

Give incorrect, incomplete, or misleading information to the competent authorities: Administrative fines of up to €10 000 000 or, if the offender is a company, up to 2% of its total worldwide turnover for the preceding financial year, whichever is higher.

How does this affect my day-to-day job as a Data Scientist?

Well, the good news is that there will be plenty of work for data scientists.

If you are working on a system that might be qualified as high-risk, there are some things you should be aware of.

Your ML models should be developed on the basis of training, validation, and testing datasets. The data collection, design choices, preprocessing, and biases of your data should be well documented before putting your model in a production environment. It is also mentioned that these datasets should be relevant, representative, and error-free.

Data governance best practices should be applied to these datasets.

Your AI system should have technical documentation that demonstrates that it complies with all the requirements. This documentation should be kept up to date.

Users of the AI system should have access to the obtained performance of the system on the validation and test datasets.

Any change to the AI system and its performance should be reported in the technical documentation.

It will also be necessary to document the expected lifetime of the system and any necessary maintenance measures.

AI systems should be monitored in such a way that their maintainers can address and identify as soon as possible signs of anomalies, dysfunctions, and unexpected performance.

Before putting the AI system into production, it should have been tested on the AI regulatory sandboxes.

Design and implement a post-market monitoring plan to systematically collect, document, and analyze relevant data provided by users.

💡

At NannyML, we are helping companies to make their machine learning models compliant with the EU AI ACT. To learn more, check out how to prepare your team for the EU AI Act Implications.

The Act has the potential to impact the way we do data science, particularly for data scientists working on high-risk AI systems. However, there are still many aspects that need clarification, such as how are the sandbox systems going to work, what is the process of obtaining the official categorization, and what is the timeline for enforcement. We anticipate that future versions of the Act will provide more clarity on these matters.

In the meantime, it is the responsible thing to do is to stay informed about it and proactively establish the foundation for future AI systems. By doing so, we can ensure that we are well-prepared to comply with the regulations when the time comes.