Like a castle wall under siege, AI systems need rigorous testing to expose weaknesses. Enter the AI red team (or red-team). AI red teams use human testers to uncover safety, fairness, privacy, and accuracy risks in AI systems. Red teaming may unearth a multitude of vulnerabilities, from biased decision-making to manipulation through crafted inputs. Ignoring these risks is akin to leaving the castle gates wide open.
But there is a practical problem.
Fixing multiple vulnerabilities will be hard because there are no clear standards to prioritize the different types of vulnerabilities an AI red-team might find.
Vulnerabilities can outnumber the resources or ability to fix them simultaneously. Deciding which vulnerability to address first can require time-consuming discussions that delay or block the remediation work.
There is no existing industry-wide standard to prioritize vulnerabilities caused by different types of AI risks. Even within one category of risk, such as privacy, prioritizing fixes is hard. The privacy community does not have a vulnerability scoring standard that is generally agreed upon.
The taxonomy of AI risks includes several that cannot all be simultaneously optimized. For example, accuracy may be at odds with privacy. Privacy protections in datasets include adding noise, allowing plausible deniability, or removing some outlying pieces of data that could identify individuals. All these privacy-protective mechanisms can reduce accuracy. Therefore, if an AI organization is faced with a choice between accuracy or privacy, it is unclear which provides the most safety to society.
A scoring system could help organizations prioritize fixes.
In the meantime, a practical first step for organizations is to run focused red-team exercises with clear goals and priorities. Managing and mapping vulnerabilities at an early stage will be more effective than running an AI red team on all possible risks and then trying to prioritize the vulnerabilities. AI red-team exercises should be based on a transparent threat model process, a clear statement of success, and resources for remediation.