What does AI red team even meaning? Is it just like a security red team?
Without a clear definition of what an AI “red-team” is, it is difficult to discuss the pros and cons and how to run an exercise effectively. It is also difficult to describe what other testing should complement AI red-teaming. I assume an “AI red-team” is a group of human experts who run non-automated tests on an AI system. But beyond that, it is unclear what anyone else means when they talk about AI red teams.
AI Red Teams need to be different from security red teams in three key ways.
- AI privacy tests should include clear-box testing. Clear-box testing means the AI red-team has access to or clarity on the training data, the model itself, or other information about how the model was built. Closed-box testing implies the AI red-teaming only has access to the outputs of the system (the “box” of the system is closed to testers). Effective measures for protecting privacy in AI could only be tested in a clear-box method. For example, protections such as applying differential privacy to the training data are immeasurable in the output alone. Therefore, outcome-only AI testing will be less effective for privacy, and I suspect for other elements of AI risk.
- AI testing need not assume malicious activities. An AI red-team can emulate a motivated, external adversary, but testing can and should also include other types of flaws. Vulnerabilities that are possible without any malicious actor often include “own-goals” introduced by the organization responsible for the model due to mistakes, poor planning, or unawareness of the risk. These vulnerabilities should be included in AI testing efforts.
- AI red-team testing need not include a blue team. Red-team testing in security and military exercises is often designed to test the “blue-team”, or to test the defense and detection capability. For many of the AI risks around human rights, it is unclear who or what the “blue-team” would be and how they would be measured. Therefore, AI testing must include a scan for all vulnerabilities, and should not be limited to testing defense and detection.