The Perplexing Paradox of Anonymity: A Legal versus Engineering Perspective
The word “anonymous” holds a magical allure for companies seeking to protect their data. It conjures up images of secrecy, security, and the ability to safeguard sensitive information from prying eyes.
For companies that collect and store user data, anonymity offers several potential benefits. First, it can help to reduce the risk of data breaches and leaks. By de-identifying data, companies can make it more difficult for unauthorized individuals to access and exploit sensitive information. This can protect the privacy of their users and mitigate legal and reputational risks.
Anonymity can promote innovation and collaboration. By removing the fear of personal identification, companies can share data, allowing openness and experimentation. This applies to companies that are building AI or ML systems, as well as companies that are fine-tuning models with their own smaller data sets. In these cases, anonymizing the training data can help with experimentation and potentially also protect the model outcome and behavior.
In addition to these practical benefits, anonymity can also appeal to companies on a philosophical level. Many businesses believe in the importance of individual privacy and want to minimize the amount of personal data they collect and store. Anonymity can align with these values and help companies to position themselves as responsible stewards of user data.
With all these benefits, you would think we all agree on what anonymous means. Sadly we don’t. There remains a gap between legal and regulatory definitions of “anonymous” and what privacy engineers like myself mean. In general, the legal definitions are a bit behind what we people working on PETs consider to be truly safe. When any company says they are “anonymizing” a dataset, for me it just opens up more questions. Are they simply removing strong identifiers and leaving everything else (which might be fine legally), or are they applying differential privacy (my gold standard)? Or are they doing something in between? There is such a range of risks for these different methods – all called anonymization – that it makes it hard to understand how well the data is protected.
The difference between regulators’ and privacy engineers’ definitions of “anonymous” reflects the challenges of protecting privacy in the digital age. What may be fine from a a legal standpoint now, may not stand up to attacks in the future.
In the past 13 years of working in privacy, I’ve seen that as new attacks are developed, organizations can be caught on the back foot. As attackers become more sophisticated, they can find new ways to link “anonymized” data to individuals. Therefore, anonymization is not a one-time process; it is an ongoing effort that needs to be constantly reviewed and updated.
However, all is not lost. Having the right process in place can help your privacy program stay up to date. Even if you don’t choose to use differential privacy to anonymize your dataset, there are ways to keep your program resilient.
Overall, organizations can consider two steps. First, build a business process to update data protection. This starts with being clear internally about what is meant by “anonymous” and building a process to log and test the protective measures. Internal clarity will have long-term benefits as long as the data or models are retained or used.
Second, consider a technical process to test and monitor the “anonymization.” This may include regular vulnerability tests or measuring the re-identifiability of the data. (Yes, my company offers these services).
These two efforts have long-term benefits for company resilience, reputation, and trust.
In conclusion, anonymization is an ever-evolving process that requires constant vigilance and adaptation to maintain its effectiveness in safeguarding sensitive data. Companies must proactively engage in a two-pronged approach: establishing robust business processes to update and enhance data protection measures, and implementing rigorous technical procedures to test, monitor, and refine their anonymization practices. By embracing this ongoing commitment to data privacy, organizations can cultivate greater resilience, protect their reputation, and foster trust among their users.