Artificial Intelligence, Real Risks: Privacy Considerations for Generative AI

Published: Feb. 17, 2023

Updated: Feb. 18, 2023

Generative AI is not new, but it has garnered new levels of attention and adoption in recent months. Models like DALL-E, Stable Diffusion, and Midjourney have flooded the Internet with AI-generated art, and in November 2022, ChatGPT took the world by storm with easy user interface and high-quality text generation. While generative AI has enormous potential, companies interested in using this technology should understand its legal and ethical risks. 

This post focuses on some key privacy risks and compliance requirements, offering four main takeaways:

  • Training data may include personal data that was collected in violation of privacy laws, which may taint the model and any products that use it.
  • Depending on how they’re used, AI models may qualify as automated decision-making, which creates heightened obligations.
  • Privacy obligations such as honoring data subject requests may be difficult to do in practice, given the way data is collected and used.
  • Performing data protection impact assessments requires examining how personal data is processed, which can be challenging with generative AI because its processing operations can be complex and obscure. 

What is generative AI?

As the name suggests, generative AI refers to artificial intelligence models that generate content in various forms, including text, images, audio, and even protein structures. The models consist of deep-learning algorithms that are fed massive datasets of a certain type of content. In general, current models learn to recognize the content, distinguish between AI-generated and non-AI-generated content, and eventually produce new content that approximates the non-AI-generated content in the training data. Models can be refined to serve a more specific function – for example, to provide instructions or respond to questions in a lifelike manner – using additional data and human feedback.

Privacy Risks & Requirements

Training Data

Because AI requires vast amounts of training data, developers rely on a wide variety of data sources, including third-party data sets (paid and open source), data harvested from consumer apps, scraped data, and, in rarer cases, data collected directly by the developer. Many of these data sets have uncertain provenances and contain personal data, which implicates several obligations under applicable data protection laws. The developers of these models may face steep penalties if they do not obtain and use training data in compliance with such laws.

For example, under the General Data Protection Regulation (“GDPR”), companies must base their collection and use of personal data on one of six “legal bases.” Companies must also inform individuals of their rights with regard to personal data (such as the right to access and delete certain data) and enable people to exercise these rights. In the last year, EU and UK authorities have levied multimillion-euro fines and banned services, such as Clearview AI’s facial recognition software, for allegedly violating these obligations by training their software on publicly available data sets. And in the U.S., the FTC has sought “algorithmic disgorgement” against companies who have used deceptive data practices to build AI models, requiring them to destroy ill-gotten data and the models built with it.

For these reasons, generative AI developers should carefully source training data to ensure compliance with applicable privacy laws. Whether developers obtain this data directly, through scraping, or from a third-party data provider, they should ensure that the data is collected, used, and disclosed in compliance with applicable privacy laws. Use of synthetic data (artificially generated data that’s designed to mimic real world data) is another option.

Companies using third-party generative AI tools may also face legal, practical, and reputational risks due to the provider’s noncompliance. For example, assume you run an app where users can create custom avatars. A generative AI tool integrated into the app allows users to upload photos of themselves and generates faces for the avatars based on these photos. If the tool developer violates applicable privacy laws, you may find yourself under a regulator’s microscope as well. In addition, if the tool developer becomes subject to algorithmic disgorgement, you would lose your avatar functionality – and users with it. The PR fallout could also result in users abandoning the app, as well as loss of revenue, funding, and partnerships. Finally, depending on their terms, AI developers may use any personal data your users provide for their own purposes, such as further training their models or providing the personal data verbatim in response to other users’ queries.

For these reasons, companies should diligence AI providers for compliance with privacy laws, including in relation to the training data and the way that the provider will use any personal data inputs, and include strong contractual protections where possible. 

Automated Decisions & Profiling

Various privacy laws regulate the use of automated data processing to make decisions that affect individuals. Generative AI may implicate compliance obligations under these laws. Under the GDPR, individuals “have the right not to be subject to a decision based solely on automated processing, including profiling,” that has legal or similarly significant effects (GDPR Article 22(1)). Privacy laws in ColoradoVirginia, and Connecticut give individuals the right to opt out of personal data processing for purposes of profiling. Connecticut follows the GDPR in limiting this to “solely” automated profiling, while Colorado’s draft privacy regulations distinguish among several types of automated processing based on the level of human involvement. California is in the process of developing regulations covering automated decision-making.  

Other Requirements

Numerous privacy laws give individuals the right to access, delete, and correct their personal data, which may be difficult where the personal data has been absorbed into the AI model. Even if it is possible to honor these rights, it is unclear whether the model would need to be re-run without the deleted data (or with the corrected data).

A data protection impact assessment may also be necessary for use of AI tools, which would require a detailed understanding of how the underlying technology works and documentation of how personal data will be processed, whether the processing is necessary and proportionate to the purpose of processing, risks to individuals’ rights, and safeguards that will be used to mitigate these risks. Examining a generative AI model in this level of detail may be time-consuming and require cooperation with the AI developer, who may not be willing (or able) to provide details regarding how the model works.

Generative AI may trigger additional compliance requirements, so companies that develop or use this technology should understand the privacy laws applicable to them and thoroughly evaluate their compliance obligations.