The Clarkson Law Firm v. OpenAI class action highlights key legal and ethical challenges in the rapidly evolving field of artificial intelligence, specifically relating to the use of personal data for training AI systems. Filed in June 2023 in the United States District Court for the Northern District of California, this lawsuit underscores growing concerns over privacy, intellectual property, and the regulatory responsibilities of AI developers. The plaintiffs allege that OpenAI used personal data without consent, violating privacy laws and potentially failing to adhere to both state and federal regulations.
The legal claims made in the Clarkson case include allegations that OpenAI breached the California Consumer Privacy Act (CCPA) by collecting personal information without user consent, the Fair Credit Reporting Act (FCRA) by using data in an unauthorised context, the federal Wiretap Act by allegedly capturing personal information without notification, and the Computer Fraud and Abuse Act (CFAA) by accessing data without proper authorisation. The lawsuit demands damages and injunctive relief to prevent further unauthorised data use, drawing attention to the potentially substantial liabilities for AI companies in managing data acquisition responsibly.
The Clarkson complaint reflects broader industry scrutiny. OpenAI, along with Microsoft as a significant stakeholder, faces allegations that approximately 300 billion words were collected without user consent, including personal data, for training its ChatGPT model. This lawsuit asserts that OpenAI bypassed legally compliant data acquisition methods and allegedly failed to register as a data broker, a regulatory requirement aimed at promoting transparency in personal data usage.
Beyond the privacy concerns, the case against OpenAI includes claims about insufficient safeguards to restrict access by children under 13, invoking similar criticisms faced by other tech giants. The plaintiffs’ request for $3 billion in damages further signals the seriousness of the perceived privacy violations and calls for accountability in data handling by AI companies.
In parallel, similar lawsuits illustrate the growing legal pressure on generative AI developers. For example, Andersen v. Stability AI et al. addresses copyright issues stemming from image scraping, Getty Images v. Stability AI involves the alleged use of 12 million images without permission, and Copilot Authors v. Microsoft, GitHub, and OpenAI questions whether AI can lawfully use licensed code snippets without attribution. These cases collectively highlight the need for robust intellectual property and privacy frameworks that align with AI advancements, as well as ethical considerations that balance innovation with user rights.
As such, the legal landscape surrounding AI development is growing more intricate, suggesting a pressing need for clear regulations and ethical standards to mitigate risks in data handling, intellectual property rights, and content generation. Future cases will likely continue to test the boundaries of AI’s impact on individual rights, prompting further discourse on the responsibilities of AI companies.
As of November 2024, there have been significant developments in the legal cases involving OpenAI and Stability AI.
Clarkson Law Firm v. OpenAI
The class-action lawsuit filed by the Clarkson Law Firm against OpenAI in June 2023, alleging unauthorised use of personal data for AI training, has seen notable progress. In May 2024, a federal judge in California dismissed the initial complaint, citing its excessive length and inclusion of irrelevant allegations. However, the judge permitted the plaintiffs to file an amended complaint, which they have since done. The case is currently proceeding through the pre-trial stages, with both parties engaging in discovery and pre-trial motions.
Andersen v. Stability AI et al.
In Andersen v. Stability AI Ltd., filed in January 2023, artists Sarah Andersen, Kelly McKernan, and Karla Ortiz allege that Stability AI, Midjourney, and DeviantArt infringed their copyrights by using their artworks without permission to train AI models. On August 12, 2024, Judge William Orrick of the Northern District of California issued an order granting in part and denying in part the defendants’ motions to dismiss. The court allowed the plaintiffs’ direct copyright infringement claims to proceed, particularly concerning the storage of their works on the defendants’ systems. However, claims of unjust enrichment, breach of contract, and certain other allegations were dismissed. The court did not address the fair use defense at this stage, indicating that such determinations are premature without a more developed factual record.
Getty Images v. Stability AI
In Getty Images v. Stability AI, initiated in March 2023, Getty Images contends that Stability AI unlawfully copied approximately 12 million of its images to train the Stable Diffusion AI model. In December 2023, the UK High Court dismissed Stability AI’s application for summary judgment and strike out, allowing Getty Images’ claims to proceed to trial. The court found that Getty Images presented a credible case of copyright infringement, particularly regarding the use of its images in training the AI model. This decision underscores the court’s recognition of potential copyright violations in the context of AI training datasets.
Copilot Authors v. Microsoft, GitHub, and OpenAI
In Doe 1 et al. v. GitHub, Inc. et al., filed in January 2023, a group of software developers allege that Microsoft, GitHub, and OpenAI infringed their copyrights by allowing GitHub Copilot to reproduce licensed code snippets without proper attribution. On July 8, 2024, Judge Jon Tigar of the Northern District of California dismissed the majority of the plaintiffs’ claims, including those under the Digital Millennium Copyright Act (DMCA) and certain state law claims. However, the court allowed two claims to proceed: one alleging breach of open-source licenses and another for breach of contract. The court’s decision narrows the scope of the lawsuit but permits the plaintiffs to pursue claims related to the alleged misuse of open-source code.
These cases underscore the increasing legal scrutiny faced by AI developers concerning data usage and intellectual property rights. The outcomes of these proceedings are likely to have significant implications for the AI industry, particularly in establishing legal precedents for data acquisition and usage in AI training.
The progression of these lawsuits underscores a pivotal shift in the legal landscape as it adapts to the rapid evolution of artificial intelligence. Each of these cases reveals not only the complexities surrounding copyright and privacy law in the context of AI but also signals broader legal and ethical questions that courts, lawmakers, and stakeholders must now confront. The impact of these cases will likely influence legislative action, shape judicial interpretations of existing laws, and propel the establishment of new regulations specifically designed for AI.
Copyright Law in the Context of AI Training Data
The cases of Andersen v. Stability AI et al. and Getty Images v. Stability AI directly challenge the notion that data scraping of copyrighted materials, even when publicly accessible online, can be permissible for AI training purposes. Courts are being asked to decide whether AI developers can freely use publicly available creative works without infringing on copyright protections. The outcomes could set influential precedents in intellectual property law, clarifying whether and how copyright laws apply to training datasets. This may prompt AI companies to secure licenses for copyrighted material or develop original datasets, thereby reshaping AI training practices.
Legal Definition and Fair Use in AI Development
The dismissal of some claims in Copilot Authors v. Microsoft, GitHub, and OpenAI highlights the emerging legal interpretations around fair use and copyright exceptions. Although the court allowed specific contractual claims to proceed, the decision to dismiss DMCA and related copyright claims points to the legal intricacies in defining “transformative use” and determining fair use for AI. If fair use becomes applicable in AI training contexts, it may open the door to broader interpretations of data usage under copyright law. Conversely, if stricter limitations are imposed, AI companies may face higher barriers in accessing diverse datasets, leading to more conservative AI development.
Consumer Privacy and Data Use Regulation
The Clarkson lawsuit, although dismissed, brought critical attention to privacy concerns and the need for explicit regulatory guidelines on how personal data can be used for AI training. If plaintiffs refile similar suits, privacy advocates may push for updates to privacy laws like the California Consumer Privacy Act (CCPA) and the General Data Protection Regulation (GDPR) to better account for the nuances of AI data usage. These updates could require clearer disclosures, consent processes, and possibly registration as a data broker. Such changes would enforce transparency and accountability, obligating AI developers to respect individual rights and limiting unauthorised data usage.
Establishing Ethical Standards for AI Development
These lawsuits have highlighted the ethical dimensions of AI, particularly regarding transparency and respect for intellectual property. Legislative bodies may increasingly view AI development through an ethical lens, potentially requiring developers to obtain explicit permissions and secure fair compensation for contributors of data and creative content. Ethical AI development standards might emerge as part of a broader regulatory framework, encompassing not only copyright compliance but also user consent and safeguards against discriminatory or exploitative practices in training datasets.
Impact on Business Models and Compliance Costs for AI Firms
The legal challenges could alter the business models of AI companies, with a shift towards proprietary datasets, licensed data acquisition, or even partnerships with copyright holders. Increased compliance costs are likely if these cases lead to tighter regulations around data acquisition, as companies will need to invest in lawful data sourcing, maintain robust compliance frameworks, and defend against potential litigation. The outcomes may also incentivise companies to develop new data protection technologies, such as synthetic data or federated learning, to mitigate privacy risks.
Influence on Future Legislation and AI Governance
Policymakers will likely be compelled to revisit existing frameworks, potentially drafting new legislation that balances innovation with consumer rights. These cases could catalyse the introduction of AI-specific regulations akin to the regulatory standards for consumer data protection in digital commerce. Internationally, countries may follow suit, establishing harmonised AI governance principles. Such frameworks could address the dual need to foster technological progress and ensure that AI systems adhere to legal standards.
Encouraging Precedent-Setting Decisions
These cases mark a significant legal threshold, with courts playing a decisive role in defining AI’s responsibilities and the limits of acceptable data usage. The judiciary’s decisions may establish foundational principles for AI accountability, offering legal precedents that other jurisdictions might adopt or adapt. Courts are poised to shape the role of AI as a responsible digital citizen, with implications for privacy, intellectual property, and ethics.
In conclusion, these cases are part of a broader transformation of legal and regulatory frameworks surrounding AI, likely steering the direction of law towards robust protections for data originators and stricter controls on AI training practices. The legal impact extends beyond individual case outcomes, setting a trajectory for more sophisticated and accountable AI governance that balances innovation with respect for individual rights and ethical standards.
Older version of the article continues below:
The title of the latest privacy class action lawsuit against OpenAI is Clarkson Law Firm v. OpenAI. It was filed on June 28, 2023, in the United States District Court for the Northern District of California. The lawsuit is still in its early stages, and it is not yet clear how it will be resolved. However, the lawsuit is a sign of the growing concern about the privacy implications of artificial intelligence. As AI technology continues to develop, it is likely that we will see more lawsuits like this one in the future.We will keep updating this blog post with more updates.
The lawsuit was filed by a group of individuals who allege that OpenAI violated their privacy by using their personal data to train its AI models without their consent. The lawsuit specifically alleges that OpenAI used data scraped from the internet, including social media posts, blog posts, and Wikipedia articles, to train its AI models. The lawsuit also alleges that OpenAI did not disclose to users that their data was being used for this purpose.
The lawsuit seeks damages for the plaintiffs’ alleged privacy violations, as well as an injunction preventing OpenAI from continuing to use users’ data in this way.
The lawsuit specifically alleges that OpenAI used data scraped from the internet, including social media posts, blog posts, and Wikipedia articles, to train its AI models. The lawsuit also alleges that OpenAI did not disclose to users that their data was being used for this purpose.
The lawsuit seeks damages for the plaintiffs’ alleged privacy violations, as well as an injunction preventing OpenAI from continuing to use users’ data in this way.
Here are some of the specific claims made in the class action lawsuit:
- OpenAI violated the California Consumer Privacy Act (CCPA) by collecting and using personal information without users’ consent.
- OpenAI violated the Fair Credit Reporting Act (FCRA) by collecting and using personal information for a purpose not authorized by the FCRA.
- OpenAI violated the federal Wiretap Act by collecting and using personal information without users’ knowledge or consent.
- OpenAI violated the federal Computer Fraud and Abuse Act (CFAA) by accessing and using personal information without authorization.
Open AI’s growing number of legal challenges
The class-action lawsuit filed against OpenAI adds to the growing number of legal challenges faced by companies involved in the development and utilization of AI technologies. In this particular case, sixteen individuals have sued OpenAI, alleging that their personal information was collected without proper notice through the company’s AI products, particularly those based on ChatGPT.
The lawsuit, filed in a Federal Court in San Francisco, accuses OpenAI of bypassing legal data acquisition methods and unlawfully gathering personal information without compensation to the individuals involved. The plaintiffs’ law firm, Clarkson, argues that OpenAI’s actions infringe upon privacy rights, intellectual property rights, and potentially violate copyright laws. Additionally, Microsoft, which has made a significant financial commitment to OpenAI, has been named as a defendant in the lawsuit.
According to the complaint, OpenAI and Microsoft deviated from established protocols by engaging in data scraping activities, collecting approximately 300 billion words from various sources without proper consent. This data allegedly included personal information obtained without permission, violating privacy laws and ethical data practices. Furthermore, OpenAI is accused of failing to comply with the legal requirement to register as a data broker.
The lawsuit also raises concerns about OpenAI’s insufficient measures to prevent children under 13 from accessing its AI tools, echoing previous criticisms directed at other tech giants. The plaintiffs seek damages totaling $3 billion from Microsoft and OpenAI, although this figure is likely a placeholder and not an actual assessment of damages.
This lawsuit is part of a broader trend where companies involved in AI development face legal disputes related to data usage, intellectual property, defamation, and other legal implications. Similar concerns have emerged in cases such as Japan’s warning to OpenAI regarding privacy regulations and Getty Images’ lawsuit against Stability AI for unauthorized use of copyrighted photographs in training an AI system.
Furthermore, OpenAI previously faced a defamation lawsuit by a radio host who claimed that ChatGPT wrongfully accused him of fraud. These cases shed light on the potential risks associated with AI-generated content and the legal ramifications that may arise.
Overall, the legal challenges faced by OpenAI reflect the complex legal landscape surrounding AI development and underscore the need for robust regulations and ethical practices in the field.
Other ongoing cases against Generative AI:
- Andersen v. Stability AI et al. (filed January 2023): This class action lawsuit alleges that three artists’ copyrights were infringed by generative AI platforms Midjourney, Stability AI, and DeviantArt, which trained their AI on images scraped from the internet without permission.
- Getty Images v. Stability AI (filed March 2023): This lawsuit alleges that Stability AI copied 12 million of Getty Images’ images without permission to train its Stable Diffusion AI art tool.
- Copilot Authors v. Microsoft, GitHub, and OpenAI (filed January 2023): This class action lawsuit alleges that Microsoft, GitHub, and OpenAI violated copyright law by allowing Copilot, a code-generating AI system trained on billions of lines of public code, to regurgitate licensed code snippets without providing credit.