Since the advent of the Internet, scrapers, platforms, and content owners alike have tried to identify the legal bounds and restrictions for web scraping. Scrapers want to access content at scale, and platforms seek to rebuff the same activity. However, there is no single “web scraping law,” and the activity may be subject to a mix of state and federal laws. And when disputes do end up in litigation, rulings may be so circumscribed that it can be difficult to identify consistent guiding principles.
Historically, web scraping lawsuits often focused on a range of claims, including the Computer Fraud and Abuse Act (CFAA), contract claims related to user agreements, and tortious interference. See our prior blogs for more details. Under these prevailing theories, courts generally found that platforms with publicly available content had limited recourse against scrapers.
But the rapid rise of artificial intelligence (AI) appears to be reshaping the focus and strategy in recent web scraping litigation, particularly for publicly available information. It’s widely known that AI companies have scraped dozens of terabytes of text from the Internet to train their models and create cutting-edge products. Now that those products are on full display—many with enormous market value—platforms and content owners are refocusing on the legality of scraping publicly available content.
While courts are fielding a deluge of recent scraping-related lawsuits, Reddit generally has led the charge with multiple lawsuits that reflect both historic and new legal theories. The resolution and path of these cases may signal a new direction for scraping litigation, especially in the AI context, going forward.
Will Contract or Copyright Prevail?
In June 2025, Reddit sued Anthropic in California court under several theories: breach of contract, unjust enrichment, trespass to chattels, tortious interference, and unfair competition. Notably, Reddit did not allege copyright infringement or violation of any other federal law that would require bringing the suit in federal court. See Reddit, Inc. v. Anthropic, PBC, No. CGC25625892 (Cal. Super. Ct. filed June 4, 2025).
Among other points, Reddit argued that Anthropic:
- Violated Reddit’s User Agreement by engaging in prohibited scraping;
- Impeded Reddit’s ability to comply with its obligations to its users, under the User Agreement, on account scraping;
- Diminished Reddit’s server capacity and has hindered Reddit’s ability to provide its services to users; and ignored or circumvented Reddit’s anti-scraping measures.
The parties quickly became—and remain—consumed with whether state or federal court is the proper venue for this dispute, and the answer to that question may have a material effect on which party prevails, in or out of court.
While Reddit focuses on breach of contract and other state law claims, Anthropic contends that the majority of Reddit’s causes of action are not substantively different from rights protected by copyright law, and therefore those claims are preempted by the Copyright Act. As of this post, Anthropic successfully removed the case to the United States District Court for the Northern District of California, and Reddit has moved to remand the case to state court, which Anthropic has opposed.
The litigation is poised to be one of the first cases to build on X Corp. v. Bright Data, where the Northern District of California (the same court overseeing the Reddit case) dismissed X Corp.’s claims that Bright Data’s scraping of X’s users’ posts violated X’s Terms of Service because that contract claim was preempted by the Copyright Act. The court found that by permitting X Corp.’s claims to proceed, X Corp. would be allowed to “entrench its own private copyright system that rivals, even conflicts with, the actual copyright system enacted by Congress,” and to “exercis[e] a copyright owner’s right to exclude where it has no such right.”
As this case unfolds, we hope to receive some clarity on the relative strength of contract and copyright claims and defenses in this new era of web scraping litigation.
DMCA Section 1201 – CFAA by Another Name?
Only a few months after suing Anthropic in state court in California, Reddit filed a lawsuit against Perplexity AI in federal district court in New York. See Reddit, Inc. v. SerpApi LLC et al., No. 25-cv-8736 (S.D.N.Y. filed Oct. 22, 2025)
Here, Reddit takes a different tack and brings various claims under the Digital Millennium Copyright Act (DMCA). Reddit does not allege copyright infringement, but rather that Anthropic violated the DMCA’s prohibition on the “circumvention of technological control measures” under 17 U.S.C. § 1201(a)(1)(A), which seems to strategically mirror some aspects of previously popular CFAA claims.
Section 1201(a)(1)(A) was created in 1998 under Title 1 of the DCMA as a way to mitigate piracy risks around emerging technologies of the time—DVDs, ebooks, and software. It prohibits the circumvention of technological measures that control access to a copyrighted work. Courts have clarified that the anti-circumvention provision is indifferent to the strength or success of the control measure. See Universal City Studios, Inc. v. Reimerdes, 111 F. Supp. 2d 294, 318 (S.D.N.Y. 2000), aff’d on other grounds sub. nom., Universal City Studios, Inc. v. Corley, 273 F.3d 429 (2d Cir. 2001).
Indeed, like the CFAA-web scraping battles of the recent past, Reddit’s complaint focuses on the defendants’ methods of access, including circumvention of Reddit’s registered user-identification limits, IP-rate limits, captcha bot protection, and anomaly-detection tools, as well as the defendants’ identity masking practices and their noncompliance with Reddit’s robots.txt directive and User Agreement.
This DMCA approach brings the action right into the copyright arena while strategically avoiding the difficulties of copyright preemption addressed in X Corp. v. Bright Data, and is already gaining traction among web scraping plaintiffs. For example, a month after Reddit’s filing, several YouTube content creators sued Nvidia corporation under the same DMCA theory, alleging that Nvidia unlawfully circumvented YouTube’s access barriers to scrape the plaintiffs’ videos to train Nvidia’s generative AI model. A few weeks after that complaint, Google brought a lawsuit against SerpApi with the same claims. Early 2026 brought additional cases into the arena: the same YouTube content creator plaintiffs that sued Nvidia filed a nearly identical lawsuit against Snap, Inc., and a separate YouTube content creator plaintiff initiated a class action against Meta for scraping in violation of the DMCA.
While the flurry of new litigation could shed meaningful light on these issues, there is a fully briefed and argued case pending before the Second Circuit that concerns the strength of certain access control and circumvention principles for DMCA section 1201 claims may shed light sooner on the viability of this newer litigation strategy. See Yout LLC v. Recording Industry Association of America, Inc., 633 F.Supp.3d 650 (D. Conn. 2022), appeal argued, No. 22-02760 (2d Cir. argued Feb. 4, 2024)
Industry Developments
Against the backdrop of these recent legal challenges is a broader tension on the Internet between (1) data owners and hosts, and (2) data scrapers. So far, that tension has created licensing deals and third-party tools that aim to broker data exchanges between the main players.
In the two complaints Reddit filed, Reddit claims that the defendants have refused to enter into licensing deals with the social media platform, which other large companies like OpenAI and Google have already agreed to. However, as Anthropic did in its opposition to Reddit’s motion to remand, defendants may take issue with and resist such licensing deals when the social media platform does not actually own the underlying user-generated content.
These dynamics appear to be motivating companies to create and offer tools and services that facilitate the exchange of data through some pre-defined process, such as Cloudflare’s Pay Per Crawl program that enables site owners to monetize their content by setting prices to access those zones.
As these cases progress, it will become clearer how the industry will evolve to address the legal considerations around certain scraping activities—including the role of AI companies, their relationship to major social media platforms, and the tools and services that facilitate this relationship.
