Blog

AI Briefing: Copyright battles bring Meta and OpenAI datasets under the microscope - Digiday

Last week saw not one but two high-profile AI legal battles under the spotlight, with updates in separate copyright cases against Meta and OpenAI.

Court documents unsealed in an AI copyright case against Meta raised new questions about the use of e-books from a book piracy site Library Genesis (LibGen). They also raise new questions about how much CEO Mark Zuckerberg and other Meta execs knew about Meta teams’ use of pirated content to help train its Llama models.  microscope manufacturer

Court documents allege Meta employees sought to remove copyright information — including headers and other identifiers — from various materials. One filing shows an internal Meta document with a suggestion to remove lines containing words like “ISBN,” “copyright,” and “all rights reserved.” Another filing includes messages between employees talking about the desire to compete with other AI rivals, including beat OpenAI’s GPT-4 while also describing French rival Mistral as “peanuts.”

Other documents include parts of Zuckerberg’s testimony from his December deposition. Zuckerberg said broad characterizations make the use of pirated content seem “like a bad thing” but added that Meta’s teams “think through this carefully because there are often more nuances than is kind of apparent at first.” (Meta did not reply to Digiday’s request for comment about the court documents.)

Books in the LibGen dataset include titles by top authors, including Ta-Nehisi Coates and Sarah Silverman, who are among the authors who filed the lawsuit. Zuckerberg claimed not to be familiar with LibGen. However, the plaintiff’s attorney then asked if Meta would do business with a company that brags about using pirated materials.

“In general, if someone is broadcasting loudly that they’re doing something that is illegal, that would be a pretty big red flag that I’d want us to look at carefully before engaging with them in any way,” Zuckerberg said.

When asked by an attorney if Meta should not be downloading materials from websites known to have pirated materials, Zuckerberg said YouTube hosts “some percent” of pirated content even if most of the content is “kind of good and they have the license to do.”

“Early on, I think that people did make some assertions about YouTube’s intent on this, and they were less mature about developing their IP rights management,” Zuckerberg said. “But even then, I don’t think that I would’ve said I wouldn’t want people at Meta not to use YouTube, at that point. So — so I don’t know.”

Other documents suggest Meta execs were aware Llama’s training data had LibGen content and other copyrighted materials from sources like CommonCrawl. Documents also suggest Meta teams knew there could be blowback and potential fines under the EU AI Act if the use of LibGen were uncovered. One document mentioned Meta teams suggesting datasets should be red-teamed to filter out potential information about bio-weapons and harmful stereotypes.

NYT v. OpenAI and Microsoft

Revelations in the Meta case come as tech companies face more scrutiny over the types of content used to train large language models. In a separate lawsuit between The New York Times and OpenAI, attorneys gave oral arguments in court that outlined key points both sides are crafting as part of the case. In both cases, plaintiffs allege tech companies stripped copyright information from content used to train AI models.

“You’re leaving people open for massive copyright infringement without the ability to trace it,” said Steven Lieberman, an attorney representing the New York Daily News, which filed a separate case against OpenAI and Microsoft. “It’s like it causes the alarm system in your house to go down.”

Beyond court — publishers ink new AI deals

Last week, Axios and OpenAI announced a new partnership that includes funding new local Axios newsrooms in four cities including Pittsburgh, Pa. and Kansas City, Mo. The deal also gives Axios access to OpenAI’s tech to build new AI products, processes and systems. In a blog post about the deal, Axios CEO Jim VanderHei wrote that the three-year deal will also give all Axios staff access to OpenAI’s enterprise version.

That wasn’t the news last week about AI-powered news. The Associated Press and Google also announced a new partnership that includes the AP providing a feed of real-time information to Google’s Gemini app. The companies’ blog posts didn’t disclose the terms of the deal or what it’ll entail, but noted the plan will help “enhance the usefulness of results” within the Gemini app. Kristin Heitmann, the AP’s chief revenue officer, stated the updates are part of the companies’ ongoing relationship and “based on working together to provide timely, accurate news and information to global audiences.”

Beyond Axios and the AP’s plans to expand AI news, another company starting with “A” took a step back. Last week, Apple suspended its use of AI news alerts, following criticism for generating inaccuracies in AI-summarized notifications. Meanwhile, a new report by DoubleVerify detailed a network of more than 200 websites generating “AI slop” that mimics real publishers while misleading adtech vendors and buyers.

Prompts and Products — Other AI news and announcements

Other AI-related stories from across Digiday

By now, spotting influencers in major ads and at events is all but mainstream — but what if artificial intelligence allowed influencers to tap into old-school product placement without actually having to shoot in person with brands?

Due to its increased focus on individuals’ avatars and their appearances, the culture of Zepeto revolves heavily around virtual fashion.

President Trump’s second term will be different from his first. It seems his relationships with the media, tech and marketing industries already show as much.

Get access to tools and analysis to stay ahead of the trends transforming media and marketing

Visit your account page to make changes and renew.

Get Digiday's top stories every morning in your email inbox.

binocular microscope Follow @Digiday for the latest news, insider access to events and more.