Meta’s Use of Pirated Material to Train AI, and Why You Should Care
Meta’s Use of Pirated Material to Train AI, and Why You Should Care
It all started with a piece in The Atlantic by Alex Reisner ( https://www.theatlantic.com/technology/archive/2025/03/libgen-meta-openai/682093/ ) revealing that Meta, the organisation behind social media sites such as Facebook and Instagram, have been using a library of pirated written material to train their generative AI. Of course, this is a bit of a simplistic starting point. There have been ongoing outrages throughout creative communities for years now, including legal cases brought by users of DeviantArt to MidJourney for their use of copyrighted images to train AI ( https://www.theartnewspaper.com/2024/05/10/deviantart-midjourney-stable-diffusion-artificial-intelligence-image-generators ). Similarly, a group of authors, including Paul Tremblay and Mona Awad, brought a lawsuit against OpenAI for book scraping ( https://www.theguardian.com/books/2023/jul/05/authors-file-a-lawsuit-against-openai-for-unlawfully-ingesting-their-books ) that were partially dismissed in February ( https://www.theguardian.com/books/2024/feb/14/two-openai-book-lawsuits-partially-dismissed-by-california-court ). But the recent furore, and the betrayal of the writing community, is suddenly very focused around this issue.