Franklin Matters: What does AI get trained on? Copyrighted material, apparently without permission of the owner

Tuesday, July 11, 2023

What does AI get trained on? Copyrighted material, apparently without permission of the owner

Aside from the fact that AI is neither artificial nor "intelligent", ChatGPT was trained on info as of 2019 (4 years ago (and getting older each day)), and also, as claimed by this lawsuit, to include copyrighted data that was not permissioned for such use.

"Tools like ChatGPT, a highly popular chatbot, are based on large language models that are fed vast amounts of data taken from the internet in order to train them to give convincing responses to text prompts from users.

The lawsuit against OpenAI claims the three authors “did not consent to the use of their copyrighted books as training material for ChatGPT. Nonetheless, their copyrighted materials were ingested and used to train ChatGPT.” The lawsuit concerning Meta claims that “many” of the authors’ copyrighted books appear in the dataset that the Facebook and Instagram owner used to train LLaMA, a group of Meta-owned AI models.

The suits claim the authors’ works were obtained from “shadow library” sites that have “long been of interest to the AI-training community”.

Continue reading the article online ->

https://www.theguardian.com/technology/2023/jul/10/sarah-silverman-sues-openai-meta-copyright-infringement