Stop me if you think that you've heard this one before: In 2021, GitHub and OpenAI (two companies either owned by or largely invested in by Microsoft), launched a coding tool named Copilot. The tool, much like OpenAI's ChatGPT, scraped vast sources of existing code to create its own database which then enabled it to suggest code to programers that it had generated.
Well, not everyone is happy with this process. In November, a class action lawsuit was filed against Microsoft, Github, and OpenAI alleging “software piracy on an unprecedented scale," notes The Verge. Now, the tech firms are firing back asking for the federal court hearing the case to dismiss it, based on grounds that the piracy claims do not hold up.
- In Github's dismissal filing, the company claims the suit “fails on two intrinsic defects: lack of injury and lack of an otherwise viable claim." Meanwhile, OpenAI's filing raises a similar argument that the suit relies on “hypothetical events” to “allege a grab bag of claims that fail to plead violations of cognizable legal rights.”
What Is Scraping?
OpenAI's tools (including Dall•E and ChatGPT) rely on scraping, which is training its AI on enormous data sets that are publicly available. That is to say, for Dall•E, it is potentially Getty Images, Flickr, and other image sets. Then, from all the data it scraped, the AI can create "new" images/text/code based on the scraped data. But is it copyright infringement?
This month, notes CNN, Getty Images filed suit against Stability AI (who makes an art generating AI called Stable Diffusion), claiming: "Getty Images believes artificial intelligence has the potential to stimulate creative endeavors. Accordingly, Getty Images provided licenses to leading technology innovators for purposes related to training artificial intelligence systems in a manner that respects personal and intellectual property rights. …Stability AI did not seek any such license from Getty Images and instead, we believe, chose to ignore viable licensing options and long standing legal protections in pursuit of their stand-alone commercial interests.”
As we’ve noted before, generative AI is creating a new frontier in IP law by relying so heavily on data scraping to train the technology. That being said, the entrance of large companies like Getty Images to the mix will step up pressure to regulate this new industry.