Humans, Generative AI, and Learning from Copyrighted Materials
Iterating Toward Openness
AUGUST 2, 2023
Blatantly, demonstrably untrue: the GPT3 dataset is a little over 600GB, primarily Wikipedia, Books corpuses, WebText and 2016-2019 CommonCrawl. Recently there’s been a fairly heated conversation about whether AI should be allowed to train on copyrighted materials. The Macbook Air I am typing this on has more free disk space than that.
Let's personalize your content