Humans, Generative AI, and Learning from Copyrighted Materials
Iterating Toward Openness
AUGUST 2, 2023
Blatantly, demonstrably untrue: the GPT3 dataset is a little over 600GB, primarily Wikipedia, Books corpuses, WebText and 2016-2019 CommonCrawl. But the topic of AI training data also intersects with another topic that’s near and dear to my heart: copyright. The Macbook Air I am typing this on has more free disk space than that.
Let's personalize your content