danq.me/1/posts/the-huge-grey-area-in-the-anthropic-ruling

  URI:

       THE HUGE GREY AREA IN THE ANTHROPIC RULING
       
       2025-06-28
       
       This week, AI firm Anthropic (the folks behind Claude) found themselves the
       focus of attention of U.S. District Court for the Northern District of
       California.
       
       NEW LAWS FOR NEW TECHNOLOGIES
       
       The tl;dr is: the court ruled that (a) piracy for the purpose of training an
       LLM is still piracy, so there'll be a separate case about the fact that
       Anthropic did not pay for copies of all the books their model ingested, but
       (b) training a model on books and then selling access to that model, which can
       then produce output based on what it has "learned" from those books, is
       considered transformative work and therefore fair use.
       
       Compelling arguments have been made both ways on this topic already, e.g.:
       * Some folks are very keen to point out that it's totally permitted for humans
       to read, and even memorise, entire volumes, and then use what they've learned
       when they produce new work. They argue that what an LLM "does" is not
       materially different from an impossibly well-read human.
       * By way of counterpoint, it's been observed that such a human would still be
       personally liable if the "inspired" output they subsequently created was
       derivative to the point of  violating copyright, but we don't yet have a
       strong legal model for assessing AI output in the same way. (BBC News article
       about Disney & Universal vs. Midjourney is going to be very interesting!)
       * Furthermore, it might be impossible to conclusively determine that the way
       GenAI works is fundamentally comparable to human thought. And that's the thing
       that got me thinking about this particular thought experiment.
       
       A MOMENT OF PHILOSOPHY
       
       Here's a thought experiment:
       Support I trained an LLM on all of the books of just one author (plus enough
       additional language that it was able to meaningfully communicate). Let's take
       Stephen King's 65 novels and 200+ short stories, for example. We'll sell
       access to the API we produce.
       
   IMG Monochrome photograph showing a shelf packed full of Stephen King's novels.
       
       The output of this system would be heavily-biased by the limited input it's
       been given: anybody familiar with King's work would quickly spot that the AI's
       mannerisms echoed his writing style. Appropriately prompted - or just by
       chance - such a system would likely produce whole chapters of output that
       would certainly be considered to be a substantial infringement of the original
       work, right?
       If I make KingLLM, I'm going to get sued, rightly enough.
       But if we accept that (and assume that the U.S. District Court for the
       Northern District of California would agree)... then this ruling on Anthropic
       would carry a curious implication. That if enough content is ingested, the
       operation of the LLM in itself is no longer copyright infringement.
       Which raises the question: where is the line? What size of corpus must a
       system be trained upon before its processing must necessarily be considered
       transformative of its inputs?
       Clearly, trying to answer that question leads to a variant of the sorites
       paradox. Nobody can ever say that, for example, an input of twenty million
       words is enough to make a model transformative but just one fewer and it must
       be considered to be perpetually ripping off what little knowledge it has!
       But as more of these copyright holder vs. AI company cases come to fruition,
       it'll be interesting to see where courts fall. What is fair use and what is
       infringing?
       And wherever the answers land, I'm sure there'll be folks like me coming up
       with thought experiments that sit uncomfortably in the grey areas that remain.
       
       LINKS
       
  HTML Anthropic
  HTML The Verge news article about the Anthropic case
  HTML Scan of the court ruling
  HTML BBC News article about Disney & Universal vs. Midjourney