URI: 
       article-seirdy-An-experiment-to-test-GitHub-Copilot-s-legality.mw - tgtimes - The Gopher Times
  HTML git clone git://bitreich.org/tgtimes git://enlrupgkhuxnvlhsf6lc3fziv5h2hhfrinws65d7roiv6bfj7d652fid.onion/tgtimes
   DIR Log
   DIR Files
   DIR Refs
   DIR Tags
   DIR README
       ---
       article-seirdy-An-experiment-to-test-GitHub-Copilot-s-legality.mw (11290B)
       ---
            1 .SH seirdy
            2 An experiment to test GitHub Copilot's legality
            3 .
            4 .QS
            5 This article was posted on 2022-07-01 by Rohan Kumar
            6 .FS
            7 https://seirdy.one/posts/2022/07/01/experiment-copilot-legality/
            8 gemini://seirdy.one/posts/2022/07/01/experiment-copilot-legality/index.gmi
            9 .FE
           10 and is now republished on this newspaper, with permission (CC-BY-SA 4.0).
           11 .
           12 .
           13 .SS
           14 Preface
           15 .
           16 .PP
           17 I am not a lawyer.
           18 This post is satirical commentary on:
           19 .
           20 .IP \(bu
           21 The absurdity of Microsoft and OpenAI’s legal justification for GitHub Copilot.
           22 .
           23 .IP \(bu
           24 The oversimplifications people use to argue against GitHub Copilot (I don’t like it when people agree with me for the wrong reasons).
           25 .
           26 .IP \(bu
           27 The relationship between capital and legal outcomes.
           28 .
           29 .IP \(bu
           30 How civil cases seem like sporting events where people “win” or “lose”, rather than opportunities to improve our understanding of law.
           31 .
           32 .PP
           33 In the process, I intentionally misrepresent how the judicial system works:
           34 I portray the system the way people like to imagine it works.
           35 Please don’t make any important legal decisions based on anything I say.
           36 .
           37 .PP
           38 The only section you should take seriously is “Context:
           39 the relevant technologies”.
           40 .
           41 .
           42 .SS
           43 Introduction
           44 .
           45 .PP
           46 GitHub is enabling copyleft violation \fBat scale\fR with Copilot.
           47 GitHub Copilot encourages people to make derivative works of source code without complying with the original code’s license.
           48 This facilitates the creation of permissively-licensed or proprietary derivatives of copyleft code.
           49 .
           50 .PP
           51 Unfortunately, challenging Microsoft (GitHub’s parent company) in court is a bad idea:
           52 their legal budget probably ensures their victory, and they likely already have a comprehensive defense planned.
           53 How can we determine Copilot’s legality on a level playing field? We can create legal precedent that they haven’t had a chance to study yet!
           54 .
           55 .PP
           56 A chat with Matt Campbell about a speech synthesizer gave me a horrible idea.
           57 I think I know a way to find out if GitHub Copilot is legal:
           58 we could use its legal justification against another software project with a smaller legal budget.
           59 Specifically, against a speech synthesizer.
           60 The outcome of our actions could set a legal precedent to determine the legality of Copilot.
           61 .
           62 .
           63 .SS
           64 Context: the relevant technologies
           65 .
           66 .PP
           67 Let’s cover the technologies and actors at play before I start my evil monologue.
           68 .
           69 .
           70 .SS
           71 Exhibit A: GitHub Copilot
           72 .
           73 .PP
           74 GitHub Copilot is a predictive autocompletion service for writing software.
           75 It’s powered by OpenAI Codex,
           76 .FS
           77 https://openai.com/blog/openai-codex/
           78 .FE
           79 a language model based on GPT-3.
           80 .FS
           81 https://en.wikipedia.org/wiki/GPT-3
           82 .FE
           83 It was trained using the source code of public repositories hosted on GitHub, regardless of their licensing.
           84 In response to a Request for Comments from the US Patent and Trademark Office, OpenAI claimed that “Artificial Intelligence Innovation”, such as code written by GitHub Copilot, should be considered “fair use”.
           85 .FS
           86 See Comment Regarding Request for Comments on Intellectual Property Protection for Artificial Intelligence Innovation submitted by OpenAI to the USPTO.
           87 https://www.uspto.gov/sites/default/files/documents/OpenAI_RFC-84-FR-58141.pdf
           88 .FE
           89 .
           90 .PP
           91 Many of the code snippets it suggests are exact copies of source code from various GitHub repositories.
           92 For an example, see this tweet:
           93 I don't want to say anything but that's not the right license Mr Copilot.
           94 .FS
           95 https://nitter.net/mitsuhiko/status/1410886329924194309
           96 https://twitter.com/mitsuhiko/status/1410886329924194309
           97 .FE
           98 by Armin Ronacher
           99 .FS
          100 https://lucumr.pocoo.org/about/
          101 .FE
          102 It contains a screen recording of Copilot suggesting this Quake code.
          103 .FS
          104 https://github.com/id-Software/Quake-III-Arena/blob/dbe4ddb10315479fc00086f08e25d968b4b43c49/code/game/q_math.c#L552
          105 .FE
          106 When prompted to do so, it obediently fills in a permissive license.
          107 That permissive license violates the Quake code’s GPL-2.0 license.
          108 Copilot provides no indication that a license violation is taking place.
          109 .
          110 .PP
          111 GitHub performed its own research into the matter.
          112 .FS
          113 I doubt anybody worth their salt would count on a company to hold itself accountable, but at least they tried.
          114 .FE
          115 You can read about it on their blog:
          116 GitHub Copilot research recitation,
          117 .FS
          118 https://github.blog/2021-06-30-github-copilot-research-recitation/
          119 .FE
          120 by Albert Ziegler.
          121 .FS
          122 https://github.com/wunderalbert
          123 .FE
          124 I’m not convinced that it accounts for the fact that suggested code might have mechanical alterations to match surrounding text, while still remaining close enough to trained data to be a license violation.
          125 .
          126 .
          127 .SS
          128 Exhibit B: The Eloquence speech synthesizer
          129 .
          130 .PP
          131 I recently had a chat with Matt on IRC about screen readers and different types of speech synthesizers.
          132 I mentioned that while I do like some variety, I always find myself returning to the underrated robotic voice of eSpeak NG.
          133 .FS
          134 https://github.com/espeak-ng/espeak-ng/
          135 .FE
          136 He shared some of my fondness, and also shared his preference for a similar speech synthesizer called Eloquence.
          137 .
          138 .PP
          139 Downloads of Eloquence are easy to find (it’s even included with the JAWS screen reader), but I struggle to find any “official” pages about the original Eloquence.
          140 Nuance acquired Eloquent Technology, the developer of Eloquence.
          141 Microsoft later acquired Nuance.
          142 .
          143 .
          144 .SS
          145 Eloquence sample audio
          146 .
          147 .PP
          148 Matt recorded this sample audio clip of Eloquence reading some text.
          149 .FS
          150 https://seirdy.one/a/eloquence.mp3
          151 .FE
          152 The text is from the introduction of Best practices for inclusive textual websites.
          153 .FS
          154 https://seirdy.one/posts/2020/11/23/website-best-practices/
          155 .FE
          156 .
          157 .QP
          158 My primary focus is inclusive design.
          159 Specifically, I focus on supporting underrepresented ways to read a page.
          160 Not all users load a page in a common web-browser and navigate effortlessly with their eyes and hands.
          161 Authors often neglect people who read through accessibility tools, tiny viewports, machine translators, “reading mode” implementations, the Tor network, printouts, hostile networks, and uncommon browsers, to name a few.
          162 I list more niches in the conclusion.
          163 Compatibility with so many niches sounds far more daunting than it really is:
          164 if you only selectively override browser defaults and use plain-old, semantic HTML (POSH), you’ve done half of the work already.
          165 .
          166 .PP
          167 I like the Eloquence speech synthesizer.
          168 It sounds similar to the robotic yet predictable voice of my beloved eSpeak NG, but with improved overall quality.
          169 Unfortunately, Eloquence is proprietary.
          170 .
          171 .
          172 .SS
          173 Exhibit C: Deep learning speech synthesis
          174 .
          175 Deep learning speech synthesis
          176 .FS
          177 https://en.wikipedia.org/wiki/Deep_learning_speech_synthesis
          178 .FE
          179 is a recent approach to speech synthesizer creation.
          180 It involves training a deep neural network on voice samples, and using the trained model to generate speech similar to a real human voice.
          181 One synthesizer using deep learning speech synthesis is Mozilla’s TTS.
          182 .FS
          183 https://github.com/mozilla/TTS
          184 .FE
          185 .
          186 .PP
          187 Zero-shot approaches could allow a pre-trained model to generate multiple different voices.
          188 YourTTS
          189 .FS
          190 https://doi.org/10.48550/arXiv.2112.02418
          191 .FE
          192 is one such example.
          193 This could allow us to synthetically re-create a person’s voice more easily.
          194 .
          195 .
          196 .SS
          197 My horrible plan
          198 .
          199 .PP
          200 My horrible plan revolves around going through two different lawsuits to set some judicial precedents; these precedents could improve the odds of succeeding in a lawsuit against Microsoft for Copilot’s licensing violations.
          201 .
          202 .PP
          203 If this succeeds, we have new legal justification that GitHub Copilot is illegal; if it fails, we have still gained a means to legally re-create proprietary software.
          204 It’s a win-win situation.
          205 .
          206 .
          207 .SS
          208 Part One: set a precedent
          209 .
          210 .IP 1.
          211 Train a modern text-to-speech (TTS) engine using the voice a proprietary one made by a company with a small legal budget.
          212 Keep the model’s internals hidden.
          213 .
          214 .IP 2.
          215 Then release the final TTS under a permissive license.
          216 Remember, we’re still keeping the machine-learning model hidden!
          217 .
          218 .IP 3.
          219 Wait for that company to file suit.
          220 .FS
          221 If the stars align, you could file an anticipatory suit against the company.
          222 It’s common for declaratory judgement regarding intellectual property rights.
          223 
          224 https://en.wikipedia.org/wiki/Declaratory_judgment
          225 .FE
          226 .
          227 .IP 4.
          228 Win or lose the case.
          229 .
          230 .
          231 .SS
          232 Part Two: use that precedent against Microsoft’s Nuance
          233 .
          234 .PP
          235 Our goal here is to get the same legal outcome as the low-stakes “trial run” of Part One.
          236 .
          237 .PP
          238 Microsoft owns Nuance.
          239 Nuance previously bought Eloquent Technology, the developers of the Eloquence speech synthesizer.
          240 .
          241 .IP 1.
          242 Repeat Part One against Nuance speech synthesizers, including Eloquence.
          243 Go to court.
          244 .
          245 .IP 2.
          246 Have the ruling from Part One cited as legal precedent.
          247 .
          248 .IP 3.
          249 Achieve the same outcome as Part One, demonstrating that we have indeed set precedent that works against Microsoft’s legal department.
          250 .
          251 .
          252 .SS
          253 Implications of the outcomes
          254 .
          255 .PP
          256 If we \fIwin\fR both cases:
          257 Microsoft has the legal high ground.
          258 Making a derivative of a copyrighted work using a machine-learning algorithm allows us to bypass copyright licenses.
          259 .
          260 .PP
          261 If we \fIlose\fR both cases:
          262 Microsoft does not have the legal high ground.
          263 We have good judicial precedent against Microsoft to use when filing suit for Copilot’s behavior.
          264 .
          265 .PP
          266 Either way, it’s an absolute win for free software.
          267 Taking down Copilot protects copyleft from enabling proprietary derivatives (and by extension, protects software freedom).
          268 But if we accidentally win these two low-stakes “test” cases, we still gain something else:
          269 we can liberate huge swaths of proprietary software, starting with speech synthesizers.
          270 .
          271 .
          272 .SS
          273 Update: on satire
          274 .
          275 .PP
          276 This post isn’t “satire through-and-through” like something from The Onion.
          277 Rather, my intent was to make some clear points, but extrapolate them to absurdity to highlight other problems.
          278 I don’t think I was clear enough when doing this.
          279 I’m sorry.
          280 .
          281 .PP
          282 Copilot has been found to suggest significant amounts of code that is dangerously similar to existing works.
          283 It does this without disclosing obligations that come with those works’ licenses.
          284 Training a model on copyrighted works may not be wrong in and of itself; however, using that model to generate new works that are not sufficiently distinct from original works is where things get problematic.
          285 Copilot’s users could apply proprietary licenses to the generated works, defeating the point of copyleft.
          286 .
          287 .PP
          288 When a tool almost exclusively encourages problematic behavior, the makers of that tool should have put thought into its implications.
          289 GitHub and OpenAI have not demonstrated a sufficiently careful approach.
          290 .
          291 .PP
          292 I don’t think that “going after” a smaller player just to manipulate our legal system is a good thing to do.
          293 The fact that this idea seems plausible to some of my readers shows how warped our perception of the judicial system is.
          294 Even if it’s accurate (I doubt it’s accurate, but I’m not certain), it’s sad.
          295 Judicial systems incentivise too much predatory behavior.
          296 .
          297 .
          298 .SS
          299 Corrections
          300 .
          301 It’s come to my attention that Eloquence may or may not still belong to Nuance.
          302 Further research is needed.
          303 Eloquent Technology was acquired by SpeechWorks in 2000.