Researchers and users discovered that GitHub Copilot would reproduce verbatim sections of copyrighted, GPL-licensed code from its training data — including commented attribution headers. A class action lawsuit was filed in 2022 alleging copyright infringement and violation of open source licence terms.
GitHub Copilot was trained on public code repositories including GPL-licensed projects. Researchers demonstrated that Copilot could be prompted to reproduce large blocks of code verbatim, including the original copyright and licence attribution comments. In some cases, users received suggestions that were direct copies of GPL-licensed code without being notified of the licence obligations. A class action lawsuit was filed in November 2022 alleging that Copilot violated the DMCA and open source licence agreements.
How the Production Safety Framework maps to this failure
A D2 + D3 failure. D3 failed at training time: no systematic licence classification was applied to training data, so the model learned patterns from GPL code without any constraint against reproducing it. D2 failed at inference time: no output similarity check was deployed to detect verbatim memorisation. This case established that training data governance (D3) must include intellectual property classification, not just privacy classification.
Specific PSF controls mapped to each failure point
Lawsuit filed November 2022. GitHub added a 'duplication detection' filter in 2022 that blocks suggestions matching 150+ characters of training data. Legal proceedings ongoing at time of writing.
The AIDA exam tests PSF knowledge across all 8 domains. Free to take, immediately verifiable.