Where CORPUS Fits in the Current Landscape
CORPUS is being built in the open. Some of what you read here is live, some is still design intent — expect it to evolve.
The current landscape leaves both creators and companies without a viable path forward. Scraping erodes trust and invites litigation. Buy-outs concentrate access in the hands of a few. Major-label deals solve a legal problem without solving the verification problem behind it. Collective rights systems were never designed for training data.
The patchwork is widening, not narrowing. Any framework that intends to replace it has to meet three conditions at the same time.
Three conditions a viable framework must meet
- Legal compliance. Music explicitly licensed for training, not swept in by default. The EU is closing the grey area fast — Article 4 of the DSM Directive lets rights holders opt out of text- and data-mining, and several CMOs including GEMA already have. The EU AI Act adds mandatory provenance disclosure for general-purpose models. Unlicensed datasets are not only risky but structurally non-compliant with what is coming.
- Fair compensation. Royalties have to reflect the contribution and ongoing influence of each work in AI training — not a one-off buy-out price, and not a flat stream count.
- Economic scalability. Access has to be affordable for startups, SMEs, and cultural institutions, not just the platforms with nine-figure catalogue budgets. Polarization between scraping and buy-outs makes the field unsustainable for everyone in the middle.
A framework that meets one or two of these but not all three pushes its users into the gap between them. The detailed case for why each existing approach falls short — scraping, buy-outs, Fair Use and DSM exceptions, and the verification gap behind major-label deals — is in Why CORPUS.
What CORPUS does
CORPUS meets all three conditions by shifting licensing to the input side. Contributions enter through explicit opt-in, are evaluated for quantity, quality, and originality relative to the existing corpus, and earn ongoing royalties plus CRPS as they shape downstream models.
The protocol is open and auditable, the data stays inside CORPUS infrastructure, and the commercial layer sits on top. The architecture is described in The Three-Layer Architecture.
Next: The Input-Side Shift.