Skip to main content

The Input-Side Shift

Public beta

CORPUS is being built in the open. Some of what you read here is live, some is still design intent — expect it to evolve.

Collective rights management was built for an industry where music is reproduced. It counts copies, broadcasts, and performances. AI training does not fit that model — not because the rules are old, but because the unit of value has moved. CORPUS licenses on the input side: each contribution is assessed at ingest and earns a weighting that determines royalty flow forever after.

Why CMOs don't fit AI training

CMOs like GEMA, SACEM, and PRS were created to license fixed works as they get copied or transmitted. Their membership agreements were written long before machine learning, and most do not actually grant rights to license training. AI training does not reproduce existing recordings; it produces new audio from learned patterns. That is a different legal act, and most CMOs cannot cleanly authorize it.

Different CMOs have responded in different, uncoordinated ways:

  • Sweden — STIM runs a pilot opt-in license for AI training with attribution and royalties.
  • France — SACEM has taken an authorization-first stance: AI data-mining of its repertoire requires explicit consent.
  • Germany — GEMA has proposed a 30% revenue share for rights-holders with minimum royalties, and floated extending collective licensing to AI training — an approach whose legal basis is unsettled.

No one yet has a coordinated answer to who decides what can be licensed, who gets paid, or whether the CMOs hold the rights they claim to exercise over AI training.

Why "copy logic" breaks for generative output

The deeper problem is conceptual. CMOs administer one specific dimension of value: the transactional. They count units; they track reproductions.

A therapeutic robot generating music for a patient, or a game engine producing a one-time soundtrack for a player, does not produce copies. The output exists in that moment and may never be heard again. There is no broadcast, no public performance, no recording stored on a platform. Retrofitting copy-based licensing onto these uses repeats the overreach and friction the industry already saw with downloads and streaming.

What CORPUS does instead

CORPUS licenses on the input side. The decisive moment is not the detection of copies — it is the licensing of contributions for training.

Each contribution is assessed at ingest for quantity, quality, and diversity, and assigned an input weighting that stays fixed once set. The weighting does not generate payment directly. It determines how the contributor participates in the royalty pool once models trained on their works are licensed and deployed.

Where output-side systems measure copies, CORPUS measures contribution. For outputs with identifiable revenue, the Generative Output Bonus adds an output-side allocation on top — based on measured similarity, never on copy detection. The result is an economic logic that can recognize value beyond the transaction.

Next: Where CORPUS Came From.