Frequently Asked Questions

Q: What's the difference between input and output attribution?

Input attribution assigns value at the moment a contribution enters the training pipeline, based on how it enriches the corpus (quantity, quality, diversity). Output attribution tries to identify which training works shaped a given generated output: currently not technically possible for generative models. CORPUS uses input attribution as its base. The Generative Output Bonus complements it: where a generated output has identifiable revenue, a defined share is allocated by measured similarity to the output. That payout follows from the contributor agreement and implies no claim that the output copies or derives from anyone's work.

Q: What are CRPS?

CRPS (Corpus Participation Rights) are a lasting stake in the system, accumulated alongside royalty points. Royalties reflect current income from active model usage; CRPS reflect the historical fact of having helped build the corpus. The exact legal form (Genussrechte, tokenised securities, cooperative hybrid) is still being evaluated in consultation with legal counsel.

Positioning

Have the Major Label deals solved the problem?

No, at least not the problem that actually matters. The deals solve a legal problem: companies have a contractual framework for using catalog content. They do not solve the verification problem: there is no method to externally verify what data a model was actually trained on, because you cannot read a training set off of model weights. The full argument is in Why CORPUS and The Wrong Debate.

Will the new licensed Suno models be worse?

Probably, if they are truly trained only on licensed data. Model quality depends directly on the breadth and diversity of training data, and existing licensed-only competitors produce measurably weaker outputs. The commercial incentive to keep the unlicensed base intact while monetising through licensed catalogs is structural; whether Suno acts on it is unverifiable from the outside.

Is CORPUS a Suno competitor?

Not primarily. Suno optimises for a consumer use case (text in, finished song out) which needs maximum training-data breadth. CORPUS develops real-time systems for autonomous and interactive contexts: adaptive soundscapes in vehicles, therapeutic music in healthcare, responsive environments in robotics. These need contextual precision and rights-clarity. Consumer-facing generation built on the corpus is not ruled out, but it is not what CORPUS is built around. See What CORPUS Is.

Why can't healthcare and automotive companies use Suno-like models?

Because the legal status of training data is a procurement requirement, not a risk assessment. A hospital or automotive OEM cannot integrate a system whose foundation is legally contested. Active litigation against Suno and Udio in the multi-billion-dollar range makes their models structurally unusable for these domains, regardless of output quality. See Compliance and Regulation.

Can AI-generated output be traced back to specific training data?

No, not reliably. Once a model is trained on millions of tracks across billions of parameter updates, the influence of any individual track on a given output is mathematically indistinguishable from noise. This is why CORPUS attributes value on the input side (at the moment a contribution enters the corpus) rather than trying to retroactively identify which works shaped which outputs. See The Input-Side Shift.

The protocol

What's the difference between input and output attribution?

Input attribution assigns value at the moment a contribution enters the training pipeline, based on how it enriches the corpus (quantity, quality, diversity). Output attribution tries to identify which training works shaped a given generated output: currently not technically possible for generative models. CORPUS uses input attribution as its base. The Generative Output Bonus complements it: where a generated output has identifiable revenue, a defined share is allocated by measured similarity to the output. That payout follows from the contributor agreement and implies no claim that the output copies or derives from anyone's work. See How Your Music Is Evaluated.

What are CRPS?

CRPS (Corpus Participation Rights) are a lasting stake in the system, accumulated alongside royalty points. Royalties reflect current income from active model usage; CRPS reflect the historical fact of having helped build the corpus. The exact legal form (Genussrechte, tokenised securities, cooperative hybrid) is still being evaluated in consultation with legal counsel. See CRPS.

What happens to my music if I withdraw it?

Withdrawal stops future training: no new model will be trained on the withdrawn work. What it does not do is "untrain" models that already used it; that is technically impossible. CORPUS treats those works as royalty-eligible for models they have already shaped, and already-issued CRPS remain. The contributor-side detail is in Withdrawal and Rights.

How is my diversity score calculated?

A contribution is compared against the existing library along three axes (timbral, harmonic, and percussive) at the moment of ingest. Works that fill sparse areas earn higher scores; works in saturated areas score lower. The bonus is protected for five years, then decays asymptotically toward a permanent floor of 30%. See Relational originality scoring and Temporal Dynamics.

If I contribute later, does my work score lower than a similar earlier one?

On the diversity axis, yes, and this is by design. The diversity score is an incentive to fill the corpus, not a verdict on a work's originality, so a contribution landing where the corpus is already dense adds less new range and scores lower there than the first arrival did. Three things keep this fair:

Diversity is one of three dimensions. Quantity and quality are scored independently, so a lower diversity score is not low earnings. A second excellent track in a dense neighborhood is still excellent; it simply adds less range.
The early advantage is temporary. The diversity bonus is protected for five years and then decays toward a floor, so early movers do not keep a permanent advantage over later similar work. See Temporal Dynamics.
The threshold is governed. Where the line between novel and saturated sits is a value judgement, set by the scoring jury, not fixed unilaterally.

What is federated learning and why does it protect my data?

Federated learning is a training approach where the data never leaves CORPUS infrastructure. Licensees who need to train their own architectures would submit a model specification; training would run on our servers against the licensed corpus, and only the resulting model weights would leave. The data stays controlled technically, not just contractually. This path is under development; at launch, CORPUS's default is to license CORPUS-trained models. See Access Models.

Participation

Who can contribute, only professionals?

No. CORPUS is built for musicians, musicologists, sound designers, and anyone whose work can enrich the corpus, independent of label affiliation, chart performance, or follower counts. There is a short application step (linked profile, sample tracks) to keep the contributor base defensible, but the door is deliberately wide. See How to Join.

When do royalties actually start flowing?

Royalties begin flowing once commercial licensing is live and revenue accumulates in the central pool. During the public beta, contributions enter under provisional non-commercial terms: internal training for R&D, prototypes, and partner demos, but no public release of trained models. When commercial licensing opens, contributors opt in explicitly per contribution. See How Royalties Flow and Today's terms are provisional.

How does CORPUS make money?

CORPUS earns revenue from two streams: model licensing (licensees pay for deployment of CORPUS-trained models in their products) and Catalog Intelligence (annotation and search-API services applied to partners' own catalogs). A defined share of net revenue covers operations and reserves; the rest forms the royalty pool that flows back to contributors. See How Royalties Flow and Catalog Intelligence.

Where do CORPUS servers live, and what about non-EU contributors and licensees?

Today, CORPUS runs on self-administered servers in Germany under EU data protection law. As the corpus scales, this evolves into a federated global server network with nodes operating under the same standards. Non-EU contributors and licensees are not excluded; the EU jurisdiction applies to where the data is stored and processed, not to who can participate. See How Contributions Are Protected.

Positioning​

Have the Major Label deals solved the problem?​

Will the new licensed Suno models be worse?​

Is CORPUS a Suno competitor?​

Why can't healthcare and automotive companies use Suno-like models?​

Can AI-generated output be traced back to specific training data?​

The protocol​

What's the difference between input and output attribution?​

What are CRPS?​

What happens to my music if I withdraw it?​

How is my diversity score calculated?​

If I contribute later, does my work score lower than a similar earlier one?​

What is federated learning and why does it protect my data?​

Participation​

Who can contribute, only professionals?​

When do royalties actually start flowing?​

How does CORPUS make money?​

Where do CORPUS servers live, and what about non-EU contributors and licensees?​