The Scoring System in Detail
CORPUS is being built in the open. Some of what you read here is live, some is still design intent — expect it to evolve.

The attribution system translates contributions into stable scores that determine long-term royalty flows. Scoring happens once — at the moment a contribution enters CORPUS — and stays fixed except for metadata improvements or fraud and error corrections. During the public beta the scoring formula itself is still being calibrated, so individual scores can shift across the corpus as the system tunes; once the protocol stabilises, scores are fixed at ingest. This page describes the mechanics; the dimensions and rationale are in How Contributions Are Evaluated.
Baseline contribution scoring

Each work receives a baseline score depending on input format and completeness.
- Format. An uncompressed WAV earns more points than an MP3 — for example, 100 vs. 50 — because lossy compression discards information that may matter for training.
- Stems. Separated tracks add to the baseline. Stems let models learn the individual elements of a production, which a stereo mix does not expose.
- MIDI and metadata. MIDI files, session metadata, and annotations all add points. They are first-class contributions, not optional extras.
Contribution integrity checks

Before any quality or originality scoring, a contribution has to clear integrity:
- Non-musical content is filtered out.
- Illegal samples and uncleared covers are caught through similarity searches against databases of copyrighted songs.
- Corrupted files are excluded.
- Missing vocal consent holds the contribution until the named performer's consent is on file.
- AI-generated material is screened through dedicated detection systems as part of the ingestion pipeline. Detection is an adversarial moving target; CORPUS plans to complement it with watermarking or hash-based provenance tools as they prove reliable. See Verification and Provenance.
Flagged works are held in a pending state and not counted in distributions until reviewed, with appeal options for contributors. See Dispute Resolution.
Production quality assessment

Production-quality scoring is currently under development. The planned mechanism is an evaluation system based on Music Information Retrieval (MIR) benchmarks — spectral balance, dynamic range, noise levels, stereo field — that awards bonus points to high-quality recordings. Early results are promising at reliably identifying recordings of high production quality; until the assessment ships, the quality axis is bounded by the integrity checks above.
Once live, contributors will be able to revise and resubmit flawed material, supported by automated feedback that highlights the specific issues found.
Relational originality scoring

Contributions are compared against the existing library at the moment of ingest along three axes: timbral (what the recording sounds like), harmonic (the notes and chords it uses), and percussive (its rhythmic profile). Works that fill sparse areas earn additional points; oversaturated areas are weighted lower. Originality is never judged in absolute terms — always in relation to what already exists.
The analysis goes beyond genre tags. Originality can arise from unusual harmonic language, experimental production, or cultural specificity just as much as from genre novelty.
Because the library evolves, the diversity component is subject to gradual recalibration after a protection period. See Temporal Dynamics.
Metadata as first-class contribution

Annotations — genre, mood, instrumentation, cultural context — are critical to corpus value. Contributors can add or refine these for their own works, and others can contribute annotations too, either voluntarily or incentivized. All verified annotation work is rewarded with points, because metadata quality is as vital to training as the recordings themselves.
Together, these mechanisms discourage mass-uploading and reward contributions that genuinely expand the corpus. Originality becomes progressively harder to achieve as the dataset grows — a dynamic that pushes contributors toward new territory rather than repetition.
Next: How Royalties Flow.