Skip to main content

The Scoring System in Detail

Public beta

CORPUS is being built in the open. Some of what you read here is live, some is still design intent — expect it to evolve.

An illustration of a seated figure beside a large clock showing a date display (Mo 31 7) and a coral starburst behind it. In front of the figure is a panel labelled SCORE with three diamond icons of different sizes, suggesting graded scoring values over time.

The attribution system translates contributions into stable scores that determine long-term royalty flows. Scoring happens once — at the moment a contribution enters CORPUS — and stays fixed except for metadata improvements or fraud and error corrections. During the public beta the scoring formula itself is still being calibrated, so individual scores can shift across the corpus as the system tunes; once the protocol stabilises, scores are fixed at ingest. This page describes the mechanics; the dimensions and rationale are in How Contributions Are Evaluated.

Baseline contribution scoring

A pictogram showing a saxophone on the left, then a soundwave, then an arrow pointing right to two diamond shapes — a contribution becomes a baseline of points.

Each work receives a baseline score depending on input format and completeness.

  • Format. An uncompressed WAV earns more points than an MP3 — for example, 100 vs. 50 — because lossy compression discards information that may matter for training.
  • Stems. Separated tracks add to the baseline. Stems let models learn the individual elements of a production, which a stereo mix does not expose.
  • MIDI and metadata. MIDI files, session metadata, and annotations all add points. They are first-class contributions, not optional extras.

Contribution integrity checks

A pictogram showing a saxophone on the left, a blue soundwave, then a STOP sign blocking an arrow that points toward a coral-coloured library star on the right — integrity checks filter what enters the library.

Before any quality or originality scoring, a contribution has to clear integrity:

  • Non-musical content is filtered out.
  • Illegal samples and uncleared covers are caught through similarity searches against databases of copyrighted songs.
  • Corrupted files are excluded.
  • Missing vocal consent holds the contribution until the named performer's consent is on file.
  • AI-generated material is screened through dedicated detection systems as part of the ingestion pipeline. Detection is an adversarial moving target; CORPUS plans to complement it with watermarking or hash-based provenance tools as they prove reliable. See Verification and Provenance.

Flagged works are held in a pending state and not counted in distributions until reviewed, with appeal options for contributors. See Dispute Resolution.

Production quality assessment

A pictogram showing a VU meter at the top, two green and coral checkmarks on the left, a colour-graded soundwave in the centre representing spectral and dynamic measurements, and three diamonds on the right — quality checks award additional points.

Production-quality scoring is currently under development. The planned mechanism is an evaluation system based on Music Information Retrieval (MIR) benchmarks — spectral balance, dynamic range, noise levels, stereo field — that awards bonus points to high-quality recordings. Early results are promising at reliably identifying recordings of high production quality; until the assessment ships, the quality axis is bounded by the integrity checks above.

Once live, contributors will be able to revise and resubmit flawed material, supported by automated feedback that highlights the specific issues found.

Relational originality scoring

A pictogram showing a saxophone on the left, a small two-way arrow pair around a compact soundwave in the centre under a delta-question-mark symbol, and a large coral library star on the right — measuring the difference between a contribution and the existing corpus.

Contributions are compared against the existing library at the moment of ingest along three axes: timbral (what the recording sounds like), harmonic (the notes and chords it uses), and percussive (its rhythmic profile). Works that fill sparse areas earn additional points; oversaturated areas are weighted lower. Originality is never judged in absolute terms — always in relation to what already exists.

The analysis goes beyond genre tags. Originality can arise from unusual harmonic language, experimental production, or cultural specificity just as much as from genre novelty.

Because the library evolves, the diversity component is subject to gradual recalibration after a protection period. See Temporal Dynamics.

Metadata as first-class contribution

A pictogram showing a saxophone next to a document with two cycling arrows around it, then a soundwave, then two diamonds — annotations cycle through revisions and earn points as a first-class contribution.

Annotations — genre, mood, instrumentation, cultural context — are critical to corpus value. Contributors can add or refine these for their own works, and others can contribute annotations too, either voluntarily or incentivized. All verified annotation work is rewarded with points, because metadata quality is as vital to training as the recordings themselves.


Together, these mechanisms discourage mass-uploading and reward contributions that genuinely expand the corpus. Originality becomes progressively harder to achieve as the dataset grows — a dynamic that pushes contributors toward new territory rather than repetition.

Next: How Royalties Flow.