Verification and Provenance

Public beta

CORPUS is being built in the open. Some of what you read here is live, some is still design intent — expect it to evolve.

Ensuring that contributions are legally usable is central to CORPUS's value. Provenance is not stamped on at the end; it is established at ingest, recorded immutably, and carried forward into every downstream use of the data. Three checks matter here: that the contribution is what it claims to be, that its origin is human, and that its metadata holds up to scrutiny.

The ingestion pipeline

Every upload passes through automated checks before it enters the library: music detection, duplicate detection, cover detection, AI-generated-content detection, stem verification, and a vocal/personality-rights match against the named singer in the collaboration agreement. Each result is logged and visible in the contributor's dashboard.

The full per-step mechanics are in The Contribution Process.

Detecting AI-generated uploads

CORPUS actively screens uploads for synthetic origin. Submissions are checked using dedicated detection systems as part of the ingestion pipeline, and flagged material goes to review before acceptance.

Detection methods and generation techniques both evolve continuously — this is an adversarial, moving target, not a problem that gets solved once. CORPUS plans to complement detection with additional safeguards as they prove reliable in practice: watermarking, hash-based provenance tools, and other emerging signals.

In parallel, the protocol is being built to adapt as jurisdictions move toward mandatory provenance disclosure. The EU AI Act is the most concrete example; the regulatory framing is in Compliance and Regulation.

Peer review and metadata validation

Metadata and quality checks are partly community-driven. The risk this creates — bias, neglect, capture — is managed by phasing the review tier:

Initial deployments rely on invite-only expert reviewers.
From 2026 onward, CORPUS introduces structured systems — examples under design include reviewer tiers and a strike system — supported by audit logs and explicit escalation paths.

The point is that peer review is logged, appealable, and traceable. A flagged contribution is not silently sidelined; the contributor sees the flag, can respond, and can escalate. The contributor-facing route runs through the Action Hub on Your Dashboard; the appeal route ends at Dispute Resolution.

Provenance carries forward

Verification at ingest is the first link in a longer chain. Once a contribution enters the library, its consent record, ownership splits, and metadata are bound to it across every training run that uses it — and, where the technology supports it, into the outputs of the models trained on it. The mechanics of carrying provenance forward into outputs (ISCC codes, provenance hashes) sit in Audit Trail and Transparency.

The ingestion pipeline​

Detecting AI-generated uploads​

Peer review and metadata validation​

Provenance carries forward​

The ingestion pipeline

Detecting AI-generated uploads

Peer review and metadata validation

Provenance carries forward