Skip to main content

Audit Trail and Transparency

Public beta

CORPUS is being built in the open. Some of what you read here is live, some is still design intent — expect it to evolve.

Auditability is both a governance tool and a legal necessity. Every contribution, license, training run, and royalty flow in CORPUS is logged in an append-only, tamper-evident registry. Past records cannot be altered without detection, which is what turns "we comply" from a vendor representation into something the system can demonstrate on demand.

An append-only registry

The registry records the things that determine who is owed what and on what authority:

  • Contributions — the file, the upload-time collaboration agreement, the consent declaration, the integrity-check results, the scoring decision.
  • Licenses — every commercial agreement, the dataset composition it covers, the contributor splits it carries.
  • Training runs — which dataset, which model, which licensee.
  • Royalty flows — the per-model point totals, the distribution rounds, the per-contributor allocations.

Because the registry is append-only, history is reconstructible. A dispute over a payout calculation, a question about whether a track was in a given training set, an audit request from a licensee or regulator — all resolve against the same record.

Provenance carried into model outputs

Audit at the dataset level is one half of the story. The other is keeping the link between training data and generated material visible across the ecosystem.

CORPUS will support embedding attribution metadata into model outputs through mechanisms such as ISCC codes, C2PA content credentials, and provenance hashes. The practical case: a game studio or mobility partner can verify the provenance of a licensed model before integration into its engine or device — not on the basis of a vendor attestation, but by reading the metadata the model itself emits.

Regulatory compliance, by construction

The registry is the structural foundation for compliance with EU rules on training data:

  • DSM Directive Article 4 — text- and data-mining opt-outs are sidestepped by CORPUS's opt-in design; the registry holds the consent record per contribution.
  • EU AI Act Article 53 — the public training-content summary for general-purpose AI providers is producible from the dataset-composition log per training run.

The detail — exact obligations, scope, the licensee's position — is in Compliance and Regulation. What this page documents is the source: the records exist, are tamper-evident, and can be produced under audit, because the protocol writes them as part of its normal operation.

Oversight

Contributor-elected boards review reports, with external audits as the system matures. The point is that auditability is not a feature CORPUS extends as a courtesy — it is the evidentiary basis the entire licensing model rests on.

Where audit raises a contested decision rather than a technical record, the path forward is in Dispute Resolution.