Compliance and Regulatory Alignment

The EU AI Act is in force, the DSM Directive has been transposed across the EU, and rights organisations have begun exercising opt-outs in ways that make unlicensed training structurally non-compliant in major markets. CORPUS makes compliance a property of the architecture rather than a vendor attestation: the records exist, are tamper-evident, and can be produced under audit. This page covers what that looks like for each of the rules that matter.
EU AI Act
The EU AI Act (Regulation (EU) 2024/1689) sets documentation and copyright-compliance obligations for providers of general-purpose AI (GPAI) models. The provisions most directly relevant to a training-data foundation sit in Article 53 and Annex XI. GPAI obligations have been in force since 2 August 2025; AI Office enforcement begins 2 August 2026. Non-compliance carries fines of up to €15 million or 3% of global annual turnover, whichever is higher.
- Article 53(1)(d) — public training-content summary. GPAI providers must publish a sufficiently detailed summary of training content, following the mandatory template published by the AI Office in July 2025. CORPUS produces the underlying material as a property of the system: every training run logs its exact dataset composition, every license is registered with the contributor splits it carries, every contribution carries its consent record from upload. See Audit Trail.
- Article 53(1)(c) — copyright policy and TDM opt-out compliance. GPAI providers must respect reservations expressed under Article 4(3) of the DSM Directive. CORPUS sidesteps the opt-out problem entirely by licensing through explicit AI-specific contributor opt-in — see below.
- Annex XI — technical documentation. GPAI providers must maintain documentation of training and evaluation, with detail on data sources, curation methods, and design choices. CORPUS provides the training-side material that feeds into this documentation.
The Act treats documentation as an obligation. CORPUS treats it as a property the system has to have to be worth using. Operational details — the GPAI Code of Practice, the AI Office's guidance on template depth — are still being elaborated; the architectural commitment that every licensee can produce the records, not merely claim them, is fixed.
DSM Directive and the TDM opt-out
Article 4 of the EU's Digital Single Market Directive permits commercial text- and data-mining only where rights holders have not reserved their rights — the TDM opt-out, which must be expressed in an appropriate manner and, for online content, in machine-readable form. CMO and rights-holder strategies around this have diverged sharply:
- SACEM in France opted out explicitly for AI training in October 2023.
- GEMA in Germany pursued litigation. In GEMA v OpenAI, the Munich Regional Court ruled in November 2025 that while the TDM exception generally covers AI training, models that reproduce training data substantially (memorisation) fall outside the exception — exposing providers to copyright liability on a per-work basis.
- STIM in Sweden took the opposite tack, launching opt-in collective licensing pilots for AI training.
For a licensee assembling training data through CMO-by-CMO licensing, three things matter:
- Unlicensed training is no longer a legal grey area in the EU. Where an opt-out has been validly exercised, or where memorisation can be demonstrated under the Munich line, training on the affected repertoire is unlawful regardless of how the data was collected.
- CMO strategies are not coordinated. A licensee inherits the coordination problem. CORPUS resolves this by making every contribution an explicit, recorded opt-in at upload, independent of CMO-level decisions about Article 4.
- Membership agreements were written before AI. Most CMO membership terms cover public performance and mechanical reproduction, not machine learning. Whether CMOs even hold the rights to license training is unresolved in most jurisdictions. CORPUS licenses training rights directly from contributors with explicit AI-specific consent.
The structural position of a CORPUS-trained model under the DSM Directive is therefore simple: every track in the training set entered with explicit, AI-specific contributor consent, and the consent record is part of the provenance trail. The opt-out question does not arise.
Personality rights and vocal performance
For tracks with vocals, personality rights sit above copyright in most European jurisdictions and cannot be cleared by the rights holders of the underlying composition alone. CORPUS makes no exceptions: vocal tracks enter the library only where the named singer has explicitly consented, and the consent is captured in the upload-time collaboration agreement. See Ownership and Consent.
For licensees deploying voice-generating models — particularly in advertising, entertainment, or any application where the output could be perceived as a specific person's voice — this matters at procurement. A model trained on vocal performances that were not personality-rights-cleared carries exposure that the licensee inherits.
Data residency and infrastructure
CORPUS infrastructure runs on self-administered servers in Germany under EU data protection law, with a federated global server network planned as the corpus scales. Storage, training, and audit records remain within EU jurisdiction unless an engagement specifies otherwise.
Security alignment follows established frameworks:
- ISO 27001 for risk and information security management.
- SOC 2 for audited controls on secure operations.
Formal certification will follow at commercial scale. Alignment from the outset is what ensures compatibility with the procurement requirements of partners in regulated industries.
For a licensee deploying in the EU, the difference is between satisfying a compliance obligation and absorbing the legal risk that comes with vendor representations. The records exist whether or not anyone asks for them — see Why CORPUS for why that matters in contrast to attestation-based deals.