The Three-Layer Architecture
CORPUS is being built in the open. Some of what you read here is live, some is still design intent — expect it to evolve.
CORPUS faces a structural choice: become a platform — a centralized entity that holds data, builds models, and sells products — or become a protocol, an open infrastructure others can build on, with CORPUS operating specific commercial services on top. A platform concentrates control. A protocol distributes it. For a system that claims to reverse the extractive logic of the existing music economy, the architecture has to be consistent with the argument.
The three-layer design resolves the tension between openness, protection, and commercial viability.
Layer 1: The open protocol
The rules of the system — how contributions are evaluated, how licenses are structured, how provenance is tracked, how participation rights are allocated — are designed to be transparent, auditable, and ultimately open.
Concretely: the scoring methodology, the audit framework, and the data standards for contributions will be published and independently verifiable. Openness is the precondition for contributor trust, and contributor trust is the precondition for the corpus to grow.
Layer 2: Controlled data access through federated learning
Raw audio cannot be recalled once released. Traditional dataset licensing — where files are shipped to a licensee — offers no structural protection against misuse.
Layer 2 is designed around federated learning: a licensee submits a model architecture, training runs inside CORPUS infrastructure, only the resulting model weights leave. The data is controlled technically, not just contractually. This path is in active development and not yet validated at scale; the default at launch is licensing CORPUS-trained models. See Access Models and What Remains Open.

Layer 3: The proprietary semantic pipeline
CORPUS's ability to generate high-resolution semantic descriptions of music — narrative function, emotional arc, structural dynamics, timbral character, contextual fit — is proprietary intellectual property and the primary source of competitive differentiation. It is what makes the training data valuable beyond its raw audio content. See The Semantic Layer for what this layer actually does and what it produces.
Technology stance
The protocol borrows the structural logic of Web3 systems — automated, trustless execution of agreements — without committing to any particular implementation. Whether blockchain becomes part of the technical foundation depends on scalability and adaptability; for the early stages of development, CORPUS prioritises flexibility over technology lock-in.
Why three layers, not one
The three layers are distinct in their logic:
- The protocol is open because trust requires transparency.
- The data is controlled because protection requires enforcement.
- The pipeline is proprietary because innovation requires incentive.
This separation is also mirrored in the institutional structure: foundation, IP entity, commercial entity. Three layers technically, three entities legally — designed so that commercial pressure cannot override contributor protections.
Next: From Platform to Protocol.