The Three-Layer Architecture

Public beta

CORPUS is being built in the open. Some of what you read here is live, some is still design intent — expect it to evolve.

CORPUS faces a structural choice: become a platform — a centralized entity that holds data, builds models, and sells products — or become a protocol, an open infrastructure others can build on, with CORPUS operating specific commercial services on top. A platform concentrates control. A protocol distributes it. For a system that claims to reverse the extractive logic of the existing music economy, the architecture has to be consistent with the argument.

The three-layer design resolves the tension between openness, protection, and commercial viability.

Layer 1: The open protocol

The rules of the system — how contributions are evaluated, how licenses are structured, how provenance is tracked, how participation rights are allocated — are designed to be transparent, auditable, and ultimately open.

Concretely: the scoring methodology, the audit framework, and the data standards for contributions will be published and independently verifiable. Openness is the precondition for contributor trust, and contributor trust is the precondition for the corpus to grow.

Layer 2: Controlled data access through federated learning

Raw audio cannot be recalled once released. Traditional dataset licensing — where files are shipped to a licensee — offers no structural protection against misuse.

Layer 2 is designed around federated learning: a licensee submits a model architecture, training runs inside CORPUS infrastructure, only the resulting model weights leave. The data is controlled technically, not just contractually. This path is in active development and not yet validated at scale; the default at launch is licensing CORPUS-trained models. See Access Models and What Remains Open.

A diagram showing multiple musicians on the left feeding into a central library star with blue arrows. From the library, blue arrows fan out to three brain silhouettes representing different models, and from each brain further blue arrows reach a vertical column of application icons — a face, an animal, an eye, a screen, an ear, a fish, a pineapple — representing different application domains powered by the trained models. — One shared corpus, many models, many applications. Each model is trained on a subset of the corpus appropriate to its application domain; the same contribution can sit in several subsets at once.

Layer 3: The proprietary semantic pipeline

CORPUS's ability to generate high-resolution semantic descriptions of music — narrative function, emotional arc, structural dynamics, timbral character, contextual fit — is proprietary intellectual property and the primary source of competitive differentiation. It is what makes the training data valuable beyond its raw audio content. See The Semantic Layer for what this layer actually does and what it produces.

Technology stance

The protocol borrows the structural logic of Web3 systems — automated, trustless execution of agreements — without committing to any particular implementation. Whether blockchain becomes part of the technical foundation depends on scalability and adaptability; for the early stages of development, CORPUS prioritises flexibility over technology lock-in.

Why three layers, not one

The three layers are distinct in their logic:

The protocol is open because trust requires transparency.
The data is controlled because protection requires enforcement.
The pipeline is proprietary because innovation requires incentive.

This separation is also mirrored in the institutional structure: foundation, IP entity, commercial entity. Three layers technically, three entities legally — designed so that commercial pressure cannot override contributor protections.

Next: From Platform to Protocol.

Layer 1: The open protocol​

Layer 2: Controlled data access through federated learning​

Layer 3: The proprietary semantic pipeline​

Technology stance​

Why three layers, not one​

Layer 1: The open protocol

Layer 2: Controlled data access through federated learning

Layer 3: The proprietary semantic pipeline

Technology stance

Why three layers, not one