Skip to main content

Where CORPUS Came From

Public beta

CORPUS is being built in the open. Some of what you read here is live, some is still design intent — expect it to evolve.

CORPUS started in artistic work, not as a policy project. Sofilab, a Munich-based sound design and innovation lab, began experimenting with generative music models in the late 2010s. The technology moved fast; the training data did not. Available material was either low-quality, poorly annotated, or legally unusable. The performance of generative models, it turned out, depends at least as much on data as on architecture.

Two lines of work, one bottleneck

In parallel, Sofilab was building adaptive sound systems for automotive, medical, and robotics partners. That work required something different from preproduced tracks: sound that behaves, not sound that is played. A vehicle's sonic environment had to respond to driving dynamics and driver state. A therapeutic device had to adapt to a patient's condition in real time. Manual preproduction does not scale to those cases — every additional sensor or behavioral parameter multiplies the conditions exponentially.

Both lines hit the same constraint: there was no rights-cleared, diverse, well-annotated corpus to train on. And no one was building one.

From artistic frustration to infrastructural necessity

The recognition was that this is not a research problem to solve once. It is an infrastructure problem — the music economy will need a licensing system designed for AI training, the same way it needed one for radio in the 1920s and streaming in the 2000s.

CORPUS is that licensing system, designed from the start to:

Who's behind it now

CORPUS is operated by Sofilab (Munich, Germany). The protocol is designed to evolve from a platform Sofilab operates today toward an open infrastructure administered by a dedicated foundation as the system reaches scale. The institutional roadmap is in The Foundation.

Next: The Three-Layer Architecture.