Skip to main content
Darius Foo

Satisfying Music Theory

This post is about my experiments in generating music using SMT.

As a teaser, here is a procedurally-generated harmonization of the first line of Joy to the World. Given the melody (in the soprano voice), the system produces the other three voices in a way that makes harmonic sense.

X: 1 T: Joy to the World L: 1/16 K: C M: 4/4 Q: 1/4=110 V: sop name="S" V: alto name="A" V: tenor name="T" clef=treble-8 V: bass name="B" clef=bass middle=d transpose=-24 % V: sop c4 B3 A1 G6 F2 E4 D4 C6 G2 A6 A2 B6 B2 c6 V: alto e4 d4 G6 d2 G6 G6 c6 F4 A2 e2 e1 e4 B1 c6 V: tenor A4 D4 G6 A2 E1 E3 D4 E4 E6 A6 G4 G1 G3 A6 V: bass "vi" A4 "V" G3 "ii" F1 "I" G6 "ii" A1 "iii" d1 "V" G6 "I" G6 "I" c6 "vi" d6 "ii" G2 "iii" B3 "iii" G3 "vi" A6

More examples here. The rest of the post describes the journey (or at least the first part of it, since it appears to be far from over!).

Procedural music composition

I've always found Renaissance polyphony deeply beautiful and intriguing. As an example, here is a performance of Sicut Cervus, one of the hallmark pieces of the period.

Here is a visualization of the piece which shows how the four voices move and interact. The voices have individual character, sometimes engage with each other, and all fit together to create cohesive harmony.

Here's another example, Though Amaryllis Dance in Green.

Check out the visualization (watch at 2x speed). As an analysis describes it,

What makes the piece special - and very fun to hear and to sing - is the liveliness of Byrd's setting. The five parts are extremely independent of each other, each filled with syncopations and leaps, like five dancers improvising, individually but in relation to each other. It makes me think of this wonderful painting by Frank Schomer Lichtner.

How are such intricate pieces created? Can it be done systematically?

The simple approach of choosing chords first, then choosing notes which fit wouldn't result in independent voices. Perhaps a nice, pleasing melody would work as a base, then the other voices could be written, ensuring they work with the harmony, which now emerges from the voices. Presumably this would get harder and harder with each voice added, as new ideas would necessitate revisions of earlier voices to admit them and keep everything cohesive. And this would be done iteratively until the piece is complete.

This sort of interleaved and iterative process of refinement might remind one of constraint propagation algorithms, and indeed, computer-aided composition using constraint solvers is its own subfield. The "omnidirectional" nature of the solving process points to modelling the entire thing as a single constraint satisfaction problem for more effective propagation.

At this point I was curious to see how well a simple, baseline approach with an SMT solver would work. Writing small prototypes is always fun, just to get a feel for the workflow and requirements, and if the idea would even work at all. SMT solvers are a staple in my area of research, so how far could I get by simply writing some constraints in Python, solving them with Z3, then rendering the result?

This turned out to be way more of a rabbit hole than I anticipated.

Vision

Many existing systems tackle specific musical tasks, e.g. generating rhythm, automatic harmonization, generating melody given a rhythm and chord progression, generating a countermelody, etc.

What I hoped to build was something more general, more of a logical foundation for all these tasks. A follow-up goal would then be to build a mixed-initiative composition tool on top of this foundation, perhaps even one accessible to casual creators.

In particular, I was inspired by work on ASP-based procedural narrative generation, which encodes stories in logical form and searches for solutions to procedural generation tasks. Section 6 of this paper describes how in this very general setting, changing the amount of initial input leads to many modes of interaction with the system. Specializing them to our musical use case:

  1. Tabula rasa generation: using the background theory (e.g. harmony and melody constraints) to generate a piece from scratch
  2. Constrained generation: finishing an incomplete piece; satisfying a melody (harmonization), chord progression, etc.
  3. Generation based on a partial piece: starting with a piece that satisfies constraints (e.g. a well-known classic work), remove constraints to generate variations

There is a fourth use case which other systems in the typed music composition community highlight:

  1. Validation: if the piece does not satisfy constraints, the violated constraints show mistakes in the piece or (more likely) inadequacies in our music theory

Encoding

There are many possible logical encodings for musical scores and concepts, with different tradeoffs with respect to expressiveness and solver performance. For example, a naive one would be to represent rhythm as a fixed grid of boolean variables, which would work for some kinds of music. The Strasheela thesis covers numerous encoding schemes, so I'll just go over my specific choices.

Pitch is represented by bounded integer variables.

Rhythm is represented by duration variables. Each note and chord has a duration, whose start is represented as an offset from the beginning of the piece. The first duration has value 0, the last is strictly smaller than the total duration of the piece, and the sequences of durations within a part are monotonically increasing.

Chords are treated the same way, with a duration and pitch (class). Ties and measures are not represented and are treated as a rendering concern.

Whether a note is a rest is represented by a boolean variable, which causes the pitch value to be ignored iff it is true.

Next, melody and harmony.

To apply melodic constraints, we must know if two notes are consecutive, and to apply harmonic constraints, we must know if a chord and note are simultaneous. In a very general setting like ours, this information may not yet be available, since it may be produced as a result of the search.

For example, we won't know if a chord and note are simultaneous if we have not yet determined their rhythm, which may in turn be affected by harmonic constraints, as a certain rhythm may not be allowed if it results in a dissonant chord. This is called the inaccessible score context problem, from Section 6.3 of the Strasheela thesis.

The solution we use is logical implication, which this representation makes easy. A chord and note are simultaneous if their duration variables overlap, and harmony constraints are applied if the simultaneous note is not a rest.

Two adjacent notes are consecutive if they are both not rests, and melody constraints are applied if notes are consecutive.

To enforce harmony constraints, we need a list of allowed pitches given a chord; a relation or table constraint in constraint programming terms. In typical SMT fashion, we unroll everything into a big disjunction instead.

False starts

I tried several other approaches before settling on SMT. My initial naive encodings with Z3 had disappointing performance. I then tried extending FaCiLe with table constraints, but implementing propagators correctly is notoriously difficult and I experienced that firsthand. Then I tried or-tools, which worked well until I encountered a limitation with reified constraints. At this point I went back to Z3 and committed to finding an efficient encoding.

The takeaway: start with SMT. There probably is an efficient SMT encoding. If there isn't (and you have confirmed this!), you will likely have to look into a custom theory solver for your domain.

A few examples of subpar encodings:

Even with SMT, straying from the golden path and using newer SMTLIB features tends to lead to frustration, e.g. bugs and lacking support in solver APIs.

Rust will find a way

My original Python implementation got unreasonably slow past a certain point.

$ ./run.sh examples/ode.py
music4.so not built, using Python implementation
Constraint (melody)             0.01s
Constraint (basics)             0.03s
Constraint (rhythm)             0.00s
Constraint (same ending rhythm) 0.00s
Constraint (no breathing)       0.00s
Constraint (no breathing)       0.00s
Constraint (no breathing)       0.00s
Constraint (simultaneity)       1.51s
Constraint (consonance)         23.50s
Constraint (melody intervals)   0.10s
Constraint (note durations)     0.04s
Constraint (chord durations)    0.02s
Solving                         59.69s
sat
Ode to Joy.musicxml
Ode to Joy.abc

It turns out that Z3's Python API has significant overhead. I found other examples where just adding the constraints took 5s, but solving them was instant, evidenced by running Z3 on the generated SMTLIB file.

The usual move from here is to rewrite in Rust, so that's what I did. This significantly improved solving time...

$ ./run.sh examples/ode.py
Using music4.so
Constraints     446.5ms
Solving         15.1s
Sat
Ode to Joy.musicxml
Ode to Joy.abc

... at the cost of having to maintain a second, slightly-different implementation. I would have liked to retire the Python version, but there are reasons to keep it:

Maybe it could be used for differential testing someday...

Examples

Now for some examples of generated pieces. The constraints used for these may be found here. Feel free to tweak them and generate of your own pieces.

Tabula rasa

In the absence of other constraints, all we have here is that intervals in consecutive notes have pleasing, familiar intervals.

Countermelody

Harmonization

X: 1 T: Ode to Joy L: 1/16 K: C M: 4/4 Q: 1/4=110 V: sop name="S" V: alto name="A" V: tenor name="T" clef=treble-8 V: bass name="B" clef=bass middle=d transpose=-24 % V: sop E4 E4 F4 G4 G4 F4 E4 D4 C4 C4 D4 E4 E6 D2 D8 E4 E4 F4 G4 G4 F4 E4 D4 C4 C4 D4 E4 D6 C2 C8 V: alto B8 A4 G8 D4 A,8 A,8 G,6 B,6 E2 B2 B2 A2 G2 A8 E2 F2 F2 G4 G2 G2 A8 D2 D2 C6 E2 B4 E4 A6 E2 C8 V: tenor G8 d2 d2 g8 d4 A4 D2 A2 E2 A6 d2 G2 C2 G8 d4 a2 b2 a2 a8 d4 G8 A2 F2 C4 G4 c8 B8 A6 e2 e8 V: bass "iii" G8 "ii" d8 "V" d8 "ii" A8 "vi" A8 "ii" B4 "vi" c2 "V" g8 "I" d4 "iii" a2 "V" d'4 "ii" c'8 "V" f4 "ii" g4 "vi" d4 "vi" A4 "ii" A2 "V" A2 "iii" d4 "V" c6 "ii" c2 "vi" G2 "V" d2 "vi" g2 "V" g2 "iii" d'2 "iii" d'2 "ii" d'2 "vi" c'2 "I" c'8

Related work

The most closely related works are the constraint programming libraries Strasheela (Oz) (which I took a lot of inspiration from) and cluster-engine (Common Lisp). Others are gecodeMCP (C++) and OMClouds (Common Lisp).

I have not yet tried most of them, but given their maturity I'm assuming they can handle all these benchmarks. One salient difference is the language the user must write constraints in (annotated above).

Another is that the use of SMT allows arbitrary constraints in standard theories (e.g. LIA) to be expressed. With constraint solvers, the set of usable constraints is typically large but limited (no arbitrary disjunction!) and non-uniform across solvers, but solvable with more efficient specialized algorithms. They may be more appropriate if the set of constraints required is well-understood (not the case yet, in this work).

Many of these libraries have been integrated into composition IDEs, such as Opusmodus, OpenMusic, and PWGL (seemingly defunct, links no longer work). Rhythm-Box and Melodizer (mentioned earlier) are components of OpenMusic.

Other systems this work was inspired by are ANTON, which uses ASP for harmonization, MusicTools, an Agda library which also discharges musical synthesis via SMT, and Type-Guided Music Composition, an approach which uses weighted refinement types to validate and synthesize music.

Conclusion

This is very much early work. Nevertheless, I'm publishing it to document my progress, get feedback, and allow others to play with the code. I've spent way too long working on this in isolation.

Here are a few ideas for how this work could be brought forward.

The prototype is currently limited to a single key (C maj) and 6 diatonic chords. Lifting these restrictions or improving the background theory (e.g. richer melody constraints, rhythmic constraints, more contrapunctal rules) would be great. The potential of SMT solvers (optimization, soft constraints, unsat cores, alternative solvers) could be further exploited. The system could be applied to more interesting musical puzzles, e.g. fugues, which would likely require new kinds of global constraints.

While it would be nice to integrate this into a composition IDE or build a new one for casual creators, I have no immediate plans to do so. There are many UX problems to be solved for a good UI design. In particular, the UI should be cognizant of the limitations and potential of the use of SMT, and should provide the right affordances for creating music without requiring users to dive into the underlying theory. The NP-hardness of SMT suggests that the UI should provide a means to localize solving.

The stability of produced outputs is another consideration: it is possible that the unconstrained parts of the piece could be entirely different between two solver calls. This would be a problem if users are working with the system in true mixed-initiative fashion, using its feedback to refine a synthesized piece. My previous work suffers from the same problem and utilized manual pinning of parts of the solution to mitigate this, but I'm sure there is a better way.

Code here.