Skip to main content
Darius Foo

ER Diagrams and Alloy

I've been curious for a while about how Alloy and ER diagrams are related. Both are technically logics for describing relational models. It makes sense that we should be able to encode ER diagrams in Alloy and get access to the latter's model-finding capabilities.

Here's a straightforward, by-example mapping between both formalisms, paying special attention to the differences.

Entity sets, relationship sets, and cardinality

A direct way to represent an entity set is as an Alloy signature. Relationship sets map to relations (or fields) of the signature.

sig e1 {
  r1: one e2
}
sig e2 {}

The default multiplicity of a relation is one, declared explicitly here. This means that r1 is one-to-many: every e1 is related to exactly one e2, but there may be e2s not related to any e1s.

Here is an equivalent ER diagram1:

G e1 e1 r1 r1 e1->r1 (1, 1) e2 e2 r1->e2 (0, n)

Relations in Alloy are directed. To specify the converse cardinality, we can use a signature fact, using ~ to invert the relation:

sig e1 {
  r1: e2
}
sig e2 {}{ one this.~r1 }

This means that each e1 is associated with exactly one e2 and vice versa.

G e1 e1 r1 r1 e1->r1 (1, 1) e2 e2 r1->e2 (1, 1)

Other cardinality constraints use different keywords:

Cardinality constraint Alloy multiplicity
(0, 0) none
(0, 1) lone
(1, 1) one
(1, n) some
(0, n) set

More on encodings later. Let's look at some applications first.

Checking properties

With this encoding, we can already reason mechanically about ER diagrams. For example, we can verify simple properties, such as: that the cardinality constraints we state do what we intend them to.

Cardinality constraints essentially rule out some models. For example, in the following scenario, the constraint means...

G e1 e1 r1 r1 e1->r1 c e2 e2 r1->e2
Constraint Meaning
(0, 0) prevents e1 from being related to e2 at all
(0, 1) prevents e1 from being related to e2 more than once
(1, 1) prevents e1 from being related to e2 more or less than one
(1, n) prevents e1 from being related to e2 less than once
(0, n) prevents nothing

Given is (1, 1), all the models of a particular ER diagram should satisfy properties like:

  1. All e1s are related by r1
  2. No e1s are related by r1 multiple times

This can all be expressed in Alloy as follows. The check command is used to verify a property, producing a counterexample if it exists within the given finite scope.

sig e1 {
  r1: e2
}
sig e2 {}

assert no_unrelated_e1 {
  all e: e1 | some e.r1
}

assert no_multiply_related_e1 {
  no e: e1 | #e.r1 > 1
}

check no_unrelated_e1 for 2
check no_multiply_related_e1 for 2
2 commands were executed. The results are:
   #1: No counterexample found. no_unrelated_e1 may be valid.
   #2: No counterexample found. no_multiply_related_e1 may be valid.

whereas the same properties will not hold of e2, which makes sense because we left it unconstrained.

assert no_unrelated_e2 {
  all e: e2 | some e.~r1
}

check no_unrelated_e2 for 2
Executing "Check no_unrelated_e2 for 2"
   Actual scopes: 2 e1, 2 e2
   Solver=sat4j Bitwidth=4 MaxSeq=2 SkolemDepth=1 Symmetry=20 Mode=batch
   84 vars. 10 primary vars. 116 clauses. 5ms.
   Counterexample found. Assertion is invalid. 5ms.

The counterexample produced demonstrates rather obviously that it is possible to have an e2 not related to an e1.

G e1_0 e1_0 e2_0 e2_0 e1_0->e2_0 r1 e2_1 e2_1

Generating example models

Alloy is also able to check if a set of constraints is satisfiable, producing a witness. An immediate use of this is to look at examples of models, to get a feel for what an ER diagram means.

The simplest way to do this is to write a command specifying only the scope

run {} for 2

and select "Show New Solution" repeatedly.

We can also write constraints to select models, to confirm their presence. Say we want a model where e1 and e2 are related:

run { some e: e1 | some e.r1 } for 2

Minimum and maximum cardinality

Finding models can help us answer other kinds of questions, like:

G e1 e1 r1 r1 e1->r1:w (1, 1) e2 e2 r2 r2 e2->r2:e (1, 1) r1:e->e2 (0, n) r2:w->e1 (0, 1)

Given this ER diagram, and assuming we have 2 e1s, how many e2s can we have at minimum/maximum?

We encode the ER diagram, then use run to see if we can find a model containing 2 e1s and 2 e2s. This is possible. However, there are no satisfying models with more than 2 e2s. Why? They would violate the lone constraint, as there would then be two e2s related to the same e1 via r2.

sig e1 {
  r1: one e2
}{ lone this.~r2 }

sig e2 {
  r2: one e1
}

run {} for exactly 2 e1, exactly 2 e2
run {} for exactly 2 e1, exactly 3 e2
2 commands were executed. The results are:
   #1: Instance found. run$1 is consistent.
   #2: No instance found. run$2 may be inconsistent.

Similarly, we see that the minimum number of e2s is 1. This is because of the one constraint on r1.

run {} for exactly 2 e1, exactly 1 e2
run {} for exactly 2 e1, exactly 0 e2
2 commands were executed. The results are:
   #1: Instance found. run$1 is consistent.
   #2: No instance found. run$2 may be inconsistent.

Programmatic use of Alloy

To build tools on top of this encoding, a way to invoke Alloy programmatically is required.

Alloy does not have a CLI. However, it has a well-documented Java API and examples of use. Here's a minimal example of how it might be used to process an Alloy source file and export the resulting model to, e.g., GraphViz or JSON, to interoperate with other tools. The default XML exporter does not seem to work, crashing when I tried it.

Minimal.java
import edu.mit.csail.sdg.alloy4.*;
import edu.mit.csail.sdg.ast.*;
import edu.mit.csail.sdg.ast.Module;
import edu.mit.csail.sdg.parser.*;
import edu.mit.csail.sdg.translator.*;

public final class Minimal {
  public static void main(String[] args) throws Err {
    A4Reporter rep = A4Reporter.NOP;
    for (String filename : args) {
      Module world = CompUtil.parseEverything_fromFile(rep, null, filename);
      for (Command command : world.getAllCommands()) {
        A4Solution ans = TranslateAlloyToKodkod.execute_command(rep, world.getAllReachableSigs(), command, new A4Options());
        if (ans.satisfiable()) {
          // crashes
          // ans.writeXML("output.xml");
          for (Sig s : ans.getAllReachableSigs()) {
            if (!s.label.contains("this")) {
              continue;
            }
            System.out.println(s.label + ":");
            System.out.println(ans.eval(s, 0));
            for (Sig.Field f : s.getFields()) {
              System.out.println("  " + f.label + ": " + ans.eval(f, 0));
            }
          }
        }
      }
    }
  }
}
run.sh
#!/bin/bash
jar=/Applications/Alloy.app/Contents/Resources/org.alloytools.alloy.dist.jar
set -ex
javac -cp $jar Minimal.java
java -cp .:$jar Minimal erd.als 2> /dev/null

Attributes and aggregation

A Method for Expressing Integrity Constraints in Database Conceptual Modeling (2020) is the only other article I found which also discusses encodings. In Section 4.3.2, they give a translation scheme very briefly.

I agree with their choices for attributes and values and have paraphrased them in the table below.

ERD Alloy Example
Domain Signature sig EmpId {}
Value Singleton sub-signature one sig John extends EmpId {}
Attribute Field sig Employee {id: EmpId}
Key attribute Separate uniqueness assumption fact {#employee = #employee.id}

Attributes in ER diagrams are usually simple and do not have much structure, but if we really want, we can now check properties about them.

The remaining choice they make is how to model relationship sets.

However, we chose to use signatures for representing both the entity types and the relationship types. We think this captures the relationship set semantics more accurately, and provides its own structure.

This is useful for encoding aggregation. Here's an example.

sig Bottle {}
sig Session {}
one sig Opened {
  rel: Bottle -> lone Session
}
sig Member {}
one sig Tasted {
  rel: Member -> Opened
}

run {} for exactly 1 Session, exactly 2 Bottle, exactly 2 Member 

The tradeoff is that writing constraints for ternary relations and above gets complex, and that the default visualization is less intuitive. It seems better to use Alloy's relations if aggregation isn't required.


  1. We're using min-max notation. ↩︎