I became aware of a very interesting library recently. It uses a combination of dependent types and typeclasses to support a surprisingly usable and boilerplate-free approach to the representation of binders.
It deserves to be more widely known, so here is a small tutorial covering the intuition behind it, and how to use it, with a minimum of category theory.
From Stdlib Require Import Utf8.
Require Import Binding.Lib Binding.Set.
We’ll use a lambda calculus to explain the key ideas.
First, a syntax expr can be thought of as a functor over some type of variables V.
We define it as follows.
Inductive expr (V : Set) : Set :=
| ret : val V → expr V
| app : expr V → expr V → expr V
with val (V : Set) : Set :=
| var : V → val V
| lam : expr (inc V) → val V.
Arguments ret {V} v.
Arguments app {V}.
Arguments var {V} x.
Arguments lam {V} e.
The only unexpected thing is that we have an extra inc constructor in the type of the lambda body, to indicate the presence of a binder.
There is otherwise no explicit binder, similar to the use of de Bruijn indices.
With this, V may be more appropriately thought of as a scope: a set of free variables that may occur in a given expr V.
More on this in a bit.
First, does the functorial view buy us?
For one, fmap, of type (A → B) → expr A → expr B, corresponds to renaming: replacing the variables with of type A at the leaves of a term with variables of type B, preserving the structure of the term.
For another, a substitution can be thought of as a function A → expr B. This makes sense because it maps variables to expressions of a possibly-different scope.
The action of substitution takes a substitution and an expression and produces an expression, i.e. it has type (A → expr B) → expr A → expr B, which is just monadic bind.
Both of these hint at structure that we can take advantage of: some way to factor out lemmas relating to renaming and substitution, so developments don’t have to be bogged down with them.
Let’s get these definitions and some boilerplate out of the way, so we can see some examples.
#[global]
Instance SetPureCore_val : SetPureCore val :=
{ set_pure := @var }.
Fixpoint emap {A B : Set} (f : A [→] B) (e : expr A) : expr B :=
match e with
| ret v => ret (vmap f v)
| app e1 e2 => app (emap f e1) (emap f e2)
end
with vmap {A B : Set} (f : A [→] B) (v : val A) : val B :=
match v with
| var x => var (f x)
| lam e => lam (emap (lift f) e)
end.
#[global]
Instance FunctorCore_emap : FunctorCore expr := @emap.
#[global]
Instance FunctorCore_vmap : FunctorCore val := @vmap.
Fixpoint ebind {A B : Set} (f : A [⇒] B) (e : expr A) : expr B :=
match e with
| ret v => ret (vbind f v)
| app e1 e2 => app (ebind f e1) (ebind f e2)
end
with vbind {A B : Set} (f : A [⇒] B) (v : val A) : val B :=
match v with
| var x => f x
| lam e => lam (ebind (lift f) e)
end.
#[global]
Instance BindCore_ebind : BindCore expr := @ebind.
#[global]
Instance BindCore_vbind : BindCore val := @vbind.
The only nonstandard things in the definitions were the use of infix operators [→] and [⇒] where we would expect function arrows →, and the lift combinator.
A [→] B can roughly be understood as A → B, and A [⇒] B as A → expr B.
The latter is a substitution.
lift lifts both of these over inc constructors: Its type is, roughly, (A [→] B) → inc A [→] inc B.
(This is a simplified, specialisd view, to get some intuition before we are exposed to the full generality of the library.)
With the definitions down, we can look at some examples.
Well-scoped terms
First, we have that the types of terms tell us how many free variables occur in them.
In other words, this is a well-scoped representation.
A val ∅ is a closed term.
Definition a1: val ∅ := lam (ret (var (&0))).
Definition a4: val (inc ∅) := var (&0).
Definition a2: val (inc ∅) := lam (ret (var (&1))).
&0 and &1 are variables, represented as de Bruijn indices,
with types inc ∅, inc (inc ∅), etc.
Definition b1 : inc ∅ := &0.
Definition b3 : inc (inc ∅) := &1.
∅ is defined as Empty_set, so it has no inhabitants, which is consistent with how it denotes the absence of free variables.
Next, the number of free variables is an overapproximation of how many free variables there actually are in a term. In other words, weakening of scopes is allowed by construction.
Here we have an identity function, but we’ve given it a type indicating that it has up to one free variable.
This is fineing
It is also fine to give &0 a similarly more “permissive” type.
Definition a3: val (inc ∅) := lam (ret (var (&0))).
Definition b2 : inc (inc ∅) := &0.
Expressions with multiple free variables can be typed by weakening appropriately.
Definition a10 : expr (inc (∅)) := ret (var (&0)).
Definition a11 : expr (inc (inc (∅))) := ret (var (&0)).
Definition a13 : expr (inc (inc ∅)) := ret (var (&1)).
Definition a12 : expr (inc (inc (∅))) := app a11 a13.
Substitution
The substitution operation used to descend into lambda bodies is called subst.
It has type expr (inc A) → expr A → expr A, consistent with how it removes one free variable.
Definition a42 (v: val (inc ∅)) : val ∅ := subst v a1.
What is typically called parallel substitution, where a substitution is applied to a term, is called bind.
Here is an instance of it with a specialised type.
Definition a6 (γ: (inc ∅) [⇒] ∅) (a: val (inc ∅)) : val ∅ :=
bind γ a.
Typically we will quantify over substitutions of a given type (commonly, γ : V [⇒] ∅, to say that the codomain consists of closed values),
and not construct one manually, but here’s how one can be made out of a function,
to give some intuition for how things work.
Definition a8 (a: val (inc ∅)) : val ∅ :=
let γ1 : (inc ∅) [⇒] ∅ := Build_Sub val _ _ _ (fun s => a1) in
bind γ1 a.
The parameter s can be matched on to replace a specific variable.
Very interestingly, the type of the substitution γ constrains what s can be!
The match on s is exhaustive even though we only cover the &0 case… given enough annotations.
&0 is notation for NZ.
In words, γ replaces only one free variable, and we have to give a value for only that one.
Definition a9 (a: val (inc ∅)) : val ∅ :=
let γ : (inc ∅) [⇒] ∅ := Build_Sub val _ _ _ (fun s =>
match s with
| NZ => a1
end)
in
bind γ a.
Experiences with this approach in formalisations
A great showcase of what the library can do is the formalisation for the paper Untyped Logical Relations at Work: Control Operators, Contextual Equivalence and Full Abstraction.
We are developing a formalisation in similar style.
The use of a well-scoped representation shines when definitions pertain to e.g. closed terms only. Associating the closedness property with terms by construction not only saves us from re-proving the same closedness obligations over and over, but seems more expressive than explicit premises, as it is possible to carry around such assumptions without having to prove them in cases where the definition appears contravariantly.
For example, when proving transitivity of a relation, if closedness premises as used, we will not not be able to prove closedness about the middle element; the appropriate place for the property is together with the element, and not in the relation.
Parameter exp : Type.
Parameter closed : exp → Prop.
Parameter related : exp → exp → Prop.
Definition rel (e1 e2: exp) : Prop :=
closed e1 → closed e2 → related e1 e2.
From Stdlib Require Import Classes.RelationClasses.
Goal Transitive rel.
unfold Transitive, rel. intros.
(* we are stuck, as we cannot prove y is closed *)
Abort.
The big question: how does this compare to a first-order representation of names, like de Bruijn indices with Autosubst? It’s better in some ways but worse in others. Better because the syntax is well-scoped by construction, and many properties come for free. Worse because more annotation is needed. Perhaps this is an artifact of the typeclass-heavy design of the library, and could be lessened by making it less generic.