This post is about one particularly thorny obstacle on the journey to developing a mechanisation of staged logic for my PhD thesis.
First, some background on staged logic. It is a specification logic for higher-order programs with effects.
Here is the syntax of its formulae. It is an untyped lambda calculus with specifications and logical connectives.
Like other logics, it comes with an entailment relation, initially defined as follows, where
This definition is known under many names in different communities.
- In other logics, such as separation logic, it is written
and called entailment, as a lifting of metalogic implication - In TLA
, where the two objects are temporal formulae representing state machines, it is written and called refinement - When the two objects are programs, it is called program approximation, with the symmetric version called behavioural equivalence. In the context of compiler correctness, it is also a simple form of forward (or backward, depending on who you ask) simulation
- When the two objects are languages, it is called inclusion and written
- When the two objects are specifications, the relation is written
and called subsumption - When the two objects are types,
is instead called subtyping
Entailment intuitively means that the models or behaviours of
We started with this and it worked great for interactive proofs in Rocq for a while.
Delimited continuations and the bubble-up semantics
However, there were problems with shift and the bubble-up semantics. We had shift results and no way to relate them.
where sequencing is defined as
Associativity broke. Rewriting wouldn't work because proper instances
Many indexed non-solutions
gentails, for "generalised entails". It also autocorrects poorly.
A similar problem in the lambda calculus
It turns out that this issue shows up in the lambda calculus.
Consider how to prove that these two expressions are equivalent.
According to the definition above, given that we know how
Suppose we have a big-step semantics
In other words, no more evaluation takes place.
If we try this for the two programs above, we end up with distinct syntactic objects that are not equal, though we can so clearly see that they behave the same.
The way I originally described it to people (with the view that staged logic was a logic) was that we had "syntax in the model". Which led to a rabbit hole of looking into denotational approaches to interpret that syntax with.
However, staged logic is equally viewed as an abstract programming language, and though it seems obvious in hindsight, the problem we had with bubbles is exactly the one described here with the lambda calculus.
Contextual equivalence
The solution is a stronger notion of equivalence: contextual equivalence (with the asymmetric version being called contextual refinement).
It seems customary to call it the "gold standard" for a notion of equivalence, so I will repeat that here.
Why does it solve the problem? Consider the statement again.
We know that these expressions are equivalent informally because no matter the argument we supply to each, we'll get the same result. We might even have justified this to ourselves by mentally evaluating both sides with some symbolic value
In other words, if we only had some way to continue evaluating these expressions, we could easily tell that they were equivalent. The problem is that we can't do this, because they are already values - the big-step semantics has nothing more to say about them.
If only we had a way to describe hypothetical, future inputs to those programs...
Well, that's what contextual refinement offers. It quantifies over all contexts, so for all that don't get stuck and produce derivations, we are able to continue running these programs. In other words, the context gives us a way to talk about how each lambda will eventually be used.
How would we actually prove this particular equivalence? Direct induction over contexts is known to be "notoriously difficult". The two tenable proof methods are bisimulations and logical relations. I am more familiar with the latter, as it is much more commonly used in mechanised settings.
The idea there is to strengthen the definition of equivalence, so it more precisely says when two expressions are equivalent. To cope with advanced language features, the definition will have to be inductive over the structure of types and/or the number of evaluation steps the program takes.
It should also come as no surprise that the Proper instances we agonised over for so long are just simple corollaries of contextual equivalence. Hence this relation can be used for rewriting.
With that, we seem to be finally able to formalise staged logic.
Conclusion
Discovering all this was a long and agonising journey, as the connections between these topics are typically discussed in very disparate settings. I still don't know of any source which connects them like this, so I just wanted to put all this in words, so the next person who needs this connection not suffer a similar fate.
I'm grateful to the authors of a paper which pointed this out particularly clearly: Program equivalence in an untyped, call-by-value functional language with uncurried functions. Moreover, it is for an untyped setting, which is a relatively interesting and unexplored one to apply logical relations in.