Reasoning about Caml Functions
Tags: formalmethods caml coq proofsThe computer science department at the University of Minnesota has recently introduced a new course titled “Advanced Programming Principles”. Generally, the course is focused on teaching students how to use modern programming languages (i.e., typed, typesafe, functional) in a productive and principled way.
A staple of functional programming is recursion. Recursive functions are ones that are defined in terms of themselves. Typically, recursive functions map arguments to a set of base cases that relate arguments to initial values (values not relying on selfreference) and a set of recursive cases that relate arguments to selfreferencing expressions. The canonical example of a recursive function is the one used to computed the Fibonacci sequence.
Reasoning about recursive functions can often be done using induction. Inductive reasoning requires that one show a property holds in all base cases and all recursive cases. The latter arguments may rely on a hypothesis that the property holds for all function applications on “smaller” inputs. The notion of what “smaller” is can be complicated, however in recursive functions acting on finite data structures, “smaller” usually refers to some measure of input data structure size. Referring back to Fibonacci example, recursive function applications in a recursive definition of the Fibonacci sequence always act on arguments less than the defined input arguments. Therefore, the notion of “smaller” could be $<$ on natural numbers for such definitions of the Fibonacci sequence.
In what follows, I will inductively reason about a property of a recursive function written in Caml. Many similar exercises can be found elsewhere on the internet, in academic papers, and textbooks. What may be interesting to some is that I follow informal reasoning with formal reasoning done in Coq using a shallow embedding. Shallow embeddings rely heavily on the meaning of the target language semantics (i.e., the language we are embedded an object language in, in this case it is Coq) to capture the meaning of the object langangue (i.e., the language under study, in this case Caml) where deep embeddings rely heavily on a definition of the object language semantics within the target language.
Order Preserving List Insertion
Two functions will be considered for this exercise: a boolean function that returns true when an input list of integers is in ascending order and a function that places an integer in the list and yields a new list.
let rec sorted l = match l with
 [ ] > true
 x::[] > true
 x1::x2::xs > x1 <= x2 && sorted (x2::xs)
let rec place e l = match l with
 [ ] > [e]
 x::xs > if e < x then e::x::xs else x :: (place e xs)
I would like to show that for all sorted integer lists $xs$ and integers $x$ that the evaluation result of applying place to $x$ and $xs$ is a sorted list. Showing this property requires some understanding of what it means to evaluate Caml expressions. I will not go into this here, but one may look here for some intuition about Caml evalution semantics.
Informal Proof
I would like to show that for all possible values of l : int
list
and e : int
, if sorted
l
evaluates to true
then
sorted (place el)
evaluates to
true
. Notice here I am confusing the written words
and fragments of Ocaml. My intention is to make reading this informal
proof more intuitive.
The obvious way to try to prove this problem is by induction on the
size of l
. When considering a nonempty
l
, the case when e>=m
must be
considered and it must be shown that sorted (m::(place e n))
where
l = m::n
. There are two issues here:

the head of the list resulting from evaluating
place e n
must be extracted due to the third pattern inmatch
, this value may be greater, equal, or less thane
and will require further case analysis 
sorted (place e n)
will be the same size asl
so the induction hypothesis cannot be applied. One might unfold the definition again, consider both case of issue 1, and then the inductive hypothesis can be applied. However, a variant of issue 1 will remain.
The proof that will be presented avoids these issues through a lemma.
Lemma: for all n : int list
, m : int
, and e : int
if sorted (m::n)
evaluates to true
and e>=m
then sorted (m::(place e n)
evaluates to true
.
Let n
be an arbitrary integer list such that
sorted (m::n)
evaluates to
true
. Let e
and
m
be arbitrary integers such that
e>=m
. It then must be shown that sorted
(m::(place e n))
evaluates to
true
. We will show this by induction on the size
of n
.
Case analysis of data structures of type int list
yields two case; one where n
is empty and one
where n
is not. In the case where
n
is empty it must be shown sorted (m::(place e
[]))
evaluates to true
. The
expression place e []
may be replaced with
[e]
by evaluation. Therefore, it must be shown
that sorted (m::e::[])
evaluates to to
true
. This expression may be replace with m<=e &&
sorted [e]
, then m<=e && true
by evaluation. Therefore, it must be shown that m<=e &&
true
evaluates to true
which
evaluation and assumptions yield.
In the case where n
is not empty we must show
sorted (m::(place e n))
evaluates to
true
where n
is equal to
o::p
, o
is an integer and
p
is an integer list. Evaluation and case analysis
of place
yields that place e n
may be replace with either e::o::p
assuming
e<o
or o::(place e p)
assuming
e>=o
.
When replacing with e::o::p
and assuming
e<o
it must be shown that sorted
(m::e::o::p)
evaluates to
true
. The expression may be replaced with m<=e &&
e<=o && sorted (o::p)
by evaluation. Evaluation,
e>=m
, and e<o
yields it
suffices to show true && sorted (o::p)
evaluates
to true
. Evaluation of the assumption sorted
(m::o::p)
yields m<=o && sorted
(o::p)
evaluates to true
which
yields true && sorted (o::p)
evaluates to
true
immediately.
When replacing with o::(place e p)
and assuming
e>=o
it must be shown that sorted (m::o::(place e
p))
evaluates to true
. It
suffices to show m<=o && sorted (o::(place e p))
evaluates to true
. The inductive hypothesis, for
all l : int list
, m : int
, and
e : int
if sorted (m::l)
evaluates to true
and e>=m
and
l
is smaller than n
then
sorted (m::(place e l)
evaluates to
true
, can be used to show that sorted (o::(place
e p))
evaluates to true
;
p
is shorter than n
, sorted
(o::p)
evaluates to true
by
evaluation of the assumption sorted (m::o::p)
, and
e>=o
by assumption.
■
Theorem: for l : int list
and e : int
, if sorted l
evaluates to true
then sorted (place e l)
evaluates to true
.
Let l
be an aribitrary integer list such that
sorted l
evaluates to
true
. Let e
be an arbitrary
integer. It then must be shown that sorted (place e
l)
evaluates to true
. This will
be shown by case analysis on the forms of integer lists.
In the case where l
is empty it must be shown
sorted (place e [])
evaluates to
true
. The expression place e
[]
may be replaced with [e]
by
evaluation. Therefore, it must be shown that sorted
[e]
evaluates to to true
. This
expression immediately evaluates to true
.
In the case where l
is not empty it must be shown
that sorted (place e (m::n))
evaluates to
true
where l = m::n
. Analysis
of place
yields that we must consider two
subcases; when e<m
or
e>=m
. When e<m
, place e
(m::n)
evaluates to e::m::n
and
it must be shown that sorted e::m::n
evaluates to
true
. This expression may be replace with e<m &&
sorted m::n
by evaluation and this evaluates to
true
by our assumptions. When
e>=m
, place e (m::n)
evaluates
to m::(place e n)
and it must be shown that
sorted (m::(place e n)
evaluates to
true
which may be proved using the lemma above.
Formal Proof
The property we are trying to prove is simple. However, tracking assumptions, cases, and the constraints they place on quantified terms is error prone when done informally. Furthermore, much of the informal proof relies on evaluating Caml terms; something that is best done by computers. Therefore, we will use the Coq to verify that our informal proofs is indeed a correct proof. However, we will use the structure of the informal proof when constructing this formal proof in Coq. Specifically, we will see that the informal lemma proved above will be useful in our formal proof.
Coq is a proof assistant built on the Calculus of Construction. Practically speaking, it provides a dependently typed functional programming language. Due to the CurryHoward correspondence typed languages can be interpreted as proof systems where types in the language are formulas and terms in the language are proofs. What follows assumes a familiarity with Coq that can be gained through the many Coq tutorials on the internet. The Coq code is intermixed with discussion. However, to use the following code, it must be entered into Coq in the order it appears.
Using proof assistants can be tedious because fundamental theorems must be proved before they can be used. For instance, all basic mathematical properties used in the informal proof (e.g., inequalities on natural numbers) must be proved before they can be utilized. Fortunately, Coq provides many theories that can be used as libraries.
Require Import Coq.Bool.Bool.
Require Import Coq.Arith.EqNat.
Require Import Coq.Lists.List.
Definitions in terms of booleans are not given for the less than and less than or equal to relations.
Fixpoint blt_nat (m n : nat) : bool :=
match m with
 O => match n with
 O => false
 _ => true
end
 S m => match n with
 S n => blt_nat m n
 _ => false
end
end.
Definition ble_nat (m n: nat) := orb (blt_nat m n) (beq_nat m n).
The first definition gives the meaning for less than on natural numbers. This syntax should look fairly similar to Caml syntax. The definition is inductive:
 Zero is less than every natural number with exception to itself.
 A natural number is less than another if it’s predecessor (the result of subtracting one) is less than the predecessor of the other.
With these definitions,
trichotomy
can now be established for inequalities on the natural numbers. This
result will be crucial when reasoning about how placement of an
element in list will impact it’s sortedness. Trichotomy is proved
through use of the lemma.
Lemma sym_blt_false_beq_true:
forall (m n:nat),
blt_nat m n = false >
blt_nat n m = false >
beq_nat m n = true.
intros m.
induction m.
intros.
destruct n.
auto.
simpl in H.
inversion H.
intros.
simpl in *.
destruct n.
simpl in H0.
inversion H0.
simpl in H0.
apply (IHm n H H0).
Qed.
Theorem trichotomy_nat:
forall (m n:nat),
blt_nat m n = true \/
beq_nat m n = true \/
blt_nat n m = true.
intros.
destruct (blt_nat m n) eqn:fst.
left.
auto.
right.
destruct (blt_nat n m) eqn:thd.
right.
auto.
left.
eapply (sym_blt_false_beq_true m n fst thd).
Qed.
From trichotomy it may be proved that the less than relation implies the less than or equal to relation.
Lemma blt_imp_ble :
forall (m n:nat),
blt_nat m n = false > ble_nat n m = true.
intros.
unfold ble_nat.
remember (trichotomy_nat n m).
case o.
intros.
rewrite H0.
auto.
intros.
case H0.
intros.
rewrite H1.
rewrite orb_comm.
auto.
intros.
rewrite H1 in H.
inversion H.
Qed.
At this point there is enough of a theory to make an attempt on the
final result. Therefore,a shallow embedding of the and
definitions are given in Coq. These definitions rely on the
built in polymorphic list type in Coq. Ignoring syntactic differences,
their definitions are very similar to the ones given in
Caml. Furthermore, due to the semantics of Coq the results in this
write up will assume that the Caml and Coq implementations of these
functions are semantically equivalent (something that should actually
be proved).
Fixpoint place (e : nat) (l : list nat) :=
match l with
 nil => e :: nil
 x :: xs => if blt_nat e x
then e :: l
else x :: (place e xs)
end.
Fixpoint sorted l :=
match l with
 nil => true
 x1 :: l' =>
match l' with
 nil => true
 x2 :: xs =>
andb (ble_nat x1 x2) (sorted l')
end
end.
Now it can be proved that the tail of a sorted list is sorted. In the informal proof, it was easy to see that by evaluation the tail of a sorted list is sorted. In Coq, we must do a bit more work therefore, it is captured in a lemma.
Lemma sorted_is_recursive:
forall (a:nat) (l:list nat),
sorted (cons a l) = true > sorted l = true.
intros.
simpl in H.
destruct l eqn:Hl.
auto.
apply andb_true_iff in H.
decompose [and] H.
auto.
Qed.
Now we may prove the lemma we used in the informal argument.
Lemma lemma : forall (l:list nat) (a e:nat),
sorted (a :: l) = true >
ble_nat a e = true >
sorted (a :: place e l) = true.
induction l.
intros.
simpl.
rewrite H0.
auto.
intros.
simpl.
destruct (blt_nat e a) eqn:test.
rewrite H0.
simpl.
unfold ble_nat.
rewrite test.
simpl.
simpl in H.
apply andb_true_iff in H.
decompose [and] H.
auto.
assert (H':=H).
apply sorted_is_recursive in H.
apply blt_imp_ble in test.
assert (IR:=IHl a e H test).
rewrite IR.
simpl in H'.
apply andb_true_iff in H'.
decompose [and] H'.
rewrite H1.
auto.
Qed.
As it was in the informal proof, this lemma yields the theorem.
Theorem theorem :
forall (l:list nat) (e : nat),
sorted l = true > sorted (place e l) = true.
intros.
destruct l.
auto.
simpl.
destruct (blt_nat e n) eqn:elta.
simpl.
unfold ble_nat.
rewrite elta.
simpl.
simpl in H.
auto.
apply blt_imp_ble in elta.
remember (lemma l n e H elta).
auto.
Qed.