Reasoning about Caml Functions

The computer science department at the University of Minnesota has recently introduced a new course titled “Advanced Programming Principles”. Generally, the course is focused on teaching students how to use modern programming languages (i.e., typed, type-safe, functional) in a productive and principled way.

A staple of functional programming is recursion. Recursive functions are ones that are defined in terms of themselves. Typically, recursive functions map arguments to a set of base cases that relate arguments to initial values (values not relying on self-reference) and a set of recursive cases that relate arguments to self-referencing expressions. The canonical example of a recursive function is the one used to computed the Fibonacci sequence.

Reasoning about recursive functions can often be done using induction. Inductive reasoning requires that one show a property holds in all base cases and all recursive cases. The latter arguments may rely on a hypothesis that the property holds for all function applications on “smaller” inputs. The notion of what “smaller” is can be complicated, however in recursive functions acting on finite data structures, “smaller” usually refers to some measure of input data structure size. Referring back to Fibonacci example, recursive function applications in a recursive definition of the Fibonacci sequence always act on arguments less than the defined input arguments. Therefore, the notion of “smaller” could be $<$ on natural numbers for such definitions of the Fibonacci sequence.

In what follows, I will inductively reason about a property of a recursive function written in Caml. Many similar exercises can be found elsewhere on the internet, in academic papers, and textbooks. What may be interesting to some is that I follow informal reasoning with formal reasoning done in Coq using a shallow embedding. Shallow embeddings rely heavily on the meaning of the target language semantics (i.e., the language we are embedded an object language in, in this case it is Coq) to capture the meaning of the object langangue (i.e., the language under study, in this case Caml) where deep embeddings rely heavily on a definition of the object language semantics within the target language.

Order Preserving List Insertion

Two functions will be considered for this exercise: a boolean function that returns true when an input list of integers is in ascending order and a function that places an integer in the list and yields a new list.

  let rec sorted l = match l with
  | [ ] -> true
  | x::[] -> true
  | x1::x2::xs -> x1 <= x2 && sorted (x2::xs)
  let rec place e l = match l with
  | [ ] -> [e]
  | x::xs -> if e < x then e::x::xs else x :: (place e xs)

I would like to show that for all sorted integer lists $xs$ and integers $x$ that the evaluation result of applying place to $x$ and $xs$ is a sorted list. Showing this property requires some understanding of what it means to evaluate Caml expressions. I will not go into this here, but one may look here for some intuition about Caml evalution semantics.

Informal Proof

I would like to show that for all possible values of l : int list and e : int, if sorted l evaluates to true then sorted (place el) evaluates to true. Notice here I am confusing the written words and fragments of Ocaml. My intention is to make reading this informal proof more intuitive.

The obvious way to try to prove this problem is by induction on the size of l . When considering a non-empty l , the case when e>=m must be considered and it must be shown that sorted (m::(place e n)) where l = m::n. There are two issues here:

  1. the head of the list resulting from evaluating place e n must be extracted due to the third pattern in match , this value may be greater, equal, or less than e and will require further case analysis

  2. sorted (place e n) will be the same size as l so the induction hypothesis cannot be applied. One might unfold the definition again, consider both case of issue 1, and then the inductive hypothesis can be applied. However, a variant of issue 1 will remain.

The proof that will be presented avoids these issues through a lemma.

Lemma: for all n : int list, m : int, and e : int if sorted (m::n) evaluates to true and e>=m then sorted (m::(place e n) evaluates to true.

Let n be an arbitrary integer list such that sorted (m::n) evaluates to true. Let e and m be arbitrary integers such that e>=m. It then must be shown that sorted (m::(place e n)) evaluates to true. We will show this by induction on the size of n.

Case analysis of data structures of type int list yields two case; one where n is empty and one where n is not. In the case where n is empty it must be shown sorted (m::(place e [])) evaluates to true. The expression place e [] may be replaced with [e] by evaluation. Therefore, it must be shown that sorted (m::e::[]) evaluates to to true. This expression may be replace with m<=e && sorted [e], then m<=e && true by evaluation. Therefore, it must be shown that m<=e && true evaluates to true which evaluation and assumptions yield.

In the case where n is not empty we must show sorted (m::(place e n)) evaluates to true where n is equal to o::p, o is an integer and p is an integer list. Evaluation and case analysis of place yields that place e n may be replace with either e::o::p assuming e<o or o::(place e p) assuming e>=o.

When replacing with e::o::p and assuming e<o it must be shown that sorted (m::e::o::p) evaluates to true. The expression may be replaced with m<=e && e<=o && sorted (o::p) by evaluation. Evaluation, e>=m, and e<o yields it suffices to show true && sorted (o::p) evaluates to true. Evaluation of the assumption sorted (m::o::p) yields m<=o && sorted (o::p) evaluates to true which yields true && sorted (o::p) evaluates to true immediately.

When replacing with o::(place e p) and assuming e>=o it must be shown that sorted (m::o::(place e p)) evaluates to true. It suffices to show m<=o && sorted (o::(place e p)) evaluates to true. The inductive hypothesis, for all l : int list, m : int, and e : int if sorted (m::l) evaluates to true and e>=m and l is smaller than n then sorted (m::(place e l) evaluates to true, can be used to show that sorted (o::(place e p)) evaluates to true; p is shorter than n, sorted (o::p) evaluates to true by evaluation of the assumption sorted (m::o::p), and e>=o by assumption.

Theorem: for l : int list and e : int, if sorted l evaluates to true then sorted (place e l) evaluates to true.

Let l be an aribitrary integer list such that sorted l evaluates to true. Let e be an arbitrary integer. It then must be shown that sorted (place e l) evaluates to true. This will be shown by case analysis on the forms of integer lists.

In the case where l is empty it must be shown sorted (place e []) evaluates to true. The expression place e [] may be replaced with [e] by evaluation. Therefore, it must be shown that sorted [e] evaluates to to true. This expression immediately evaluates to true.

In the case where l is not empty it must be shown that sorted (place e (m::n)) evaluates to true where l = m::n. Analysis of place yields that we must consider two sub-cases; when e<m or e>=m. When e<m, place e (m::n) evaluates to e::m::n and it must be shown that sorted e::m::n evaluates to true. This expression may be replace with e<m && sorted m::n by evaluation and this evaluates to true by our assumptions. When e>=m, place e (m::n) evaluates to m::(place e n) and it must be shown that sorted (m::(place e n) evaluates to true which may be proved using the lemma above.

Formal Proof

The property we are trying to prove is simple. However, tracking assumptions, cases, and the constraints they place on quantified terms is error prone when done informally. Furthermore, much of the informal proof relies on evaluating Caml terms; something that is best done by computers. Therefore, we will use the Coq to verify that our informal proofs is indeed a correct proof. However, we will use the structure of the informal proof when constructing this formal proof in Coq. Specifically, we will see that the informal lemma proved above will be useful in our formal proof.

Coq is a proof assistant built on the Calculus of Construction. Practically speaking, it provides a dependently typed functional programming language. Due to the Curry-Howard correspondence typed languages can be interpreted as proof systems where types in the language are formulas and terms in the language are proofs. What follows assumes a familiarity with Coq that can be gained through the many Coq tutorials on the internet. The Coq code is intermixed with discussion. However, to use the following code, it must be entered into Coq in the order it appears.

Using proof assistants can be tedious because fundamental theorems must be proved before they can be used. For instance, all basic mathematical properties used in the informal proof (e.g., inequalities on natural numbers) must be proved before they can be utilized. Fortunately, Coq provides many theories that can be used as libraries.


Require Import Coq.Bool.Bool.
Require Import Coq.Arith.EqNat.
Require Import Coq.Lists.List.

Definitions in terms of booleans are not given for the less than and less than or equal to relations.

Fixpoint blt_nat (m n : nat) : bool :=
  match m with
    | O => match n with
             | O => false
             | _ => true
           end
    | S m => match n with
               | S n => blt_nat m n
               | _ => false
             end
  end.

Definition ble_nat (m n: nat) := orb (blt_nat m n) (beq_nat m n).

The first definition gives the meaning for less than on natural numbers. This syntax should look fairly similar to Caml syntax. The definition is inductive:

  1. Zero is less than every natural number with exception to itself.
  2. A natural number is less than another if it’s predecessor (the result of subtracting one) is less than the predecessor of the other.

With these definitions, trichotomy can now be established for inequalities on the natural numbers. This result will be crucial when reasoning about how placement of an element in list will impact it’s sortedness. Trichotomy is proved through use of the lemma.


Lemma sym_blt_false_beq_true:
    forall (m n:nat),
      blt_nat m n = false ->
      blt_nat n m = false ->
      beq_nat m n = true.
intros m.
induction m.
    intros.
    destruct n.
      auto.
      simpl in H.
      inversion H.
   intros.
   simpl in *.
   destruct n.
     simpl in H0.
     inversion H0.
     simpl in H0.
     apply (IHm n H H0).
Qed.

Theorem trichotomy_nat:
    forall (m n:nat),
      blt_nat m n = true \/
      beq_nat m n = true \/
      blt_nat n m = true.
intros.
destruct (blt_nat m n) eqn:fst.
   left.
   auto.
   right.
   destruct (blt_nat n m) eqn:thd.
     right.
     auto.
     left.
     eapply (sym_blt_false_beq_true m n fst thd).
Qed.

From trichotomy it may be proved that the less than relation implies the less than or equal to relation.


Lemma blt_imp_ble :
    forall (m n:nat),
      blt_nat m n = false -> ble_nat n m = true.
intros.
unfold ble_nat.
remember (trichotomy_nat n m).
case o.
  intros.
  rewrite H0.
  auto.
  intros.
  case H0.
    intros.
    rewrite H1.
    rewrite orb_comm.
    auto.
    intros.
    rewrite H1 in H.
    inversion H.
Qed.

At this point there is enough of a theory to make an attempt on the final result. Therefore,a shallow embedding of the and definitions are given in Coq. These definitions rely on the built in polymorphic list type in Coq. Ignoring syntactic differences, their definitions are very similar to the ones given in Caml. Furthermore, due to the semantics of Coq the results in this write up will assume that the Caml and Coq implementations of these functions are semantically equivalent (something that should actually be proved).

Fixpoint place (e : nat) (l : list nat) :=
  match l with 
    | nil => e :: nil
    | x :: xs => if blt_nat e x 
                   then e :: l  
                   else  x :: (place e xs)
  end.

Fixpoint sorted l :=
match l with
| nil => true
| x1 :: l'  => 
  match l' with
  | nil => true
  | x2 :: xs =>
    andb (ble_nat x1 x2) (sorted l')
  end
end.

Now it can be proved that the tail of a sorted list is sorted. In the informal proof, it was easy to see that by evaluation the tail of a sorted list is sorted. In Coq, we must do a bit more work therefore, it is captured in a lemma.


Lemma sorted_is_recursive:
  forall (a:nat) (l:list nat),
    sorted (cons a l) = true -> sorted l = true.
intros.
simpl in H.
destruct l eqn:Hl.
  auto.
  apply andb_true_iff in H.
  decompose [and] H.
  auto.
Qed.

Now we may prove the lemma we used in the informal argument.

Lemma lemma : forall (l:list nat) (a e:nat), 
sorted (a :: l) = true -> 
ble_nat a e = true -> 
sorted (a :: place e l) = true.
induction l.
  intros.
  simpl.
  rewrite H0.
  auto.
  intros.
  simpl.
  destruct (blt_nat e a) eqn:test.
    rewrite H0.
    simpl.
    unfold ble_nat.
    rewrite test.
    simpl.
    simpl in H.
    apply andb_true_iff in H.
    decompose [and] H.
    auto.
    assert (H':=H).
    apply sorted_is_recursive in H.
    apply blt_imp_ble in test.
    assert (IR:=IHl a e H test).
    rewrite IR.
    simpl in H'.
    apply andb_true_iff in H'.
    decompose [and] H'.
    rewrite  H1.
    auto.
Qed.

As it was in the informal proof, this lemma yields the theorem.

Theorem theorem :
    forall (l:list nat) (e : nat), 
      sorted l = true -> sorted (place e l) = true.
intros.
destruct l.
  auto.
  simpl.
  destruct (blt_nat e n) eqn:elta.
    simpl.
    unfold ble_nat.
    rewrite elta.
    simpl.
    simpl in H.
    auto.
    apply blt_imp_ble in elta.
    remember (lemma l n e H elta).
    auto.
Qed.

Comments