Proving symmetry of serialization of nats · Beginner Questions

I'm making progress on this, but I thought it would be a good example to analyze.

I am trying to show that serializing a number to bytes and then deserializing it results in the original number. Here are the function definitions:

fun digit_list :: "nat ⇒ nat list ⇒ nat list" where
"digit_list 0 s = s" |
"digit_list n s = digit_list (n div 10) (n mod 10 # s)"

fun string_of_nat :: "nat ⇒ string" where
"string_of_nat n = map char_of (digit_list n '''')"

fun multiply_digit :: "nat ⇒ nat ⇒ nat" where
"multiply_digit a digit = a * 10 + digit"

fun nat_of_string :: "string ⇒ nat" where
"nat_of_string s = foldl multiply_digit 0 (map of_char s)"

string_of_nat produces a string given a nat, i.e. value string_of_nat 123 = ''123'', and
nat_of_string produces a nat given a string, i.e. value nat_of_string ''123'' = 123.

I know there's a show implementation for nat, but I couldn't find the inverse, so I just implemented them both. I figure it's good practice.

theorem
  shows "nat_of_string (string_of_nat n) = n"
proof (induction n)
  case 0
  show ?case by auto
next
  case (Suc n)
  then show ?case
    apply(auto simp add: foldl_map)
qed

I added the foldl_map simplification rule as an attempt to leverage the fact that there are already simplification rules for of_char (char_of c), and this did simplify the proof goals. Here's the goal that I'm left with (in the Suc n case):

foldl (λa x. a * 10 + x mod 256) 0 (digit_list (Suc n div 10) [Suc n mod 10]) = Suc n

I imagine that I need an intermediate lemma to aid in further simplification, but I've been unable to think of any that result in a discharged goal. I think one complication is that digit_list is guaranteed to create a list of nats that are each between 0-9, since the idea is to serialize individual digits. I could prove a lemma related to that, but I don't see how that will actually help in simplification.

Manuel Eberl (Nov 25 2021 at 09:58):

Oof, you've made your life pretty difficult there. Basically, the more additional parameters (such as accumulators) you have, the more difficult the induction proofs will be because you have to generalise.

Manuel Eberl (Nov 25 2021 at 09:59):

If you do a straightforward induction on that statement, you will run into trouble, because in the statement you call digit_list n [], but in every recursion step of digit_list, it modifies the accumulator, so you won't be able to apply the induction hypothesis.

Manuel Eberl (Nov 25 2021 at 09:59):

So the first thing you need is a way to get rid of the accumulator, i.e. relate the behaviour of digit_list n acc to the behaviour of digit_list n []

Manuel Eberl (Nov 25 2021 at 09:59):

lemma digit_list_to_Nil: "digit_list n acc = digit_list n [] @ acc"

Manuel Eberl (Nov 25 2021 at 10:00):

You will also need to prove map (of_char ∘ char_of) (digit_list n []) = digit_list n [], which only holds because all the numbers in digit_list are less than 10 (in particular less than 256).

Manuel Eberl (Nov 25 2021 at 10:00):

lemma set_digit_list: "set (digit_list xs acc) ⊆ {0..9} ∪ set acc"
  by (induction xs acc rule: digit_list.induct) auto
[...]
  have "map (of_char ∘ char_of) (digit_list n []) = digit_list n []"
    by (intro map_idI) (use set_digit_list[of n "[]"] in auto)

Manuel Eberl (Nov 25 2021 at 10:02):

With all of that, you should be able to prove foldl multiply_digit 0 (digit_list n []) = n via induction on n. But you're going to have to use an induction rule like less_induct to make it work, or a prove a custom one that allows you to go from n to n div 10.

Manuel Eberl (Nov 25 2021 at 10:04):

lemma nat_div10_induct[case_names 0 div10]:
  assumes "P (0 :: nat)" "⋀n. n > 0 ⟹ P (n div 10) ⟹ P n"
  shows   "P n"
  using assms
  apply induction_schema
    apply force
   apply (rule wf_measure[of id])
  apply auto
  done

Manuel Eberl (Nov 25 2021 at 10:04):

So this is how to solve your problem. However, I would have started out quite a bit differently: keep your functions as simple as possible. Don't introduce accumulators if you can avoid it.

Manuel Eberl (Nov 25 2021 at 10:07):

You could have defined digit_list so that it returns the reverse list of digits instead:

fun rev_digit_list :: "nat ⇒ nat list" where
  "rev_digit_list n = (if n = 0 then [] else n mod 10 # rev_digit_list (n div 10))"

fun nat_of_rev_digit_list :: "nat list ⇒ nat" where
  "nat_of_rev_digit_list [] = 0"
| "nat_of_rev_digit_list (x # xs) = x + 10 * nat_of_rev_digit_list xs"

It should be very easy to show that these two commute, and then you can define your nat_of_string etc. in terms of these just by reversing. And you can still show correspondence to your more complicated (and presumably more efficient, although it probably hardly matters in practice) definitions above afterwards.

Manuel Eberl (Nov 25 2021 at 10:08):

So TLDR: Always keep your definitions as simple as possible. It will make proofs easier. Any additional optimisations can still be introduced afterwards in a modular fashion.

Stream: Beginner Questions

Topic: Proving symmetry of serialization of nats

Alex Weisberger (Nov 24 2021 at 01:18):

Manuel Eberl (Nov 25 2021 at 09:58):

Manuel Eberl (Nov 25 2021 at 09:59):

Manuel Eberl (Nov 25 2021 at 09:59):

Manuel Eberl (Nov 25 2021 at 09:59):

Manuel Eberl (Nov 25 2021 at 10:00):

Manuel Eberl (Nov 25 2021 at 10:00):

Manuel Eberl (Nov 25 2021 at 10:02):

Manuel Eberl (Nov 25 2021 at 10:04):

Manuel Eberl (Nov 25 2021 at 10:04):

Manuel Eberl (Nov 25 2021 at 10:07):

Manuel Eberl (Nov 25 2021 at 10:08):