Hi everyone,

I'm making progress on this, but I thought it would be a good example to analyze.

I am trying to show that serializing a number to bytes and then deserializing it results in the original number. Here are the function definitions:

```
fun digit_list :: "nat ⇒ nat list ⇒ nat list" where
"digit_list 0 s = s" |
"digit_list n s = digit_list (n div 10) (n mod 10 # s)"
fun string_of_nat :: "nat ⇒ string" where
"string_of_nat n = map char_of (digit_list n '''')"
fun multiply_digit :: "nat ⇒ nat ⇒ nat" where
"multiply_digit a digit = a * 10 + digit"
fun nat_of_string :: "string ⇒ nat" where
"nat_of_string s = foldl multiply_digit 0 (map of_char s)"
```

`string_of_nat`

produces a string given a `nat`

, i.e. `value string_of_nat 123 = ''123''`

, and

`nat_of_string`

produces a `nat`

given a `string`

, i.e. `value nat_of_string ''123'' = 123`

.

I know there's a `show`

implementation for `nat`

, but I couldn't find the inverse, so I just implemented them both. I figure it's good practice.

The theorem I'm trying to prove is then:

`nat_of_string (string_of_nat n) = n`

,

And here is my attempted proof:

```
theorem
shows "nat_of_string (string_of_nat n) = n"
proof (induction n)
case 0
show ?case by auto
next
case (Suc n)
then show ?case
apply(auto simp add: foldl_map)
qed
```

I added the `foldl_map`

simplification rule as an attempt to leverage the fact that there are already simplification rules for `of_char (char_of c)`

, and this did simplify the proof goals. Here's the goal that I'm left with (in the `Suc n`

case):

`foldl (λa x. a * 10 + x mod 256) 0 (digit_list (Suc n div 10) [Suc n mod 10]) = Suc n`

I imagine that I need an intermediate lemma to aid in further simplification, but I've been unable to think of any that result in a discharged goal. I think one complication is that `digit_list`

is guaranteed to create a list of `nats`

that are each between 0-9, since the idea is to serialize individual digits. I could prove a lemma related to that, but I don't see how that will actually help in simplification.

I appreciate any thoughts on this.

Oof, you've made your life pretty difficult there. Basically, the more additional parameters (such as accumulators) you have, the more difficult the induction proofs will be because you have to generalise.

If you do a straightforward induction on that statement, you will run into trouble, because in the statement you call `digit_list n []`

, but in every recursion step of `digit_list`

, it modifies the accumulator, so you won't be able to apply the induction hypothesis.

So the first thing you need is a way to get rid of the accumulator, i.e. relate the behaviour of `digit_list n acc`

to the behaviour of `digit_list n []`

Something like this:

```
lemma digit_list_to_Nil: "digit_list n acc = digit_list n [] @ acc"
```

You will also need to prove `map (of_char ∘ char_of) (digit_list n []) = digit_list n []`

, which only holds because all the numbers in `digit_list`

are less than 10 (in particular less than 256).

For this, the rule `map_idI`

may be useful, e.g.

```
lemma set_digit_list: "set (digit_list xs acc) ⊆ {0..9} ∪ set acc"
by (induction xs acc rule: digit_list.induct) auto
[...]
have "map (of_char ∘ char_of) (digit_list n []) = digit_list n []"
by (intro map_idI) (use set_digit_list[of n "[]"] in auto)
```

With all of that, you should be able to prove `foldl multiply_digit 0 (digit_list n []) = n`

via induction on `n`

. But you're going to have to use an induction rule like `less_induct`

to make it work, or a prove a custom one that allows you to go from `n`

to `n div 10`

.

Such a rule could e.g. look like this:

```
lemma nat_div10_induct[case_names 0 div10]:
assumes "P (0 :: nat)" "⋀n. n > 0 ⟹ P (n div 10) ⟹ P n"
shows "P n"
using assms
apply induction_schema
apply force
apply (rule wf_measure[of id])
apply auto
done
```

So this is how to solve your problem. However, I would have started out quite a bit differently: keep your functions as simple as possible. Don't introduce accumulators if you can avoid it.

You could have defined `digit_list`

so that it returns the reverse list of digits instead:

```
fun rev_digit_list :: "nat ⇒ nat list" where
"rev_digit_list n = (if n = 0 then [] else n mod 10 # rev_digit_list (n div 10))"
```

and similarly for the inverse of `rev_digit_list`

:

```
fun nat_of_rev_digit_list :: "nat list ⇒ nat" where
"nat_of_rev_digit_list [] = 0"
| "nat_of_rev_digit_list (x # xs) = x + 10 * nat_of_rev_digit_list xs"
```

It should be very easy to show that these two commute, and then you can define your `nat_of_string`

etc. in terms of these just by reversing. And you can still show correspondence to your more complicated (and presumably more efficient, although it probably hardly matters in practice) definitions above afterwards.

So TLDR: Always keep your definitions as simple as possible. It will make proofs easier. Any additional optimisations can still be introduced afterwards in a modular fashion.

This was a really in depth answer, exactly the kind of info I was looking for - many thanks. I like that I can now choose to either avoid my problem by simplifying the definitions, but also have some more advanced techniques to try out as well.

"Avoiding" the problem by avoiding an accumulator argument led to a proof for me (and that feels great, because I've spent quite a few days puzzling over this):

```
fun digit_list :: "nat ⇒ nat list" where
"digit_list 0 = []" |
"digit_list n = n mod 10 # digit_list (n div 10)"
fun string_of_nat :: "nat ⇒ string" where
"string_of_nat n = map char_of (digit_list n)"
fun nat_of_string :: "string ⇒ nat" where
"nat_of_string [] = 0" |
"nat_of_string (d # ds) = (of_char d) + 10 * (nat_of_string ds)"
theorem ser_deser: "nat_of_string (string_of_nat n) = n"
apply(induction n rule: digit_list.induct)
apply(auto)
done
```

I'm not exactly clear why I needed to use `digit_list.induct`

as the induction rule, but plain induction on n didn't work immediately. I'll try and spend some time to understand that more. I saw in the proof goals that the shape of the term was very close to the induction hypothesis, so it was able to be substituted to carry out the proof - that's my main learning from this.

Thanks again!!

Alex Weisberger has marked this topic as resolved.

Why you had to use `digit_list.induct`

as an induction rule is easy to explain: if you unfold the definitions of `nat_of_string`

and `string_of_nat`

, you will see that your goal has as its ‘innermost term’ `digit_list n`

(i.e. `digit_list`

is the first function being applied to `n`

). So if you're doing induction over `n`

, in order to actually do something (i.e. unfold the recursive definition of `digit_list`

, you will have to do your induction in a way that fits the recursion of `digit_list`

. The ‘standard’ induction on `nat`

doesn't help you because it goes from `Suc n`

to `n`

, but you need to go from `n`

to `n div 10`

.

By the way, a few more things:

First, I usually prefer to use `definition`

for non-recursive functions and unfold the definitions manually (by doing `unfolding foo_def`

or `simp add: foo_def`

). The `fun`

command is really mostly useful for recursive functions.

Second, I would define `digit_to_list`

not by pattern matching, because that way you get as a second simp rule `digit_to_list (Suc n) = Suc n mod 10 # digit_to_list (Suc n div 10)`

which is kind of ugly.

You can do this instead:

```
fun digit_list :: "nat ⇒ nat list" where
"digit_list n = (if n = 0 then [] else n mod 10 # digit_list (n div 10))"
lemmas [simp del] = digit_list.simps
lemma digit_list_0 [simp]: "digit_list 0 = []"
and digit_list_rec [simp]: "n > 0 ⟹ digit_list n = n mod 10 # digit_list (n div 10)"
by (subst digit_list.simps; simp; fail)+
```

It's important to delete `digit_list.simps`

from the simpset, because otherwise the simplifier will keep rewriting with that rule forever. Another way to achieve the same thing is to use a not-so-well-known feature of the function package: conditional equations:

```
function digit_list :: "nat ⇒ nat list" where
"digit_list 0 = []"
| "n > 0 ⟹ digit_list n = n mod 10 # digit_list (n div 10)"
by auto
termination by lexicographic_order
```

The only disadvantage of this is that we lose executability because the code generator cannot handle conditional code equations. But we can restore executability by just proving the original definition as a code equation:

```
lemma digit_list_code [code]:
"digit_list n = (if n = 0 then [] else n mod 10 # digit_list (n div 10))"
by simp
```

I appreciate those tips - definitely going to start applying them.

Note that your current implementation of `string_of_nat`

gives you the empty string for `0`

. In case that that's not what you intended. :smile:

Oh that's interesting - luckily, `nat_of_string (string_of_nat 0)`

still returns 0 (I would hope so, or else the proof would be incorrect!).

To actually serialize to the digit `0`

, I can't think of an implementation change to `digit_list`

, but `string_of_nat`

could handle the `0`

case explicitly:

```
definition string_of_nat :: "nat ⇒ string" where
"string_of_nat n = (if n = 0 then [CHR 0x00] else map char_of (digit_list n))
```

This breaks some of the exiting proofs, but I'll play around with fixing them.

Yup, that's what I would have done. You probably don't want `CHR 0x00`

though, that's a 0 byte. The character 0 is `CHR ''0''`

. You can also simply write ''0'' for `[CHR ''0'']`

.

By the way, the reason I was doing this proof was for a blog post about refinement: https://concerningquality.com/refinement/.

I would love to hear any feedback on it. I'm hoping to write more about proving things with Isabelle in the future.

Last updated: Sep 25 2022 at 23:25 UTC