Stream: Beginner Questions

Topic: Calculation proofs have unintuitive behavior


view this post on Zulip isauser (Feb 23 2026 at 22:17):

I've noticed that calculation proofs sometimes behave in weird ways, failing with cryptic errors (which btw aren't listed in the reference manual). Some examples:

The lemma below fails with "Vacuous calculation result":

theory Repro
  imports Main "HOL-Data_Structures.Sorting"
begin

fun partition3 :: "('a::linorder) ⇒ 'a list ⇒ 'a list × 'a list × 'a list" where
"partition3 x xs = (
  filter (λy. y < x ) xs,
  filter (λy. y = x ) xs,
  filter (λy. y > x ) xs)"

fun quicksort3 :: "('a::linorder) list ⇒ 'a list" where
"quicksort3 [] = []" |
"quicksort3 (x # xs) = (
  let (ls, es, gs) = partition3 x xs in
  quicksort3 ls @ x # es @ quicksort3 gs)"

lemma "sorted (quicksort3 xs)"
proof(induction xs rule: quicksort3.induct, simp)
  case (2 x xs)
  obtain ls es gs where parts: ‹(ls, es, gs) = partition3 x xs› by auto
  have ls_smallest: ‹∀a ∈ set (quicksort3 ls). ∀b ∈ set (x # es @ quicksort3 gs). a ≤ b› sorry
  have ‹sorted (quicksort3 (x # xs)) = sorted (quicksort3 ls @ x # es @ quicksort3 gs)›
    using parts by simp

  (* Comment out this line, and the errors disappear: *)
  also have ‹… = sorted ((quicksort3 ls) @ (x # es @ quicksort3 gs))› by simp

  also (* calculation: sorted (quicksort3 (x # xs)) = sorted (quicksort3 (x # xs)) *)
  have ‹… = (sorted (quicksort3 ls) ∧ sorted (x # es @ quicksort3 gs)
        ∧ (∀a ∈ set (quicksort3 ls). ∀b ∈ set (x # es @ quicksort3 gs). a ≤ b))›
    by (simp add: sorted_append)
  also (* Error: Vacuous calculation result: *)
  have ‹… = (sorted (quicksort3 ls) ∧ sorted (x # es @ quicksort3 gs))›
    using ls_smallest by simp
  show ?case sorry
qed

end

It seems that the line also have ‹… = sorted ((quicksort3 ls) @ (x # es @ quicksort3 gs))› breaks the calculation proof. This step is redundant, but Isabelle seems to accept redundant steps in other situations:

notepad begin
  fix P1 P2 P3 P4 Q :: bool
  assume assms: P1 ‹P1 = P2› ‹P2 = P3› ‹P3 = P4›
  have ‹P1 = P2› using assms by simp
  also have ‹… = P3› using assms by simp
  also have ‹… = P3› by simp (* Redudant step *)
  also have ‹… = P4› using assms by simp
end

In the next example, Isabelle accepts one calculation step, but not another identical one:

notepad begin
  fix P1 P2 P3 P4 Q :: bool
  assume assms: P1 ‹P1 = P2› ‹P2 = P3› ‹P3 = P4› Q
  have ‹P1 ∧ Q = P2 ∧ Q› using assms by simp
  also have ‹… = P3 ∧ Q› using assms by simp
  also have ‹… = P4 ∧ Q› using assms by simp (* No matching trans rules for calculation *)
end

Are these known bugs that will be fixed in a future release? Or, if this behavior is expected, is there some other way to write transitive chain calculations that don't have these weird quirks? Namely, if I can prove:

have 1: ‹A = B› sorry
have 2: ‹B = C› sorry
have ‹A = C› using 1 2 by simp

Then I should be also able to write it like this (regardless of what A, B and C look like, for any chain length):

have ‹A = B› sorry
also have ‹… = C› sorry
finally have ‹A = C› by simp

view this post on Zulip Alex Mobius (Feb 23 2026 at 22:41):

In your first example, what essentially happens is Isabelle sees two statements of the form "A = B" and "B = B" and tries to apply a transitivity rule. There are two transitivity rules for equality, and you may either get "A = B" (so your step was redundant) or "A = A" (this is what happens in your first example; thus you get "(* calculation: sorted (quicksort3 (x # xs)) = sorted (quicksort3 (x # xs)) *)", a useless calculation rule that cannot be applied to the rest of the proof). So redundant steps may either work or break, because Isabelle tries to smartly select the correct transitivity rule and there's not enough information; however, it's fine to just delete them.

Concerning your next example with "P1 /\ Q = P2 /\ Q", it does not say what you think it says, because = binds tighter than logical operators (you can use <--> as an equality that binds less tightly). So your first step is "P1 & ((Q = P2) & Q)" (which is true because all of these are True), your second step is RHS of the first step equal to P3, and Q; so, (((Q = P2) & Q) = P3) & Q". That is also of course true because all of the variables are True. Then on the next line the transitive step ("also") fails because the first line wasn't an equality, and the second line also isn't an equality. What's a little counterintuitive is that it doesn't have anything to do with what comes after "also", the "also" itself fails.

Answering your final question, "simp" is not deterministic and has an upper bound on the complexity of combinations it tries, while a calculation chain can get in theory to any length without impacting the easibility of the proof. In your second snippet, btw, it is better to write finally have ‹A = C› . with a dot instead of by simp, because the result is just restating the calculation, there's no further simplification needed.

view this post on Zulip Alex Mobius (Feb 23 2026 at 22:43):

I would suggest not despairing and sticking with calculation proofs; they do take a bit of getting used to, but this is often the nicest way to perform e.g. assumption or goal rewrites by equivalencies, not to mention the ordered statements.

view this post on Zulip isauser (Feb 23 2026 at 23:45):

Thank you, this is very helpful! I still don't really get why the first example fails though. You say that there are two transitivity rules for equality, but in the print_trans_rules output I only see one transitivity rule ⟦?r = ?s; ?s = ?t⟧ ⟹ ?r = ?t and one symmetry rule ?s = ?t ⟹ ?t = ?s that have an equality in both the premise and the conclusion. I don't see how you can derive B = B from the first rule though. And in any case this is very counter-intuitive IMO, as it breaks the expected chain of calculations from left to right.

view this post on Zulip Alex Mobius (Feb 23 2026 at 23:49):

Oh, rather there are two possible orders of substitution. When using equality with calculation it performs a substitution in the other term, so rules are of the form ?s = ?t ==> ?P ?t ==> ?P ?s and `?P ?s ==> ?s = ?t ==> ?P ?t". Which means you either substitute B -> B in A = B, or substitute A <- B in B = B. At least that is my understanding.

view this post on Zulip isauser (Feb 24 2026 at 00:58):

This is starting to look like a bug. According to the manual, the second and further also should be equivalent to note calculation = trans [OF calculation this]. And this seems to work until you write down a redundant calculation step - then for some reason it breaks:

notepad begin
  fix P Q R S T :: bool
  assume ‹P = Q› ‹Q = R› ‹R = S› ‹S = T›
  have ‹P = Q› using ‹P = Q ›by simp
  print_facts

  (* First `also`: calculation = this = (P = Q)*)
  also have ‹… = R› using ‹Q = R› by simp
  print_facts
  thm trans[OF calculation this]

  (* Next `also`: calculation = thm trans[OF calculation this] = (P = R)*)
  also have ‹… = S› using ‹R = S› by simp
  print_facts
  thm trans[OF calculation this]

  (* Next `also`: calculation = thm trans[OF calculation this] = (P = S)*)
  also have ‹… = S› using ‹R = S› by simp
  print_facts
  thm trans[OF calculation this]

  (* Next `also`: calculation should equal: thm trans[OF calculation this] = (P = S), but actually equals: P = P *)
  also have ‹… = T› using ‹S = T› by simp
  print_facts
  thm trans[OF calculation this]

  finally have ‹P = T› .
end

view this post on Zulip isauser (Feb 24 2026 at 01:20):

If I apply the rule trans explicitly, I get a different error:

also(trans) have ‹… = T› using ‹S = T› by simp (* Vacuous calculation result: P = S *)
print_facts
thm trans[OF calculation this]

It seems you were right, and Isabelle picked a different trans rule that resulted in calculation := P = P (though I still have no idea how to debug this and find out which one exactly). Perhaps it initially tried to apply trans, saw it fail, and that's why it went with something else. But the "vacuous calculation" error still doesn't explain what's wrong.

view this post on Zulip Alex Mobius (Feb 24 2026 at 01:27):

FWIW in here:

notepad begin
  fix P1 P2 P3 P4 Q :: bool
  assume assms: P1 ‹P1 = P2› ‹P2 = P3› ‹P3 = P4›
  have ‹P1 = P2› using assms by simp
  also have ‹… = P3› using assms by simp
  also have ‹… = P3› by simp (* Redudant step *)
  also have ‹… = P4› using assms by simp
end

you have the same behaviour, you're just one line away from triggering the error :)

looks like also has a wider set of rules than just trans, indeed - the manual does say the translation it provides are approximate. I however really don't get why it's important what behvaiour we get here... garbage in, garbage out, just don't write P=P in your proofs :)

view this post on Zulip Alex Mobius (Feb 24 2026 at 01:28):

The vacuous calculation error just means that it reproduced a result that was already in print_facts somewhere. That's usually not what you want, so it's an error.

view this post on Zulip isauser (Feb 24 2026 at 01:30):

It's not a critical issue - it just creates friction. Maybe I just want to parenthesize certain terms for clarity - it shouldn't break the program IMO.

view this post on Zulip Manuel Eberl (Feb 24 2026 at 12:48):

There are a number of issues with the calculation feature in Isar, in my opinion. It is a very versatile tool, but also quite brittle as a result of that. In any case, changing what it does at this point is basically impossible because it would break lots of existing proofs.


Last updated: Mar 04 2026 at 20:34 UTC