Stream: Mirror: Isabelle Users Mailing List

Topic: [isabelle] [ISSUE] PIDE reports wrong error line and mess...


view this post on Zulip Email Gateway (Apr 02 2026 at 17:50):

From: Bartosz Glowacki <cl-isabelle-users@lists.cam.ac.uk>

Dear Isabelle developers,

I found a difference between isabelle build and the PIDE server (use_theories) outputs when by sorry appears as the final tactic before qed in a calc-terminated proof block:

Minimal reproducer:

theory BugDemo
imports "HOL-Combinatorics.Multiset_Permutations"
begin

text \\<open>
map_index_self: mapping the index function over a distinct list gives [0..<length xs].
\\<close>

lemma map_index_self:
assumes "distinct xs"
shows "map (index xs) xs = [0..<length xs]"
proof -
have "xs = map (\\<lambda>i. xs ! i) [0..<length xs]"
by (simp add: map_nth)
also have "map (index xs) \\<dots> = map id [0..<length xs]"
by (intro map_cong) simp_all
finally show ?thesis
by sorry     (* PIDE reports error here - wrong *)
Qed         (* isabelle build reports error here - correct *)

end

The real error is semantic — qed finds no open goal context — and isabelle build diagnoses it correctly. PIDE reports a spurious syntax error at the wrong line instead. While this is not a serious bug, it may be misleading for the end-user to receive the syntax error instead of the real issue they should focus on.

This was found using a differential fuzzing framework that compares error output between isabelle build and the PIDE server on mutated AFP theories. The mutation by simp -> by sorry triggered this mismatch in 10/10 independent runs. I am happy to share the framework and logs.

This was found as part of my dissertation work on differential testing of the Isabelle platform.

Kindest regards,
Bartosz Glowacki
King's College London

view this post on Zulip Email Gateway (Apr 07 2026 at 12:16):

From: Fabian Huch <huch@in.tum.de>

'by sorry' is syntactically ill-formed, and there is no surprise that an
interactive session reports errors differently from a non-interactive
build because

a) a headless build does not need to keep track of as much markup
information and

b) an interactive session tries to recover from problems (to some
degree), such that one can continue editing in the presence of errors.

Fabian

On 4/2/26 19:05, Bartosz Glowacki (via cl-isabelle-users Mailing List)
wrote:

Dear Isabelle developers,

I found a difference between isabelle build and the PIDE server
(use_theories) outputs when by sorry appears as the final tactic
before qed in a calc-terminated proof block:

Minimal reproducer:

theory BugDemo
  imports "HOL-Combinatorics.Multiset_Permutations"
begin

text \\<open>
  map_index_self: mapping the index function over a distinct list
gives [0..<length xs].
\\<close>

lemma map_index_self:
  assumes "distinct xs"
  shows   "map (index xs) xs = [0..<length xs]"
proof -
  have "xs = map (\\<lambda>i. xs ! i) [0..<length xs]"
    by (simp add: map_nth)
  also have "map (index xs) \\<dots> = map id [0..<length xs]"
    by (intro map_cong) simp_all
  finally show ?thesis
    by sorry     (* PIDE reports error here - wrong *)
Qed         (* isabelle build reports error here - correct *)

end

The real error is semantic — qed finds no open goal context — and
isabelle build diagnoses it correctly. PIDE reports a spurious syntax
error at the wrong line instead. While this is not a serious bug, it
may be misleading for the end-user to receive the syntax error instead
of the real issue they should focus on.

This was found using a differential fuzzing framework that compares
error output between isabelle build and the PIDE server on mutated AFP
theories. The mutation by simp -> by sorry triggered this mismatch
in 10/10 independent runs. I am happy to share the framework and logs.

This was found as part of my dissertation work on differential testing
of the Isabelle platform.

Kindest regards,
Bartosz Glowacki
King's College London

view this post on Zulip Email Gateway (Apr 12 2026 at 19:15):

From: Makarius <makarius@sketis.net>

On 02/04/2026 19:05, Bartosz Glowacki (via cl-isabelle-users Mailing List) wrote:

I found a difference between isabelle build and the PIDE server (use_theories)
outputs when by sorry appears as the final tactic before qed in a calc-
terminated proof block:

First note that "tactic" is not a category of the Isar proof language. The
"by" element is called a "command", and its argument a "proof method", or just
"method". The "sorry" command cannot be given as argument to "by": it will not
work for structural reasons.

Anyway, back to the main points: You have just discovered the tip of the
iceberg of plenty of fundamental differences how batch-build vs. PIDE editing
works. That can be explained from historical reasons over many decades,
because quite different side-conditions apply to these two main modes of
Isabelle (in the past there were > 2 such modes).

In general, batch-compilers vs. IDE environments routinely work quite
differently, so this is to be expected --- even if it is annoying. For
example, I find the Scala plugin of IntelliJ IDEA particularly weak in this
respect: the IDE feedback is only a crude approximation of the real compiler
and its errors.

For Isabelle processing of theory and proof documents, we are better than
that, but there is plenty of room for further improvement.

The situation is going to improve once again soon, because the last remains of
the old ML and Isar toplevel command loops disappear, and this gives more
freedom in the semantics of command application (e.g. partial failure, while
retaining structural integrity of the text).

theory BugDemo
  imports "HOL-Combinatorics.Multiset_Permutations"
begin

The real error is semantic — qed finds no open goal context — and isabelle
build diagnoses it correctly. PIDE reports a spurious syntax error at the
wrong line instead. While this is not a serious bug, it may be misleading for
the end-user to receive the syntax error instead of the real issue they should
focus on.

Why not stop using this misleading "bug" word. Above you say yourself, that it
is not a serious bug.

This was found using a differential fuzzing framework that compares error
output between isabelle build and the PIDE server on mutated AFP theories. The
mutation by simp -> by sorry triggered this mismatch in 10/10 independent
runs. I am happy to share the framework and logs.

I would say the only problem on our side is that the documentation is too
implicit about the many differences of "isabelle build" vs. the PIDE editor
session.

On the other hand, everybody knows that after using Isabelle practically for a
brief time.

Makarius

view this post on Zulip Email Gateway (Apr 13 2026 at 21:36):

From: Bartosz Glowacki <cl-isabelle-users@lists.cam.ac.uk>

Dear Makarius,

Thank you for the context. I take your point that batch build and PIDE sessions work under fundamentally different side-conditions, and that differences in error reporting are expected to some degree. I will avoid calling these "bugs."

However, the by sorry case was just the entry point. The differential framework has now run hundreds of iterations of mutated AFP theories through both isabelle build and the PIDE server (use_theories), and after filtering out false positives from my own framework, several patterns remain where the PIDE user is concretely disadvantaged compared to batch build. I wanted to flag these as observations rather than defect reports (and maybe area for improvements on the server side):

  1. Server stops after first error, hides later independent failures.
    When a theory has two independent broken proofs (e.g. lines 354 and 537), isabelle build reports both, but the server reports only the first and stops. A user fixes line 354, re-checks, and is then surprised by the second failure at line 537 that was there all along. (Observed in Randomised_Social_Choice/Rankings.)

  2. Server appears to carry stale state across use_theories calls.
    Across six consecutive mutations of Risk_Free_Lending.thy, the server persistently reports line 118 regardless of where the actual error is (lines 42, 177, or 2265 in different mutations). isabelle build correctly reports the actual error location each time. This looks like the server caching a failure from an earlier invocation without fully invalidating the affected region.

  3. Server floods intermediate proof-step errors from a single root cause.
    For Median_Of_Medians_Selection.thy, a single duplicated by (auto ...) line causes isabelle build to report 3 errors, while the server reports 7 - scattering error markers across proof steps at lines 426, 445, 454, 461 that are all consequences of the one mutation. The user in jEdit sees what looks like five separate problems, and is confused, where to actually apply the fix.

  4. Consistent off-by-one line numbers.
    Across five iterations and two different theories (Roy_Floyd_Warshall.thy, another), the server reports a command error one line before where isabelle build reports it (e.g. server says line 78, build says line 79 for the same obtain command). This is reproducible and systematic — always off by exactly 1 in the same direction.

  5. Different error attribution depth in nested proofs.
    For GenClock.thy, the server reports the innermost failing qed (line 431), while isabelle build reports the outermost qed (line 455). This is arguably a design choice rather than a problem - the server's pinpointing may actually be more useful.

I understand that some of these (especially 3 and 5) may be inherent to the parallel/incremental processing model and not worth changing. And case 2 may be an artifact of how my framework calls use_theories in sequence. But cases 1 and 4 seem like places where the interactive experience could be improved without fundamental architectural changes.

I am happy to share the full framework, mutation logs, and theory files for any of these cases. I would really like to know what your opinion on this is.

Regarding my overall approach: the framework performs differential testing by taking AFP theories, applying targeted mutations (substituting proof methods, duplicating commands, introducing sorry), running them through both isabelle build and the PIDE server, and comparing the error output systematically. The goal is to find places where the two modes diverge in ways that affect the user.

Kindest regards,
Bartosz Glowacki
King's College London


From: Makarius <makarius@sketis.net>
Sent: Sunday, April 12, 2026 9:15 PM
To: Bartosz Glowacki <bartosz.gowacki@kcl.ac.uk>; cl-isabelle-users@lists.cam.ac.uk <cl-isabelle-users@lists.cam.ac.uk>
Cc: Mohammad Ahmad Abdulaziz Ali Mansour <mohammad.abdulaziz@kcl.ac.uk>; Karine Even-Mendoza <karine.even_mendoza@kcl.ac.uk>
Subject: Re: [isabelle] [ISSUE] PIDE reports wrong error line and message for by sorry before qed in calc proof

On 02/04/2026 19:05, Bartosz Glowacki (via cl-isabelle-users Mailing List) wrote:

I found a difference between isabelle build and the PIDE server (use_theories)
outputs when by sorry appears as the final tactic before qed in a calc-
terminated proof block:

First note that "tactic" is not a category of the Isar proof language. The
"by" element is called a "command", and its argument a "proof method", or just
"method". The "sorry" command cannot be given as argument to "by": it will not
work for structural reasons.

Anyway, back to the main points: You have just discovered the tip of the
iceberg of plenty of fundamental differences how batch-build vs. PIDE editing
works. That can be explained from historical reasons over many decades,
because quite different side-conditions apply to these two main modes of
Isabelle (in the past there were > 2 such modes).

In general, batch-compilers vs. IDE environments routinely work quite
differently, so this is to be expected --- even if it is annoying. For
example, I find the Scala plugin of IntelliJ IDEA particularly weak in this
respect: the IDE feedback is only a crude approximation of the real compiler
and its errors.

For Isabelle processing of theory and proof documents, we are better than
that, but there is plenty of room for further improvement.

The situation is going to improve once again soon, because the last remains of
the old ML and Isar toplevel command loops disappear, and this gives more
freedom in the semantics of command application (e.g. partial failure, while
retaining structural integrity of the text).

theory BugDemo
imports "HOL-Combinatorics.Multiset_Permutations"
begin

The real error is semantic — qed finds no open goal context — and isabelle
build diagnoses it correctly. PIDE reports a spurious syntax error at the
wrong line instead. While this is not a serious bug, it may be misleading for
the end-user to receive the syntax error instead of the real issue they should
focus on.

Why not stop using this misleading "bug" word. Above you say yourself, that it
is not a serious bug.

This was found using a differential fuzzing framework that compares error
output between isabelle build and the PIDE server on mutated AFP theories. The
mutation by simp -> by sorry triggered this mismatch in 10/10 independent
runs. I am happy to share the framework and logs.

I would say the only problem on our side is that the documentation is too
implicit about the many differences of "isabelle build" vs. the PIDE editor
session.

On the other hand, everybody knows that after using Isabelle practically for a
brief time.

Makarius


Last updated: May 02 2026 at 13:17 UTC