[isabelle] Isabelle2023-RC2: Sledgehammer prioritized ove... · Mirror: Isabelle Users Mailing List

Stream: Mirror: Isabelle Users Mailing List

Topic: [isabelle] Isabelle2023-RC2: Sledgehammer prioritized ove...

Email Gateway (Jul 30 2023 at 10:17):

From: Dominique Unruh <cl-isabelle-users@lists.cam.ac.uk>
Hello,

in RC2 (and also prior RCs), when I run sledgehammer (as a command),
then Isabelle uses all available threads to do the sledgehammer, without
continuing to parse the document after the sledgehammer.

The following example (also attached as a .thy) illustrates this:

Until the sledgehammer is finished, nothing happens in the rest of the
proof.

In Isabelle 2022, the behavior was different (on the same system). Even
with a sledgehammer command still running, parsing of the rest of the
document would continue. Even methods would be evaluated (e.g., apply
simp etc.) albeit a bit slower.

The previous behavior was highly convenient: It was possible leave
sledgehammer running on some subgoals while already working on the goals
below, considerably speeding up the proof and removing idle times.

VyA8ivCdj4PBBrnA.png
Scratch.thy

Email Gateway (Jul 31 2023 at 07:36):

From: Jasmin Blanchette <jasmin.blanchette@ifi.lmu.de>
Dear Dominique,

I wasn't aware of the back-and-forth of behavior you are describing. I suspect the behavior is originating from Isabelle/PIDE, because even when the user invokes Sledgehammer with the "dont_slice" option (which instructs Sledgehammer to use only one thread), with recent development versions processing does not continue. This is the behavior I'm familiar with, but I can see why the other behavior is useful.

Have you tried the Sledgehammer panel? It should allow you to continue with your proof. (If you prefer the command to the panel, I'd be curious to hear why.)

Best,
Jasmin

Email Gateway (Jul 31 2023 at 08:13):

From: Dominique Unruh <cl-isabelle-users@lists.cam.ac.uk>
Dear Jasmin,

indeed, the panel seems to block the processing less for some reason.
(It's hard to tell whether the amount of blocking is the same as it was
in 2022, but at least processing continues.)

I have stopped using the panel at some point and started using the
command because the following workflow enables a very fast use of
sledgehammer that the panel cannot keep up with:

* I write my Isar proof steps one by one (e.g., have "..." also have
"..." also have "..." etc.)

* After each proof step, I insert a sledgehammer command (via
configured abbreviation /sl//ctrl-;/ which inserts three lines /try0
sledgehammer by -///).

* In steps that sledgehammer cannot process, I can also add some
method calls before the sledgehammer command.

The advantages of this:

* I can start several sledgehammers at once. Of course, only I have a
very powerful machine they will be processed one by one anyway, but
at least I don't need to manually trigger one when the previous
finishes as I would do with the panel.

* I can see due to the coloring which ones have finished.
* Should I notice a problem with some proof step higher up and fix
it, the sledgehammers of steps below it will restart automatically,
I don't need to restart them as I would with the panel.

* For changes that break a bunch of proof steps throughout the theory
(e.g., minor change in a definition), it is a highly efficient
process to first go through all broken steps and put the
sledgehammer command there, go away for a while, and then come back
when all is done and see which steps need manual attention.
(Important: go through the finished sledgehammers last-to-first,
otherwise all the others restart.)

It may sound a little complicated, but this workflow speeds things up a
lot for heavy sledgehammer use. It would be quite sad if the blocking
behavior of the sledgehammer command would be permanent... I understand
too little of PIDE to see how it happens.

Best wishes,
Dominique.

Email Gateway (Jul 31 2023 at 08:28):

From: Peter Lammich <lammich@in.tum.de>
I'm indeed using a very similar workflow, copying in lines of the form

subgoal sledgehammer sorry

into my apply scripts, getting myself a coffee, and then see how many
goals I still have to solve myself. Very useful on subgoals prodcued
from verification condition generators. (The ideal feature here would be
an option to sledgehammer to stop processing as soon as the first proof
has been found)

Email Gateway (Jul 31 2023 at 09:01):

From: Jasmin Blanchette <jasmin.blanchette@ifi.lmu.de>
Hi Peter,

Have you tried "sledgehammer [max_proofs = 1]"? You can also set it once and for all globally as

sledgehammer_params [max_proofs = 1]

in your theory file.

Best,
Jasmin

Email Gateway (Jul 31 2023 at 13:10):

From: Peter Lammich <lammich@in.tum.de>
Addendum:

I also observe the reported blocking behaviour in my workflow. In 2022,
Isabelle would process about four "subgoal sledgehammer sorry" lines in
parallel, while they are now processed one by one.

This is a regression against 2022 (at least for my main-usage pattern of
sledgehammer)

Also parameters like [dont_slice, max_proofs=1] have no effect any more.
In the below example, 2022 would process all subgoals in parallel, while
2023 goes through them one-by-one, utilizing only one thread.

Email Gateway (Aug 02 2023 at 10:37):

From: Dominique Unruh <cl-isabelle-users@lists.cam.ac.uk>
Dear all,

one thing that maybe helps to narrow this down: The same effect happens
with try0. It's harder to notice because try0 usually is quite fast,
but sometimes it does take several seconds. So it seems it is not
something related to sledgehammer itself.

Best wishes,
Dominique.

VyA8ivCdj4PBBrnA.png

Email Gateway (Aug 06 2023 at 13:24):

From: Makarius <makarius@sketis.net>
This sounds like another case for "hg bisect" together with isabelle/Admin/init

Without tangible evidence from the history, discussions make hardly any sense.

Makarius

Email Gateway (Aug 07 2023 at 09:20):

From: Peter Lammich <lammich@in.tum.de>
I get

The first bad revision is:
changeset:   76406:40a365360680
user:        wenzelm
date:        Wed Nov 02 11:34:24 2022 +0100
summary:     more accurate outer syntax keywords (see also
94b2690ad494): base session could be anything, e.g. ZF vs. HOL;

Note that this is the same changeset that introduced the regression with
proof state not being displayed!

Here's a summary if the regressions that have been reported, and the
first bad revisions:

print-theorems displays too much: at 76813:92a547feec88
no proof-state in Output after commands (other than lemma): 40a365360680
sledgehammer blocks all threads: 76406:40a365360680

Email Gateway (Aug 08 2023 at 16:02):

From: Makarius <makarius@sketis.net>
Yes, because 'sledgehammer' needs to be categorized as "diag" command in order
to be made parallel by default.

So the change
https://isabelle.sketis.net/repos/isabelle-release/rev/40d50936484c restores
the parallel "subgoal sledgehammer [dont_slice,max_proofs=1] sorry" of yours.

I don't know of the other fine points that have been discussion on this thread
have changed as well.

If someone wants to try the above release repository version and finds
remaining regressions from Isabelle2022, then "hg bisect" will greatly help to
sort it out.

(There are < 48h left to produce Isabelle2023-RC3 and I am still busy in other
corners.)

Makarius

Email Gateway (Aug 11 2023 at 15:32):

From: Dominique Unruh <cl-isabelle-users@lists.cam.ac.uk>
Hello,

I can confirm that the problem disappeared for me in RC3. :)

Best wishes,
Dominique.

Last updated: Jul 26 2025 at 12:43 UTC