Stream: Archive Mirror: Isabelle Users Mailing List

Topic: [isabelle] Automating the auto-tools


view this post on Zulip Email Gateway (Aug 19 2022 at 07:47):

From: Gottfried Barrow <gottfried.barrow@gmx.com>
I'm in jEdit, so I first open up a second view of jEdit with "View/New
View".

I have a theorem:

theorem emS_is_unique1:
"!u. (!x. ~(x IN u)) --> u EQ emS"

I highlight it, and run my script. It creates a new file which imports
my main file, like this:

theory sTs_Simp_X
imports sTs_Simp
begin

theorem emS_is_unique1_S:
"!u. (!x. ~(x IN u)) --> u EQ emS"
apply(auto) sledgehammer
nitpick[sat_solver=MiniSat_JNI]
nitpick[sat_solver=SAT4J]
oops

theorem emS_is_unique1_Sn:
"~(!u. (!x. ~(x IN u)) --> u EQ emS)"
apply(auto) sledgehammer
nitpick[sat_solver=MiniSat_JNI]
nitpick[sat_solver=SAT4J]
oops
end

It's importing all my nitpick and sledgehammer settings from the main file.

It runs sledgehammer and nitpick on the theorem and its negation. It
takes a while, that's why I have the two jEdit views. I watch sTs_Simp_X
in one jEdit view and can work on sTs_Simp in the other.

The continuous prover works on sTs_Simp_X while I'm working on sTs_Simp.
Of course, if I do a save on my main file, the auto-provers start over
in sTs_Simp_X, which is why I try to leave it alone, but that's hard to
do. I want to click around in sTs_Simp_X to see if anything is
happening. I reduce the 20 provers down to e, spass, and z3, to speed
things up; I'll probably put more back in.

I haven't completely figured out how the continuous prover works. In a
big file, it sometimes seems to go into a dormant state if I don't go
all the way to the bottom and click on the line where the "end"
statement is.

I would attach the script, but it's done in a scripting language for
WinEdt. jEdit has a scripting language, but I don't know how to use it
except for very simple macros for key sequences.

I now return to the drudgery of tutorials and exercises.

Regards,
GB

view this post on Zulip Email Gateway (Aug 19 2022 at 07:47):

From: Gottfried Barrow <gottfried.barrow@gmx.com>
You'd actually want this:

nitpick[sat_solver=MiniSat_JNI]
nitpick[sat_solver=SAT4J]
apply(auto)
sledgehammer

Instead of this:

apply(auto)
sledgehammer
nitpick[sat_solver=MiniSat_JNI]
nitpick[sat_solver=SAT4J]

Since nitpick would only work on the subgoal after the apply(auto).

Regards,
GB

view this post on Zulip Email Gateway (Aug 19 2022 at 07:48):

From: Lars Noschinski <noschinl@in.tum.de>
It should process more or less to the point where you are looking at in
the editor. This saves a lot of computing effort, when you are editing
something in the middle of a big theory -- otherwise the cpu would
constantly be spinning, heating up your computer and wasting your
battery power.

-- Lars

view this post on Zulip Email Gateway (Aug 19 2022 at 07:48):

From: Gottfried Barrow <gottfried.barrow@gmx.com>
Okay. I guess the general rule is that if some code can be seen in a
jEdit view, then the continuous prover will work on the file up to that
point.

To belabor the point, I suppose if I had 10 jEdit views open of 10
different .thy files, the continuous prover would try to work on all of
them up to the code that could be seen in each view.

That's how it appears. I re-generate the tester file from the theorem. I
click back on the jEdit view which has my main file. jEdit asks me if I
want to reload the re-generated file. I say yes, and the re-generated
tester file gets reloaded in both jEdit windows, and the continuous
prover starts working away on the view that shows it. All of the code
for the tester file can be seen, so the continuous prover works until it
gets to the end.

I broke the two tester theorems into four: Sledgehammer on the theorem.
Nitpick on the theorem. Nitpick on the negation. Sledgehammer on the
negation.

LEARNING BY ACCIDENT WITH AUTO-TOOLS

By accident, I found out that importing my complete file will allow
Sledgehammer and Nitpick to find things they otherwise couldn't find,
which makes sense to do for the purposes of trying to find inconsistencies.

Being back in my original file, rather than a simplified version for
experimenting, I had the incorrect axiom uncommented out,

(x IN u <-> x IN v) --> u = v,

instead of the commented correct version,

!u. !v. (!x. x IN u <-> x IN v) --> u = v

In my main file, I have the theorem stated before another axiom because
I don't want the axiom to be used to prove the theorem.

Well, in my tester file, where the negation of the theorem is being
tested by Nitpick and Sledgehammer, spass and e found proofs for the
negation using my additional axiom, though yices and Z3 didn't.

Lesson? Axioms. You don't want them, which has been addressed. If you
use them, be afraid. Very afraid.

Additional lesson? I want to put back in some of those provers to see
who can find the inconsistency and who can't.

The prover e tells me:

"The prover found a type-unsound proof ... even though a supposedly
type-sound encoding was used (or, less likely, your axioms are
inconsistent). Please report this to the Isabelle developers."

We understand that Jasmin or another develop had his peers in mind when
he said, "or, less likely, your axioms are inconsistent".

I choose the spass metis theorem. After inserting it, in the output
window, it tells me "Metis: the assumptions are inconsistent".

I comment out the wrong formula and un-comment the right formula. The
theorem is proved, and nothing else shows up wrong at this point.

Sledgehammer or Nitpick completing their attempts is slowest when I try
to disprove what's true, and try to prove what's not true.

Of course, I have the timeouts for both set to 120 seconds.

I need to experiment on Linux under VirtualBox to see what kind of
performance I get. I dread setting it all up, and that kind of setup
isn't totally trouble free either.

Regards,
GB

view this post on Zulip Email Gateway (Aug 19 2022 at 07:48):

From: Makarius <makarius@sketis.net>
Yes. It also means that a lot of editor views on different spots in some
library can cost a lot of performance for continous checking.

You can also control the range of checking a little bit: the Prover
Session panel has a "Check" button, which tells to prover to check the
whole of the current buffer, until you "Cancel" or recommence normal
editing.

Makarius

view this post on Zulip Email Gateway (Aug 19 2022 at 07:49):

From: Gottfried Barrow <gottfried.barrow@gmx.com>
With a 480 second timeout, these are provers which found the
inconsistency due to the trivial mistake in the axiom:

e
metis
spass
spass_new
z3_tptp
remote_e_sine
remote_iprover
remote_iprover_eq
remote_leo2
remote_satallax
remote_vampire

Of note is that z3_tptp found it but not z3. Note to self: Leave z3_tptp in.

Provers that attempted it:

metis smt dummy_thf e spass spass_new spass_old
z3_tptp cvc3 yices z3 remote_e_sine remote_e_tofof
remote_iprover remote_iprover_eq remote_leo2 remote_satallax
remote_snark remote_vampire remote_waldmeister

Note to self: Put the remote provers in that were successful. They
shouldn't add too much of a burden on my CPU.

With a 480 second timeout, the right axiom in, and me waiting much
longer than 480 seconds, sledgehammer didn't find a proof for the
negated theorem.

I ran with a 480 second timeout, with these provers:

e metis spass spass_new z3 z3_tptp remote_e_sine remote_iprover
remote_iprover_eq remote_leo2 remote_satallax remote_vampire

At 480 seconds, e, metis, spass, and spass_new timed out. They had all
been sucking up 25% of my CPU processing. At that point z3 kicked in
using 25% of my CPU until it got an error after 2 minutes. Sledgehammer
was still running, so I figured it was waiting on the remote provers to
signal that they had timed out after 480 seconds of processing, which,
it appears, was going to be spread out over however many minutes it
would take for me to get my 480 seconds of processing, the remote
servers also trying to prove theorems for the rest of the world, trying
in vain, at least for the negation of mine, I hope.

And those prover processes don't terminate very well. You have to help
them out, and kill them sometimes. For Windows, it helps to have the
process explorer utility:

http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx

However, killing them seems to cause problems, so I think it's best to
let them terminate normally.

I learned something important here. Every prover's going to get its 480
seconds to try and do something.

But every time I do a keystroke in a file that the tester file is
importing, the continuous prover starts over, so I need to change my
script to accommodate for that.

Regards,
GB

view this post on Zulip Email Gateway (Aug 19 2022 at 07:49):

From: Gottfried Barrow <gottfried.barrow@gmx.com>
I'm not editing big files yet, although they've become big enough that I
try not to do a save on a file in my other editor, or jEdit will reload
the whole thing and prove it all from the top.

I'll try to remember the check button, but I forsee building up theories
hierarchically so I can build the heap for what I'm doing. For anything
that's in the heap, speed's not a problem, as far as I can tell. Speed
problems for sessions in src/HOL went away after I built the heap for them.

The way you have things working is not a problem with me. I've started
to use 2 views as my standard setup. The edits stay in sync, and the
prover for the two views is the same prover.

Any keystroke in a file that my test file imports causes the prover to
start over in that file, but I now create a copy of my working file for
my tester file to import, so that doesn't happen.

The continuous prover works good for me. I don't have any longing for
manually stepping through a theory like in PG, but I have decently fast
computer. It's not extraordinarily fast. It's an i3 with 16GB of ram.

Regards,
GB


Last updated: Apr 25 2024 at 01:08 UTC