Stream: Archive Mirror: Isabelle Users Mailing List

Topic: [isabelle] Sledgehammer and Nitpick from the command line?


view this post on Zulip Email Gateway (Aug 19 2022 at 07:54):

From: Gottfried Barrow <gottfried.barrow@gmx.com>
I have a theorem that I want to test with Sledgehammer and Nitpick by
calling isabelle from the command line.

For the Sledgehammer test, it's in a theory like this:

theory z0
imports z0i
begin
sledgehammer_params[verbose=true,isar_proof=true,timeout=10000000000,provers="
smt e spass z3_tptp z3
"]
theorem really_important_result_neg: "~(the Riemann conjecture is true)"
apply(auto)
sledgehammer
oops
end

Is there a way to do this so that I get to see the Sledgehammer output?

I've broken a theory with 4 different tests into 4 different theories
with one test each using Sledgehammer or Nitpick.

If they're all in one theory, then I have to run them sequentially. If I
have 4 different theories, and I want to run them all at the same time,
then I have to have 5 jEdit views open.

I probably don't want to be running 4 tests at the same time, but
there's a good chance I at least want to run 2 at a time. Tests being
what they are, I want to terminate them sometimes. Starting and
terminating processes can mess up the one jEdit session that's managing
all the views.

I could start up a completely independent jEdit session, other than
they're not completely independent because they'll use the same home
folder. Well, I could set the environment so they use different home
folders.

Anyway, what I really want to do is run them all from the command line.
If I'm testing the negation of a theorem that's been proved, then I'm
not expecting Sledgehammer to prove the negation. I could start a test
in a console window with an 8 hour timeout, and then work in jEdit
independently, where I tend to start and shutdown jEdit quite a bit.

I looked at "isabelle tty" briefly, but that didn't look like the
solution, though it may be. I searched in the Sledgehammer manual on
"command", and I didn't find anything.

Thanks,
GB

view this post on Zulip Email Gateway (Aug 19 2022 at 07:54):

From: Gottfried Barrow <gottfried.barrow@gmx.com>
Being able to run Sledgehammer and Nitpick from batch/bash files would
create lots of convenient possibilities.

It's not just about testing what you've proved. If had a new theorem I
was about to start working on, then I could run a 30 second Sledgehammer
and Nitpick test on both the theorem and its negation, which would be
the 4 tests. I could then start 1 or 2 longer tests before starting my
attempt to prove the theorem.

It would also conveniently allow me to run two different groups of
provers independently on the same theorem in different console windows,
the two groups being the local provers and the remote provers. It's good
to get Sledgehammer to terminate normally, but you don't know when the
remote provers are going to timeout.

I'm about to start an overnight experiment with the timeout set to 4
hours, but I want predictability, so I'm not going to put any of the
remote provers in. I'd need to have two independent jEdit sessions
running, like I talk about below, to not end up with a situation where
the remote provers never come back, and Sledgehammer is still running 8
hours later.

Regards,
GB

view this post on Zulip Email Gateway (Aug 19 2022 at 07:56):

From: Jasmin Blanchette <jasmin.blanchette@gmail.com>
Hi Gottfried,

The "system.pdf" manual explains how to process Isabelle theories on the command line. I never remember the exact syntaxes and have to look it up each time, but look for the "isabelle usedir" tool. (There might be other tools as well.) This is nothing specific to Sledgehammer.

Regards,

Jasmin

view this post on Zulip Email Gateway (Aug 19 2022 at 07:57):

From: Makarius <makarius@sketis.net>
Long-running IDE sessions do not make much sense. The tty or unattended
batch script are the right approach for that.

You need to take care that Isabelle/Scala/jEdit present a Unicode
rendering of the prover sources to you, so copy-paste into a plain
"isabelle tty" session will not work. You can use "Reload with Encoding"
in jEdit to use UTF8 instead of the default UTF8-Isabelle -- it will
reveal the true form of the prover in put sources that can then be copied
into the raw process. The latter might still cause additional problems on
Windows. It might require some experimentation with the Cygwin-Terminal
to see how it works.

Makarius

view this post on Zulip Email Gateway (Aug 19 2022 at 07:57):

From: Gottfried Barrow <gottfried.barrow@gmx.com>
(I sent this email in earlier today, but it never showed up on the list.
The other had an attached zip file containing my experiment files, so
that might have been why it got rejected.)

I tried that, but I'm doing something in a theory that you wouldn't
normally do, which is leaving a sledgehammer command under a theorem,
and wanting usedir to give me sledgehammer feedback like I get in jEdit.
This would probably be a feature request, which I'm sure you don't have
time for.

I'll try using i3p: http://pu.inf.uni-tuebingen.de/i3p/

If I start sledgehammer in jEdit with a long timeout, I have to be
careful not to disturb jEdit by scrolling around. Otherwise, it'll start
sledgehammer again, new prover processes will get started, and old
prover processes won't get terminated.

Here are some details if you have time. I can get a decent solution with
either i3p, Proof General, or jEdit. All of them have some GUI overhead,
and overhead from having multiple sessions running, but that's the cost
of doing business. Part of the solution is to run these tests on a
different computer.

DETAILS

If I use "usedir" on a theory which has a sledgehammer command in it, an
eprover.exe process gets started (sometimes) that never gets terminated,
and usedir gives some warning messages even though it finishes building
the HTML, GraphBrowser.jar, and session.graph in the browser_info folder.

Further experimenting, I find out that what happens is not always the
same. I'm using sledgehammer in a way it's not meant to be used, so
what's happening is meaningless to some degree.

With five provers in, I get five warning messages like on the second
line below:

Building z0 ...
sh:
/cygdrive/e/Isabelle2012/contrib/exec_process-1.0/x86-cygwin/exec_process:
Bad address

I take two provers out, and an exec_process.exe process gets started
which never gets terminated. With sledgehammer only running "e" and
"spass", an "eprover.exe" process gets started that never gets terminated.

With no sledgehammer command in, usedir work as expected.

Here is the theory with sledgehammer:

theory z0
imports z0i
begin
sledgehammer_params[verbose=true,isar_proof=false,timeout=120,provers="
z3 e spass z3_tptp cvc3
"]
theorem emS_is_unique_Sn:
"~(\<forall>u. (\<forall>x. x nIN u) \<longrightarrow> u = emS)"
apply(auto)
sledgehammer
oops
end

Here is the file it imports:

theory z0i
imports Main
begin
typedecl --"The primitive set type: everything is a set."
sT
consts --"Set membership predicate: to be axiomatized by subsequent axioms."
inS::"sT \<Rightarrow> sT \<Rightarrow> bool" (infixl "IN" 55)
abbreviation
ninS::"sT \<Rightarrow> sT \<Rightarrow> bool" (infixl "nIN" 55)
where "x nIN y == \<not>(x IN y)"
consts --"The empty set: naively named as a constant."
emS::sT
axiomatization --"Axiom of existence: the empty set is empty." where
eeA:"\<forall>x. \<not>(x IN emS)"
axiomatization --"Axiom of extention: set equality." where
exA: "\<forall>u. \<forall>v. (\<forall>x. x IN u
\<longleftrightarrow> x IN v) \<longrightarrow> u = v"
theorem emS_is_unique --"The empty set is unique.":
"\<forall>u. (\<forall>x. x nIN u) \<longrightarrow> u = emS"
by (metis eeA exA)
end

Here is ROOT.ML

(*
no_document use_thy "ThisTheory";
use_thy "ThatTheory";
*)
use_thy "z0";

Here is the usedir command:

isabelle usedir -b HOL z0

Here are my usedir etc/settings

ISABELLE_USEDIR_OPTIONS="-i true -g true -M max -p 1 -q 2 -v true
-V outline=/proof,/ML"

Regards,
GB

view this post on Zulip Email Gateway (Aug 19 2022 at 07:57):

From: Gottfried Barrow <gottfried.barrow@gmx.com>
Okay. I'll keep that in mind, though it's not obvious to me right now
how I'm supposed to use that information to create a batch/bash file to
automate everything.

My idea right now is to use i3p. I used it a little in the past, and it
seems to be more of a light weight IDE. I can call it directly with a
batch file, unlike Proof General under Cygwin. I would have to manually
start the proving with i3p, but that's okay.

If I really get serious about testing theorems then I'll set up a true
Linux install on a quad core desktop that I'm not using.

Thanks,
GB

view this post on Zulip Email Gateway (Aug 19 2022 at 07:57):

From: Gottfried Barrow <gottfried.barrow@gmx.com>
I think using a second session of jEdit with the environment set to a
different home folder is the best solution for what I'm trying to do.
Running things with a batch file sounds good as an idea, but there's a
lot of interaction between jEdit and Sledgehammer that would be hard to
duplicate.

The main way I keep in control is by entering "sledgehammer" when I want
it to start, and otherwise, making sure the "sledgehammer" command isn't
in the file. Sledgehammer is very eager to start running.

I have to make sure "sledgehammer" isn't in before trying to edit the
file in any way. If I try to edit the timeout value with "sledgehammer"
in, it'll jump down to the sledgehammer step after a single keystroke.

Also, if I delete "sledgehammer" while the provers are still running, it
seems to be good at shutting them down. I'll see a process such as
"eprover.exe" taking up 25% of the CPU processing, and when I delete
"sledgehammer", it'll kill the process.

I start a second copy of jEdit with a different home folder, so it
doesn't write over my main jEdit settings, and then I split the screen
vertically four or more times with "View / Splitting / Split
Vertically". I haven't gotten to the point where I try to run
sledgehammer in 4 files and nitpick in 2 files all at the same time.
That will require 7 split windows.

I haven't figured out how independent the continuous prover allows
sledgehammer to be when running in two different files. I might need to
start up 3 or more copies of jEdit

I'm trying to see if I can compile the prover satallax for Cygwin.

Regards,
GB

view this post on Zulip Email Gateway (Aug 19 2022 at 08:01):

From: Gottfried Barrow <gottfried.barrow@gmx.com>
Jens,

At least with Windows, installing Java is near effortless. In trying to
get Suse, Fedora, and Ubuntu going under VirtualBox, I thought I was
going to have to install Java, and it's not effortless because Sun Java
is never part of the install these days, and they replace Sun Java with
openSDK Java. * JAVA INCLUDED WITH BOTH CYGWIN & LINUX*

Last year I ran Isabelle2011 under Ubuntu, and I had to install Sun Java
to do it, which sudo makes reasonably easy, but you have to hunt down a
site for sudo to get it from, and such a site is never going to be an
official download site, so installing Sun Java on Linux can become a
security risk.

Somewhere in the process of installing Linux distributions this year,
the thought occurred to me, "Hmmm, I wonder if Makarius put the JVM
runtime under contrib for the linux distribution." I opened up
Isabelle2012_bundle_x86_64-linux.tar.gz, and there I saw the folder
jdk-6u31_x86_64-linux. Thanks Makarius for no installation necessary on
the big components, like Cygwin and Java, which the user used to have to
intall, Cygwin not being straightforward at all, and Sun Java on Linux
not being straightforward.

One thing that used to concern me was Java constantly updating itself.
I'd think, "I better keep these old Windows Java install files because a
Java update could break Isabelle." Now the Java runtime version gets
fixed for a particular version of Isabelle.

NORTON QUIT DELETING ISABELLE.EXE

The thread "Windows Command Line 2012" brought my attention back to
"Isabelle.exe" which is now run to start Isabelle.

I thought maybe I had messed up my Isabelle2012 install, so I
reinstalled it again, and found out that Norton now finds "Isabelle.exe"
acceptable.

One of Norton's scan methods is "reputation". So if Norton thinks a
program is acting suspicious, and the program has no reputation because
it's a new release, then it quarantines the program. Norton File Insight
now tells me about "Isabelle.exe":

1) Very Few Users: Fewer than 5 users in the Norton Community have used
this file.
2) Mature: This file was released 1 month ago.
3) Good: Norton has given this file a favorable ratting.

At the time Norton was deleting Isabelle.exe, the rating was "unknown"
because Norton knew nothing about the "Isabelle.exe".

OTHER ANTIVIRUS INFORMATION

I could use free antivirus software, but Norton has protected me for
several years, so I go to a local chain store and pay $40 for a 3
computer license of Norton 360 instead of paying $59 or $89 on their
site. Plus, I think they have one of the better firewalls.

Developers playing tricks to get Linux/Unix programs to run on Windows
take a chance on getting them blocked by antivirus programs. I've never
had Norton ever complain about Cygwin, but it's a native or semi-native
application of Windows, based on what I know about it.

The "alt_ergo" executable "alt-ergo-0.94-mingw.exe" got rejected by
Norton, so minGW executables can get the reject.

A couple of months ago, Norton deleted the install file for the Coq beta
version, which I was trying to use in conjunction with a type theory
book. I unquarantined the Coq installer and installed it, then when
experimenting with it, I selected "compile" from some menu and Norton
shut down the compile executable and deleted it. I figured I didn't need
to be doing taht anyway, so I uninstalled Coq, to wait for another day.

Anyway, a person intent on using a program can tell the antivirus
program to unblock it, but people trying out a new program will reject
it if it ends up being too much trouble.

COMPILING PROVERS FOR CYGWIN and E's SLOW STARTUP

The reason I reinstalled Isabelle2012 was because I had been seeing
eprover.exe show up as a process, but then it stopped showing up.

It turns out that eprover.exe takes a little over 35 seconds to start up
on Cygwin. I had switched to using a timeout of 30 seconds, and I had
installed gcc and OCaml for Cygwin, so I thought I might have messed up
the Cygwin under the contrib folder, but that wasn't the case

I successfully compiled leo2, satallax, why3, and alt-ergo for Cygwin.

Leo2 built with no hassles. I had problems with the first try of
satallax, but I retried with new, untarred sources, and it magically
built per the instructions.

For alt-ergo, I had to build why3. Why3 got some kind of error at the
end, but a "make install" still seemed to make it work good enough.

There's a minGW exe for alt-ergo, but I rebuilt it because when I run
alt-ergo with the minGW exe, I get this error:

"alt_ergo": A prover error occurred:
Unknown input format: tff1

To build a Cygwin binary for alt-ergo, I had to build "ocamlgraph", and
the key to success there was not building the GUI, and to get the
correct make file, I had to uninstall the GTK library that was causing
problems.

I get the same error with the Cygwin binary, so after reading the
description of alt-ergo, I assume alt-ergo does some specialized proving.

VAMPIRE 0.6 FOR CYGWIN, INSTALLING LINUX TO GET VAMPIRE

Trying to run Vampire for Cygwin, I get this error:

"vampire": A prover error occurred:
User error: forced_options is not a valid option

The Cygwin version is 0.6, and I assume the Linux version is 1.8.

The latest version of Vampire is the local prover that I can't on
Cygwin. It seems a shame to have to have to install Linux to get it,
when I have all the other local provers, but Vampire appears to be one
of the best for first-order logic:

http://www.cs.miami.edu/~tptp/CASC/J6/WWWFiles/DivisionSummary1.html
<http://www.cs.miami.edu/%7Etptp/CASC/J6/WWWFiles/DivisionSummary1.html>

I got that link off of the satallax web page:

http://www.ps.uni-saarland.de/~cebrown/satallax/
<http://www.ps.uni-saarland.de/%7Ecebrown/satallax/>

Satallax seems to be at a distinct disadvantage when competing in prover
competitions with Sledgehammer, considering that Sledgehammer uses
Satallax, in addition to many other provers.

CREATING 10 FILES TO TEST ONE THEOREM

I finish up my "the user reports back from the field on using
Sledgehammer and Nitpick to test theorems".

First, when using Sledgehammer and Nitpick in jEdit, where
"sledgehammer" (or nitpick) is the only proof command in a file, it
helps to know that there are only three states:

1) Sledgehammer is running because a "sledgehammer" command exists in
the file, and Sledgehammer hasn't finished.

2) Sledgehammer has finished running, the results are in the output
panel, and nothing has been done to the jEdit session to make
Sledgehammer start over, where Sledgehammer starting over will erase the
old results. "Anything" could be editing the file in any way, or working
in another file that's also visible in the same jEdit process. The
safest way to keep Sledgehammer from starting over is to only scroll
through the Sledgehammer results until you know you don't want them
anymore. That can be a problem when Sledgehammer is constantly updating
its results, but I foresee only running Sledgehammer for hours and hours
when no results are being found.

3) Sledgehammer is prevented from running because a "sledgehammer"
command doesn't exist in the file, where deleting a "sledgehammer"
command will cause the continuous prover to terminate the prover
processes that are running. (Deleting running sledgehammer commands
before closing a file or closing jEdit is the best thing to do.)

There are commands in the Sledgehammer manual, such as "sledgehammer
running_provers", but that doesn't apply to jEdit that I can tell. If I
try to run such a command, sledgehammer immediately starts over. I
actually use this eagerness of sledgehammer to start as a way to use it
like a batch file.

Okay, so to test one theorem and its negation, I create 10 different
files that all import their own copy of the main THY file. These 10
files get loaded into 10 different jEdit sessions, where I use the
switch "-j -settings=/.isabelle/%ISAVERSION%/jedit%1" to tell jEdit
where to store the settings, which require me to copy my main home jEdit
folder to 10 different folders, such as
/users/me/.isabelle/Isabelle2012/jedit1.

I break the test out into 10 files because I want to learn how the
different provers work, want to close jEdit for provers which have
finished, and I want to terminate and start provers without causing
other provers to start over.

Right now, I'm grouping them like this:

(1 e spass_new remote_vampire remote_e_sine remote_iprover_eq)
(*2 z3_tptp spass metis vampire remote_e_tofof remote_iprover
remote_snark*)
(3 z3 yices cvc3 smt alt_ergo remote_waldmeister)
(4 satallax leo2)
(*5 remote_leo2 remote_satallax remote_e remote_z3_tptp remote_cvc3
remote_z3*)

Groups 1 and 2 are first-order provers, and I break them up because "e"
is going to run as long as I let it.

Group 3 is mainly the SMT provers.

Group 4 is the higher-order provers, and they'll run as long as I let
them. Group 5 is the remote duplicates.

My WinEdt script can create some or all of the 10 files, and then for
each file created, it calls a new process of jEdit using a corresponding
settings folder. Sledgehammer and nitpick start automatically if a
"sledgehammer" or "nitpick" command is in the file. I arrange the 10
windows where I want them, and jEdit will remember the settings.

I set the default for both Sledgehammer and Nitpick at 48 hours. The
idea is that I can always terminate by deleting the sledgehammer or
nitpick command, but it will always take me, say, 10 minutes to pick up
where a test left off if it finished after 10 minutes.

Not knowing how the provers work, I assume the first x minutes for a
prover is always the same. So a prover run for 1 hour twice is only a 1
hour test, where a 2 hour test has done something different the second
hour than the first.

As it turns out, m
[message truncated]


Last updated: Apr 25 2024 at 04:18 UTC