From: Jasmin Blanchette via Cl-isabelle-users <cl-isabelle-users@lists.cam.ac.uk>
Dear all,
The "try0" and "try" commands were introduced to give an easy way to try out various tactics and tools. "try0" only applies a selection of tactics, whereas "try" additionally runs "solve_direct", Quickcheck, and reduced versions of Nitpick and Sledgehammer.
I know people use "try0", including some experts, but I'm trying to find out whether "try" is used. The feedback I have about it, as well as my own impression, is that it just uses too much CPU at the same time to be useful, making the machine overheat. Also, because Nitpick and Sledgehammer are not the real thing, after a failed "try" one might still want to run them (esp. Sledgehammer), so "try" is not helping much. Finally, "solve_direct" and Auto Quickcheck are run by default, and you can also enable Auto Nitpick (which I've done for years), so in the end, "try" is really just "try0 + reduced Sledgehammer + lots of heat".
My proposal:
Remove the "try" command.
Rename "try0" to "try".
(Maybe?) Enable Auto Nitpick.
(Auto Nitpick is a reduced version of Nitpick that doesn't take that much CPU and won't run for more than 5 s. It's useful on nonexecutable conjectures like "hd [] = 0" and other problems with an axiomatic flavor.)
Any opinions?
Jasmin
From: Peter Lammich <lammich@in.tum.de>
Hi,
I'm using try0 very often. I only use nitpick if I don't trust my
lemma.
My typical sequence to find a proof to some lemma, which is simple
enough such that I can hope for a few lines apply script, goes as
follows:
unfolding relevant-defs
auto simp: relevant-simps
ALLGOALS (try0 or sledgehammer)
One feature that I miss most here would be a sledgehammer configuration
that saves resources while focusing on proof-finding, i.e., when the
first prover has found a proof, suspend the other provers and try proof
reconstruction. When a proof reconstruction is found, terminate and
don't use up resources any more.
This configuration would be beneficial when sledgehammer is run on many
goals in parallel, in my case, the many VCs that a verification
condition generator has left over.
For better usability, it could be combined with a sledgehammer
(allgoals) option, to prevent the current
subgoal sledgehammer sorry
subgoal sledgehammer sorry
...
subgoal sledgehammer sorry
boilerplate.
I have requested this feature already two years ago
https://lists.cam.ac.uk/mailman/htdig/cl-isabelle-users/2018-February/msg00043.html
but only got a solution proposal that worked on an unofficial branch of
Isabelle.
From: "Eugene W. Stark" <isabelle-users@starkeffect.com>
I use "try" routinely and find it essential. Here are a few notes:
(1) I find it much more convenient to type in "try" directly into the input while trying
to sketch out a proof development than to use the separate Sledgehammer interface.
(To me, the latter seems to be more useful for searching for perhaps better or faster
proofs without disturbing the existing text.)
I use "try" during proof development to try to see if a proof step can be quickly
verified as true, without necessarily having to tediously decompose it and write
out all the steps. I am willing to wait a little bit for this, though it tends to
come with some risk of overloading the prover (more on this below). Many times,
a proof by "smt" is produced, which I keep temporarily and then later improve,
either again using "try" or sledgehammer to find a better proof (which is often
possible later due to additional facts that I added while fleshing out the theory),
or by explicitly expanding the proof to eliminate the use of "smt".
(2) "try" is extremely helpful for becoming familiar with the behavior of the various
proof methods, and with theories one has not used before. Of course, one must
not expect it to do magic, even though sometimes it seems like it does.
(3) "try" will find proofs that "try0" misses, due to apparent time limits in "try0".
For example, "try0" will typically not find find proofs by "force" or "fastforce"
that take ten seconds or more, even if all the necessary facts have been given.
The limits seem to be removed in "try", which allows the long proofs to be found.
(4) There appear to be some strange anomalies in "try" regarding the scheduling of the
various proof methods it tries. One would expect that "try" would run the various
methods concurrently as much as possible. However, in some situations, "try" can
run indefinitely without finding a proof, whereas "try0" will immediately report
a proof by "force" or "fastforce" that takes only a few tens of milliseconds.
(5) "Heat" while using "try" seems to be related less to CPU usage than it does to memory
usage. Using "try" can quickly blow up the prover heap and put the computer into
thrashing conditions, unless suitable limits have been imposed when invoking Isabelle.
The desktop that I generally use for proof development has 24GB of RAM and runs Ubuntu.
I have typically limited the ML heap to 8GB and the Java heap to 4GB and this generally
allows me to do other things like browse the web and read email along with running
Isabelle, without entering a thrashing situation. Lately I have found it necessary
to increase the ML heap limit to 12GB to avoid excessive garbage collection,
and this will produce thrashing sometimes, especially in conjunction with the ridiculous
memory consumption of Firefox and Thunderbird.
Note that "try" can be less than useless on a memory-poor system. Although I was
successfully using a 6GB laptop for some time, I always had to be careful about using "try"
with it, which put a significant hitch into my normal development routine.
About two years ago I found that I was no longer able to do anything useful on
that platform. At the moment, 16GB is still a useful configuration, but barely.
(6) I believe there to be issues in the infrastructure regarding the cancellation of a
"try" that has already been started. With a fair amount of frequency, if you delete
a running "try" from the document, the search will not be cancelled immediately,
but rather Poly will run for arbitrarily long amounts of time at high CPU,
apparently until the originally scheduled tasks have completed. In previous
versions of Isabelle, I have observed an apparent blow-up in the scheduling of
future tasks that can produce a backlog that can take many minutes to clear, though
this seems to have improved in recent Isabelle versions.
It is difficult and frustrating to try to understand what is going on during periods
of overload, because the "Isabelle monitor" does not get any data from Poly during
precisely the periods of time when you want to know what is happening.
So all you see is happy, normal behavior, computing is going on, garbage collection
is taking place regularly and not too frequently, and the number of queued tasks is
stable, and then when an overload sets in the display stops for the duration.
When the overload clears, you just get a straight line drawn from the conditions before
to the conditions after, so you learn nothing.
(7) When the prover seems to have been spammed as a result of "try", I often have to make a
guess about whether it will be faster to wait for the overload to clear, or to just
restart Isabelle and re-check the context from the beginning.
My conclusion: I find "try" to be extremely useful, and would be very unhappy to see it go away.
However, I suspect that fixes to some of the issues I have described above would make it less
touchy to use.
- Gene Stark
From: "John F. Hughes" <jfh@cs.brown.edu>
As a relative novice, I use "try" a lot, especially for Eugene Stark's
reason-2: it teaches me what the various solvers are good at. I also happen
to like "nitpick," not so much because it finds errors in my theorems, but
because it finds errors in the way I've stated my theorems; they way we
write things in Isabelle is rather a lot like the way we write things in
mathematics, but not completely, and Nitpick often warns me that I've
tumbled into another of the rough spots.
Another note in praise of nitpick: in doing some work in synthetic affine
and projective geometry, I wanted to exhibit a four-point affine plane. To
do so required writing out the incidence relation (which points are on
which lines), which was a case-by-case nightmare to write and read. But
when instead I asked "try" to prove the theorem that every affine plane
contains at least 5 points, nitpick instantly generated the standard
4-point affine plane for me. That's kind of a backwards approach, but it
was, as I say, very useful.
From: Lawrence Paulson <lp15@cam.ac.uk>
This is amazing and shows that we are nearly able to do experimental mathematics.
In fact, I have used nitpick in some tricky combinatorial proofs to check corner cases. It was very effective.
Larry Paulson
From: Manuel Eberl <eberlm@in.tum.de>
For my Master's Thesis (compiling functional programs to density
functions) I effectively had to implement a fragment of typed Lambda
calculus and prove all kinds of rules about typing and substitution [1].
These rules are somewhat intricate, and I effectively proved them by
coming up with a plausible-looking conjecture and then tweaking it until
QuickCheck stopped complaining.
I don't think I've used Nitpick that much, perhaps because I have less
intuition about what kinds of things one can expect it to work on well.
With QuickCheck, I have a very clear idea of what kinds of propositions
it can and cannot do.
Speaking of "experimental mathematics" in general, I converted a big
unreadable SMT proof from the area of social choice theory (that was
found by other people) into a structured proof by experimentation in
Isabelle a few years ago [2]: I had a big list of facts that somehow
implied False together (and "by smt" could show it), but no intuitive
idea /why/ they implied False.
I then picked two or three of these facts and used "auto" to find out
what they boil down to together, deleted them from the list, and added
that consequence instead and checked whether the list was still
unsatisfiable.
Rinse and repeat until the list is a singleton and you have a structured
proof!
Manuel
[1]:
https://www.isa-afp.org/browser_info/current/AFP/Density_Compiler/PDF_Target_Semantics.html,
see e.g. "free_vars_cexpr_subst_aux"
[2]: https://link.springer.com/chapter/10.1007%2F978-3-030-29007-8_14
From: marco caminati via Cl-isabelle-users <cl-isabelle-users@lists.cam.ac.uk>
Also used Nitpick to compute and exhibit compositions of event structures, basically using it as a constraint solver.
Nice and faster than using other approaches, though some times you need to fiddle a bit with parameters.
Let's see who else :)
Il martedì 26 maggio 2020, 15:45:01 GMT+1, Lawrence Paulson <lp15@cam.ac.uk> ha scritto:
This is amazing and shows that we are nearly able to do experimental mathematics.
In fact, I have used nitpick in some tricky combinatorial proofs to check corner cases. It was very effective.
Larry Paulson
From: Jasmin Blanchette via Cl-isabelle-users <cl-isabelle-users@lists.cam.ac.uk>
Hi Peter,
For better usability, it could be combined with a sledgehammer
(allgoals) option, to prevent the currentsubgoal sledgehammer sorry
subgoal sledgehammer sorry
...
subgoal sledgehammer sorryboilerplate.
I have requested this feature already two years ago
https://lists.cam.ac.uk/mailman/htdig/cl-isabelle-users/2018-February/msg00043.html
I'm surprised I didn't answer, but as you can guess the patch wasn't ready for being pushed for a variety of reasons. With some work, it could have, though, but like Makarius, I had other reforms in mind that I wanted to do first, before complicating the code base. And, surely, there is no need to pollute the namespace with names like "sledgehammer_first" and "sledgehammer_all". In fact, believe this facility could have been built modularly on top of the existing code, without modifying Isabelle's code. (In ML, one can invoke Sledgehammer on a specific subgoal etc.)
but only got a solution proposal that worked on an unofficial branch of Isabelle.
I think it's a good idea indeed and that we have enough CPU to do it: Although the default Sledgehammer timeout is 30 seconds, most of the action happens in the first few seconds, and once some satisfactory proof is found for one of the goals, the CPU time could be allocated on the other goals instead of trying to improve on that proof (unless all goals have been proved and more time is available, at which point the old behavior is desirable).
Sledgehammer desperately needs an overhaul from the old model "one goal * 4 (or so) provers * 30 seconds". Doing it right involves going back to the drawing board and refactoring code, as opposed to just adding code. The Sledgehammer code base is already complicated enough from all kinds of additions (mostly by me) and cannot take more ad hoc extensions. Since we have introduced time slicing (in 2010 already), to get better performance we should mix and match the provers, e.g. maybe we should give 40 seconds to Vampire and only 20 seconds to SPASS, and so on. A refactoring that would achieve this could also achieve the multiple-goal objective (and it could be any set of goals, not just all or a single one).
The good news is that Martin Desharnais, who contributed thousands of virtually bug-free code to the (co)datatype package in 2014 during an internship at the TUM, has shown himself willing to take over Sledgehammer maintenance and development, at least for the next few years. So both the reform and your idea have been duly noted, and if Martin is so inclined and granted access to the Isabelle repository again, it might materialize -- hopefully before you've taken the matter into your own hands and written your own "sledgehammer_all" along the lines of what I wrote above. ;)
Jasmin
Last updated: Nov 21 2024 at 12:39 UTC