Stream: Mirror: Isabelle Users Mailing List

Topic: [isabelle] Unfold short tactics in Isar. (blast, etc)


view this post on Zulip Email Gateway (Aug 05 2020 at 03:51):

From: Georgy Dunaev <georgedunaev@gmail.com>
How to deal with short proofs like "by blast"?
They leave gaps in the knowledge of reader, force one to re-prove theorems.

I mean the best option is to obtain some information about proof. (Not just
"Yes, proved" or "No")

I understand that sometimes this proofs are results of oracles, but the aim
is to obtain more verbose proof.

a) Did someone considered unfolding such short proof to something in
Isar-style? ( from ... have ... qed.)

b) The other option is to obtain a term from proved theorem. How to do this?

Let's use this example from Isabelle/ZF (ZF-ex).

"
text‹The singleton problems are much harder in HOL.›

lemma singleton_example_1: "∀x ∈ S. ∀y ∈ S. x ⊆ y ⟹ ∃z. S ⊆ {z}"
by blast
"

Sincerely Yours,
G. D.

view this post on Zulip Email Gateway (Aug 05 2020 at 10:10):

From: Lawrence Paulson <lp15@cam.ac.uk>
Dear Georgy

Just to clarify one point: blast doesn’t use oracles and overall the significance of oracles in Isabelle proofs is small. I understand that for a beginner it could be interesting and instructive to see a blast proof broken into smaller parts, but (unfortunately) it’s rare for Isabelle’s automation to prove anything that isn’t obvious. Our problem really is that many obvious facts can only be proved with elaborate formal arguments that take much time and labour to construct.

To take your example: if ∀x ∈ S. ∀y ∈ S. x ⊆ y then clearly all elements of S must equal one another, from which the conclusion ∃z. S ⊆ {z} obviously follows. I agree it is impressive that blast can prove this in a single step.

Larry Paulson

view this post on Zulip Email Gateway (Aug 05 2020 at 10:37):

From: Manuel Eberl <eberlm@in.tum.de>
To add to what Larry said: unlike systems like Coq or Lean, Isabelle
does not allow you to look at ‘the proof’ that was constructed by
default. The reason for that is that Isabelle's design differs from
those systems a lot. In Coq, proof terms fall out of the system
basically for free anyway. In Isabelle, you can get something like this,
but you have to do extra implementation work and invest extra CPU time
and memory to get it. So usually, it's not done at all.

There /is/ an Isabelle feature called ‘proof terms’ that you can switch
on and then you can look at the ‘proof’ (in some sense) that was found,
even if it was constructed by blast or auto. But this proof will usually
be big, cluttered with ‘unnecessary’ steps, and completely unreadable
for a human, much like e.g. SMT proofs that you can get from Z3. Because
proof terms are more of an afterthought in Isabelle, none of the
automation even tries to produce small and beautiful proof terms.

However, generally, Isabelle's automation does not perform magic. It
really is surprisingly simple. It is a combination of a good library of
rewrite rules, simple general-purpose automation for classical
reasoning, some specialised decision procedures, and some specialised
rewriting procedures. Blast, for instance, is ‘just’ a Tableaux solver,
if I recall correctly.

The situation is a bit different with provers like "smt" and "metis".
Sledgehammer calls occasionally produce very puzzling one-line proofs
that I find very hard to retrace by hand. Sledgehammer does have an
"Isar proof generation" feature that tries to produce more structured
proofs, but I never really got it to produce anything illuminating and
often it fails entirely. I have no idea how it works either.
Sledgehammer, all in all, indeed feels a bit magical – unlike the more
pedestrian methods like blast and auto.

(@everone else: Please do correct me if I am wrong; I am not really an
expert on this automation. Some of it is probably almost as old as I am
and my understanding of it is sometimes rather vague.)

Cheers,

Manuel

view this post on Zulip Email Gateway (Aug 05 2020 at 10:54):

From: Georgy Dunaev <georgedunaev@gmail.com>
Thanks for the feedback!

Indeed, it is more psychological/ philosophical effect.
I think, a fact is understood completely when it's proven.
(Maybe it's just an irrational fear of loosing control over proofs and a
fear of degrading right into a mindless trying of different tactics.)

I didn't mean that that obvious proof should be separated into parts, I was
just curious about technical possibilities of reviving proofs. But, I
agree, the possible drawback is that proof may not be elegant or even not
human-readable, so it's complicated issue.

Kind regards,
G. D.

view this post on Zulip Email Gateway (Aug 05 2020 at 10:54):

From: Lawrence Paulson <lp15@cam.ac.uk>
Also important to note: back around 1977, Robin Milner introduced the LCF architecture (with an abstract type of theorems defined within a strongly typed functional programming language, namely ML) with the specific aim of eliminating the need to store proofs. He had observed a simple fact: proofs take up too much memory. Since then, you may retort, memory capacity has increased by three or four orders of magnitude. But so have our proofs.

Larry

view this post on Zulip Email Gateway (Aug 05 2020 at 11:30):

From: Manuel Eberl <eberlm@in.tum.de>
Indeed they have!

I wonder if the memory requirements of storing proof terms for
everything is a problem for systems that do this. I vaguely recall Leah
Neukirchen giving a talk about proofs by reflection in Coq and listed
"saves memory because there are no proof terms associated with the
theorem" as an advantage – so that seems to indicate that proof term
size can indeed cause performance problems in Coq.

However, I think the performance situation in these systems is not quite
as bad as it is in Isabelle (with proof terms enabled) because they have
definitional equality and implicit normalisation, which Isabelle does
not. In Isabelle, every rewrite step must be recorded explicitly.

In any case, I have it on good authority that some people are planning
to improve proof terms in Isabelle, so let's see what comes out of that.
Mario Carneiro seemed quite interested in them for the purpose of
importing Isabelle proofs into other systems.

Still, considering the fact that whenever our tools become more
capable/performant, proof applications seem to scale up almost
immediately until they are again at the very limit of what the system
can do, it is probably a good thing that we can just "switch off" proof
terms if we have to.

Manuel

view this post on Zulip Email Gateway (Aug 05 2020 at 11:59):

From: Mario Carneiro <di.gama@gmail.com>
Mario Carneiro is watching this thread with great interest. :) I think that
the fact that proof terms in isabelle are large is very correlated with the
fact that they are not stored. (A good comparison is the observation that
software is still about as fast from the user perspective as it was 20
years ago, even though the processor and memory capacity has increased by
orders of magnitude over the period. We've just managed to fill the
available capacity with more work and less efficient software, with the
reasoning that a human can't tell the difference between 10 milliseconds
and 3 nanoseconds.) I personally have no doubts that it is possible to
improve on the current situation by several orders of magnitude, although I
don't know how feasible they are given the current architecture and social
context. The general attitude that proof terms "don't matter" is terribly
destructive to attempts to work on improving their efficiency, though. They
enable all manner of proof export, cross linking of libraries, and external
checking. I don't plan to settle for less than the entire standard
library + AFP, but as I understand it this is completely out of the range
of feasibility of the current proof term mode, because too much processing
is spent putting things into lambda calculus rather than just logging
kernel calls.

To give some sense of what I think is possible, I have managed to enable
proof export for the cadical SAT solver such that the runtime is not much
affected (+0-10%), and the resulting proof checking time is also
negligible. Disk space can be a problem for large automated proofs,
assuming you aren't doing any proof processing, but it is easier to make a
tradeoff of disk space for more implicit data starting from a fully
elaborated proof baseline, plus disk space is relatively cheap so I'm not
overly concerned about this. Of course it would be better if the proofs
were actually optimized, but I'll take what I can get.

Mario

view this post on Zulip Email Gateway (Aug 05 2020 at 12:16):

From: Lawrence Paulson <lp15@cam.ac.uk>
This observation may be true for Microsoft Word, but it is certainly not true for Isabelle. I don’t have the precise numbers at my fingertips but I recall for example that the MicroJava session took about 45 minutes to run in the late 1990s. And here it is today:

Finished HOL-MicroJava (0:01:10 elapsed time, 0:05:44 cpu time, factor 4.90)

That is an improvement of nearly 2 orders of magnitude, partly achieved through parallelism (by a factor of almost 5).

Larry

view this post on Zulip Email Gateway (Aug 05 2020 at 12:26):

From: Mario Carneiro <di.gama@gmail.com>
If you look at the same program over time, it will appear to get faster
because the hardware is changing and the software isn't. Similarly here, if
you look at the MicroJava session over time it has improved. But I would
bet there are other sessions that exist now that also take 45 minutes,
perhaps even more. They just use different proof methods and prove
different things. In one sense this is certainly a gain - we can now use
methods that would not be practical in the past, and tackle much bigger
problems now than before. But it is this exact fact that means that
computer systems will always live at human timescales, because we don't
optimize past the point that we can notice the difference, and we don't
attempt to run programs that take too long (depending on the context that
upper bound might be 1 second or 30 seconds for an editor action, several
minutes for a theory compiled on the spot, and hours to weeks for a full
development).

view this post on Zulip Email Gateway (Aug 05 2020 at 14:30):

From: Lawrence Paulson <lp15@cam.ac.uk>
It is a common view that improvements in software performance have come for free from hardware improvements alone. But in the case of Isabelle a simple calculation shows that this is no explanation. A processor in the late 1990s might have a clock speed of 400 MHz. Simply multiplying by today’s speedup factor of 40 would suggest that today we have a 16 GHz processor, so obviously something else is going on. In the case of Isabelle it’s the huge investment in parallelism.

I was curious to see how many really big problems there are in the AFP. It turns out that there are only six (out of 547) in the “slow” category. The slowest of these (Iptables_Semantics) has a time limit of 17 hours. It’s a far cry from the era in which running jobs overnight was a regular necessity.

Given the major role of proof objects in Coq, we can be sure that they are optimised to the maximum extent. Yet it’s clear that Coq users try to avoid creating proof objects altogether. Formalisations minimise the role of proof objects by working as much as possible with primitive recursive functions (which are evaluated within the kernel and don’t create proof objects).

Larry Paulson

view this post on Zulip Email Gateway (Aug 05 2020 at 15:15):

From: Manuel Eberl <eberlm@in.tum.de>
It should be mentioned that the slow iptables session, as far as I
understand, is an application of generated code (a piece of software
that was verified in Isabelle/HOL) to a really big example application,
and that is the only thing that is slow about it. Not the actual proofs.

For Flyspeck, it is somewhat similar. I think this is some kind of
exhaustive search procedure that is proven correct and then the
generated code is executed, and that is what is so slow about it.

Neither of these actually involve Isabelle itself in the slow bits.

I don't know about the other ones, but I think those might actually be
actual Isabelle proofs that are a bit slow.

Of the four others, two take about 30 minutes and two take about 60
minutes, with 8 threads each. That's not so bad. Bicategories and
JinjaThreads are probably slow because they are just huge (in terms of
lines of code). With ConcurrentGC and AODV I don't really know why they
are so slow. Perhaps they use an exorbitant amount of automated case
distinctions?

Manuel

view this post on Zulip Email Gateway (Aug 23 2020 at 18:40):

From: Wolfgang Jeltsch <wolfgang-it@jeltsch.info>
Just to be sure: Something like a proof object is nevertheless created
when a proof method is invoked, right? My understanding was that
following the LCF approach means that Isabelle will never trust methods
like blast to produce correct results but always require them to
provide proof objects that are then checked by a minimal trusted kernel
for correctness. Is this the case, or did I misunderstand something?

All the best,
Wolfgang

view this post on Zulip Email Gateway (Aug 24 2020 at 08:39):

From: Lawrence Paulson <lp15@cam.ac.uk>
No proof objects are created. Soundness is enforced through the abstract type discipline.

Here is an analogy: imagine a programming project that requires a dictionary datatype. Somebody implements the dictionary using red-black trees, hash tables or whatever, and provides an interface with abstract operations such as the empty dictionary, insert, lookup, update, delete. Those operations are the only means available to work with dictionaries outside the data structure definition itself, and in that way the integrity of the internal representation is protected. Provided that the code implementing the representation is correct, and assuming of course a type-safe programming language, all dictionary operations are guaranteed to deliver the correct results regardless of how many errors there may be in other parts of the program.

Maintaining a separate audit trail for each dictionary would provide a means of checking the dictionary operations at runtime (in particular to verify that lookup delivers correct results), but it's unnecessary and would cause the program to quickly run out of memory.

Proof objects are precisely such audit trails. That they are unnecessary (and indeed unaffordable) was Robin Milner's insight, around 1975. It is noteworthy that in proof assistants such as Coq that represent propositions by types, great efforts are made to use Bool or Prop rather than Type in order to keep proof objects to a minimum.

Larry

view this post on Zulip Email Gateway (Aug 24 2020 at 12:57):

From: Wolfgang Jeltsch <wolfgang-it@jeltsch.info>
Thanks a lot for your explanation.

I don’t want to claim that I fully understand Isabelle’s technique
already, but I hope I got the essence of it.

My understanding is as follows: You could let proof methods build proof
objects whose correctness you check afterwards. Proof objects would be
of some algebraic data type, and you would use the data constructors of
this type to build proof objects. To get to the Isabelle style, you
essentially replace these data constructors by operations of an abstract
data type whose values correspond to proofs. The properties of the
abstract data type ensure that it is impossible for you to construct
incorrect proofs. Is that roughly how it works?

In any case, it’s reassuring that in Isabelle there is indeed only this
small kernel you have to trust.

All the best,
Wolfgang

view this post on Zulip Email Gateway (Aug 24 2020 at 13:04):

From: Freek Wiedijk <freek@cs.ru.nl>
Dear all,

Larry wrote:

No proof objects are created. Soundness is enforced
through the abstract type discipline.

Henk Barendregt calls what is implemented in the HOL family
of systems _ephemeral_ proof objects. His image is of
someone writing a proof with a stick in the sand at the sea,
and the sea washing away the writing all the time. At the
end the full proof has been written but nothing remains.

Freek

view this post on Zulip Email Gateway (Aug 24 2020 at 13:08):

From: Wolfgang Jeltsch <wolfgang-it@jeltsch.info>
Interesting analogy!

From what I understand, the invocations of the functions you use for
constructing a new fact collectively form a proof of this fact, but no
actual data resembling the proof is ever constructed by these functions.

All the best,
Wolfgang

view this post on Zulip Email Gateway (Aug 24 2020 at 13:20):

From: Freek Wiedijk <freek@cs.ru.nl>
Dear Wolfgang,

In most systems (at least in Isabelle and HOL Light) there
is a switch that you can turn on/off to allow you to store
the proof object in memory, but in normal operation this
switch is always off.

Freek

view this post on Zulip Email Gateway (Aug 24 2020 at 13:23):

From: Lawrence Paulson <lp15@cam.ac.uk>

On 24 Aug 2020, at 14:03, Freek Wiedijk <freek@cs.ru.nl> wrote:

Henk Barendregt calls what is implemented in the HOL family
of systems _ephemeral_ proof objects. His image is of
someone writing a proof with a stick in the sand at the sea,
and the sea washing away the writing all the time. At the
end the full proof has been written but nothing remains.

That image is evocative but doesn’t indicate how you prevent the “someone” from writing a faulty proof. Here is Milner himself:

we’re not so concerned with checking or generating proofs as with performing proofs. Thus, we don’t normally store or display proofs but only the results of them - i.e. theorems. These form an abstract type on which the only allowed operations are the inference rules …; this ensures that a well-typed program cannot perform faulty proofs (it may not prove the theorem expected but the result will be a theorem!). If extra security or formal proof-checking is desired, full proofs are easily generated -- only minor changes in the implementation of the abstract type for theorems would be required. The principal aims then in designing ML were to make it impossible to prove non-theorems yet easy to program strategies for performing proofs.

Gordon, Milner, Morris, Newey, and Wadsworth
A Metalanguage for interactive proof in LCF
POPL '78: Proceedings of the 5th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
January 1978 Pages 119–130
https://doi.org/10.1145/512760.512773
http://www-public.imtbs-tsp.eu/~gibson/Teaching/CSC4504/ReadingMaterial/GordonMMNW78.pdf


Last updated: Jul 15 2022 at 23:21 UTC