[isabelle] "blast" eliminates unrelated flex-flex pairs i... · Archive Mirror: Isabelle Users Mailing List

Stream: Archive Mirror: Isabelle Users Mailing List

Topic: [isabelle] "blast" eliminates unrelated flex-flex pairs i...

Email Gateway (Aug 19 2022 at 12:56):

From: Lawrence Paulson <lp15@cam.ac.uk>
I’m not aware of a change, but it’s possible.

There is a trade-off here: on the one hand, these constraints can accumulate excessively, causing performance problems and making the subgoals unreadable. But on the other hand, eliminating them in the obvious way (using constant functions) destroys information.

If you get these constraints in a proof, even if it currently works, it might be a good idea to look for a different proof (by explicitly instantiating a variable that is currently left uninstantiated) to get rid of them yourself.

Larry Paulson

Email Gateway (Aug 19 2022 at 12:56):

From: Makarius <makarius@sketis.net>
It is definitely a good idea to avoid relying on such boundary situations,
like pending flex-flex pairs, back tracking after 'apply' etc.

Anyway, the reason why blast works slightly differently in Isabelle2013-2
(and the obsolete Isabelle2013-1) is this detail from the NEWS:

* SELECT_GOAL now retains the syntactic context of the overall goal
state (schematic variables etc.). Potential INCOMPATIBILITY in rare
situations.

The change to SELECT_GOAL (210bca64b894) ensures that the goal context is
preserved more rigorously, with add-on information like maxidxs, flex-flex
pairs etc. That was the very purpose of the change, in order to prevent
other oddities from the following mailing list threads about "SELECT_GOAL
and schematic variables":

https://lists.cam.ac.uk/mailman/htdig/cl-isabelle-users/2013-March/msg00118.html
http://www.mail-archive.com/isabelle-dev@mailbroy.informatik.tu-muenchen.de/msg04268.html

Note that in these threads there was never any talk about "bugs" nor
"fixes" -- the notoriously meaningless words. The NEWS entry correctly
anticipates some "potential INCOMPATIBILITY", although I did not know
about this particular case when writing that piece of text.

It is funny that blast smashes flex-flex pairs in the first place, and
thus affects the overall goal state, not just the subgoal where it is
applied.

That detail goes back to Isabelle/93a84eb6c522 from 02-Jan-1998, where
Larry writes in the log entry "Blast_tac now squashes flex-flex pairs
immediately", but it is unclear why. Maybe Larry remembers why he was
smashing flex-flex pairs on purpose 15 years ago.

Trying without, i.e. removing the following line
http://isabelle.in.tum.de/repos/isabelle/file/Isabelle2013-2/src/Provers/blast.ML#l1261
shows that there is no change in the success of the immediately reachable
universe of Isabelle + AFP examples.

Maybe there is a different impact on top-secret proofs that are locked-up
in some cellar like the Lost Ark in some Indiana Jones movie, but who
knows?

Makarius

Email Gateway (Aug 19 2022 at 12:56):

From: David Greenaway <david.greenaway@nicta.com.au>
Alas, I am not the author of these thousands-of-line proofs, but merely
the poor soul who is trying to get them working on Isabelle2013-2. The
original authors have long since moved on to greener pastures.

While I agree it would be ideal to refactor the proof, one tends to take
the path of least resistance when they have 400k lines of other proof
script they also have to get working on the next Isabelle version...

Cheers,
David

The information in this e-mail may be confidential and subject to legal professional privilege and/or copyright. National ICT Australia Limited accepts no liability for any damage caused by this email or its attachments.

Email Gateway (Aug 19 2022 at 12:56):

From: David Greenaway <david.greenaway@nicta.com.au>
On 10/12/13 05:36, Makarius wrote:

On Mon, 9 Dec 2013, Lawrence Paulson wrote:

If you get these constraints in a proof, even if it currently works, it
might be a good idea to look for a different proof (by explicitly
instantiating a variable that is currently left uninstantiated) to get
rid of them yourself.

It is definitely a good idea to avoid relying on such boundary
situations, like pending flex-flex pairs, back tracking after 'apply'
etc.

Anyway, the reason why blast works slightly differently in
Isabelle2013-2 (and the obsolete Isabelle2013-1) is this detail from
the NEWS:

* SELECT_GOAL now retains the syntactic context of the overall goal
state (schematic variables etc.). Potential INCOMPATIBILITY in rare
situations.

The change to SELECT_GOAL (210bca64b894) ensures that the goal context
is preserved more rigorously, with add-on information like maxidxs,
flex-flex pairs etc. That was the very purpose of the change, in
order to prevent other oddities from the following mailing list
threads about "SELECT_GOAL and schematic variables":

https://lists.cam.ac.uk/mailman/htdig/cl-isabelle-users/2013-March/msg00118.html
http://www.mail-archive.com/isabelle-dev@mailbroy.informatik.tu-muenchen.de/msg04268.html

Ah, that helps explains why this suddenly came up when I couldn't see
any relevant changes to "blast" itself in the Isabelle2013/Isabele2013-1
diff.

Note that in these threads there was never any talk about "bugs" nor
"fixes" -- the notoriously meaningless words. The NEWS entry correctly
anticipates some "potential INCOMPATIBILITY", although I did not know
about this particular case when writing that piece of text.

When I use the word "bug", I tend to mean a behaviour of a computer
program that is unexpected, unintended and undesirable. There is
certainly subjectivity in whether particular behaviours are unexpected,
unintended and/or undesirable, but that doesn't make the word "bug"
meaningless.

It is funny that blast smashes flex-flex pairs in the first place, and
thus affects the overall goal state, not just the subgoal where it is
applied.

Indeed it is funny. I would even go so far as to say it is unexpected,
unintended and undesirable.

Trying without, i.e. removing the following line
http://isabelle.in.tum.de/repos/isabelle/file/Isabelle2013-2/src/Provers/blast.ML#l1261
shows that there is no change in the success of the immediately reachable
universe of Isabelle + AFP examples.

Maybe there is a different impact on top-secret proofs that are
locked-up in some cellar like the Lost Ark in some Indiana Jones
movie, but who knows?

Thanks for the suggestion.

I will make the change and see how it affects all the top-secret proofs
available in the dark cellars around here and report back.

Cheers,
David

Email Gateway (Aug 19 2022 at 12:57):

From: David Greenaway <david.greenaway@nicta.com.au>
Making this change to blast fixed the proofs that were broken by the
change in behaviour in Isabelle2013-1, and doesn't cause any new proof
breakage.

Thanks for the help!

Cheers,
David

Email Gateway (Aug 19 2022 at 12:57):

From: Makarius <makarius@sketis.net>
conclusion yet. There is merely an indication that it is probably better
not to smash flex-flex here, but some uncertainty remains.

Larry might have had very good reasons to smash the flex-flex pairs in
1998. These reasons might no longer apply, or the experiments so far did
not get to the point where the difference is visible.

One needs to understand that in the old times people had much more time
thinking about problems and working on solutions. For example, the ML
compiler required many hours to work on the (then quite small) examples,
giving time for the person behind it to reconsider the present idea -- and
there was not the huge distraction of online noise of today.

Makarius

Email Gateway (Aug 19 2022 at 12:57):

From: Makarius <makarius@sketis.net>
That thinking is now very common, but it is counter-productive far serious
software development, especially due all the extra complexity that has
been piled up in the past decades and people no longer knowing how it was
all done. We are running into a major problem here -- the next big
software crisis, which will be also an open-source software crisis in
particular.

Some years ago, I've ventured to extend our little island of happiness
(based on Isabelle/ML) into the "real world", using Isabelle/Scala and
Isabelle/jEdit etc. Moreover there was the ambition to make it work on
all major platforms. It was both surprising and depressing how low the
quality of these things from the "mainstream" actually are, and all these
people using terminology like "features", "bugs", "fixes", "unfortunate"
without much reflection.

Anyway, this is getting a bit far off this thread ...

Makarius

Email Gateway (Aug 19 2022 at 12:57):

From: Tobias Nipkow <nipkow@in.tum.de>
Dear Makarius,

Your constant language policing makes this mailing list extremely noisy and
unpleasant. You should leave the terminology with which people report their
findings to them. To suggest that the quality of some software is influenced by
the language in which users write about it is bizzare.

Tobias

Email Gateway (Aug 19 2022 at 12:57):

From: Makarius <makarius@sketis.net>
I did not mean to continue this thread at that point, but I do confirm
again what I've tried to communicate there in plain and simple words, in
the hope that at least a few people will understand.

Makarius

Email Gateway (Aug 19 2022 at 12:57):

From: "Yannick Duchêne (Hibou57 )" <yannick_duchene@yahoo.fr>
Not in vain.

Email Gateway (Aug 19 2022 at 12:57):

From: Makarius <makarius@sketis.net>
Larry, can you give further hints about that?

Alternatively, someone else can volunteer to make further experiments on
all reachable Isabelle applications, to consolidate the empirical
findings. This will only take a few hours.

In particular the following questions are relevant:

* Do flex-flex pairs after invocation of blast actually happen in
practice, i.e. does blast proof reconstruction introduce any by
itself?

* Is there any performance impact on changing the behaviour? So far
the success or failure was not affected, but there were no
measurements of timing so far.

Makarius

Email Gateway (Aug 19 2022 at 12:58):

From: Lawrence Paulson <lp15@cam.ac.uk>
I’m puzzled about this. I thought the issue was that flex-flex constraints were suddenly being smashed now, when they were not before. And I would certainly not have wanted every blast call to smash all flex-flex constraints, because this can lose completeness.

Note that flex-flex pairs attach to theorems, and therefore to proof states as a whole, and not to individual subgoals.

Larry

Email Gateway (Aug 19 2022 at 12:58):

From: Lawrence Paulson <lp15@cam.ac.uk>
You shouldn’t have to redo the proofs. If you are lucky, doing the proof in Isabelle2013 may give you a clue as to how the variables are being instantiated, so that you can add a constraint somewhere. And flex-flex constraints are very rare.

Do you really have 400k lines of Isabelle proof scripts? All of the Isabelle/HOL theories (including examples and libraries) are barely more than that.

Larry Paulson

Email Gateway (Aug 19 2022 at 12:58):

From: David Greenaway <david.greenaway@nicta.com.au>
On 11/12/13 08:42, Lawrence Paulson wrote:

You shouldn’t have to redo the proofs. If you are lucky, doing the
proof in Isabelle2013 may give you a clue as to how the variables are
being instantiated, so that you can add a constraint somewhere. And
flex-flex constraints are very rare.

Again, these tend to pop up in very large proofs which I have very
little understanding or context of. Add in a few moving parts, such as
Schirmer's SIMPL VCG [1], our own tactics (for example, bundled with
[2]), and ugly proofs done on a deadline, it all becomes a bit much.

Tracking down the original problem involved carefully comparing
line-by-line the proof output between Isabelle2013 and Isabelle2013-1
and seeing where they diverged.

I am certainly not advocating this proof style, merely trying to cope
with it.

Do you really have 400k lines of Isabelle proof scripts? All of the
Isabelle/HOL theories (including examples and libraries) are barely
more than that.

~/l4.verified # rm -rf isabelle
~/l4.verified # wc -l */.thy | tail -1
536136 total

Cheers,
David

[1]: http://afp.sourceforge.net/browser_info/current/HOL/Simpl/Vcg.html
[2]: http://ssrg.nicta.com.au/projects/TS/autocorres/

Email Gateway (Aug 19 2022 at 12:58):

From: David Greenaway <david.greenaway@nicta.com.au>
As a single data point, I am not seeing any performance change on my
proof scripts. (A slight performance improvement after the change, but
well within the margin of noise.)

Cheers,
David

Email Gateway (Aug 19 2022 at 12:58):

From: Lawrence Paulson <lp15@cam.ac.uk>
This is a very unfortunate situation to be in. But I guess it’s no different from other kinds of legacy nightmare code.

You’ll probably have to bite the bullet and redo some of these proofs. I assume they are very lengthy apply-chains rather than structured proofs. The irony is that it might have been easier to write structured proofs in the first place.

Larry Paulson

Email Gateway (Aug 19 2022 at 12:58):

From: Makarius <makarius@sketis.net>
no longer conforms to what you would do today. Although it is hard to
tell precisely, since flex-flex pair are extremely rare, and defining an
intended meaning operates almost at 0 entropy.

Anyway, after heating my office with more CPU power (running Isabelle +
AFP + IsaFoR a few times), I have the impression that there is nothing to
see on this thread, and we can just move on. This means we can probably
remove the extra clean-up step from blast in coming Isabelle versions.

Further explanations when the current build has terminated ...

Makarius

Email Gateway (Aug 19 2022 at 12:58):

From: Makarius <makarius@sketis.net>
Here is some raw information about afp-2013-2:

$ wc -c $(find afp-2013-2 -name "*.thy") | tail -1
39685684 total

$ wc -l $(find afp-2013-2 -name "*.thy") | tail -1
885191 total

So the public material seems to outweigh the top-secret stuff.

Makarius

Email Gateway (Aug 19 2022 at 12:58):

From: Lars Noschinski <noschinl@in.tum.de>
I guess the rarity of flex-flex pairs depends on the kind of proofs you
do. I recently wrote a VCG for program refinement, consisting mainly of
a large set of introduction rules. I remember seeing flex-flex pairs
quite often for the intermediate results (although the usually would
disappear when the VCG successfully completed).

-- Lars
signature.asc

Email Gateway (Aug 19 2022 at 12:59):

From: Gerwin Klein <Gerwin.Klein@nicta.com.au>
Yes, that’s very similar to the kinds of proofs we have.

flex-flex pairs are still rare in our proofs, but they do pop up occasionally.

Cheers,
Gerwin

Email Gateway (Aug 19 2022 at 12:59):

From: Makarius <makarius@sketis.net>
There were various delays, failures, non-termination, but mostly due to
other reasons. We have again a situation where AFP feels very "heavy"
(even on 8 cores), and the many repository clones and branches sometimes
cause confusion what needs to be tested.

Anyway, according to the current situation in Isabelle2013-2, there are
merely these few spots where flexflex-pairs occur and blast smashes them:

(line 99 of "~~/src/HOL/Cardinals/Wellorder_Embedding.thy")
(line 406 of "~~/src/HOL/Library/Ramsey.thy")
(line 408 of "~~/src/HOL/Library/Ramsey.thy")
(line 318 of "$AFP/thys/Markov_Models/ex/PCTL.thy")
(line 376 of "$AFP/thys/Stuttering_Equivalence/PLTL.thy")
(line 534 of "$AFP/thys/Stuttering_Equivalence/PLTL.thy")

All of these are harmless. Doing nothing about flexflex pairs within
blast means they are smashed a bit later at the end of the proof, as
usual.

I've recently seen other proof tools like "fast" introducing left-over
flexflex pairs, but the only problem was some divergence in the enclosing
goal infrastructure, where they were treated non-uniformly by accident.

So it looks like the global flexflex context of a goal state (the
"tpairs") is best left alone by regular proof tools. They just accumulate
monotonically like maxidx, and final results are somehow standardized by
the system infrastructure.

Thus tools that do require pending fleflex constraints to work, will have
better chances.

We should try that, but of course observations might change again in the
coming months before the next release.

Makarius

Email Gateway (Aug 19 2022 at 13:00):

From: Lawrence Paulson <lp15@cam.ac.uk>
I need to stress a basic fact: Flex-Flex constraints invariably involve function variables. This means that the proof involved an inference rule containing a function variable, such as imageI. It is worth attempting to identify which rule is to blame (the variable name may be a clue) and to supply a suitably instantiated instance of the rule to blast.

Larry

Email Gateway (Aug 19 2022 at 13:00):

From: Makarius <makarius@sketis.net>
Just for the record: there is something wrong with this experiment on
AFP Collections and/or JinjaThreads, but I don't know yet what really
happens. Total existence failure of the test environment ...

We should close this thread on isabelle-users for now, since David has his
workaround already to get L4.verified running.

Further software archeology on isabelle-dev at some later point -- I am on
travel next week.

Makarius

Email Gateway (Aug 19 2022 at 13:16):

From: David Greenaway <David.Greenaway@nicta.com.au>
Hi all,

While updating some large proofs to Isabelle2013-1, I discovered
a change in the behaviour of "blast" in the presence of flex-flex pairs.
In particular, in Isabelle2013-1, "apply blast" appears to eliminate
flex-flex pairs, even if the particular goal blast is working on doesn't
actually involve any schematics. This can make the larger theorem being
worked on unprovable.

For example, consider the following snippet:

(* Silly lemma to generate a flex-flex pair. *)
schematic_lemma "!!x y a b. (1 = 1) & f ((x = y) = (a = b)) = ?P x y a b"
apply (subst (1 3) eq_commute)
apply (rule conjI)
(* Two subgoals exist; only the second involves schematics. *)
apply blast

In Isabelle2013 the "apply blast" would leave the flex-flex constraints
unchanged. In Isabelle2013-1 and Isabelle2013-2, the constraint
disappears. In more complex proofs, this can make finishing the proof
impossible.

Is this behaviour of Isabelle2013-1/2 expected, or is it a regression?

(I don't have a good grasp on the finer points of higher order
unification and flex-flex constraints, so please forgive me if I am
using incorrect terminology.)

Cheers,
David

Last updated: Jul 03 2025 at 01:50 UTC