Stream: Archive Mirror: Isabelle Users Mailing List

Topic: [isabelle] Isabelle2013-RC1 Sledge output finicky


view this post on Zulip Email Gateway (Aug 19 2022 at 09:40):

From: Gottfried Barrow <gottfried.barrow@gmx.com>
The remote provers tend to only work when used in a Sledgehammer command
by themselves.

This command finishes after a few seconds with no results from either ATP.

sledgehammer[minimize=smart,preplay_timeout=3,timeout=30,verbose,max_relevant=smart,provers="
remote_vampire z3_tptp
"]

This command finishes after a few seconds with a proof.

sledgehammer[minimize=smart,preplay_timeout=3,timeout=30,verbose,max_relevant=smart,provers="
remote_vampire
"]

The ATP remote_satallax is predictable in the same exact way. By itself,
it returns a proof. With z3_tptp, it doesn't.

In general, the ATPs are only working if they're run by themselves.

They're still very useful.

Regards,
GB

view this post on Zulip Email Gateway (Aug 19 2022 at 09:41):

From: Makarius <makarius@sketis.net>
I can confirm that in both in Isabelle2012 and Isabelle2013-RC1 executing
many processes in parallel on Windows/Cygwin is very fragile. It seems to
have been always like that, but only with Isabelle2012 we could speak of
reasonably Windows support in the first place. Now that so many other
thinks are working better than before, such drop-outs become more
apparent.

Let's see if we can figure out a solution to the problem ...

Makarius

view this post on Zulip Email Gateway (Aug 19 2022 at 09:42):

From: Gottfried Barrow <gottfried.barrow@gmx.com>
On 1/25/2013 2:49 PM, Makarius wrote:

In general, the ATPs are only working if they're run by themselves.

I can confirm that in both in Isabelle2012 and Isabelle2013-RC1
executing many processes in parallel on Windows/Cygwin is very
fragile. It seems to have been always like that...

As to Isabelle2012, I assume you say that because you haven't been using
Cygwin as your primary platform, but you're underselling Isabelle2012,
as far as Sledgehammer goes.

I've got most of my proofs I have now from having used Isabelle2012 with
Sledgehammer, where I was usually running a minimum of 8 ATPs on the
first attempt.

And that's why I isolated it down to Vampire needing to run by itself
now in Isabelle2013-RC1. With Isabelle2012, about 5 ATPs would usually
return a similar proof, but on a particular theorem, which I was running
Sledgehammer on again, there was only one good proof of about 250ms,
which I already had, but which I wasn't getting again, and I didn't know
where it had come from. I finally ran Vampire by itself to get the proof
again.

...but only with Isabelle2012 we could speak of reasonably Windows
support in the first place. Now that so many other thinks are working
better than before, such drop-outs become more apparent.

I suppose Isabelle2012 was when Sledgehammer started being useful on
Cygwin for Sledgehammer, but with Isabelle2013, I see that the e prover
spawns about 15 processes, when it used to only start 1 or 2.

Complications are the price of progress.

Let's see if we can figure out a solution to the problem ...

Exactly. We're a team, and I'll do my share of button pushing, copying
and pasting, inserting commands, and such.

But as I so profoundly said in the past, there are no problems with
Isabelle, there are only workarounds.

For me, it would probably be running Linux under VirtualBox, which I
need to do anyway because I can only get certain local ATPs under Linux.
I'll do that after all the release candidates, unless I need to do it
sooner.

I already knew that Vampire was probably the best ATP at getting proofs
for what I'm doing, but it might be better than I thought, and I might
have been missing some Vampire proofs I should have been getting with
Isabelle2012, so I'll pay more attention to that now.

However, running multiple ATPs is useful for more than just finding a
good, fast proof. Two different ATPs may return a fast proof differing
in only one proof fact, but one proof fact may be more logically
appealing than the other, such as that one is a low level theorem, and
the other is at a much higher level.

Regards,
GB

view this post on Zulip Email Gateway (Aug 19 2022 at 09:42):

From: Makarius <makarius@sketis.net>
In the meantime David Matthews, who is the master behind Poly/ML, managed
to work around the problem with Cygwin and multithreaded execution of
external processes.

Once his change shows up officially on his polyml-code/fixes-5.5 branch, I
will repackage it for Isabelle and produce Isabelle2013-RC2.

In the history of Isabelle on Windows, this will be the first time where
all regular test sessions are running through, even HOL-Nitpick_Examples
with its many externaly Java processes.

Stay tuned and keep fingers crossed ...

Makarius

view this post on Zulip Email Gateway (Aug 19 2022 at 09:42):

From: Gottfried Barrow <gottfried.barrow@gmx.com>
Let the magic continue in convenient parallel processes. I have
considered many times how the low-level engine magicians don't get their
due credit.

I am tuned in to the Isabelle channel.

Regards,
GB

view this post on Zulip Email Gateway (Aug 19 2022 at 09:43):

From: Makarius <makarius@sketis.net>
It is more like the architects of Stonehenge -- nobody remembers them now,
and nobody has the slightest idea how they did it.

Makarius

view this post on Zulip Email Gateway (Aug 19 2022 at 10:00):

From: Gottfried Barrow <gottfried.barrow@gmx.com>
I finally got some good data.

I attach 3 screenshots for a total of 135+154+159 KB. Not bad for
capturing a lot of screen space.

Here's the story: I had two views open, a left view and a right view.

1) I did a CNTL-V to insert what I call the "sledge_long" command. (Why
use 1 when you can use 18, although there's a few I leave out.)

2) Sledgehammer finished after only a few seconds and didn't give me any
feedback. That's shown in the 1st screenshot.

3) I clicked into the other view and switched to the file with the
sledgehammer command. It automatically started running sledgehammer, but
quit after a few seconds, without giving any real feedback in the output
panel. I didn't capture that screenshot.

4) I clicked back into the left view and switched to a file which the
sTs.thy file imports. That caused the right view to start running
Sledgehammer again, but this time Sledgehammer worked right. The 2nd
screen capture shows the right view running Sledgehammer.

5) The right view finished running Sledgehammer after a couple of
minutes, I clicked back into the right view, and I scrolled down to the
bottom of the output panel. The 3rd screen shot shows the right view
finished, with the last of the Sledgehammer results. It finished
normally given the 90 second timeout.

Here is my cheap seats observation: With Sledgehammer and parallel
processing, nothing ever happens exactly the same twice. I hope that helps.

That parallel processing, it's tricky stuff, and I've now noticed that
one of those processes didn't get terminated, at the expense of 25% of
the CPU processing. Icon meters in the taskbar. Necessary, very necessary.

Just now getting back to jEdit, the Sledgehammer command was still
there. I clicked back and forth between the two windows, it kicked out
fast without proving anything, and after more clicking around, it
started running right and was visible in both views.

Regards,
GB
1_sledge quit after a few seconds.png
2_sledge running in right view.png
3_sledge finished in right view.png


Last updated: Apr 25 2024 at 08:20 UTC