Stream: Mirror: Isabelle Development Mailing List

Topic: [isabelle-dev] AFP: "enable proof in session Lorenz_C1"^


view this post on Zulip Email Gateway (Feb 29 2024 at 09:39):

From: Makarius <makarius@sketis.net>
An odd (unexplained) change has occurred on AFP:

changeset: 14197:ddf90847bfa5
user: immler
date: Tue Feb 27 15:10:13 2024 +0100
files: thys/Ordinary_Differential_Equations/Ex/Lorenz/C1/Lorenz_C1.thy
thys/Ordinary_Differential_Equations/ROOT
description:
enable proof in session Lorenz_C1

Looking briefly into the history reveals this evidence:

changeset: 8561:13b3e24a71b0
user: wenzelm
date: Mon Nov 20 07:50:03 2017 +0100
files: thys/Ordinary_Differential_Equations/Ex/Lorenz/Lorenz_C1.thy
thys/Ordinary_Differential_Equations/ROOT
description:
adapted to timing on lxcisa0 with threads=8: (8:08:58 elapsed time, 47:18:51
cpu time, factor 5.81);

Although lxcisa0 has grown old, it is still representative for the performance
baseline of testing.

Yesterday, I've started the session on more current hardware to see how it
works now, but that is still running. Maybe something else is wrong on that
machine.

Further note that the standard nightly build (isabelle_cronjob) is affected:
it will probably terminate within 1-3 more days. See also lrzcloud2 on
https://isatest.sketis.net/devel/cronjob-main.log

So what is the reasoning behind the change ddf90847bfa5? The log does not say
anything tangible.

Of course it is better to check things formally, instead of having them
commented-out. So far, I thought that Lorenz_C0 was there to be tested
properly, and Lorenz_C1 an untested add-on example that is unlikely to fail on
its own account.

If huge and heavy things like Lorenz_C1 should get a more formal status in
AFP, we probably need a session group "very_very_slow" or alternatively
"untested" to say that it is not tested regularly.

Makarius


isabelle-dev mailing list
isabelle-dev@in.tum.de
https://mailman46.in.tum.de/mailman/listinfo/isabelle-dev

view this post on Zulip Email Gateway (Feb 29 2024 at 10:04):

From: Makarius <makarius@sketis.net>
I've sent this with CC to immler.emails.immler_email according to
AFP/38e4f6bce76e metadata, but that address did not work.

Makarius


isabelle-dev mailing list
isabelle-dev@in.tum.de
https://mailman46.in.tum.de/mailman/listinfo/isabelle-dev

view this post on Zulip Email Gateway (Feb 29 2024 at 10:25):

From: Makarius <makarius@sketis.net>
It has now terminated as follows (on of1.proof.cit.tum.de):

Finished Lorenz_C1 (11:29:26 elapsed time, 67:46:46 cpu time, factor 5.90)

That is something to be investigated. Either there is something odd in the
machine configuration, or this bulky test now takes much longer than in 2017.

Makarius


isabelle-dev mailing list
isabelle-dev@in.tum.de
https://mailman46.in.tum.de/mailman/listinfo/isabelle-dev

view this post on Zulip Email Gateway (Feb 29 2024 at 10:29):

From: Tobias Nipkow <nipkow@in.tum.de>
I have forwarded your email to a working email address.

Tobias

smime.p7s

view this post on Zulip Email Gateway (Feb 29 2024 at 10:34):

From: Fabian Huch <huch@in.tum.de>
On 2/29/24 10:38, Makarius wrote:

An odd (unexplained) change has occurred on AFP:

changeset:   14197:ddf90847bfa5
user:        immler
date:        Tue Feb 27 15:10:13 2024 +0100
files: thys/Ordinary_Differential_Equations/Ex/Lorenz/C1/Lorenz_C1.thy
thys/Ordinary_Differential_Equations/ROOT
description:
enable proof in session Lorenz_C1

I heard from Fabian that nowadays the session runs reasonably fast (6
hrs -- Flyspec-Tame-computation is nearly 5 hrs). But this does not seem
to fit with your results.

Looking briefly into the history reveals this evidence:

changeset:   8561:13b3e24a71b0
user:        wenzelm
date:        Mon Nov 20 07:50:03 2017 +0100
files: thys/Ordinary_Differential_Equations/Ex/Lorenz/Lorenz_C1.thy
thys/Ordinary_Differential_Equations/ROOT
description:
adapted to timing on lxcisa0 with threads=8: (8:08:58 elapsed time,
47:18:51 cpu time, factor 5.81);

Although lxcisa0 has grown old, it is still representative for the
performance baseline of testing.

Yesterday, I've started the session on more current hardware to see
how it works now, but that is still running. Maybe something else is
wrong on that machine.

Further note that the standard nightly build (isabelle_cronjob) is
affected: it will probably terminate within 1-3 more days. See also
lrzcloud2 on https://isatest.sketis.net/devel/cronjob-main.log

If it should be nightly we really need high-end hardware here (not 1/4
slice of a Xeon 4316) -- we won't have free access to these machines for
much longer anyway.

So what is the reasoning behind the change ddf90847bfa5? The log does
not say anything tangible.

Of course it is better to check things formally, instead of having
them commented-out. So far, I thought that Lorenz_C0 was there to be
tested properly, and Lorenz_C1 an untested add-on example that is
unlikely to fail on its own account.

If huge and heavy things like Lorenz_C1 should get a more formal
status in AFP, we probably need a session group "very_very_slow" or
alternatively "untested" to say that it is not tested regularly.

Im am in favor of having an "untested" groups vs. commenting out proofs.

Fabian


isabelle-dev mailing list
isabelle-dev@in.tum.de
https://mailman46.in.tum.de/mailman/listinfo/isabelle-dev

view this post on Zulip Email Gateway (Feb 29 2024 at 10:56):

From: Makarius <makarius@sketis.net>
On 29/02/2024 11:33, Fabian Huch wrote:

On 2/29/24 10:38, Makarius wrote:

An odd (unexplained) change has occurred on AFP:

changeset:   14197:ddf90847bfa5
user:        immler
date:        Tue Feb 27 15:10:13 2024 +0100
files: thys/Ordinary_Differential_Equations/Ex/Lorenz/C1/Lorenz_C1.thy
thys/Ordinary_Differential_Equations/ROOT
description:
enable proof in session Lorenz_C1

I heard from Fabian that nowadays the session runs reasonably fast (6 hrs --
Flyspec-Tame-computation is nearly 5 hrs). But this does not seem to fit with
your results.

On which hardware precisely?

This question also touches Intel vs. ARM: the native ARM Poly/ML is incredibly
fast in most applications, but can be slower in some of these computations. I
did not investigate this systematically so far.

Presently we are testing Intel only.

Of course it is better to check things formally, instead of having them
commented-out. So far, I thought that Lorenz_C0 was there to be tested
properly, and Lorenz_C1 an untested add-on example that is unlikely to fail
on its own account.

If huge and heavy things like Lorenz_C1 should get a more formal status in
AFP, we probably need a session group "very_very_slow" or alternatively
"untested" to say that it is not tested regularly.

Concerning the unequal, but related brothers Lorenz_C0 and Lorenz_C1, I have
now collected all available timing information since 2017. The command-line
for that is "isabelle build_status -S Lorenz_C0 -l 10000", but it requires
access to a non-public PostgreSQL server. The result is in the attached .csv:
nothing special to be seen here, looks fine.

Makarius

Lorenz_C0.csv

view this post on Zulip Email Gateway (Feb 29 2024 at 12:04):

From: Makarius <makarius@sketis.net>
I now recall, that "slow" and "very_slow" tests are traditionally understood
in the context of ML_system_64 = true, and that the Lorenz computations were
slow in the default 64_32 model.

So everything looks like it has always been since 2017.

Makarius


isabelle-dev mailing list
isabelle-dev@in.tum.de
https://mailman46.in.tum.de/mailman/listinfo/isabelle-dev

view this post on Zulip Email Gateway (Feb 29 2024 at 15:24):

From: Fabian Immler <fabian.immler@gmail.com>
On Thu, Feb 29, 2024 at 10:39 Makarius <makarius@sketis.net> wrote:

So what is the reasoning behind the change ddf90847bfa5? The log does not
say
anything tangible.

You are right, unfortunately that ended up as a „parrot“ commit message.

>

Of course it is better to check things formally, instead of having them
commented-out.

That is the motovation for the change: it is better to check things
formally. Talking to Fabian Huch, I got the impression that now there are
more compute resources than five or six years ago.

So far, I thought that Lorenz_C0 was there to be tested

properly, and Lorenz_C1 an untested add-on example that is unlikely to
fail on
its own account.

Mathematically, Lorenz_C0 is of not much value. Lorenz_C1 is a formal proof
for the computational part of Tucker‘s proof of Smale‘s 14th problem. So
Lorenz_C1 is of actual value and worth testing „regularly“.

„Regularly“ could also be weekly, monthly, or even just once for every
Isabelle release. It probably makes sense to reflect that in a new session
group „very_very_slow“ or the like.

On Thu, Feb 29, 2024 at 11:55 Makarius <makarius@sketis.net> wrote:

On 29/02/2024 11:33, Fabian Huch wrote:

I heard from Fabian that nowadays the session runs reasonably fast (6 hrs

view this post on Zulip Email Gateway (Feb 29 2024 at 16:30):

From: Makarius <makarius@sketis.net>
On 29/02/2024 16:23, Fabian Immler wrote:

Of course it is better to check things formally, instead of having them
commented-out.

That is the motovation for the change: it is better to check things formally.
Talking to Fabian Huch, I got the impression that now there are more compute
resources than five or six years ago.

There are indeed a few more hardware resources, and hopefully more to come
very soon, but right now the production test environment is rather underpowered.

I am speaking here of the official isabelle_cronjob nightly build with
repeatable performance measurements. Although many people don't know about
that, it is the main way to oversee Isabelle/AFP resource requirements. If
that does not work properly, we are "flying blind" --- we used to do that over
several years around 2015 +/-.

That build environment defines what can be active as "slow" or "very_slow".
Starting with April 2024 there will be a new situation, hopefully with some
new hardware somewhere. That is necessary, because the LRZ will shut down its
experimental free-as-in-beer Linux cluster.

I do hope that within a few weeks there will be regular AFP tests on good
hardware. Here is an example timing on the super high-end consumer hardware
organized by Fabian Huch:

AFP/ddf90847bfa5
Host of1.proof.cit.tum.de

ML_PLATFORM="x86_64-linux"
ML_HOME="/u/home/wenzelm/.isabelle/contrib/polyml-5.9.1/x86_64-linux"
ML_SYSTEM="polyml-5.9.1"
ML_OPTIONS="--minheap 1500"

threads=8

Finished HOL (0:01:58 elapsed time, 0:06:19 cpu time, factor 3.21)
Finished HOL-Analysis (0:04:03 elapsed time, 0:19:05 cpu time, factor 4.70)
Finished Ordinary_Differential_Equations (0:01:02 elapsed time, 0:03:43
cpu time, factor 3.59)

Finished Lorenz_C0 (0:05:06 elapsed time, 0:29:38 cpu time, factor 5.80)
Finished Lorenz_C1 (3:17:28 elapsed time, 19:25:10 cpu time, factor 5.90)
3:34:09 elapsed time, 21:06:14 cpu time, factor 5.91

Not bad. But also note that ARM is much slower: the analogous test on Mac
Studio M1 from some hours ago is still running, here are some results so far:

Finished HOL (0:02:08 elapsed time, 0:07:09 cpu time, factor 3.33)
Finished HOL-Analysis (0:04:34 elapsed time, 0:21:46 cpu time, factor 4.77)
Finished Ordinary_Differential_Equations (0:01:06 elapsed time, 0:04:11
cpu time, factor 3.81)
Finished Lorenz_C0 (0:17:58 elapsed time, 1:45:23 cpu time, factor 5.86)

So far, I thought that Lorenz_C0 was there to be tested
properly, and Lorenz_C1 an untested add-on example that is unlikely to
fail on
its own account.

Mathematically, Lorenz_C0 is of not much value. Lorenz_C1 is a formal proof
for the computational part of Tucker‘s proof of Smale‘s 14th problem. So
Lorenz_C1 is of actual value and worth testing „regularly“.

„Regularly“ could also be weekly, monthly, or even just once for every
Isabelle release. It probably makes sense to reflect that in a new session
group „very_very_slow“ or the like.

OK, lets see what will emerge rather soon.

Right now Lorenz_C1 needs to remain inactive to avoid bombing the
isabelle_cronjob AFP test: Today it actually failed due to file-system
shortage, so it did not bump into the next day.

Makarius


isabelle-dev mailing list
isabelle-dev@in.tum.de
https://mailman46.in.tum.de/mailman/listinfo/isabelle-dev

view this post on Zulip Email Gateway (Feb 29 2024 at 22:36):

From: Makarius <makarius@sketis.net>
I've now killed that test after it had accumulated > 60h CPU time without
terminating.

See further AFP/fd41e17fc7bd, where we are back to sanity for now.

Makarius


isabelle-dev mailing list
isabelle-dev@in.tum.de
https://mailman46.in.tum.de/mailman/listinfo/isabelle-dev


Last updated: Apr 27 2024 at 12:25 UTC