Stream: Archive Mirror: Isabelle Users Mailing List

Topic: [isabelle] AFP and Library


view this post on Zulip Email Gateway (Aug 22 2022 at 15:55):

From: Tobias Nipkow <nipkow@in.tum.de>
Hi Makarius,

You have concerted many if not all AFP sessions that use theories from Library
such that they are now based on the Library image. This means that every time a
Library theory changes, sonething like half the AFP needs to be rebuilt even
though only a few of the AFP sessions may import the modified Library theory.
This happens fairly frequently and I wonder what your rational for the change was.

Tobias
smime.p7s

view this post on Zulip Email Gateway (Aug 22 2022 at 15:56):

From: Tobias Nipkow <nipkow@in.tum.de>
Gerwin wrote:

"Most AFP entries were already based on that, because we’d otherwise be building
HOL-Library about 50 times or so instead of only once per test. It’s a trade-off."

Well, we would only be building a few popular entries in Library multiple times.
Indeed it is a trade-off and we win a little(?) by starting from a Library image
but we lose big time if only a single theory in Library changes. It is a bit
like amortized complexity: on average, you win, but in real time you may have to
wait much longer. Since Library changes frequently, I do wonder about this
trade-off.

Tobias
smime.p7s

view this post on Zulip Email Gateway (Aug 22 2022 at 15:56):

From: Lars Hupel <hupel@in.tum.de>
Just as a side note for your consideration: Jenkins builds the AFP incrementally, i.e., only changed sessions. A change in HOL-Library will entail a lot of sessions being rebuilt, even if just a single theory has changed.

Maybe we should look into which library theories change the most or split library into multiple sessions.

Cheers
Lars

view this post on Zulip Email Gateway (Aug 22 2022 at 15:56):

From: Lars Hupel <hupel@in.tum.de>
Just as a side note for your consideration: Jenkins builds the AFP incrementally, i.e., only changed sessions. A change in HOL-Library will entail a lot of sessions being rebuilt, even if just a single theory has changed.

Maybe we should look into which library theories change the most or split library into multiple sessions.

Cheers
Lars

view this post on Zulip Email Gateway (Aug 22 2022 at 15:56):

From: Makarius <makarius@sketis.net>
On 28/08/17 08:15, Tobias Nipkow wrote:

Gerwin wrote:

"Most AFP entries were already based on that, because we’d otherwise be
building HOL-Library about 50 times or so instead of only once per test.
It’s a trade-off."

This is indeed the traditional explanation for providing various
"library" images in Isabelle and using them in the applications, e.g. in
AFP. In recent years we have seen a trend of increasing that quite
substantially.

In particular, Isabelle2017 will have the following important starting
points for applications:

HOL
HOL-Library
HOL-Computational-Algebra
HOL-Analysis | HOL-Algebra | HOL-Number_Theory | HOL-Nonstandard-Analysis

After changing any of them, there will be substantial follow-up build
times of other sessions depending on them. This is a natural consequence
of providing good library sessions and getting lots of applications on top.

There is a second aspect for the HOL-Library image only: it is somehow
the canonical supplement to the main HOL image.

Thus there are occasional tendencies in the applications to provide a
session "Foo" and also "Foo-Library" that "merges" the HOL-Library
aspect later on, even after building a whole stack of sessions. This
complicates session specifications and causes further multiplication of
build times. Since we cannot really merge session images, the main
approach to avoid the problem is to put HOL-Library at the very start of
the session dependencies and to remove the Foo-Library variant eventually.

Well, we would only be building a few popular entries in Library
multiple times. Indeed it is a trade-off and we win a little(?) by
starting from a Library image but we lose big time if only a single
theory in Library changes.

This sounds more like the file-based build model of Coq, but without
actual "object files" to store intermediate states.

It is a theoretical question if we can move over there, practically we
can't change the fundamental approache on the spot and expect that
things still work afterwards.

(I have occasionally discussed the inherent differences of the build
model with the Coq guys, and am still convinced that our tradition of
doing it scales better. Empirical proof: size and complexity of AFP today.)

I do agree that in the past 1-2 years there is an increasing pain to be
felt in maintaining Isabelle + AFP, up to the point that less
maintenance and fewer important reforms happen. In the past 3 years, the
size of Isabelle + AFP material has doubled, while build and test
infrastructure had stagnated since 2012. Only recently this has improved
again (but I don't mean the Jenkins setup, which I simply ignore).

What would greatly help is to have proper time measurement for nightly
builds of AFP within the regular Isabelle cronjob, with its persistent
database support in the background (see also
http://isabelle.in.tum.de/devel/build_status). The software is all there
in Isabelle/Scala; lacking is only some half-decent test hardware (e.g.
a cloud node with 8 cores + 64 GB should be sufficient).

At some point, "isabelle build" could use the persistent timing
information from the database to make better scheduling of big builds,
including adhoc changes to the actual session graph (e.g. changing
parent sessions vs. session imports).

Makarius

view this post on Zulip Email Gateway (Aug 22 2022 at 15:56):

From: Tobias Nipkow <nipkow@in.tum.de>
On 28/08/2017 09:07, Lars Hupel wrote:

Maybe we should look into which library theories change the most or split
library into multiple sessions.

Thank you for this constructive suggestion. What would happen if there was an
AFP_Library with only those Library theories that are needed at least twice in
the AFP? I analyzed the sources. Only half the theories in Library are used at
least twice in the AFP. Then I analyzed the last 123 builds on Jenkins: 13 of
them did not involve any theory in AFP_Library and the AFP build would have
completed much more quickly (often in a few minutes rater than 1 hour). There is
an argument here for a dedicated AFP_Library.

Tobias

Cheers
Lars

On 28 August 2017 08:15:10 CEST, Tobias Nipkow <nipkow@in.tum.de> wrote:

Gerwin wrote:

"Most AFP entries were already based on that, because we’d otherwise be building
HOL-Library about 50 times or so instead of only once per test. It’s a trade-off."

Well, we would only be building a few popular entries in Library multiple times.
Indeed it is a trade-off and we win a little(?) by starting from a Library image
but we lose big time if only a single theory in Library changes. It is a bit
like amortized complexity: on average, you win, but in real time you may have to
wait much longer. Since Library changes frequently, I do wonder about this
trade-off.

Tobias

On 27/08/2017 13:35, Tobias Nipkow wrote:

Hi Makarius,

You have concerted many if not all AFP sessions that use theories from
Library
such that they are now based on the Library image. This means that every
time a
Library theory changes, sonething like half the AFP needs to be rebuilt
even
though only a few of the AFP sessions may import the modified Library
theory.
This happens fairly frequently and I wonder what your rational for the
change was.

Tobias

smime.p7s

view this post on Zulip Email Gateway (Aug 22 2022 at 15:57):

From: Makarius <makarius@sketis.net>
Some more notes on this.

The present reform of session-qualified theory names in Isabelle2017
still allows old-fashioned unqualified imports from existing parent
session images. It eases the upgrade of existing project sources, but it
means that the parent session structure still has an impact on the
theory name space.

Right after the Isabelle2017 release we can get rid of that, and parent
sessions vs. imported sessions (keyword 'sessions' in the ROOT) become
interchangeable -- as far as logical theory names are concerned (not the
time for loading theories).

This opens many possibilities to rearrange things, either statically in
the ROOT files, or dynamically by a smarter version of "isabelle build".
For example, all AFP sessions with < 30s runtime could be built in one
big ML process, for improved parallel performance and to avoid loading
the base session many times.

There are many more possibilities, especially with full timing
information of all sessions, theories, commands available to the build
process.

Makarius

view this post on Zulip Email Gateway (Aug 22 2022 at 15:57):

From: Florian Haftmann <florian.haftmann@informatik.tu-muenchen.de>

Maybe we should look into which library theories change the most or split library into multiple sessions.

My personal experience suggests that substantial rebuilds after pull are
seldom triggered by changes to HOL-Library; most times, that is due to
changes to HOL-Main / HOL-Complex theories — however this is not based
on any accountable evidence, so it might indeed be interesting to know
which candidates in HOL-Library have considerable velocity.

In the past, we occasionally split off sessions from HOL-Library
(sometimes also HOL-ex) as separate parts, the latest of these being
HOL-Computation_Algebra. Though scalability always played a part in the
considerations, we always managed to find a conceptual distinction to
make clear how the contents of the newly established session should look
like.

After a casual look at

$ wc -l src/HOL/Library/*.thy | sort -n

the following rough ideas come to my mind:

a) Shift further material to Computational_Algebra, e. g. Extended* or
material on orders and lattices.

b) Shift material to Datastructures, e. g. RBT*

c) Split off technical »utilities« like Simps_Case_Conv, Pattern_Aliases
and Code_ to a separate session HOL-Utilities (»tools« is already used
by a different tradition).

Cheers,
Florian

P. S. Since that is a post-release issue and the original issue not
relevant for users working on stable releases, we should move this
discussion to isabelle-dev.
signature.asc

view this post on Zulip Email Gateway (Aug 22 2022 at 15:57):

From: Tobias Nipkow <nipkow@in.tum.de>
As long as this is transparent, you are of course welcome to optimize "isabelle
build" for tasks like the AFP. Everything that involves changes to the AFP needs
to be discussed first.

Tobias
smime.p7s

view this post on Zulip Email Gateway (Aug 22 2022 at 15:57):

From: Tobias Nipkow <nipkow@in.tum.de>
On 31/08/2017 13:26, Florian Haftmann wrote:

Maybe we should look into which library theories change the most or split library into multiple sessions.

My personal experience suggests that substantial rebuilds after pull are
seldom triggered by changes to HOL-Library; most times, that is due to
changes to HOL-Main / HOL-Complex theories — however this is not based
on any accountable evidence, so it might indeed be interesting to know
which candidates in HOL-Library have considerable velocity.

From my analysis of these 123 revisions, 10% of the builds could have been
reduced from 30-60 mins to a few minutes by a less bloated Library. This is
clearly not dramatic but it is helpful.

In the past, we occasionally split off sessions from HOL-Library
(sometimes also HOL-ex) as separate parts, the latest of these being
HOL-Computation_Algebra. Though scalability always played a part in the
considerations, we always managed to find a conceptual distinction to
make clear how the contents of the newly established session should look
like.

I certainly support moving theories out of Library for conceptual reasons, and
it may also have the desired side effect.

Concerning Data_Structures please note that this directory is dedicated to
conceptually clean and simple models of data structures and not to efficient
implementations. Hence it is not the right home for Library/RBT*.

Tobias

After a casual look at

$ wc -l src/HOL/Library/*.thy | sort -n

the following rough ideas come to my mind:

a) Shift further material to Computational_Algebra, e. g. Extended* or
material on orders and lattices.

b) Shift material to Datastructures, e. g. RBT*

c) Split off technical »utilities« like Simps_Case_Conv, Pattern_Aliases
and Code_ to a separate session HOL-Utilities (»tools« is already used
by a different tradition).

Cheers,
Florian

P. S. Since that is a post-release issue and the original issue not
relevant for users working on stable releases, we should move this
discussion to isabelle-dev.

smime.p7s


Last updated: Nov 21 2024 at 12:39 UTC