[isabelle] Performance regression in 2016-1 with characte... · Archive Mirror: Isabelle Users Mailing List

Stream: Archive Mirror: Isabelle Users Mailing List

Topic: [isabelle] Performance regression in 2016-1 with characte...

Email Gateway (Aug 22 2022 at 14:42):

From: Florian Haftmann <florian.haftmann@informatik.tu-muenchen.de>
In 96015aecfeba I have addressed this by a dedicated simproc for
equality on characters. This should be possible to be ported into a
separate theory for Isabelle2016-1.

From the same changeset it can be seen that other tools reyling on
concrete string values like state spaces use to provide their own proof
devices for that.

The deeper reason behind this seems to be that simp rules tailored for
abstract symbolic reasoning are often not suitable for calculating with
concrete values and vice versa.

Unfortunately the most elegant solution, a simproc or simp rules for
congruences »num1 mod num3 = num1 mod num2« did not work out since there
are two many other rules and simprocs interfering with mod.

Cheers,
Florian
signature.asc

Email Gateway (Aug 22 2022 at 14:42):

From: Thomas.Sewell@data61.csiro.au
Sorry I haven't been around for this discussion.

Thanks Florian for the patch. We've also found a similar fix, we'll
adopt one or
the other locally until the next release.

About Makarius' question:

Since such
post-release problems of users have happened routinely in recent years,
I would like to know what can be done about it.

It's worth thinking about this. Ideally we'd see releases go out with no
problems
remaining in them.

However, the only strategy that has been considered yet is to wait long
enough
during the pre-release phase for all "serious users" to update their
proofs. I don't
think we're going to be able to do that. On this occasion, we had a
volunteer, and the
INCOMPATIBILITY level wasn't so bad (the changes to supremum sets being
the worst
offenders), and still, we finished the update process shortly after the
release became
official. This was a good release. In previous releases we've been
preoccupied with
other things, and have stayed on the previous Isabelle version for months.

Another possible strategy would be to split Isabelle releases between
compatible
releases (new features, PIDE extensions etc) and incompatible releases
(INCOMPATIBILITY allowed). "Serious users" could then do push-button
tests and
offer feedback on all compatible releases. This is known to be a reasonable
strategy, since it's what pretty much every mature system out there does,
but it's probably too much to hope that Isabelle development would change
so substantially.

We can also suggest particular test cases that match the kinds of
problems we
have locally. For instance, it's obvious that we could stress-test
string comparisons
with a fairly short theory, and contribute that as a test case. Likewise
our problems
with large records. But it's a lot of work to discover an essential test
case.
For this release, it took three of us most of a day to
narrow down the performance problems we were seeing from polymorphic
constants
on C-types to the simple issue of string comparison.

Anyway, if anyone else has any bright ideas how we can help, we'd be
happy to.

Cheers,
Thomas.

Email Gateway (Aug 22 2022 at 14:43):

From: Makarius <makarius@sketis.net>
This sounds like requiring 5-10 more manpower behind Isabelle
development and release process.

Even just having "patch levels" with small amendments for published
releases (as done for Coq), already requires significant extra time.

Isabelle releases do not even have version numbers: every release is a
consolidated state of ongoing linear development. In the past, the
consolidation required 2-3 weeks, but we are now approaching 6-8 weeks
(with relatively little happening in that long time).

Makarius

Email Gateway (Aug 22 2022 at 14:43):

From: Lawrence Paulson <lp15@cam.ac.uk>
Usually there are a couple of weeks’ notice before the pre-release phase even starts… couldn’t at least some users choose that time to download the current development version to take a look at how it might affect their work?

Larry Paulson

Email Gateway (Aug 22 2022 at 14:43):

From: Makarius <makarius@sketis.net>
The full schedule for the Isabelle2016-1 release is recorded here:
http://sketis.net/2016/release-candidates-for-isabelle2016-1

The informal snapshot Isabelle2016-1-RC0 was published 07-Oct-2016, i.e.
9.5 weeks before the final release (13-Dec-2016). RC0 is really just an
informal preview.

The point to hop on a new release for professional users is RC1, or RC2
at the latest. There can be a tradeoff: getting on the train earlier
could mean more work to figure out remaining problems of the release
candidate, getting on the train tool late could mean that some
inconvenience have to remain in the release.

My impression is that we've had again the second case several times:
notable problems were discovered too late and there was no realistic
change to revisit them.

Makarius

Email Gateway (Aug 22 2022 at 14:52):

From: Thomas.Sewell@data61.csiro.au
Hello Isabelle users, developers.

We've noticed a serious performance regression involving characters and
strings in Isabelle-2016-1.

This can be demonstrated as follows:

lemma "''ab'' ~= ''ba''"
using [[simp_trace]]
by simp

In earlier Isabelle versions, it takes a few dozen steps to prove this
rule. In 2016-1, the characters
are compared by expanding rule Char_eq_Char_iff which requires
normalising "k mod 256" for
various k, which takes many many steps.

One workaround we tested is to adjust the simpset to ensure that the
code rules nat_of_char_code
and integer_of_char_simps are used, which solves the problem.

The C typing framework used by the seL4 proofs, Cogent proofs,
AutoCorres etc, uses strings to
encode field names and type identifiers. The impact of this change is
big enough to explode the
time to parse seL4 into Isabelle by a factor of 4 or more. There are
also some other string users
around.

We're sorry we didn't notice how significant the impact was until after
the release.

For the moment we'll patch around this by adjusting the simpset at our
end. The Isabelle version
should probably be adjusted as well though.

Cheers,
Thomas.

Email Gateway (Aug 22 2022 at 14:53):

From: Makarius <makarius@sketis.net>
Can you explain what went wrong in the release process?

The first release candidate on 01-Nov-2016, more than 6 weeks ago.

Makarius

Email Gateway (Aug 22 2022 at 14:53):

From: Makarius <makarius@sketis.net>
Sorry, I've got this wrong.
http://sketis.net/2016/release-candidates-for-isabelle2016-1 says:

On 28-Oct-2016 the release candidate Isabelle2016-1-RC1 was published.
Serious testing by users is now required, to expose remaining problems.

That was exactly 7 weeks ago -- a very long time. Since such
post-release problems of users have happened routinely in recent years,
I would like to know what can be done about it.

Maybe we should offer some "premium user" program: Improvements of
published releases are charged $$$.

Makarius

Email Gateway (Aug 22 2022 at 14:53):

From: Gerwin.Klein@data61.csiro.au

On 17.12.2016, at 01:26, Makarius <makarius@sketis.net> wrote:

On 16/12/16 14:55, Makarius wrote:

On 16/12/16 05:54, Thomas.Sewell@data61.csiro.au wrote:

We're sorry we didn't notice how significant the impact was until after
the release.

Can you explain what went wrong in the release process?

The first release candidate on 01-Nov-2016, more than 6 weeks ago.

Sorry, I've got this wrong.
http://sketis.net/2016/release-candidates-for-isabelle2016-1 says:

On 28-Oct-2016 the release candidate Isabelle2016-1-RC1 was published.
Serious testing by users is now required, to expose remaining problems.

That was exactly 7 weeks ago -- a very long time. Since such
post-release problems of users have happened routinely in recent years,
I would like to know what can be done about it.

I don’t think there is much that can be done from the release side. 7 weeks is a good period for testing for “normal” users, but it is not a lot of time to update > 500kloc of proof from one Isabelle release to another when there are too many other pressing things to do. We did manage to test functionality in time, but not performance, and that was mostly a function of what testing hardware we had available on our end (a particular bottleneck this year).

Maybe we should offer some "premium user" program: Improvements of
published releases are charged $$$.

Might be an idea, but currently not necessary, at least not for us :-) It’s relatively easy to change things on our side, it would just have been nicer for the rest of the community if we had managed to find the problem earlier. Maybe nobody else needs that many string comparisons — the performance regression didn’t seem to show up in a major way in the AFP or distribution after all.

Cheers,
Gerwin

Last updated: Jun 30 2025 at 20:23 UTC