Stream: Archive Mirror: Isabelle Users Mailing List

Topic: [isabelle] Proof of concept: BNF-based records


view this post on Zulip Email Gateway (Aug 22 2022 at 16:38):

From: Makarius <makarius@sketis.net>
Thanks to everybody who contributed to this thread.

Here is a summary of important points that are relevant for the next
round of renovation of the Isabelle/HOL record package.

Some partial solutions (for small records) are:

https://github.com/seL4/l4v/tree/master/tools/c-parser/recursive_records

http://isabelle.in.tum.de/repos/isabelle/file/3107dcea3493/src/HOL/Library/Datatype_Records.thy

view this post on Zulip Email Gateway (Aug 22 2022 at 16:44):

From: Lars Hupel <hupel@in.tum.de>
Dear list,

I'd like to demonstrate that it is quite simple to base records on BNFs.

The short story is: it takes about 160 lines of ML code and 50 lines of
syntax declarations to achieve almost (one big caveat, see below)
feature parity to the standard record package (sources attached).

Looking for your input on whether this is useful, and whether it belongs
into HOL-Library, the AFP, or the bin.

What is supported? Pretty much the old syntax:

bnf_record ('a, 'b) foo =
field_1 :: 'a
field_2 :: 'b

term "⦇ field_1 = 3, field_2 = () ⦈"
term "foo ⦇ field_1 := y ⦈"

Also, record construction syntax is supported out of the box for
arbitrary single-free-constructor types:

datatype x = X (a: int) (b: int)

term "⦇ a = 1, b = 2 ⦈"

Update syntax for existing types can be installed like this:

local_setup ‹BNF_Record.mk_update_defs @{type_name x}›

term "(X 1 2) ⦇ b := 3 ⦈ = X 1 3"

What isn't supported?

How is it implemented?

  1. Define a single-constructor datatype.
  2. Generate update functions for each field.
  3. Lots of syntax sugar.

Why?

The short reason: the "record" package isn't localized.

The longer reason: there are more problems with records, e.g. there is
no support for nested recursion (they do not constitute BNFs), and even
"copy_bnf" can't fix the absence of a suitable "size" function (which
makes termination proofs impossible).

Why not?

The internal construction of standard records is much faster. Half of
the 50 lines to declare syntax is required to un-declare record syntax.
Unfortunately raw "syntax" rules can't be bundle-d. Importing the theory
breaks all existing record syntax.

What is the motivation for this?

Lem supports records. CakeML uses them for some core types, and
specifies nested recursion through them. Whenever CakeML updates, I can
either change the generated sources to include a bunch of magic
incantations to make termination proofs work, or I can change Lem to
emit "bnf_record" instead of "record".* As a third option (somewhere
between band-aid and proper solution), Jasmin has already suggested a
few months to port the size plugin to records. I attempted that but
unfortunately it took me too much time and ultimately I didn't manage
to. This here was much quicker to pull off (approx. 3 h) and fixes an
immediate problem I had. (Jasmin did however prevent me from writing a
plugin that generates update functions for all datatypes.)

Cheers
Lars

view this post on Zulip Email Gateway (Aug 22 2022 at 16:45):

From: Makarius <makarius@sketis.net>
On 08/02/18 22:36, Lars Hupel wrote:

I'd like to demonstrate that it is quite simple to base records on BNFs.

The short story is: it takes about 160 lines of ML code and 50 lines of
syntax declarations to achieve almost (one big caveat, see below)
feature parity to the standard record package (sources attached).

This is all very strange. No discussion whatsoever and now there is even
a change on the Isabelle repository:
http://isabelle.in.tum.de/repos/isabelle/rev/7929240e44d4

The history of the official Isabelle record package is long and
entangled. Initially its main idea was to support extensible records in
a simple manner (see the paper by Naraschewski and Wenzel from 1998).
Later, some guys from Sydney added many more features to make the
package scalable for the L4.verified project -- that made the
implementation also very complex. Thus it has become difficult to update
and "localize" it. Moreover any connection to BNF datatypes (which are
very resource hungry) is likely to bomb applications with many record
fields (or record extensions).

It is presently unclear to me, where the scalability features are used
in practice. If L4.verified no longer requires that, we can probably
remove 80% of the record package and then update it easily. If scalable
records are still required, it will need more work to update.

Can some Data61 people say more about their current applications of the
record package?

Looking for your input on whether this is useful, and whether it belongs
into HOL-Library, the AFP, or the bin.

Spontaneously, I would have said HOL-ex or the bin. By putting it in
HOL-Library, you make certain claims of quality and sustained
maintenance. I.e. the first posting I can imagine on isabelle-users:

"I've found out about HOL-Library.Datatype_Records by accident -- there
is no NEWS entry and no documentation whatsoever. That tool appears to
be in conflict with the existing record package. Can anybody please fix
this ASAP, such that I can use theories with regular records together
with these new BNF records? Anyway, what is the roadmap? Can I expect
new BNF records will eventually supersede old-fashioned typedef records?"

What is supported? Pretty much the old syntax:

bnf_record ('a, 'b) foo =
    field_1 :: 'a
    field_2 :: 'b

term "⦇ field_1 = 3, field_2 = () ⦈"
  term "foo ⦇ field_1 := y ⦈"

Also, record construction syntax is supported out of the box for
arbitrary single-free-constructor types:

datatype x = X (a: int) (b: int)

term "⦇ a = 1, b = 2 ⦈"

Update syntax for existing types can be installed like this:

local_setup ‹BNF_Record.mk_update_defs @{type_name x}›

term "(X 1 2) ⦇ b := 3 ⦈ = X 1 3"

That is record notation with "make" and "update" operations on top of
existing datatypes. It was actually the plan in 1996/1997, before the
extensible record scheme emerged in 1998. We did not do this by fancy,
but it was based on requirements from applications.

Why?

The longer reason: there are more problems with records, e.g. there is
no support for nested recursion (they do not constitute BNFs), and even
"copy_bnf" can't fix the absence of a suitable "size" function (which
makes termination proofs impossible).

Can you explain this further? Is there an inherent problem of the
general idea of the regular record package wrt. BNFs? Can the BNF
experts comment on that?

Why not?

The internal construction of standard records is much faster. Half of
the 50 lines to declare syntax is required to un-declare record syntax.
Unfortunately raw "syntax" rules can't be bundle-d. Importing the theory
breaks all existing record syntax.

BTW, there is an old Isabelle development principle to be monotonic wrt.
existing tools. A fork and clone is bad enough, one that disrupts what
is already there is worse: entropy, decay.

What is the motivation for this?

Lem supports records. CakeML uses them for some core types, and
specifies nested recursion through them. Whenever CakeML updates, I can
either change the generated sources to include a bunch of magic
incantations to make termination proofs work, or I can change Lem to
emit "bnf_record" instead of "record".* As a third option (somewhere
between band-aid and proper solution), Jasmin has already suggested a
few months to port the size plugin to records. I attempted that but
unfortunately it took me too much time and ultimately I didn't manage
to. This here was much quicker to pull off (approx. 3 h) and fixes an
immediate problem I had. (Jasmin did however prevent me from writing a
plugin that generates update functions for all datatypes.)

This hints at various discussions in the dark. We have relatively little
traffic on the isabelle-users and isabelle-dev mailing lists, so why not
use them for proper discussions?

As a reminder:

* isabelle-users is for anything that is relevant to users of official
Isabelle releases (past, present, future releases).

* isabelle-dev is for anything concerning the ongoing development
process of Isabelle repository versions between releases.

* It does not make sense to cross-post on both lists; isabelle-dev is
the smaller list, essentially a subset of isabelle-users.

Mailing lists are not for real-time "chats". Often the relevant people
are not immediately reactive (e.g. busy somewhere else or even on
vacation). This means it needs more than just a few days to draw
conclusions from the absence of postings.

Makarius

view this post on Zulip Email Gateway (Aug 22 2022 at 16:45):

From: Tobias Nipkow <nipkow@in.tum.de>
Lars discussed this with me. There is no harm in putting it in Library as long
as the functionality and limitations of this approach are clearly stated in the
theory, which they are currently not. This needs to be included. Then people
know what they are letting themselves in for.

Tobias
smime.p7s

view this post on Zulip Email Gateway (Aug 22 2022 at 16:46):

From: Lars Hupel <hupel@in.tum.de>

Can you explain this further? Is there an inherent problem of the
general idea of the regular record package wrt. BNFs? Can the BNF
experts comment on that?

I explained that in my email. "copy_bnf" makes a record a BNF. But for
it to be able to be used in a function definition, it also requires size
setup, which as of now, is manual.

BTW, there is an old Isabelle development principle to be monotonic wrt.
existing tools. A fork and clone is bad enough, one that disrupts what
is already there is worse: entropy, decay.

As I said, raw syntax cannot be bundled. I expect people to have a brief
look over what they're including, and the way the syntax setup works is
exceedingly obvious.

This hints at various discussions in the dark. We have relatively little
traffic on the isabelle-users and isabelle-dev mailing lists, so why not
use them for proper discussions?

The problems with record + nested recursion had been laid out on the
mailing lists a few months ago.

view this post on Zulip Email Gateway (Aug 22 2022 at 16:46):

From: Lars Hupel <hupel@in.tum.de>
Fair point. I will add a few paragraphs there.

view this post on Zulip Email Gateway (Aug 22 2022 at 16:46):

From: Gerwin.Klein@data61.csiro.au

It is presently unclear to me, where the scalability features are used
in practice. If L4.verified no longer requires that, we can probably
remove 80% of the record package and then update it easily. If scalable
records are still required, it will need more work to update.

Can some Data61 people say more about their current applications of the
record package?

We still do require it, and in fact make more use of the scalability than ever, because seL4 keeps growing and we keep adding support for further architectures.

We also depend on extensibility of records in a few places.

Looking for your input on whether this is useful, and whether it belongs
into HOL-Library, the AFP, or the bin.

Spontaneously, I would have said HOL-ex or the bin. By putting it in
HOL-Library, you make certain claims of quality and sustained
maintenance. I.e. the first posting I can imagine on isabelle-users:

"I've found out about HOL-Library.Datatype_Records by accident -- there
is no NEWS entry and no documentation whatsoever. That tool appears to
be in conflict with the existing record package. Can anybody please fix
this ASAP, such that I can use theories with regular records together
with these new BNF records? Anyway, what is the roadmap? Can I expect
new BNF records will eventually supersede old-fashioned typedef records?”

:-)

I must admit that we have been maintaining a similar (very bare-bones) additional datatype record package on the side for about a decade. We use it in addition to the standard record package to model C structs, because they can contain recursive pointer types, which we couldn’t do with normal records. I.e. it might make sense to have both, as long as it is clear what each is for, what the differences are, and why they can’t/shouldn’t be the same.

The longer reason: there are more problems with records, e.g. there is
no support for nested recursion (they do not constitute BNFs), and even
"copy_bnf" can't fix the absence of a suitable "size" function (which
makes termination proofs impossible).

Can you explain this further? Is there an inherent problem of the
general idea of the regular record package wrt. BNFs? Can the BNF
experts comment on that?

I’d be interested in that too. It’d be nice to properly localise and BNF-ify records eventually.

Cheers,
Gerwin

view this post on Zulip Email Gateway (Aug 22 2022 at 16:46):

From: Makarius <makarius@sketis.net>
Can you point to the sources of a few such big record definitions?

Such examples will be required even to a canonical localization effort
of the package that retains its feature richness.

Makarius

view this post on Zulip Email Gateway (Aug 22 2022 at 16:46):

From: Makarius <makarius@sketis.net>
People often don't have a choice: imports are often indirect. E.g. when
you use theories from HOL-Algebra, the regular record package will be
used. Combining this which bnf_record will break it.

It is normally possible to add new things without disrupting existing
things, and thus retain the good manners of monotonicity.

Makarius

view this post on Zulip Email Gateway (Aug 22 2022 at 16:47):

From: Makarius <makarius@sketis.net>
Can you point to the sources of this alternative datatype record
package? How does it treat record syntax?

One important aspect of localizing the record package is actually the
concrete syntax. When that is done properly, syntax could be attached to
different term constructions, independently of the underlying record
representation.

Makarius

view this post on Zulip Email Gateway (Aug 22 2022 at 16:47):

From: Peter Lammich <lammich@in.tum.de>
Is there a way to do advanced syntax (e.g. nonterminal, syntax,
translations, parse_translation, print_ast_translation) properly (e.g.
localized)?

view this post on Zulip Email Gateway (Aug 22 2022 at 16:47):

From: Makarius <makarius@sketis.net>
Yes, one merely needs to connect local entities with global items in the
background theory (the so-called foundation), according to the usual
policies of local_theory contexts.

Here the foundation consists of raw syntax and translations, and the
connection might require further tricks that have not been used before.

The whole localization business is about such tricks -- to make things
possible that look impossible at first sight.

Makarius

view this post on Zulip Email Gateway (Aug 22 2022 at 16:48):

From: Peter Lammich <lammich@in.tum.de>
So I don't feel so bad anymore about writing syntax translations tgat depend
on a configuration flag from the local context ... I did this trick recently
to enable advanced syntax in a bundle ... the problem was that some
translations that match on the outermost constant of an expression are not
supposed to return identity ... this required further workarounds and makes it
all feel like a bad hack.

Peter

view this post on Zulip Email Gateway (Aug 22 2022 at 16:48):

From: Makarius <makarius@sketis.net>
Can you point to the sources?

Makarius

view this post on Zulip Email Gateway (Aug 22 2022 at 16:48):

From: Peter Lammich <lammich@in.tum.de>
No, but I can include a stripped-down example. Basically, the print and
parse translations will raise Match if they are disabled by a
configuration option.

See the last lines of attached text to see an example how it is
supposed to work.

The decomposition of strings looks ughly, I basically copied it from
String_Syntax.ML

view this post on Zulip Email Gateway (Aug 22 2022 at 16:49):

From: Thomas.Sewell@data61.csiro.au
Hi all. I'm the guy from Sydney who added stuff to the record package.
I'm just back from holiday, and I should try to clear up some
misunderstandings.

I think the BNF record extension mechanism is fine. I think (but haven't
checked) that it's the record update syntax which is being used. This is
just syntax for any constant whose name ends in "_update". You can play
with this by hand, if you want, define the constant spam_update
and use the syntax "r (| spam := 1 |)". This won't create conflicts with
the record package, since the record package knows which constants
it "owns".

The history, as I recall, is that the extensible record mechanism is
quite old.
Norbert Schirmer's Simpl package tends to require large records, so
he adjusted the implementation. He managed to support about 100 variables,
and then I adjusted it again to get up to 1000.

My adjustment made a lot of changes to how the record type is defined
internally. But I tried to avoid changing the "surface layer" which users
interact with. I think Makarius is slightly wrong to describe this as
new features,
it was meant to just be a performance upgrade within the machinery.

I did make a mistake. I didn't understand the code generator well, and
in fact I
still don't. I had hoped at the time that my changes were benign, and
would result
in code being generated fairly normally. I was wrong, apparently. Other
authors
subsequently added quite a lot of code to produce theorems that allow
the code
generator to simulate a simpler definition for records. This conflicts
with my
performance goals somewhat. With all these changes together, the package is
now quite complicated.

We can't "quote" the big record definition that is done, because it's
called in ML by
another package. But we can describe the performance constraints very
easily:
somewhere upwards of 700 variables. I think that our most challenging
configuration
might be up well about 1000 now.

There's one key big record for any given inclusion of a C program, i.e.
one in any
image. It's a flat record, not an extension and never extended.

Quoting myself on this list from nearly 10 years ago, you can run test
an n-variable
record definition quite easily:

ML {*
fun add_test_rec b n = let
fun var i = (Binding.suffix_name (string_of_int i) b, @{typ bool},
Mixfix.NoSyn)
in Record.add_record {overloaded = false} ([], b)
NONE (map var (1 upto n))
end
*}

setup {* add_test_rec @{binding test_rec} 20 *}
setup {* add_test_rec @{binding test_rec_b} 200 *}

We also use a number of smaller records, and do use the extension
feature, but
I don't think we're different to other users in any interesting way.

We had a live discussion about record performance problems more than 5 years
ago. Around then we were having critical difficulties getting images to
build on
pretty much any hardware. I tracked down the problem at the time. It was due
to both the code generator support for extra record features and a
change to the
way proof terms were stored.

This problem was then solved in PolyML. When David Matthews added
shareCommonData as a feature that the GC called occasionally, the problem
went away. Now the relevant image builds are just slow. I can explain
this in more
detail, if anyone really cares, but on our side we no longer need to do
anything
about it.

The record package is now quite complicated. Various people would like
to localise
it. Part of the problem is that it doesn't have a single author any
more. I made a quick
attempt to do some localising in the past, but I quickly hit bits of the
syntax translations
and simprocs that I avoided changing in the past and that I still don't
understand.

There are probably many other agendas as well.

Cheers,
Thomas.

view this post on Zulip Email Gateway (Aug 22 2022 at 16:49):

From: Dmitriy Traytel <traytel@inf.ethz.ch>
Back from vacation here as well. I'll try to clarify some things from the BNF perspective. BTW: the term "datatype-base" for Lars' tool would be more appropriate than "BNF-based" (whereas the datatypes themselves are "BNF-based").

On 14 Feb 2018, at 15:01, Makarius <makarius@sketis.net> wrote:

The history of the official Isabelle record package is long and
entangled. Initially its main idea was to support extensible records in
a simple manner (see the paper by Naraschewski and Wenzel from 1998).
Later, some guys from Sydney added many more features to make the
package scalable for the L4.verified project -- that made the
implementation also very complex. Thus it has become difficult to update
and "localize" it. Moreover any connection to BNF datatypes (which are
very resource hungry) is likely to bomb applications with many record
fields (or record extensions).

I dimly remember that Andeas Lochbihler had an application of a long sequence of record extensions (depth 5 or 6), but can not find the corresponding formalization in Isabelle+AFP.

Why?

The longer reason: there are more problems with records, e.g. there is
no support for nested recursion (they do not constitute BNFs), and even
"copy_bnf" can't fix the absence of a suitable "size" function (which
makes termination proofs impossible).

Can you explain this further? Is there an inherent problem of the
general idea of the regular record package wrt. BNFs? Can the BNF
experts comment on that?

As Lars has pointed out: there is no problem with records w.r.t. BNFs. One can register any record as a BNF using the copy_bnf command (potentially inheriting some dead variables from the underlying product type used in the construction of records). A BNF does not give one a size function though.

Jasmin has already suggested a
few months to port the size plugin to records. I attempted that but
unfortunately it took me too much time and ultimately I didn't manage
to. This here was much quicker to pull off (approx. 3 h) and fixes an
immediate problem I had. (Jasmin did however prevent me from writing a
plugin that generates update functions for all datatypes.)

This hints at various discussions in the dark. We have relatively little
traffic on the isabelle-users and isabelle-dev mailing lists, so why not
use them for proper discussions?

Actually, this was on the mailing list and I am surprised that nobody has posted the link yet:

https://lists.cam.ac.uk/pipermail/cl-isabelle-users/2017-August/msg00016.html <https://lists.cam.ac.uk/pipermail/cl-isabelle-users/2017-August/msg00016.html>

However, for some reason, I can't find the last mail from Jasmin (to Lars, cc'ing me and isabelle-users) in this thread in the mailman archives. So I guess we should call it "in the twilight" rather than "in the dark". Here is its content (of which in particular the last paragraph is relevant in this thread):

On 13 Aug 2017, at 17:47, Blanchette, J.C. <j.c.blanchette@vu.nl> wrote:

Hi Lars,

A good starting point could be to decouple record syntax from record
definitions. There's no reason why

datatype foo = Bar (x: nat) (y: nat)

couldn't be treated as a record, including update functions.

I'm not sure what you mean exactly by "treated as a record". Records have a quite different feel, with their polymorphic "more" field.

I remember looking at records a few years ago and finding out that many of the properties and concepts shared with datatypes (or the "ctr_sugar" abstraction) have different names or different forms. A thorough analysis of the current situation would be a good idea before adding anything.

I'm already working on a plugin that produces "*_update" functions (this allows for
record update syntax, albeit not construction syntax).

I'm not sure what your plans are, but I would argue against using the datatype plugin for this, because such plugins are enabled by default. More than enough constants are already generated for datatypes. Perhaps we could discuss alternatives offline if you like.

Cheers,

Jasmin

Stepping back and looking at the different requirements (of Lars and Data61) gives me the impression, that it would be useful to have a plugin to enable record syntax and update-functions for datatypes. There are many questions that need to be discussed: Should it work for single-constructor datatypes only? What about recursive single-constructor datatypes? (The answer to the latter seems yes thinking of Data61's recursive pointer type.)

However, this plugin definitely needs to be disabled by default.

There are ways to achieve this (c.f. the "transfer" option and plugin of primrec in src/HOL/Tools/BNF/bnf_lfp_rec_sugar.ML and src/HOL/Tools/BNF/bnf_lfp_rec_sugar_transfer.ML), but they feel more like a workaround, rather than the solution: we are using an additional configuration option to disable the plugin by default. Ideally, there would be a cleaner way to specify the defaults directly in the plugin infrastructure when creating a plugin interpretation (i.e. in the Plugin.interpretation function).

This plugin would be clearly conceptually separated from the record package (which we will never supersede with datatypes because of efficiency). Probably much of the tool that Lars' created can be reused in such a plugin, but without introducing a new record package and the confusion associated to it.

Dmitriy

view this post on Zulip Email Gateway (Aug 22 2022 at 16:49):

From: Andreas Lochbihler <mail@andreas-lochbihler.de>

I dimly remember that Andeas Lochbihler had an application of a long sequence of record extensions (depth 5 or 6), but can not find the corresponding formalization in Isabelle+AFP.

Such a long sequence appeared in early versions of my Monomorphic_Monad formalisation in
the AFP. (There was a record for a monad, an extension for exceptions, another one for
state, etc.) But I dropped the whole record approach because the fixed order of record
extensions made the whole approach too inflexible. Today, Monomorphic_Monad is based on
locales.

All my records are small in terms of fields and I have frequently used extensions, but
usually no deeper than three levels. Some examples can be found in the unfinished FSCPI
formalisation at:

https://www.ethz.ch/content/dam/ethz/special-interest/infk/inst-infsec/information-security-group-dam/research/projects/FCSPI/CoSP.zip

Andreas

view this post on Zulip Email Gateway (Aug 22 2022 at 16:50):

From: "Achim D. Brucker" <brucker@spamfence.net>
(re-sending, as I first used a non-subscribed email address)

Hi,
We are currently porting (at this point in time, manually) the
extensible encoding for class model from my PhD thesis [A] to
records. Our main application at the moment is a formalization of the
Document Object Model (DOM) and partly HTML (as an extension of the
DOM) for reasoning over browser APIs that modify DOM/HTML instances
(a first AFP submission is planned for the near future).

Here, we easily end up with 5 and more extensions and also rather
complex type polynoms as extension types.

We moved from a datatype-based approach to record to benefit from the
records syntax and also to re-use the existing extensibility of records
(instead of implementing our own using records or tuples as we did in
our old object-oriented datatype package).

Best,
Achim

[A] Achim D. Brucker, Burkhart Wolff: An Extensible Encoding of
Object-oriented Data Models in hol. J. Autom. Reasoning 41(3-4):
219-249 (2008)
https://www.brucker.ch/bibliography/abstract/brucker.ea-extensible-2008-b
signature.asc


Last updated: Nov 21 2024 at 12:39 UTC