[isabelle] Code generation in the AFP · Archive Mirror: Isabelle Users Mailing List

Stream: Archive Mirror: Isabelle Users Mailing List

Topic: [isabelle] Code generation in the AFP

Email Gateway (Aug 22 2022 at 17:08):

From: Lars Hupel <hupel@in.tum.de>
Dear list,

sparked by a question from Tobias, I would like to draw attention to the
current (technical) state of code generation as used in numerous AFP
entries.

For now, let's only talk about self-contained code, i.e. no external
dependencies required. Code that can be compiled with just "export_code
... checking ...".

All Jenkins boxes have a uniform setup with GHC, ocamlc, scalac, and
Poly/ML present. (Pinning the versions of GHC and ocamlc to make them
independent of the platform version is a different story, too.)

Hence, it is no problem to use both "checking" and "in". People use
both, and most (all?) generated artifacts are listed in
"$AFP_BASE/.hgignore":

thys/Tree-Automata/code/haskell/generated/Nat.hs
thys/Tree-Automata/code/haskell/generated/Ta.hs
thys/Tree-Automata/code/ml/generated/Ta.ML
thys/Tree-Automata/code/ocaml/generated/Ta.ml
...

However, using "export_code ... in ... file ..." does not mean that the
generated code is being compiled. For that, one needs to use "checking".
But "checking" does not generate files, so when one wants to provide a
session that allows both

a) downstream users to use "isabelle build" to produce artifacts
b) testing (compiling) generated artifacts to avoid regressions

one has to use both commands.

It is additionally complicated by the fact that the AFP submission
system, for security reasons, forbids writing files into "$AFP_BASE".
Submitting a session that either uses "export_code ... in ... file ..."
directly or indirectly (via a different session) will lead to build failure.

Maybe we should move towards allowing both use cases in a single
command; coupled with a system option that's on by default determining
whether actual files are written. One could then envision running

isabelle build -o codegen_artifacts=false ...

in the submission system, whereas nothing else changes for users.

Opinions?

Cheers
Lars

Email Gateway (Aug 22 2022 at 17:08):

From: Makarius <makarius@sketis.net>
In principle, writing files into the source file-system space is
considered a legacy feature since > 10 years, but I guess that I am now
the only one who still remembers this.

So just the usual questions:

* Why is there a need to write files into the source directory in the
first place? (Apart from old habits inherited from the 1970s or 1980s.)

* Which fine-tuning of Isabelle functionality is required to get rid
of it?

It is important to note that the PIDE document model does not support
the concept of files generated during execution. So it would be better
to get rid of it altogether, or if there are good reasons for it to
support it properly. Both requires to understand the true purpose behind it.

Makarius

Email Gateway (Aug 22 2022 at 17:08):

From: Lars Hupel <hupel@in.tum.de>

In principle, writing files into the source file-system space is
considered a legacy feature since > 10 years, but I guess that I am now
the only one who still remembers this.

I vaguely remember talks about handling "code_export" through PIDE, but
I'm not sure how far up your priorities that is.

* Why is there a need to write files into the source directory in the
first place? (Apart from old habits inherited from the 1970s or 1980s.)

Code generation. It could also happen to an entirely different
directory, sure.

* Which fine-tuning of Isabelle functionality is required to get rid
of it?

Users should be able to obtain code that they can compile. Possibly
without having to open an editor. Possibly by using "isabelle build" for
that.

I personally have no strong opinion on the workflow, but I suspect
others might.

Email Gateway (Aug 22 2022 at 17:08):

From: Makarius <makarius@sketis.net>
This sounds like a special form of output of "resources" that have a
"path" location (relative to a theory. E.g. in Isabelle/ML it could be
done by (write_resource path text), analogously to (writeln text).

The result could be absorbed by the PIDE markup for the theory, and
retried later by command-line tools. (This assumes that we already have
persistent PIDE databases as result of "isabelle build", but we are
close to that.)

Further applications of the same idea:

* Generated LaTeX document source: document output is always emitted,
but not yet used with latex; the "isabelle document" tool does that
later on, based on a persistent PIDE database.

* Generated HTML web pages (or even a web server), based on a
persistent PIDE database -- essentially a continuation of
"isabelle.preview" in Isabelle/jEdit.

Makarius

Email Gateway (Aug 22 2022 at 17:08):

From: Lars Hupel <hupel@in.tum.de>
Am I right in assuming that you are proposing a set of new functions for
dealing with resources, that require some additional PIDE infrastructure?

This is potentially a good idea anyway, yet we still have to figure out

1) where generated program text should go in the first place; in
particular, I'd be skeptical of any scheme that would put it e.g. below
"$ISABELLE_HOME_USER"

2) a somewhat reasonable way to control writing to the file system and
checking the results without theory authors having to repeat themselves

I think the most common use case to support here is that people want to
run Isabelle as part of their build system ("make", "cabal", ...) to
obtain sources that are then compiled together with some additional
code. Internalizing that entire process into Isabelle is virtually
impossible due to the unlimited amount of build tools out there.

Email Gateway (Aug 22 2022 at 17:09):

From: "Achim D. Brucker" <brucker@spamfence.net>
Hi,

we are relying on generating source files for the test setups generated
by HOL-TestGen. Getting rid of writing generated source files in general
would be a severe limitation. Generating files in a dedicated output
directory (similar to, or even the same as, the one used for the document
generation) should be fine.

Being able to integrate the compilation of the generated source files (and
external dependencies) into the build system (e.g., by providing a build.sh
in the source output directory similar to custom document generation) would
be a big plus ;-).

Cheers,

Achim
signature.asc

Email Gateway (Aug 22 2022 at 17:09):

From: Makarius <makarius@sketis.net>
On 27/04/18 16:26, Lars Hupel wrote:

This sounds like a special form of output of "resources" that have a
"path" location (relative to a theory. E.g. in Isabelle/ML it could be
done by (write_resource path text), analogously to (writeln text).

The result could be absorbed by the PIDE markup for the theory, and
retried later by command-line tools. (This assumes that we already have
persistent PIDE databases as result of "isabelle build", but we are
close to that.)

Am I right in assuming that you are proposing a set of new functions for
dealing with resources, that require some additional PIDE infrastructure?

This is potentially a good idea anyway, yet we still have to figure out

1) where generated program text should go in the first place; in
particular, I'd be skeptical of any scheme that would put it e.g. below
"$ISABELLE_HOME_USER"

The output of write_resource would be stored in the PIDE markup
database, what is now log/My_Session.db and still lacking PIDE information.

Other tools could pick it up by Isabelle/Scala operations (still to be
provided). One such tool could be "isabelle resources" to extract
material from the database and copy it into a given directory. From
there, one could continue in the old-fashioned manner of the
1970s/1980s, with tools like "make" and its modern derivatives.

2) a somewhat reasonable way to control writing to the file system and
checking the results without theory authors having to repeat themselves

Is that what I have described above or something else?

The general idea is that "isabelle build" or Isabelle/jEdit sessions
never write to the file-system -- that would be stateful -- but only to
the stateless PIDE database.

I think the most common use case to support here is that people want to
run Isabelle as part of their build system ("make", "cabal", ...) to
obtain sources that are then compiled together with some additional
code. Internalizing that entire process into Isabelle is virtually
impossible due to the unlimited amount of build tools out there.

Most tools I've ever seen can be assimilated by Isabelle. People who
don't want this, and can still stick to older traditions, e.g. like this
in a Makefile:

isabelle build ... # turn session sources into session database
isabelle resources ... # extract certain files into a directory
do whatever ...

Makarius

Email Gateway (Aug 22 2022 at 17:09):

From: Lars Hupel <hupel@in.tum.de>

Is that what I have described above or something else?

The general idea is that "isabelle build" or Isabelle/jEdit sessions
never write to the file-system -- that would be stateful -- but only to
the stateless PIDE database.

Sounds like it, yes. But just to clarify: is this a short- or long-term
solution?

(There's no rush, anyway.)

Most tools I've ever seen can be assimilated by Isabelle. People who
don't want this, and can still stick to older traditions, e.g. like
this
in a Makefile:

That's an entirely different discussion though, centered around who'd be
willing to put in the effort and maintain build tools packaged as
Isabelle components (or similar). But as long as there's a convenient
command-line tool, it's a largely orthogonal discussion.

Email Gateway (Aug 22 2022 at 17:09):

From: Makarius <makarius@sketis.net>
Both short- and long-term, in a sense. Long-term, because the basic idea
has been in the pipeline for a long time, in some form or another.
Short-term, because it might actually happen really soon, if we are very
lucky for the coming release.

Makarius

Email Gateway (Aug 22 2022 at 17:09):

From: Florian Haftmann <florian.haftmann@informatik.tu-muenchen.de>
Dear all,

I would also appreciate a more systematic treatment of generated code in
the context of its applications.

Just a few further observations:

The following files apparently contain generated code but a part of
the history:
thys/LTL_to_DRA/Code/LTL_to_DRA_Translator.sml
thys/Depth-First-Search/dfs.ML
thys/Refine_Imperative_HOL/benchmarks/NestedDFS/isabelle/NDFS_Benchmark_export.sml
thys/Refine_Imperative_HOL/benchmarks/Dijkstra/isabelle/dijkstra_export.sml
thys/Refine_Imperative_HOL/benchmarks/Heapmap/isabelle/heapmap_export.sml
thys/LTL/example/rewrite_example.sml

Having generated data in the history is always a bad idea. The least
thing to todo is to remove this and put them into .hgrc (a second-best
thing, though).

A cursory grep -rIPn ghc * on the AFP reveals a couple of ad-hoc
constructions to build tools from generated code using GHC.

Maybe it is time to think how such compiler invocations and their
results and be integrated smoothly into a Isabelle session. In that
longer past I never attempted this due to lack of examples.

Cheers,
Florian
signature.asc

Email Gateway (Aug 22 2022 at 17:09):

From: Lars Hupel <hupel@in.tum.de>
Do we really need to add even more moving parts to the simple act of
running "isabelle build"?

It is definitely worth discussing how to deal with external
dependencies, but I believe a solution should live outside of a session
build.

Cheers
Lars

Email Gateway (Aug 22 2022 at 17:14):

From: Makarius <makarius@sketis.net>
I have already started working in that direction: see
Isabelle/ac82ee617a75 with the following NEWS entries:

* ML *

Operation Export.export emits theory exports (arbitrary blobs), which
are stored persistently in the session build database.

* System *

The command-line tool retrieves theory exports from the session build
database.

There are no applications yet. It is likely that there will be further
refinements, when it comes into practice.

I am presently also thinking in the direction to export proof terms
(without storing them in ML), maybe even OpenTheory output.

Makarius

Last updated: Jul 18 2025 at 20:25 UTC