[isabelle] Embedded languages in Isabelle · Archive Mirror: Isabelle Users Mailing List

Stream: Archive Mirror: Isabelle Users Mailing List

Topic: [isabelle] Embedded languages in Isabelle

Email Gateway (Aug 22 2022 at 20:41):

From: Makarius <makarius@sketis.net>
Here are further examples for embedded languages in Isabelle:

Isabelle2019/src/Pure/Tools/rail.ML
Isabelle2019/src/Pure/Syntax/simple_syntax.ML

Both is based on Isabelle parser combinators. There is also proper
support for formal positions and Prover IDE markup in the rail
application -- without that an Isabelle language is not complete.

(Note on browsing sources in Isabelle/Pure: Documentation panel, entry
src/Pure/ROOT.ML + its explanation at the top of the file.)

I don't think that your first idea to change the Isabelle term language
will work out in general: certain assumptions about lexical syntax and
identifiers are deeply rooted in the system. There are some explanations
on formal names in section 1.2 of the "implementation" manual (see the
Documentation panel in Isabelle/jEdit). In particular, the system may
transform names within a term in certain situation, e.g. "x" could
become "x_" or "xa" or "?xa" later on.

You did not say anything about the application context nor the overall
motivation. There might be more plain and basic approaches, e.g. an
application notation \<app> where this newly allocated Isabelle symbol
is rendered as a thin space.

Makarius
signature.asc

Email Gateway (Aug 22 2022 at 20:41):

From: Daniel Kirchner <daniel@ekpyron.org>
Thank you very much for your pointers!
I already looked at Pure/Syntax/simple_syntax.ML, but Pure/Tools/rail.ML
indeed looks quite helpful, especially with respect to the Prover IDE markup
(I already tried to implement that in my preliminary approaches, but the
example will prove very helpful, even though I might move to a different
approach altogether for my current application at least for now, see below).

To say a bit more about my application: I'm constructing a shallow semantic
embedding of a philosophical logic (Edward Zalta's "Theory of Abstract
Objects"). That logic is quite different from HOL in multiple ways, which makes
this an interesting research project. For example the language contains two
kinds of atomic formulas, i.e. Fxy "x and y 'exemplify' F" and xyF "x and y
'encode' F", neither of which can be implemented as mere function application
in an embedding in HOL. The variable names have type implications, e.g. x and
y have a different type than F and G, which again have a different type from
e.g. \kappa, etc. - some of these types map directly to types in HOL, others I
need to implement with guarding behind additional constants, etc.
My basic implementation maps all these concepts to constructs in the Isabelle
term language in a working, but not in a user-friendly way - e.g. I need
special syntax for both exemplification and encoding and at times explicit type
annotations (resp. "type guard constants"), etc.
In other words I have a working implementation of the logic, but not of its
syntax (i.e. I can map formulas of the logic to corresponding "not-so-nice-
looking" formulas in Isabelle's term language, but this mapping is "manual
effort").

So what I'm currently trying to research is how close I can get to the actual
syntax of the target language directly in Isabelle, so that a person who is
used to the target logic has an easier time working with the embedding.

The target logic itself also defines new symbols in terms of its basic symbols,
so the implementation should be extensible very much like with mixfix
annotations to constants or "syntax" commands. That's why I'm trying to avoid
duplicating an implementation of a priority grammar. (The definitions in the
target logic themselves are not quite the same as Isar definitions, so I will
probably introduce new outer syntax commands for them, but in the end they
will translate to "plain", but more complex definitions satisfying the narrowly
defined inferential role of definitions in the hyper-intensional free logic of
the target system).

At the moment I'm trying to only parse the actual syntax of the language,
not to print it, i.e. a formula given in the syntax of the target logic, will
be translated to its representation in Isabelle's term logic and further
reasoning will be done in Isabelle's term logic directly (I think this should
alleviate the problem of transformed names like x_ and xa, but at a later
stage it might also be interesting to go further and print in the target
logic's syntax and continue reasoning in it as well, for which this indeed
will be an issue).

Given that I basically need to support something like mixfix notation for
extending the syntax and reproducing a suitable variant of the Isabelle term
language manually from strings or cartouches in the end seems like an
unnecessary duplication of work (especially since Pure/Syntax/syntax.ML, Pure/
Syntax/syntax_phases.ML, etc., are not exactly meant to be used like this),
I'm now trying to mainly stay within the Isabelle term language itself.
Interestingly, only now after looking through the syntax parsing ML code I
realize that I can do much more than I thought within its boundaries.

Simplified that approach will look like

consts valid :: "p => bool"
consts emb_implies :: "p => p => p"
nonterminal q
syntax valid :: "q => bool" ("[\Turnstile _]")
emb_implies "q => q => q" (infixl "->" 8)
emb_not "q => q" ("\<not>_" [40] 70)
"_atomic_formula" :: "id_position => q" ("_")
etc.

and then installing a parse_ast_translation for "_atomic_formula" which splits
up e.g. "Fxy" to "(_exe F x y)" and "xyF" to "(_enc x y F)", which is then
translated to the corresponding term in a parse_translation.

So far this is working out quite well and is alleviating the need to deal with
markup myself for now, so I will see how far I will get with this approach.

Regarding your suggestion to introduce something like "\<app>", printed as a
thin space: I actually tried this at one point and attempted to introduce
"\<delim>" to be printed as zero-width-space, but (1) this still makes it
harder to type formulas in the target logic and might actually be quite
confusing and (2) I had some issues with jedit at least with zero-width spaces
(I'm not sure about thin spaces anymore) - unfortunately I don't quite
remember what these issues were right now.

In any case: I think I can work with this for now and thank you very much for
your help so far and for clarifying the fact that the ML parsing
infrastructure for Isabelle's term language is probably best not to be re-/
abused for purposes like this.

In general in the long run it would, of course, be nice to have a self-
contained "reentrant" and "public" version of the parser for Isabelle's term
language in ML, in which every aspect could be customized, including the
lexicon and tokenizing and maybe even the generation of "fresh" variable
names, etc. - but I realize that this is way too much to ask especially for my
very specific and probably rather exotic use case :-).

Best wishes,
Daniel Kirchner
signature.asc

Email Gateway (Aug 22 2022 at 20:42):

From: Makarius <makarius@sketis.net>
On 03/10/2019 17:29, Daniel Kirchner wrote:

To say a bit more about my application: I'm constructing a shallow semantic
embedding of a philosophical logic (Edward Zalta's "Theory of Abstract
Objects"). That logic is quite different from HOL in multiple ways, which makes
this an interesting research project. For example the language contains two
kinds of atomic formulas, i.e. Fxy "x and y 'exemplify' F" and xyF "x and y
'encode' F", neither of which can be implemented as mere function application
in an embedding in HOL. The variable names have type implications, e.g. x and
y have a different type than F and G, which again have a different type from
e.g. \kappa, etc. - some of these types map directly to types in HOL, others I
need to implement with guarding behind additional constants, etc.

Thanks for the general application outline -- it is always important to
keep the big picture in mind. A language that is very different from the
Isabelle/Pure lambda calculus (which is reused in the Isabelle/HOL
library) definitely poses a challenge. It will require very careful
exploration to do just the right thing, otherwise it will lead to an
unmaintainable blob of "code" that nobody can understand nor maintain.

The target logic itself also defines new symbols in terms of its basic symbols,
so the implementation should be extensible very much like with mixfix
annotations to constants or "syntax" commands. That's why I'm trying to avoid
duplicating an implementation of a priority grammar. (The definitions in the
target logic themselves are not quite the same as Isar definitions, so I will
probably introduce new outer syntax commands for them, but in the end they
will translate to "plain", but more complex definitions satisfying the narrowly
defined inferential role of definitions in the hyper-intensional free logic of
the target system).

Note that there are different levels of complexity rolled into Isabelle
term notation. A command like 'notation' with plain mixfix is relatively
close to the priority grammer: if you subtract special things like
'binder' from the mixfix syntax, it is even simpler. In contrast,
'syntax' + 'translations' (or 'parse_translation' etc.) opens a not so
clearly defined space of adhoc syntax additions. If your own language
does not require that, it would greatly simplify a standalone
implementation of it (one that is independent of the Isabelle "syntax
phases").

At the moment I'm trying to only parse the actual syntax of the language,
not to print it, i.e. a formula given in the syntax of the target logic, will
be translated to its representation in Isabelle's term logic and further
reasoning will be done in Isabelle's term logic directly (I think this should
alleviate the problem of transformed names like x_ and xa, but at a later
stage it might also be interesting to go further and print in the target
logic's syntax and continue reasoning in it as well, for which this indeed
will be an issue).

BTW, printing also means document output of the sources in HTML or
LaTeX. There is an implicit assumption that adjacent letters for one
word, not individual identifiers. Compare this with LaTeX typesetting of
$abc$ vs. \emph{abc}. Presently, there is no proper way in the document
preparation system to change the typesetting of names, but it is
conceivable that a future notion of antiquotations within terms could
also go through document presentation.

I'm now trying to mainly stay within the Isabelle term language itself.
Interestingly, only now after looking through the syntax parsing ML code I
realize that I can do much more than I thought within its boundaries.

Simplified that approach will look like

consts valid :: "p => bool"
consts emb_implies :: "p => p => p"
nonterminal q
syntax valid :: "q => bool" ("[\Turnstile _]")
emb_implies "q => q => q" (infixl "->" 8)
emb_not "q => q" ("\<not>_" [40] 70)
"_atomic_formula" :: "id_position => q" ("_")
etc.

and then installing a parse_ast_translation for "_atomic_formula" which splits
up e.g. "Fxy" to "(_exe F x y)" and "xyF" to "(_enc x y F)", which is then
translated to the corresponding term in a parse_translation.

This looks much better to me. It deescalates the overall complexity of
the implementation by some orders of magnitude.

Apart from id_position you can also experiment with cartouches for
certain well-defined sub-languages.

In general in the long run it would, of course, be nice to have a self-
contained "reentrant" and "public" version of the parser for Isabelle's term
language in ML, in which every aspect could be customized, including the
lexicon and tokenizing and maybe even the generation of "fresh" variable
names, etc. - but I realize that this is way too much to ask especially for my
very specific and probably rather exotic use case :-).

This sounds a bit like marketing talk for "software frameworks". Such
software components don't exist, and where people have tried it its has
lead to maintenance nightmares.

The inner syntax implementation of Isabelle/Pure is the opposite of
that. It has many special tricks based on the particular situation to
make it just work by accident. I have reworked the architecture and
implementation many times in the past 25 years. A few more reforms of it
are still in the pipeline, e.g. subterm PIDE markup as seen in the tiny
rail example.

Such reforms naturally break bad applications of it -- or the other way
round: bad applications prevent further reforms.

Makarius
signature.asc

Last updated: Jul 05 2025 at 20:22 UTC