[isabelle] Isabelle code for getting in-memory representa... · Mirror: Isabelle Users Mailing List

Stream: Mirror: Isabelle Users Mailing List

Topic: [isabelle] Isabelle code for getting in-memory representa...

Email Gateway (Jan 30 2021 at 02:01):

From: Alex Meyer <alex153@outlook.lv>
Hello!

I am trying to find Isabelle code (parser and data structures for AST/abstract syntax trees) that can parse theory file (tree of loaded theory files) into in-memory structure like abstract syntax tree.

It would be nice to have some quick pointers to the source files. All the details, of course, I am reading myself in source code and implementation/developer manuals (Isabelle Cookbook especially has chapters about parsing).

So far I have found parser and AST structure for Pure:
https://isabelle-dev.sketis.net/source/isabelle/browse/default/src/Pure/Syntax/parser.ML
https://isabelle-dev.sketis.net/source/isabelle/browse/default/src/Pure/Syntax/ast.ML
and for complete theory files:
https://isabelle-dev.sketis.net/source/isabelle/browse/default/src/Pure/Isar/parse.ML
https://isabelle-dev.sketis.net/source/isabelle/browse/default/src/Pure/Isar/local_theory.ML

But I am trying to grasp the following 3 things:

1) even the simplest theories contain infix commands that introduce new symbols with additionl grammatical rules with infix (even ToyLIst have them). Isar/parse.ML has commands for processing infix/mixfix - am I right that parse.ML tries to do such parsing in 2 steps: 1) builds extended grammar which includes additional grammatical rules from infix commands; 2) parses original theory file with extended grammar in the second pass?

2) one usually works with trees fo theory files. Is Isar/parse.ML the top level parser that loads most specific theory file and and constructs the instance of Isar/local_theory.ML. And when use request to load the base theory files, then Isar/parse.ML can decorate/extende Isar/local_theory.ML instance with the additional branches/leaves that correspond to the data in the base files?

3) Is parsing (the trees of theory files) really so simple? Or I am on the wrong path?
I had the question with similar ideas but with different technologies in mind last summer: https://lists.cam.ac.uk/pipermail/cl-isabelle-users/2020-July/msg00025.html and I was strongly discouraged from such parsing and in-memory manipulation of theory files.
But now - as I am looking on the Isabelle code - it seems to me that such parsing can be done with src/Pure/Isar code and arriving at the in-memory theory representation is not that hard? Or I am wrong?

My intention is to have full control over the representation and manipulation of the code in theory files:
1) maybe I would like to parse or generate thy files with the grammatical framework, e.g., translate the sentence in natural language into thy file (similar efforts have been done in Coq). Just for the purposes of representation and open-ended deduction, no need to have certified proofs.
2) maybe I would like to preprocess code of theory files for the processing with neural networks in the style of https://arxiv.org/abs/2006.09265 I have no clear thought yet, but I guess that this research can be extended if I can have full controle for the in-memory structurs of theory files.

So - it would be nice to have some pointers. Just pointers, all remaining I will explore myself.

I am sure that I will managed to do this, but pointers can make some shortcuts.

Thanks,
Alex

Email Gateway (Feb 03 2021 at 12:58):

From: Makarius <makarius@sketis.net>
On 30/01/2021 03:00, Alex Meyer wrote:

My intention is to have full control over the representation and manipulation
of the code in theory files:
1) maybe I would like to parse or generate thy files with the grammatical
framework, e.g., translate the sentence in natural language into thy file
(similar efforts have been done in Coq). Just for the purposes of
representation and open-ended deduction, no need to have certified proofs.
2) maybe I would like to preprocess code of theory files for the processing
with neural networks in the style of https://arxiv.org/abs/2006.09265 I have
no clear thought yet, but I guess that this research can be extended if I can
have full controle for the in-memory structurs of theory files.

This "full control" is not going to work: Isabelle cannot be put into a box
like that, it is far too flexible and powerful.

The language of Isabelle is an open-ended framework for arbitrary semantic
embeddings, usually implemented in Isabelle/ML. Over the decades, I have
provided means to "tap" some aspects of the internal representation of
Isabelle languages, with external markup in Isabelle/PIDE.

I had the question with similar ideas but with different technologies in mind
last summer:
https://lists.cam.ac.uk/pipermail/cl-isabelle-users/2020-July/msg00025.html
and I was strongly discouraged from such parsing and in-memory manipulation of
theory files.

That thread already provides good answers. In short: don't do it, but change
your perspective on the problem.

So - it would be nice to have some pointers. Just pointers, all remaining I
will explore myself.

https://github.com/Deducteam/isabelle_dedukti
https://github.com/UniFormal/MMT
https://github.com/qaware/isabelle-afp-search

At the bottom of these tools there are two different approaches:

(1) heavy headless PIDE session (as in "isabelle dump"): one big
Isabelle/ML/Scala process to crunch everything and export certain aspects

(2) session build + export database (as in "isabelle export" or
Isabelle/Dedukti above): this is a regular "isabelle build" with certain options

Generally note that the proper language for "Isabelle systems programming" is
Isabelle/Scala: not funny scripting languages like Python.

The Isabelle/Dedukti application shows how to wrap rather simple Scala modules
into Isabelle command line tools. That is for Isabelle2020, but it should be
easy to adapt to Isabelle2021 which will appear within approx. 2 weeks.

Makarius

Email Gateway (Feb 05 2021 at 19:39):

From: Wenda Li <wl302@cam.ac.uk>
Dear Alex,

2) maybe I would like to preprocess code of theory files for the processing with neural networks in the style of https://arxiv.org/abs/2006.09265 <https://arxiv.org/abs/2006.09265> I have no clear thought yet, but I guess that this research can be extended if I can have full controle for the in-memory structurs of theory files.

Regarding https://arxiv.org/abs/2006.09265, I was writing a parser for .thy files in Python to extract information. The code should be available from the supplementary material of the ICLR submission (https://openreview.net/forum?id=Pzj6fzU6wkj <https://openreview.net/forum?id=Pzj6fzU6wkj>). The framework is hacky and fragile but may be OK for early experimentation or quick prototyping. Feel free to reuse it as you see fit.

For serious and robust development, I, too, believe the Isabelle/Scala interface is the best option. Dominique’s scala-isabelle library (https://github.com/dominique-unruh/scala-isabelle <https://github.com/dominique-unruh/scala-isabelle>) could be of great use.

Wenda

Email Gateway (Feb 05 2021 at 19:58):

From: Makarius <makarius@sketis.net>
I still don't understand the purpose of it.

Why not use Isabelle/Scala directly, it is an integral part of Isabelle?

Makarius

Email Gateway (Feb 05 2021 at 20:56):

From: Wenda Li <wl302@cam.ac.uk>
Sorry for the confusion I caused, Makarius. I was referring to the scala-isabelle library as an example of interacting with Isabelle through a Scala interface. Depending on one's needs, using Isabelle/Scala directly could, of course, be the idiomatic solution.

Wenda

Last updated: Jul 18 2025 at 20:25 UTC