Isabelle to XML · Beginner Questions

Hi, I want to write a tool that can consume, manipulate, and produce THY files. For instance, a tool that can tell me which lemmas in a THY file are unused. How might I do this? It would be ideal if there were a tool that could export/import THY files to/from XML files, as I know how to parse XML. Does that exist? Thanks.

Wolfgang Jeltsch (Sep 10 2023 at 17:29):

Hmm, converting to and from a particular file format simply because you know how to parse it doesn’t seem to be a good idea. Wouldn’t it be better to parse theory files directly?

John Wickerson (Sep 12 2023 at 08:24):

Let me generalise my question then: how can I access and modify the AST of a THY file? I don’t want to build my own parser/pretty-printer for THY files, as that would be reinventing the wheel.

Wolfgang Jeltsch (Sep 12 2023 at 14:38):

That’s a good general question. :smile: Unfortunately, I’m not deep enough into Isabelle that I could satisfyingly answer this question; others here might be able to do so. That said, my understanding is that there is some ML code in the core part of Isabelle for creating and modifying abstract syntax trees for _inner_ syntax (terms and types), but _outer_ syntax fragments, the parts _around_ terms and types, are not really parsed but rather processed directly using ML code, with _some_ additional foundation using finite state machines.

Xiaokun Luan (Sep 19 2023 at 08:59):

I'm looking for the same thing as you, also as a beginner. I noticed that there is a SideKick panel in jEdit IDE showing the structure of the file. But I haven't figured out how to call SideKick parser from the command line.

Yutaka Nagashima (Sep 21 2023 at 01:00):

John Wickerson said:

Let me generalise my question then: how can I access and modify the AST of a THY file? I don’t want to build my own parser/pretty-printer for THY files, as that would be reinventing the wheel.

Dear John,

I noticed you're keen on writing a tool for identifying unused lemmas in a THY file. While the idea is great, I believe using the AST of a THY file might not yield the results you're hoping for.

For instance, if you prove a lemma named "foo" and store it in the default simp set of the simp method in a THY file. Determining whether "foo" has been used or not becomes challenging when we rely solely on the AST of the THY file without running Isabelle.

Just wanted to share this observation to potentially save you some time and effort. Let me know if you have any questions or if there's any other way I can assist!

Best wishes,
Yutaka

Lukas Stevens (Sep 21 2023 at 10:58):

If you are just interested in finding unused theorems, then you can use unused_thms

Lukas Stevens (Sep 21 2023 at 11:00):

More generally, while it is certainly possible to obtain a syntax tree of the outer syntax, you can't really parse inner syntax of Isabelle without running Isabelle itself. This is due to the complexity of inner syntax that allows you to do things like parse translations, adhoc overloading, etc.

Lukas Stevens (Sep 21 2023 at 11:03):

There is a linter that works with the AST of the outer syntax. Perhaps @Fabian Huch can point you to the file where the AST of the outer syntax is obtained.

Wenda Li (Sep 21 2023 at 11:25):

Lukas Stevens said:

There is a linter that works with the AST of the outer syntax. Perhaps Fabian Huch can point you to the file where the AST of the outer syntax is obtained.

We should have such terrific tools highlighted somewhere they are easily overlooked and underutilised...

Lukas Stevens (Sep 21 2023 at 11:40):

With NixOS, you just do isabelle.withComponents (components: [ components.linter ]) :D

Lukas Stevens (Sep 21 2023 at 11:41):

We could add it to isabelle.systems

Lukas Stevens (Sep 21 2023 at 11:42):

It is already at https://isabelle.systems/addons

Wenda Li (Sep 21 2023 at 11:47):

Lukas Stevens said:

We could add it to isabelle.systems

This website is awesome!

Fabian Huch (Sep 21 2023 at 12:10):

There is no "Isabelle AST" for outer syntax, at least not in the classical sense. The reason for this is that defining syntax in Isabelle is a user-space operation, i.e., no static syntax tree can even exist. The abstraction that we have are categorized tokens, i.e., you can statically tokenize Isabelle theories, where the token information yields you what kind it is.
If you want more than that, you will want to work with Isabelle code that has already been parsed by the actual syntax parsers (i.e., you need to process the content in Isabelle first). Then, you get the tokens with semantic markup -- syntax trees still don't exist as they don't exist in the parsing language. But the markup allows you to look at individual command spans and analyze tokens, defined entities, etc.
You could export that markup into xml (in fact, the representation is in xml internally), but you really want to use the machinery inside Isabelle -- otherwise you'll have to reinvent everything rather than just using a faithful implementation. You can do that either in Isabelle/Scala or Isabelle/ML.

Fabian Huch (Sep 21 2023 at 12:13):

The Isabelle linter does not actually parse Isabelle: Rather, it parsers the anti-patterns that people should avoid.

Fabian Huch (Sep 21 2023 at 12:19):

Finally, the actual problem of finding unused theorems can not be solved on a syntactical level as there are many ways in which theorems can be used that do not involve mentioning its name (e.g., fact, implicit rule, simpset or other lemma collections, ...)

Fabian Huch (Sep 21 2023 at 12:23):

If you insist on still trying to have static syntax trees of Isabelle, I can point you to the work of Yiannos Stathopolous: He built an Isabelle/Isar parser (at least on the Pure level IIRC) a while ago, though maintenance is an issue, and I'm not sure the code is publicly available (you can ask him though).

Last updated: Jul 13 2025 at 04:38 UTC

Stream: Beginner Questions

Topic: Isabelle to XML

John Wickerson (Sep 10 2023 at 09:18):