From: Alex Meyer <firstname.lastname@example.org>
Regarding my previous question - I am reading a lot and now I am aware of the:
1) Isabelle already has read(parse, check) facilities for the inner syntax (the main target of my efforts) that returns list of fully annotated terms or types. The question may be - how to parse this textual information, but this - I guess - is being handled by additional efforts - next point:
2) Google search for "isabelle theory as xml" returns articles about getting ACL2 specification from Isabelle theories, getting MMT representation from the Isabelle theories - so - I am guess, that by reading those articles I will arrive at conclusion that my transformation problem is already solved and that there can be already existing tools.
So, for now I am going ahead with reading. Sorry for asking before diving in articles and Google.
No: Alex Meyer
Nosūtīts: sestdiena, 2021. gada 30. janvāris 04:00
Kam: Isabelle Users <email@example.com>
Tēma: Isabelle code for getting in-memory representation (abstract syntax trees) for complete theory file (tree of loaded theories)
I am trying to find Isabelle code (parser and data structures for AST/abstract syntax trees) that can parse theory file (tree of loaded theory files) into in-memory structure like abstract syntax tree.
It would be nice to have some quick pointers to the source files. All the details, of course, I am reading myself in source code and implementation/developer manuals (Isabelle Cookbook especially has chapters about parsing).
So far I have found parser and AST structure for Pure:
and for complete theory files:
But I am trying to grasp the following 3 things:
1) even the simplest theories contain infix commands that introduce new symbols with additionl grammatical rules with infix (even ToyLIst have them). Isar/parse.ML has commands for processing infix/mixfix - am I right that parse.ML tries to do such parsing in 2 steps: 1) builds extended grammar which includes additional grammatical rules from infix commands; 2) parses original theory file with extended grammar in the second pass?
2) one usually works with trees fo theory files. Is Isar/parse.ML the top level parser that loads most specific theory file and and constructs the instance of Isar/local_theory.ML. And when use request to load the base theory files, then Isar/parse.ML can decorate/extende Isar/local_theory.ML instance with the additional branches/leaves that correspond to the data in the base files?
3) Is parsing (the trees of theory files) really so simple? Or I am on the wrong path?
I had the question with similar ideas but with different technologies in mind last summer: https://lists.cam.ac.uk/pipermail/cl-isabelle-users/2020-July/msg00025.html and I was strongly discouraged from such parsing and in-memory manipulation of theory files.
But now - as I am looking on the Isabelle code - it seems to me that such parsing can be done with src/Pure/Isar code and arriving at the in-memory theory representation is not that hard? Or I am wrong?
My intention is to have full control over the representation and manipulation of the code in theory files:
1) maybe I would like to parse or generate thy files with the grammatical framework, e.g., translate the sentence in natural language into thy file (similar efforts have been done in Coq). Just for the purposes of representation and open-ended deduction, no need to have certified proofs.
2) maybe I would like to preprocess code of theory files for the processing with neural networks in the style of https://arxiv.org/abs/2006.09265 I have no clear thought yet, but I guess that this research can be extended if I can have full controle for the in-memory structurs of theory files.
So - it would be nice to have some pointers. Just pointers, all remaining I will explore myself.
I am sure that I will managed to do this, but pointers can make some shortcuts.
Last updated: Sep 28 2021 at 19:14 UTC