[isabelle] Parsing term is easy with Scala, can similar a... · Mirror: Isabelle Users Mailing List

Stream: Mirror: Isabelle Users Mailing List

Topic: [isabelle] Parsing term is easy with Scala, can similar a...

Email Gateway (Apr 04 2021 at 12:37):

From: Alex Meyer <alex153@outlook.lv>
I have came up with the code (using https://github.com/dominique-unruh/scala-isabelle/) that parses term into tree-like structure:

val setup = Isabelle.Setup(isabelleHome = Path.of(isabelleHome), logic = "HOL", build=false)
implicit val isabelle: Isabelle = new Isabelle(setup)
val ctxt = Context("HOL.Imperative_HOL.ex.Imperative_Quicksort")

val test_term = Term(ctxt, "two_integer_max_case_def a b = (case a > b of True \\<Rightarrow> a | False \\<Rightarrow> b)")
println("After test term")
println(test_term.getClass())
println("test_term: " + test_term.pretty(ctxt))
val jsonString = write(test_term)
println("After write test term")

def t_report2(term: Term, curr_string: String): String = term match {
case Const(p_name, p_type) => curr_string + " [Const " + p_name + "] "
case Free(p_name, p_type) => curr_string + " [Free " + p_name + "] "
case Var(p_name, p_index, p_type) => curr_string + " [Var " + p_name + p_index + "] "
case Abs(p_name, p_type, p_body_term) => curr_string + " [Abs " + p_name + p_body_term.pretty(ctxt) + t_report2(p_body_term, curr_string) + "] "
case Bound(p_index) => curr_string + " [Bound " + p_index + "] "
case App(p_term_1, p_term_2) => curr_string + " [App " + p_term_1.pretty(ctxt) + t_report2(p_term_1, curr_string) + " ||| " + p_term_2.pretty(ctxt) + t_report2(p_term_2, curr_string) + "] "
//final case class Const(name: String, typ: Typ) // Corresponds to ML constructor 'Const'
//final case class Free(name: String, typ: Typ) // Corresponds to ML constructor 'Free'
//final case class Var(name: String, index: Int, typ: Typ) // Corresponds to ML constructor 'Var'
//final case class Abs(name: String, typ: Typ, body: Term) // Corresponds to ML constructor 'Abs'
//final case class Bound private (index: Int) // Corresponds to ML constructor 'Bound'
//final case class App private (fun: Term, arg: Term) // Corresponds to ML constructor '$'
case _ => curr_string + " [Other] "
}
println(t_report2(test_term, ""))

The resulting string representation is something like (I am still doing the right indentation):

[App (=) (two_integer_max_case_def a b)

[App (=) [Const HOL.eq] |||

two_integer_max_case_def a b

[App two_integer_max_case_def a

[App two_integer_max_case_def

[Free two_integer_max_case_def] |||

a [Free a]

] |||

b [Free b]

]

] |||

case b < a of True ⇒ a | False ⇒ b

[App case_bool a b

[App case_bool a

[App case_bool

[Const Product_Type.bool.case_bool] |||

a [Free a]

] ||| b [Free b] ] ||| b < a [App (<) b [App (<) [Const Orderings.ord_class.less] ||| b [Free b] ] ||| a [Free a] ] ] ]

This is the result which I tried to find all the time! And now I am wondering - what I am missing here? Why so many people have told me that this endeavour is so hard? But, currently, unfortunatly, it applies to the terms (inner syntax) only. Such can is possible because of the (from https://javadoc.io/doc/de.unruh/scala-isabelle_2.13/latest/de/unruh/isabelle/pure/Term.html):

sealed abstract class Term
final case class Const(name: String, typ: Typ) // Corresponds to ML constructor 'Const'
final case class Free(name: String, typ: Typ) // Corresponds to ML constructor 'Free'
final case class Var(name: String, index: Int, typ: Typ) // Corresponds to ML constructor 'Var'
final case class Abs(name: String, typ: Typ, body: Term) // Corresponds to ML constructor 'Abs'
final case class Bound private (index: Int) // Corresponds to ML constructor 'Bound'
final case class App private (fun: Term, arg: Term) // Corresponds to ML constructor '$'

(And I am still digesting what it means because in Java someone has have introspection/reflection into the function name and arguments but not into function body/code content, but this Scala approach seemingly allows to dive into the content of the body of the class or function!).

My question is - is there similar code (with mathc) possible for parsing theory? i.e. is there
sealed abstract class Theory (or MLValue)
final case class Theorem(...)
final case class Definition(...)

Something like that? Or something completely different one should use for the theory (outer syntax)?

Isabelle provides parsing and pretty-printing facilities for the terms.

Alex

Email Gateway (Apr 05 2021 at 00:09):

From: Dominique Unruh <unruh@ut.ee>
Hello,

And now I am wondering - what I am missing here? Why so many people have
told me that this endeavour is so hard?

I don't think anyone said that it is hard to parse a term. That is easy and
well-supported both in Isabelle/ML and via scala-isabelle.

What is difficult is to translate a theory into a data-structure.

So, to summarize, what is available is (both in ML and Scala):

- Converting a string into a term object (parsing)
- Inspecting a term (because a term is a datastructure).
- Creating theorem objects (e.g., by proving them, or by loading them
from an existing theory).

- Inspecting a theorem object (namely by getting the term corresponding
to it).

- Creating theory objects (e.g., by executing a .thy file).
- Getting all theorems from a theory. (And then inspecting them as
above.)

- Splitting a theory file into commands (like "lemma ..." or "apply
...").

- Parsing a command (but not into an inspectable datastructure, you
simply get a function that can be applied to the the current state of a
theory)

What is not available (to the best of my knowledge):

- Get an AST-like datastructure from parsing a theory. That is because
this is just never creating in the process of parsing a .thy file.

What is available (but I don't know how to best extract it, for this
Makarius' emails are relevant):

- Getting the PIDE markup of a theory. Basically, this is some
annotation of the theory (e.g., it might say something like "characters
5-15 are the name of a theorem" and "character 100-101 are the name of a
variable of type nat"). I don't know the extent of information available
here.

Best wishes,
Dominique.

Last updated: Jul 18 2025 at 20:25 UTC