[isabelle] sloccount for .thy files? · Archive Mirror: Isabelle Users Mailing List

Stream: Archive Mirror: Isabelle Users Mailing List

Topic: [isabelle] sloccount for .thy files?

Email Gateway (Aug 19 2022 at 07:53):

From: Rafal Kolanski <xs@xaph.net>
Dear Isabelle Users,

We need something like sloccount (count lines of code in a file with
comments removed) for Isabelle. It seems I have to hack up such a tool,
but I worry that trying to do so quickly may result in not getting it
right. I would much rather piggyback on a tool written by someone else
already.

Does anyone know of such a tool, or a hackable approximation of one?

Sincerely,

Rafal Kolanski.

Email Gateway (Aug 19 2022 at 07:53):

From: Makarius <makarius@sketis.net>
I don't know any approximations, only the official Isabelle functions for
that. (According to Isabelle2012.)

This is how it works in Isabelle/Scala, e.g. with the "isabelle scala"
toplevel loop:

val syntax = isabelle.Outer_Syntax.empty
syntax.scan("theory (* (* \n *) ) A { *}")

You can then walk through the token list and remove comments, count
newlines etc.

Note that the empty outer syntax does not know about any keywords, but
that should be OK in this application.

There are also similar operations in src/Pure/Thy/thy_syntax.ML, but
Isabelle/ML system programming is a bit old-fashioned. It is worth
spending a little time getting acquainted with Isabelle/Scala; such an
investment will pay off rather quickly for other projects.

Makarius

Email Gateway (Aug 19 2022 at 07:53):

From: "C. Diekmann" <diekmann@in.tum.de>
Hi,

[...] or a hackable approximation of one?

$ cat .thy | awk '/$\/,/\*$/{ next } 1' | wc -l

Removes all (* ) blocks and counts the lines. Empty lines, text { *}
blocks and --"" comments are not stripped.

With kind regards,
Cornelius

Email Gateway (Aug 19 2022 at 08:04):

From: Makarius <makarius@sketis.net>
This threads appear to be inconclusive. Did you prefer awk or Scala?

Anyway, here is a little toy for isabelle scala (as of Isabelle2012),
which can be run on its toplevel loop:

import isabelle._

def lines_without_comments(text: String): Int =
{
val newlines =
for {
token <- Outer_Syntax.empty.scan(text).iterator
if !token.is_comment
c <- token.source.iterator
if c == '\n'
} yield 1
val n = (0 /: newlines)(_ + _)
if (text.endsWith("\n")) n else n + 1
}

To get the text of a file, you can use scala.io like this:

val text = scala.io.fromFile("A.thy").mkString

Next time there will be more Isabelle-ish I/O operations.

Makarius

Email Gateway (Aug 19 2022 at 08:29):

From: Rafal Kolanski <xs@xaph.net>
On 01/08/12 06:53, Makarius wrote:

On Wed, 25 Jul 2012, Rafal Kolanski wrote:

We need something like sloccount (count lines of code in a file with
comments removed) for Isabelle.

This threads appear to be inconclusive. Did you prefer awk or Scala?

Anyway, here is a little toy for isabelle scala (as of Isabelle2012),
which can be run on its toplevel loop:

I have ran your program on the attached Scratch.thy, and got back the
result "19", which is identical to the result of wc -l. That can't be
right! Any thoughts? I have just started learning scala today in order
to understand this snippet, and I don't yet see what's wrong.

Sincerely,

Rafal Kolanski.

import isabelle._

def lines_without_comments(text: String): Int =
{
val newlines =
for {
token <- Outer_Syntax.empty.scan(text).iterator
if !token.is_comment
c <- token.source.iterator
if c == '\n'
} yield 1
val n = (0 /: newlines)(_ + _)
if (text.endsWith("\n")) n else n + 1
}

To get the text of a file, you can use scala.io like this:

val text = scala.io.fromFile("A.thy").mkString

Next time there will be more Isabelle-ish I/O operations.

Makarius
Scratch.thy

Email Gateway (Aug 19 2022 at 08:30):

From: Christian Sternagel <c-sterna@jaist.ac.jp>
Dear Rafal,

Scala's "for" is more or less like Haskell's list comprehension. The
above for-expression does the following:

iterate over all tokens,
omit comment tokens
for every other token, iterate through its characters
only consider newline characters

Furthermore, I think there is a typo in the script, the last
if-then-else should have swapped branches, i.e., add 1 if the file ends
with a newline character.

After correcting the typo, the script counts all newlines that are not
part of comments. Your attached Scratch.thy had 20 lines. It contains
only a single comment (at least what _.is_comment considers as comment),
namely (* more comments\n *), which just contains a single new-line,
thus the script is right.

To see how the different tokens are categorized you can do the following:

$ isabelle scala
import isabelle._
val text = scala.io.Source.fromFile("Scratch.thy").mkString
val tokens = Outer_Syntax.empty.scan(text).toList
tokens.foreach { x => print(x.toString + "\n") }

hope this helps

chris

Email Gateway (Aug 19 2022 at 08:31):

From: Makarius <makarius@sketis.net>
On Wed, 1 Aug 2012, Christian Sternagel wrote:

} yield 1
val n = (0 /: newlines)(_ + _)
if (text.endsWith("\n")) n else n + 1
}

Furthermore, I think there is a typo in the script, the last if-then-else
should have swapped branches, i.e., add 1 if the file ends with a newline
character.

My idea was to count the lines independently of the termination of the
last line. If the text ends with "\n" it had already been counted as part
of the tokens. The example Scratch.thy has 20 lines but \n newline
characters.

Anyway, this is just a toy. It shows that there are fine points that
might need to be addressed, so a decent programming language helps.

import isabelle._
val text = scala.io.Source.fromFile("Scratch.thy").mkString
val tokens = Outer_Syntax.empty.scan(text).toList
tokens.foreach { x => print(x.toString + "\n") }

Thanks for the correction of scala.io.Source.fromFile.

The last line can be done like this:

tokens foreach println

Makarius

Email Gateway (Aug 19 2022 at 08:32):

From: Makarius <makarius@sketis.net>
Your example contains various Isar markup commands like this:

section {* ... *}
text {* ... *}

These are not comments, they are part of the formal text (and they consist
of more than one token anyway).

You could filter out both (* ... ) and { ... *} tokens like this:

if !(token.is_comment || token.kind == Token.Kind.VERBATIM)

Although that is not very precise, since verbatim tokens can occur in
other non-markup commands as well, say 'ML' or 'method_setup'.

It requires a little bit of outer syntax parsing to detect

MARKUP_COMMAND VERBATIM_TOKEN

separated by whitespace or comments (or tags). See
src/Pure/Thy/thy_output.ML:present_thy/markup for the fully precise way to
detect that form.

To discriminate markup command tokens in Scala you need information from
the outer syntax taken from the prover session, lets say from
Isabelle/Pure or HOL as approximation (assuming that the application does
not introduce its own markup commands). The following table has been
guessed from a a search for Outer_Syntax.markup_command on the ML sources:

val markup_command =
Set(
"header",
"chapter",
"section",
"subsection",
"subsubsection",
"text",
"text_raw",
"sect",
"subsect",
"subsubsect",
"txt",
"txt_raw")

This set can then be used like a predicate in Scala, with normal
application notation:

token.is_command && markup_command(token.content)

Then you probably want to use parser combinators from the Scala library to
work out some structure of the Isar token list. See also
src/Pure/Isar/parse.scala and some other places that use that Parse
module.

Moreover, you probably also want to take care of vacous lines after the
stripping.

It depends what you want to count and to achieve in the end, how much
precision you want to apply.

Makarius

Email Gateway (Aug 19 2022 at 08:34):

From: Rafal Kolanski <xs@xaph.net>
Thank you very much. This gives me a very solid basis to start with,
even with my knowledge of scala at a day-old level. It also prevents me
from having to write yet another parser. Perhaps the resulting script
(after being tuned) will be of use to somebody. Once I have it, I will
share.

Sincerely,

Rafal Kolanski.

Last updated: Jul 12 2025 at 08:27 UTC