From: Rafal Kolanski <xs@xaph.net>
Dear Isabelle Users,
We need something like sloccount (count lines of code in a file with 
comments removed) for Isabelle. It seems I have to hack up such a tool, 
but I worry that trying to do so quickly may result in not getting it 
right. I would much rather piggyback on a tool written by someone else 
already.
Does anyone know of such a tool, or a hackable approximation of one?
Sincerely,
Rafal Kolanski.
From: Makarius <makarius@sketis.net>
I don't know any approximations, only the official Isabelle functions for 
that.  (According to Isabelle2012.)
This is how it works in Isabelle/Scala, e.g. with the "isabelle scala" 
toplevel loop:
val syntax = isabelle.Outer_Syntax.empty
   syntax.scan("theory (* (* \n *) ) A { *}")
You can then walk through the token list and remove comments, count 
newlines etc.
Note that the empty outer syntax does not know about any keywords, but 
that should be OK in this application.
There are also similar operations in src/Pure/Thy/thy_syntax.ML, but 
Isabelle/ML system programming is a bit old-fashioned.  It is worth 
spending a little time getting acquainted with Isabelle/Scala; such an 
investment will pay off rather quickly for other projects.
Makarius
From: "C. Diekmann" <diekmann@in.tum.de>
Hi,
[...] or a hackable approximation of one?
$ cat .thy | awk '/\(\/,/\*\)/{ next } 1' | wc -l
Removes all (* ) blocks and counts the lines. Empty lines, text { *}
blocks and --"" comments are not stripped.
With kind regards,
Cornelius
From: Makarius <makarius@sketis.net>
This threads appear to be inconclusive.  Did you prefer awk or Scala?
Anyway, here is a little toy for isabelle scala (as of Isabelle2012), 
which can be run on its toplevel loop:
import isabelle._
def lines_without_comments(text: String): Int =
{
   val newlines =
     for {
       token <- Outer_Syntax.empty.scan(text).iterator
       if !token.is_comment
       c <- token.source.iterator
       if c == '\n'
     } yield 1
   val n = (0 /: newlines)(_ + _)
   if (text.endsWith("\n")) n else n + 1
}
To get the text of a file, you can use scala.io like this:
val text = scala.io.fromFile("A.thy").mkString
Next time there will be more Isabelle-ish I/O operations.
Makarius
From: Rafal Kolanski <xs@xaph.net>
On 01/08/12 06:53, Makarius wrote:
On Wed, 25 Jul 2012, Rafal Kolanski wrote:
We need something like sloccount (count lines of code in a file with
comments removed) for Isabelle.This threads appear to be inconclusive. Did you prefer awk or Scala?
Anyway, here is a little toy for isabelle scala (as of Isabelle2012),
which can be run on its toplevel loop:
I have ran your program on the attached Scratch.thy, and got back the 
result "19", which is identical to the result of wc -l. That can't be 
right! Any thoughts? I have just started learning scala today in order 
to understand this snippet, and I don't yet see what's wrong.
Sincerely,
Rafal Kolanski.
import isabelle._
def lines_without_comments(text: String): Int =
{
val newlines =
for {
token <- Outer_Syntax.empty.scan(text).iterator
if !token.is_comment
c <- token.source.iterator
if c == '\n'
} yield 1
val n = (0 /: newlines)(_ + _)
if (text.endsWith("\n")) n else n + 1
}To get the text of a file, you can use scala.io like this:
val text = scala.io.fromFile("A.thy").mkString
Next time there will be more Isabelle-ish I/O operations.
Makarius
Scratch.thy
From: Christian Sternagel <c-sterna@jaist.ac.jp>
Dear Rafal,
Scala's "for" is more or less like Haskell's list comprehension. The 
above for-expression does the following:
Furthermore, I think there is a typo in the script, the last 
if-then-else should have swapped branches, i.e., add 1 if the file ends 
with a newline character.
After correcting the typo, the script counts all newlines that are not 
part of comments. Your attached Scratch.thy had 20 lines. It contains 
only a single comment (at least what _.is_comment considers as comment), 
namely (* more comments\n  *), which just contains a single new-line, 
thus the script is right.
To see how the different tokens are categorized you can do the following:
$ isabelle scala
import isabelle._
val text = scala.io.Source.fromFile("Scratch.thy").mkString
val tokens = Outer_Syntax.empty.scan(text).toList
tokens.foreach { x => print(x.toString + "\n") }
hope this helps
chris
From: Makarius <makarius@sketis.net>
On Wed, 1 Aug 2012, Christian Sternagel wrote:
} yield 1
val n = (0 /: newlines)(_ + _)
if (text.endsWith("\n")) n else n + 1
}Furthermore, I think there is a typo in the script, the last if-then-else
should have swapped branches, i.e., add 1 if the file ends with a newline
character.
My idea was to count the lines independently of the termination of the 
last line.  If the text ends with "\n" it had already been counted as part 
of the tokens.  The example Scratch.thy has 20 lines but \n newline 
characters.
Anyway, this is just a toy.  It shows that there are fine points that 
might need to be addressed, so a decent programming language helps.
import isabelle._
val text = scala.io.Source.fromFile("Scratch.thy").mkString
val tokens = Outer_Syntax.empty.scan(text).toList
tokens.foreach { x => print(x.toString + "\n") }
Thanks for the correction of scala.io.Source.fromFile.
The last line can be done like this:
tokens foreach println
Makarius
From: Makarius <makarius@sketis.net>
Your example contains various Isar markup commands like this:
section {* ... *}
   text {* ... *}
These are not comments, they are part of the formal text (and they consist 
of more than one token anyway).
You could filter out both (* ... ) and { ... *} tokens like this:
if !(token.is_comment || token.kind == Token.Kind.VERBATIM)
Although that is not very precise, since verbatim tokens can occur in 
other non-markup commands as well, say 'ML' or 'method_setup'.
It requires a little bit of outer syntax parsing to detect
MARKUP_COMMAND VERBATIM_TOKEN
separated by whitespace or comments (or tags).  See 
src/Pure/Thy/thy_output.ML:present_thy/markup for the fully precise way to 
detect that form.
To discriminate markup command tokens in Scala you need information from 
the outer syntax taken from the prover session, lets say from 
Isabelle/Pure or HOL as approximation (assuming that the application does 
not introduce its own markup commands).  The following table has been 
guessed from a a search for Outer_Syntax.markup_command on the ML sources:
val markup_command =
     Set(
       "header",
       "chapter",
       "section",
       "subsection",
       "subsubsection",
       "text",
       "text_raw",
       "sect",
       "subsect",
       "subsubsect",
       "txt",
       "txt_raw")
This set can then be used like a predicate in Scala, with normal 
application notation:
token.is_command && markup_command(token.content)
Then you probably want to use parser combinators from the Scala library to 
work out some structure of the Isar token list.  See also 
src/Pure/Isar/parse.scala and some other places that use that Parse 
module.
Moreover, you probably also want to take care of vacous lines after the 
stripping.
It depends what you want to count and to achieve in the end, how much 
precision you want to apply.
Makarius
From: Rafal Kolanski <xs@xaph.net>
Thank you very much. This gives me a very solid basis to start with,
even with my knowledge of scala at a day-old level. It also prevents me
from having to write yet another parser. Perhaps the resulting script
(after being tuned) will be of use to somebody. Once I have it, I will
share.
Sincerely,
Rafal Kolanski.
Last updated: Oct 30 2025 at 20:23 UTC