[isabelle] Clone detection for Isabelle theories · Archive Mirror: Isabelle Users Mailing List

Stream: Archive Mirror: Isabelle Users Mailing List

Topic: [isabelle] Clone detection for Isabelle theories

Email Gateway (Aug 22 2022 at 13:47):

From: Lars Hupel <hupel@in.tum.de>
Dear list,

I'd like to announce the availability of a technology preview of code
clone detection for Isabelle theories.

This is the result of a Master's Thesis carried out by Dominik Vinan
under my supervision, with the help of Benjamin Hummel. Benjamin's
company is the vendor of ConQAT, an open-source software quality
assurance toolkit, for which the student has implemented an Isabelle
frontend.

The purpose of code clone detection is simple: to find duplicated
specification or proof fragments in Isabelle theories.

I will present the tool and its accompanying paper at the Isabelle
Workshop in Nancy, but it is already possible to download and use it:

Paper
<http://www.in.tum.de/~nipkow/Isabelle2016/Isabelle2016_9.pdf>

ConQAT+Isabelle bundle

<https://www21.in.tum.de/~hupel/downloads/isabelle-clones/isabelle-clones-preview.zip>

The bundle should contain all instructions for feeding theories into
ConQAT. It should work on macOS and Linux (no guarantees for Windows).
The result is an HTML report of detected clones which can be viewed in
the browser.

If you try it out, please report any problems (or that you didn't have
problems) to me via mail so that the student and me can iron out bugs
before ITP.

Cheers
Lars

Email Gateway (Aug 22 2022 at 13:47):

From: Johannes Waldmann <johannes.waldmann@htwk-leipzig.de>
Interesting.

As far as I understood from the paper,
you are parsing source code with libisabelle,
which in turn requires a full Isabelle installation.
(Then - is parsing safe? Or is there danger of
code injection?)

I'd hope there is something like Language.Isabelle,
(just AST data type and parser - no semantics). cf.
https://hackage.haskell.org/package/haskell-src-exts
https://hackage.haskell.org/package/language-javascript

very hypothetically - is there a formal Isabelle(Isar) grammar
from which AST representation and parser/printer (in Haskell)
could be generated (mostly) automatically?

Best regards, J. Waldmann.

Email Gateway (Aug 22 2022 at 13:48):

From: Lars Hupel <hupel@in.tum.de>
Hi Johannes,

As far as I understood from the paper,
you are parsing source code with libisabelle,
which in turn requires a full Isabelle installation.
(Then - is parsing safe? Or is there danger of
code injection?)

no, it is unsafe; yes, there is danger of code injection. Loading
Isabelle theories with libisabelle is equivalent to loading them into
Isabelle/jEdit.

very hypothetically - is there a formal Isabelle(Isar) grammar
from which AST representation and parser/printer (in Haskell)
could be generated (mostly) automatically?

To quote the paper:

"Isabelle/Isar is the surface syntax for Isabelle theories. Because it
is user-extensible, it is impossible to parse statically."

Due to the presence of arbitrarily complicated parse translations (inner
syntax) and syntax parsers (outer syntax) written in ML the syntactic
analysis of Isar is equivalent to the halting problem. In that sense
Isar is very much like Perl.

Cheers
Lars

Email Gateway (Aug 22 2022 at 13:48):

From: Johannes Waldmann <johannes.waldmann@htwk-leipzig.de>
Hi.

Quoting out of context, on purpose:

... Isar is very much like Perl.

Great! Use this for advertising.
Should open up a whole new population of users ...

User-defined syntax is perhaps
more a matter of readability
(I guess it's mostly about operator precedences,
and a few special forms - unless you actively abuse it).
But code injection really scares me.

Mind you, Haskell is scary too, because I can do

{-# language TemplateHaskell #-}
import Language.Haskell.TH
$(runIO (print 42) >> return [])
main = return ()

which will do IO during compilation.

But that needs the language pragma to be present
(at the start of the file)
and I can call ghc (the compiler) with "-XSafe"
and it will reject such code outright.

This "safe Haskell" is a recent addition
to the language (well, to GHC actually).
https://ghc.haskell.org/trac/ghc/wiki/SafeHaskell

I wonder why Isabelle plans to abandon
its "safe" flag (as Lars mentioned in
https://lists.cam.ac.uk/pipermail/cl-isabelle-users/2016-July/msg00034.html
)

Best regards, Johannes.

Email Gateway (Aug 22 2022 at 13:48):

From: Manuel Eberl <eberlm@in.tum.de>

But that needs the language pragma to be present
(at the start of the file)
and I can call ghc (the compiler) with "-XSafe"
and it will reject such code outright.

From the Haskell documentation:

‘Safe Haskell, however, does not offer compilation safety. During
compilation time it is possible for arbitrary processes to be
launched, using for example the custom pre-processor flag. This can be
manipulated to either compromise a users system at compilation time,
or to modify the source code just before compilation to try to alter
set Safe Haskell flags.’

Also, I don't recall the details, but I think there were some problems
with Safe Haskell in the past. If you want to be (more) sure, you
probably still have to sandbox compilation and execution on the OS level.

I wonder why Isabelle plans to abandon
its "safe" flag (as Lars mentioned in
https://lists.cam.ac.uk/pipermail/cl-isabelle-users/2016-July/msg00034.html
)
First of all, the ability to write custom tactics in ML is an integral
part of Isabelle. Preventing people from doing that is too immense a
restriction for it to be a useful feature in pretty much any context
except course homework (where writing tactics is usually not done –
although I did in fact do that for one of my homework submissions to
make my life easier).

Secondly, if I recall, Isabelle's ‘safe’ mode isn't actually very safe.
I don't know what it prevents and what it does not prevent, but even
without the ability to embed arbitrary ML code, you can do all kinds of
nasty things with code generation and the ‘value’ command. Isabelle is a
huge system and as far as I am aware, none of it was ever designed with
a ‘malicious’ user in mind.

I guess if one were to go over the entire code base of Isabelle with the
express intent of finding possible security problems and then ensuring
that all of these are disallowed in the safe mode could result in a
version of Isabelle that provides a reasonable amount of confidence in
its safety, but that would be a big endeavour and the only real use case
would be automatic homework processing.

Considering that you can do the same thing with considerably less
effort, no restrictions imposed on the expressivity of Isabelle, and
with much better confidence in the safety of the approach by using
sandboxes/containers, I don't think there is any reason to invest time
and energy into a ‘safe mode’.

Cheers,

Manuel

Email Gateway (Aug 22 2022 at 13:48):

From: Lars Hupel <hupel@in.tum.de>

... Isar is very much like Perl.

Great! Use this for advertising.
Should open up a whole new population of users ...

This was on purpose ;-)

User-defined syntax is perhaps
more a matter of readability
(I guess it's mostly about operator precedences,
and a few special forms - unless you actively abuse it).
But code injection really scares me.

I think your mental model of Isabelle is slightly off. There's no point
in discussing the design decisions made decades ago when we're really
discussing how clone detection works with the system we have right now.

To rephrase: Isabelle has arbitrary user-extensible syntax and it has
arbitrary executable ML code. If you want to do realistic syntax
analysis today you have to accept that.

I'm not saying that in the future there couldn't be some other
mechanism. But it is what it is right now.

I wonder why Isabelle plans to abandon
its "safe" flag (as Lars mentioned in
https://lists.cam.ac.uk/pipermail/cl-isabelle-users/2016-July/msg00034.html
)

There are no plans to abandon it. It has already been abandoned.

But the reasoning goes as follows: Many people understand many different
things about what "secure mode" is supposed to mean. The fact that it
won't run custom ML code means neither that

a) malicious users are prevented from tricking the system into accepting
theorems which are not actually proven (e.g. by using axiomatization)

b) malicious users are prevented from affecting the operating system or
the machine running Isabelle

But it means that many legitimate things (e.g. custom tactics, loading
AFP libraries which contain ML code) are impossible.

Because of that it wasn't really useful to begin with. Having a "secure"
flag which doesn't actually protect anything is worse than having no
flag at all.

Again, in the future, there could be different mechanisms. But before
doing so we actually need to come up with a solid threat model and a
specification on what "secure" should mean.

Cheers
Lars

Email Gateway (Aug 22 2022 at 13:48):

From: Lars Hupel <hupel@in.tum.de>
Sorry, that probably came off harsher than intended. I was trying to say
that Isabelle supports all these things today and there is no reliable
way to turn them off.

Last updated: Oct 09 2025 at 01:37 UTC