[isabelle] soundness of Isabelle/HOL · Archive Mirror: Isabelle Users Mailing List

Stream: Archive Mirror: Isabelle Users Mailing List

Topic: [isabelle] soundness of Isabelle/HOL

Email Gateway (Aug 18 2022 at 19:01):

From: Matthias Schmalz <Matthias.Schmalz@inf.ethz.ch>
Hi,

Out of curiosity and for my thesis introduction, I have the following
question.
I am wondering whether there is a theorem prover out there that gives
stronger soundness guarantees than Isabelle/HOL and whether there is
empirical evidence showing that the difference is practically relevant.
I would also like to know when the last unsoundness bug in Isabelle's
inference core has been observed.

I already know that Isabelle follows the LCF approach and that HOL is
built from a modest number of axioms using conservative extension
methods. It is therefore very likely that proofs by Isabelle are
correct. I also know that this soundness guarantee is restricted to the
inference core; for example, nothing prevents users from configuring the
parser to parse "False" as "True" and therefore giving the impression
that "False" can be proved. (And of course, soundness rests on the
assumption that compiler, ML libraries, operating system, and hardware
behave correctly.)

Many thanks in advance,
Matthias

Email Gateway (Aug 18 2022 at 19:01):

From: Lars Noschinski <noschinl@in.tum.de>
I think HOL Zero tries to make sure that you cannot reconfigure parser
and pretty-printer in a way that confuses the user. If you search for
mark@proof-technologies.com on this list, you should find some
discussion on this topic.

-- Lars

Email Gateway (Aug 18 2022 at 19:01):

From: Ramana Kumar <rk436@cam.ac.uk>
HOL Zero (http://proof-technologies.com/holzero.html) was mentioned already.
HOL Light has had some self-verification applied (
http://www.cl.cam.ac.uk/~jrh13/papers/holhol.html).
Jitawa is a theorem prover verified in HOL (
http://www.cl.cam.ac.uk/~mom22/jitawa/).
For more about the idea you mentioned, parsing "False" as "True", see
Pollack Inconsistency (http://www.cs.ru.nl/~freek/pubs/rap.pdf).

Email Gateway (Aug 18 2022 at 19:02):

From: Ramana Kumar <rk436@cam.ac.uk>
Sorry - Jitawa is the verified runtime. The self-verifying prover on top of
it is called Milawa (http://www.cs.utexas.edu/~jared/milawa/Web/).

Email Gateway (Aug 18 2022 at 19:02):

From: Lawrence Paulson <lp15@cam.ac.uk>
Forgive me if I launch into my usual lecture: I have never seen a piece of work spoilt by soundness bugs in a theorem prover. I have seen many pieces of work spoilt by unrealistic models, incorrect axioms or proofs of irrelevant properties. Soundness is vital, but (contrary to a widely held belief) it doesn't absolve users of the need to know what they are doing.

Larry Paulson

Email Gateway (Aug 18 2022 at 19:02):

From: "\"Mark\"" <mark@proof-technologies.com>
I would summarise the potential soundness-related vulnerabilities of
LCF-style systems as follows:

Small risk that the inference kernel still has problems with the
design/implementation of its logic. This might include:
- a) errors in tricky term operations such as substitution (most or
  perhaps every HOL system has had problems with this at some stage);
- b) errors in the inference rules or axiom/definition commands (this risk
  is amplified if the kernel implements various derivable inference rules as
  primitives).
Vulnerabilities in the implementation of the LCF-style architecture.
This might include:
- a) backdoors to creating theorems, thus sidestepping the logic (e.g. by
  importing theorems from disk purely as statements without proof);
- b) not keeping track of theory (e.g. not maintaining a list of the
  definitions of each constant in the theory, thus making it difficult to
  review definitions and what has been assumed);
- c) ability to overwrite crucial ML values and/or abstract datatypes
  (such as the list of axioms or the datatype for theorems);
- d) vulnerability from aspects of the implementation language (e.g.
  OCaml's mutable strings, OCaml's Obj.magic) .
Problems associated with the pretty printer (and arguably the parser)
fooling the user about what has been proved. This might include:
- a) ambiguously displaying theorems (such as not annotating theorems with
  their types, or not distinguishing between overloaded entities). This has
  become know recently as Pollack Inconsistency;
- b) vulnerability to having the pretty printer overwritten or extended in
  a problematic way.

My HOL Zero system addresses most of these vulnerabilities, although there
is little that can be done about some of the 2c), 2d) and 3b)
vulnerabilities if an ML toplevel is allowed. It is implemented in OCaml,
but does manage to avoid the mutable strings problem. I will be porting it
to SML sometime soon hopefully, which is a fundamentally more secure
language. I give out $100 bounty rewards for anyone uncovering new
unsoundness-related vulnerabilities. The website includes a list of all
soundness-related vulnerabilities that have been uncovered (even those found
by myself). See the HOL Zero page at:
http://www.proof-technologies.com/holzero.html

The inference kernels of most other LCF-style systems are actually a bit
complex, involving a few thousand lines of ML. As of 2012, the risks of
vulnerability 1 are probably quite small because they get used so much, but
there have been some unsoundnesses uncovered in the 1990s, and I would very
much like developers to clearly publish all discovered soundness-related
vulnerabilities in their systems.

HOL Light manages to keep the inference kernel down to about 500 lines
(depending on exactly how you count the lines) by various measures,
including: not implementing derivable inference rules in the kernel (other
than TRANS) and having a monolithic theory instead of maintaining a theory
hierarchy. John Harrison has proved the consistency of the HOL Light
kernel, but this just addresses vulnerability 1. Indeed I can prove false
without leaving a trace in the HOL Light state in about 5 lines of ML!

I'm fairly sure that alll existing HOL systems (other than HOL Zero) still
have various known vulnerabilities in their pretty printer, although some
have been addressed.

Mark Adams.

on 30/1/12 10:00 AM, Matthias Schmalz <Matthias.Schmalz@inf.ethz.ch> wrote:

Email Gateway (Aug 18 2022 at 19:02):

From: "\"Mark\"" <mark@proof-technologies.com>
Larry, you rightly point out that there are more important worries for
theorem prover users. But this doesn't mean that soundness-related
vulnerabilities should be ignored now.

And also, importantly, I would say that theorem proving hasn't yet reached
its potential. Nowadays formal proof is largely confined to university
departments and a handful of reputed specialist companies. If/when theorem
proving does become big, commercial realities mean that there will be
considerable outsourcing of proofs to contractors. Undoubtedly commercial
pressures will tempt some contractors to maliciously exploit
soundness-related vulnerabilities, and so we should be taking steps now to
address this. I'm sure there would have been many in 1970s/80s pooh-poohing
the risk of computer users exploiting security-related vulnerabilities.

Mark.

on 30/1/12 11:51 AM, Lawrence Paulson <lp15@cam.ac.uk> wrote:

Forgive me if I launch into my usual lecture: I have never seen a piece of
work spoilt by soundness bugs in a theorem prover. I have seen many pieces
of work spoilt by unrealistic models, incorrect axioms or proofs of
irrelevant properties. Soundness is vital, but (contrary to a widely held
belief) it doesn't absolve users of the need to know what they are doing.

Larry Paulson

On 30 Jan 2012, at 09:59, Matthias Schmalz wrote:

Hi,

Out of curiosity and for my thesis introduction, I have the following
question.
I am wondering whether there is a theorem prover out there that gives
stronger soundness guarantees than Isabelle/HOL and whether there is
empirical evidence showing that the difference is practically relevant.
I would also like to know when the last unsoundness bug in Isabelle's
inference core has been observed.

I already know that Isabelle follows the LCF approach and that HOL is
built from a modest number of axioms using conservative extension methods.
It is therefore very likely that proofs by Isabelle are correct. I also
know
that this soundness guarantee is restricted to the inference core; for
example, nothing prevents users from configuring the parser to parse
"False"
as "True" and therefore giving the impression that "False" can be proved.
(And of course, soundness rests on the assumption that compiler, ML
libraries, operating system, and hardware behave correctly.)

Many thanks in advance,
Matthias

Email Gateway (Aug 18 2022 at 19:02):

From: Randy Pollack <rpollack@inf.ed.ac.uk>
---------- Forwarded message ----------
From: Randy Pollack <rpollack0@gmail.com>
Date: Mon, Jan 30, 2012 at 9:50 AM
Subject: Re: [isabelle] soundness of Isabelle/HOL
To: "\"Mark\"" <mark@proof-technologies.com>
Cc: lp15@cam.ac.uk, Matthias.Schmalz@inf.ethz.ch,
cl-isabelle-users@lists.cam.ac.uk

A well known approach to soundness, not discussed here yet, is
independent checking.

As Larry points out, a user must read and understand the statements of
theorems s/he wants to believe, and all definitions they hereditarily
depend on.

Further there must be confidence that whatever form these theorems are
presented in is correctly captured by a proof tool's internal
representation (issues of printing/parsing/overloading, etc).

But the actual proofs are no cause to worry, at least in principle, at
least in a system like Coq or Isabelle that can produce complete,
independently checkable proof scripts. in some completely unambiguous
notation.

Randy Pollack

Email Gateway (Aug 18 2022 at 19:02):

From: "\"Mark\"" <mark@proof-technologies.com>
That's right, what I didn't say was that proof checking can address many of
the remaining soundness-related vulnerabilities (and HOL Zero is very much
intended to be used in precisely this way, as a proof checker). For
example, any concern that the datatype for theorems is being overwritten by
the proof script can be addressed by recording the proof and exporting it to
another system.

But this proof checking can still suffer problems if the system importing
the proof is not suitably trustworthy. For example, if both systems have a
pretty printer that have the same vulnerability (e.g. not distinguishing
between two overloaded HOL variables), then the problem won't get noticed in
either system.

Mark.

on 30/1/12 2:50 PM, Randy Pollack <rpollack0@gmail.com> wrote:

A well known approach to soundness, not discussed here yet, is
independent checking.

As Larry points out, a user must read and understand the statements of
theorems s/he wants to believe, and all definitions they hereditarily
depend on.

Further there must be confidence that whatever form these theorems are
presented in is correctly captured by a proof tool's internal
representation (issues of printing/parsing/overloading, etc).

But the actual proofs are no cause to worry, at least in principle, at
least in a system like Coq or Isabelle that can produce complete,
independently checkable proof scripts. in some completely unambiguous
notation.

Randy Pollack
--
On Mon, Jan 30, 2012 at 7:59 AM, "Mark" <mark@proof-technologies.com>
wrote:

Larry, you rightly point out that there are more important worries for
theorem prover users. But this doesn't mean that soundness-related
vulnerabilities should be ignored now.

And also, importantly, I would say that theorem proving hasn't yet
reached
its potential. Nowadays formal proof is largely confined to university
departments and a handful of reputed specialist companies. If/when
theorem
proving does become big, commercial realities mean that there will be
considerable outsourcing of proofs to contractors. Undoubtedly
commercial
pressures will tempt some contractors to maliciously exploit
soundness-related vulnerabilities, and so we should be taking steps now
to
address this. I'm sure there would have been many in 1970s/80s
pooh-poohing
the risk of computer users exploiting security-related vulnerabilities.

Mark.

on 30/1/12 11:51 AM, Lawrence Paulson <lp15@cam.ac.uk> wrote:

Forgive me if I launch into my usual lecture: I have never seen a piece
of
work spoilt by soundness bugs in a theorem prover. I have seen many
pieces
of work spoilt by unrealistic models, incorrect axioms or proofs of
irrelevant properties. Soundness is vital, but (contrary to a widely
held
belief) it doesn't absolve users of the need to know what they are
doing.

Larry Paulson

On 30 Jan 2012, at 09:59, Matthias Schmalz wrote:

Hi,

Out of curiosity and for my thesis introduction, I have the following
question.
I am wondering whether there is a theorem prover out there that gives
stronger soundness guarantees than Isabelle/HOL and whether there is
empirical evidence showing that the difference is practically relevant.
I would also like to know when the last unsoundness bug in Isabelle's
inference core has been observed.

I already know that Isabelle follows the LCF approach and that HOL is
built from a modest number of axioms using conservative extension
methods.
It is therefore very likely that proofs by Isabelle are correct. I also
know
that this soundness guarantee is restricted to the inference core; for
example, nothing prevents users from configuring the parser to parse
"False"
as "True" and therefore giving the impression that "False" can be
proved.
(And of course, soundness rests on the assumption that compiler, ML
libraries, operating system, and hardware behave correctly.)

Many thanks in advance,
Matthias

Email Gateway (Aug 18 2022 at 19:03):

From: Konrad Slind <konrad.slind@gmail.com>
Hi all,

Matt Kaufmann and I ran a workshop on topics related to this issue
(Trusted Extensions of Interactive Theorem Provers). See

http://www.cs.utexas.edu/~kaufmann/itp-trusted-extensions-aug-2010/

for the homepage and

http://www.cs.utexas.edu/~kaufmann/itp-trusted-extensions-aug-2010/summary/summary.pdf

for a summary of the techniques.

Cheers,
Konrad.

Email Gateway (Aug 18 2022 at 19:03):

From: Ramana Kumar <rk436@cam.ac.uk>
On independent checking, relevant projects (apart from HOL Zero) include
OpenTheory ( http://www.gilith.com/research/opentheory/) and dedukti (
http://www.lix.polytechnique.fr/dedukti/).

On Jan 30, 2012 2:56 PM, "Randy Pollack" <rpollack@inf.ed.ac.uk> wrote:

---------- Forwarded message ----------
From: Randy Pollack <rpollack0@gmail.com>
Date: Mon, Jan 30, 2012 at 9:50 AM
Subject: Re: [isabelle] soundness of Isabelle/HOL
To: "\"Mark\"" <mark@proof-technologies.com>
Cc: lp15@cam.ac.uk, Matthias.Schmalz@inf.ethz.ch,
cl-isabelle-users@lists.cam.ac.uk

A well known approach to soundness, not discussed here yet, is
independent checking.

As Larry points out, a user must read and understand the statements of
theorems s/he wants to believe, and all definitions they hereditarily
depend on.

Further there must be confidence that whatever form these theorems are
presented in is correctly captured by a proof tool's internal
representation (issues of printing/parsing/overloading, etc).

But the actual proofs are no cause to worry, at least in principle, at
least in a system like Coq or Isabelle that can produce complete,
independently checkable proof scripts. in some completely unambiguous
notation.

Randy Pollack
--

On Mon, Jan 30, 2012 at 7:59 AM, "Mark" <mark@proof-technologies.com>
wrote:

Larry, you rightly point out that there are more important worries for
theorem prover users. But this doesn't mean that soundness-related
vulnerabilities should be ignored now.

And also, importantly, I would say that theorem proving hasn't yet
reached
its potential. Nowadays formal proof is largely confined to university
departments and a handful of reputed specialist companies. If/when
theorem
proving does become big, commercial realities mean that there will be
considerable outsourcing of proofs to contractors. Undoubtedly
commercial
pressures will tempt some contractors to maliciously exploit
soundness-related vulnerabilities, and so we should be taking steps
now to
address this. I'm sure there would have been many in 1970s/80s
pooh-poohing
the risk of computer users exploiting security-related vulnerabilities.

Mark.

on 30/1/12 11:51 AM, Lawrence Paulson <lp15@cam.ac.uk> wrote:

Forgive me if I launch into my usual lecture: I have never seen a
piece of
work spoilt by soundness bugs in a theorem prover. I have seen many
pieces
of work spoilt by unrealistic models, incorrect axioms or proofs of
irrelevant properties. Soundness is vital, but (contrary to a widely
held
belief) it doesn't absolve users of the need to know what they are
doing.

Larry Paulson

On 30 Jan 2012, at 09:59, Matthias Schmalz wrote:

Hi,

Out of curiosity and for my thesis introduction, I have the following
question.
I am wondering whether there is a theorem prover out there that gives
stronger soundness guarantees than Isabelle/HOL and whether there is
empirical evidence showing that the difference is practically
relevant.
I would also like to know when the last unsoundness bug in Isabelle's
inference core has been observed.

I already know that Isabelle follows the LCF approach and that HOL is
built from a modest number of axioms using conservative extension
methods.
It is therefore very likely that proofs by Isabelle are correct. I
also
know
that this soundness guarantee is restricted to the inference core; for
example, nothing prevents users from configuring the parser to parse
"False"
as "True" and therefore giving the impression that "False" can be
proved.
(And of course, soundness rests on the assumption that compiler, ML
libraries, operating system, and hardware behave correctly.)

Many thanks in advance,
Matthias

Email Gateway (Aug 18 2022 at 19:08):

From: Makarius <makarius@sketis.net>
Before everything is repeated, see also the following thread from Jan
2011:
https://lists.cam.ac.uk/pipermail/cl-isabelle-users/2011-January/msg00047.html

IIRC, the thread also contains some explanations why your points 2c), 2d)
and 3b) concerning the "ML toplevel" are merely an accidental feature of
your implementation that follows the classic Cambridge HOL paradigm. The
Isabelle/ML toplevel is integrated with the prover differently -- it would
easily allow to amend these issues if they were of practical relevance,
but there are more imporant things to do.

Makarius

Last updated: Jul 18 2025 at 20:25 UTC