[isabelle] Verify the legitimacy of a proof? · Archive Mirror: Isabelle Users Mailing List

Stream: Archive Mirror: Isabelle Users Mailing List

Topic: [isabelle] Verify the legitimacy of a proof?

Email Gateway (Aug 22 2022 at 15:40):

From: scott constable <sdconsta@syr.edu>
Hi All,

Let's suppose I have some lemma

lemma ComplexProperty: "something interesting"
...
done

whose proof cites numerous other lemmas in other .thy files, possibly
written by other persons with malicious intentions. Is there a command in
Isabelle which I could use to determine the legitimacy of the entire proof
tree of ComplexProperty? That is, I want to ensure that no lemma in the
proof tree ended with "sorry", or if any axioms were used, I want to know
which ones, etc.

Thanks,

Scott Constable

Email Gateway (Aug 22 2022 at 15:40):

From: "C. Diekmann" <diekmann@in.tum.de>

That is, I want to ensure that no lemma in the
proof tree ended with "sorry",

If you make a command line run with isabelle build, every use of sorry
is counted as cheating and makes the build fail (but only at the end of the
build, not at the time when the thy with the sorry is processed).

This raises a new question: can I enable cheating (quick_and_dirty) mode in
a command line build?

One question answered, one not answered, one new problem ;-)

Cheers,
Cornelius

Email Gateway (Aug 22 2022 at 15:41):

From: Makarius <makarius@sketis.net>
Isabelle does not protect against malicious intentions. It would require
a quite different system to do that, one that you won't like to use.

The other big provers (e.g. Coq) are similar in this respect.

HOL-Zero is a notable exception in targeting a market of potentially
malicious (ab-)users, but it is not a "big prover".

Makarius

Email Gateway (Aug 22 2022 at 15:41):

From: Simon Wimmer <wimmersimon@gmail.com>
Hi Scott,

you can also check which oracles a theorem depends on from the ML level.
A hack to look for the oracle that is used to implement sorry can look like
this:

theory Scratch
imports Main
keywords "check_sorry" :: diag
begin

ML ‹
val get_oracles = Proofterm.all_oracles_of o Proofterm.strip_thm o
Thm.proof_body_of

val contains_sorry = exists (fn (a, _) => a = "Pure.skip_proof") o
get_oracles

fun report_sorry ctxt =
if Context_Position.is_visible ctxt then
Output.report [Markup.markup Markup.bad "Proof arises from sorry
oracle!"]
else ();

fun check_sorry ctxt th =
if contains_sorry th then report_sorry ctxt else ()

fun check_sorry_cmd thm_ref st =
let
val ctxt = Toplevel.context_of st
val th = Proof_Context.get_fact_single ctxt thm_ref
in check_sorry ctxt th end

val _ =
Outer_Syntax.command @{command_keyword check_sorry} "Check theorem for
sorry"
(Parse.thm >> (fn (th, _) => Toplevel.keep (check_sorry_cmd th)));
›

(* Usage: *)
lemma one_add_1_eq_3:
"1 + 1 = 3"
sorry

check_sorry HOL.refl
check_sorry one_add_1_eq_3

Cheers,
Simon

Email Gateway (Aug 22 2022 at 15:41):

From: Manuel Eberl <eberlm@in.tum.de>
Unfortunately, it is very easy to circumvent this. I don't recall who
found this originally, but you can hide the ‘taint’ of a theorem by
going through a type class instantiation:

lemma one_add_1_eq_3:
"(1::nat) + 1 = 3"
sorry

class foo = semiring_1 +
assumes foo: "1 + 1 = 3"

instance nat :: foo
by intro_classes (rule one_add_1_eq_3)

lemmas one_add_1_eq_3' = foo [where ?'a = nat]

check_sorry one_add_1_eq_3'

Cheers,

Manuel

Email Gateway (Aug 22 2022 at 15:41):

From: Makarius <makarius@sketis.net>
"Found this originally" sounds very funny to me.

Of course, the problem of oracles vs. type classes instantiations is as
old as oracles and type class instantiations in Isabelle. It is rather
well-known for insiders.

So we are back to the new meta-problem from recent years: even power
users don't know anymore what Isabelle is and what it does, and
especially what it does not.

Makarius

Email Gateway (Aug 22 2022 at 15:41):

From: Tobias Nipkow <nipkow@in.tum.de>
On 08/07/2017 13:30, Makarius wrote:

On 08/07/17 13:01, Manuel Eberl wrote:

Unfortunately, it is very easy to circumvent this. I don't recall who
found this originally, but you can hide the ‘taint’ of a theorem by
going through a type class instantiation:

"Found this originally" sounds very funny to me.

Of course, the problem of oracles vs. type classes instantiations is as
old as oracles and type class instantiations in Isabelle. It is rather
well-known for insiders.

I am sure Manuel meant no disrespect. No doubt you were aware of this behaviour
from the start.

So we are back to the new meta-problem from recent years: even power
users don't know anymore what Isabelle is and what it does, and
especially what it does not.
It is usually due to a fanciful interpretation of the documentation. The latter
says that "The system always records oracle invocations within derivations of
theorems by a unique tag." but does not claim that these tags are also always
inherited in the way one is all too likely to assume. There you are, Manuel.

Tobias
smime.p7s

Email Gateway (Aug 22 2022 at 15:41):

From: Makarius <makarius@sketis.net>
On 08/07/17 15:33, Tobias Nipkow wrote:

On 08/07/2017 13:30, Makarius wrote:

On 08/07/17 13:01, Manuel Eberl wrote:

Unfortunately, it is very easy to circumvent this. I don't recall who
found this originally, but you can hide the ‘taint’ of a theorem by
going through a type class instantiation:

"Found this originally" sounds very funny to me.

Of course, the problem of oracles vs. type classes instantiations is as
old as oracles and type class instantiations in Isabelle. It is rather
well-known for insiders.

I am sure Manuel meant no disrespect. No doubt you were aware of this
behaviour from the start.

No, and neither did I mean any disrespect to Manuel.

So we are back to the new meta-problem from recent years: even power
users don't know anymore what Isabelle is and what it does, and
especially what it does not.
It is usually due to a fanciful interpretation of the documentation. The
latter says that "The system always records oracle invocations within
derivations of theorems by a unique tag." but does not claim that these
tags are also always inherited in the way one is all too likely to
assume.

The problem remains open: How can true expertise about how Isabelle
really works be reconstructed? We've seen a slow and steady decline in
the past 10 years, and myself writing hundreds of pages of documentation
helped only very little.

Makarius

Email Gateway (Aug 22 2022 at 15:41):

From: Manuel Eberl <eberlm@in.tum.de>

I am sure Manuel meant no disrespect. No doubt you were aware of this
behaviour from the start.
Indeed I only wanted to stress that I was definitely not the one to
discover this behaviour and that it has been well-known for some time,
and that I do not know anything beyond that.

Manuel

Email Gateway (Aug 22 2022 at 15:41):

From: Makarius <makarius@sketis.net>
Like this:

isabelle build -o quick_and_dirty

Makarius

Email Gateway (Aug 22 2022 at 15:41):

From: Makarius <makarius@sketis.net>
Back to the very start. This is how I usually do it:

* Everything needs to work cleanly with "isabelle build" in batch
mode, with default options (no quick_and_dirty).

* Superficial inspection of syntax. The substring "axiomatization"
points to most ways to use axioms in practice (but very obscure
possibilities remain).

* Inspection if add-on Isabelle/ML setup is used (normally not
required at all).

* Inspection of definitions, statements, proofs (preferably in
structured Isar). Does it all "make sense", "look reasonable" etc.?

The latter is often the main problem: definitions that are not what one
would expect or like to see, or proofs that don't expose the reasoning
properly.

In that sense, I would say that a "legitimate" proof needs to be one
that is nicely structured and presented, based on clear definitions.

Extra points for proofs that are checked reasonably fast.

Makarius

Email Gateway (Aug 22 2022 at 15:42):

From: Mark Adams <mark@proof-technologies.com>
Yes, and it is mainly intended to be used as a proof checker, via proof
objects. You export your formal proof as proof objects, then import
these into HOL Zero. In this sense it is perfectly capable - it has
successfully checked in one session of a few hours the 1.4 billion
primitive inference steps of 2 of the 4 parts of Flyspeck (the main text
and the linear inequalities).

Mark.

Email Gateway (Aug 22 2022 at 15:42):

From: scott constable <sdconsta@syr.edu>
Thanks all, I think this has been a productive discussion :)

I would like to respond to a point Makarius brought up earlier:

Isabelle does not protect against malicious intentions. It would require
a quite different system to do that, one that you won't like to use.

The other big provers (e.g. Coq) are similar in this respect.

I'm also familiar with Coq, and I do think Coq should be better in this
respect. In Coq, proofs are themselves objects with a given type. So they
can be checked or examined, for instance by dumping them to the console. So
to check the legitimacy of a theorem in Coq, I believe it would suffice to
walk the proof tree by recursively expanding each non-atomic node, and thus
ensure that the proof tree is composed entirely of legitimate proof objects.

Am I wrong about this? If not, might there be a similar approach in
Isabelle?

Scott

Email Gateway (Aug 22 2022 at 15:42):

From: Lawrence Paulson <lp15@cam.ac.uk>
To protect against malicious intentions would turn Isabelle into a form of security software. But the guarantees we get from the latter are quite different from what we get from a formal proof. Ultimately security claims depend upon trusting a lot of complicated mechanisms, such as certificate authorities and cryptosystems. We are not a lot better off than when a model checker comes back with nothing.

However, we work with formal proofs, which can be examined, even interactively. We do not have to work on the basis that "X has been proved, therefore X is true", but rather "We have been given a proof of X; Is it credible?" Then we can look at any part of this proof where we have doubts. A devious user has many ways to try to fool us, but it's not so easy if he has to supply the full source code and we insist on legibility throughout. The effort we choose to invest in this would depend on how important X is and how much we distrust the person who supplied the proof.

Larry Paulson

Email Gateway (Aug 22 2022 at 15:42):

From: Lars Hupel <hupel@in.tum.de>
Maybe it's time for an Underhanded Isabelle Contest?

<http://www.underhanded-c.org/>
<https://en.wikipedia.org/wiki/Underhanded_C_Contest>

Cheers
Lars

Email Gateway (Aug 22 2022 at 15:42):

From: Tobias Nipkow <nipkow@in.tum.de>
Scott,

Of course you are right and if some prover does not guarantee correctness on its
own, that is the standard answer: proof terms. Isabelle has had proof terms for
a long time now, but their checker is also part of Isabelle and uses the same
infrastructure of terms, types etc. Hence it is only as reliable as that
infrastructure is. An independent checker would improve the situation. That's
why, for example, for SAT solvers independent and verified proof checkers are
becoming available.

The discussion about "malicious intentions" never ceases to amaze me. Using the
interfaces that a system offers is not malicious by definition of the term
interface. If it requires wizard status to know what is ok and what not, that is
a design decision one can take, but then one need not be suprised that
non-wizards may perceive this as risky.

Appealing to readable proofs is not a panacea as Mark Adams' email shows: the
proofs about the linear inequalities he refers to are by necessity machine
generated and beyond human checking. But we would still like to have strong
correctness guarantees.

Tobias
smime.p7s

Email Gateway (Aug 22 2022 at 15:43):

From: Ondřej Kunčar <kuncar@in.tum.de>
Regardless of the question whether Isabelle should be a security software or whether there are some malicious users, in my opinion, every theorem prover should be able to answer the following question:

“Given a theorem X, which nondefinitional axioms and/or oracles (e.g., sorry) were used to prove X.” (using Isabelle’s terminology, other provers might use a different terminology).

As we saw in this thread, to obtain an answer to this question is in case of Isabelle
1) neither simple (as documented by Simon’s ML code)
2) nor reliable (as pointed out by Manuel).

I hope we can improve on this in foreseeable future.

Best,
Ondřej
smime.p7s

Email Gateway (Aug 22 2022 at 15:44):

From: Peter Lammich <lammich@in.tum.de>
At this point, I second Ken. Although having human-readable machine
checked proofs is certainly a nice feature to get an idea of how a
proof works, the main points to check should be the statement of the
theorem, including the definitions (syntax tweaks, etc) it uses, as
well as the axioms its proof uses. Then, one relies on the logical
inference kernel that the proof is actually correct.
In particular, auxiliary lemmas and definitions only used for the
proof, but not in the main statement of the theorem, should be
irrelevant for trusting the theorem.

This principle should, in first place, be independent of whether the
user is malicious or not. However, in Isabelle, the malicious user has
a lot of possibilities to hide tweaks from a reviewer, while, in
HOLZero, these possibilities are very limited (if existent at all).

Actually, Isabelle contains many tools in this spirit, for example,
even the simplifier or classical reasoner apply some transformations on
the theorems provided to them. More high-level tools like function
package or datatype package even do really complex proofs, completely
hidden from the user.

Email Gateway (Aug 22 2022 at 15:45):

From: Lawrence Paulson <lp15@cam.ac.uk>
I think that we are talking at cross purposes here. Of course we want our systems to be sound, and a kernel architecture is a great help in this, as we have known for 40 years. But even with a kernel architecture, it is easy for a result to be misrepresented. See e.g.

Avra Cohn. The Notion of Proof in Hardware Verification. J. Autom. Reasoning5(2): 127-139 (1989)

The proof kernel is no defence whatever against misrepresentation or misunderstanding, so it’s important that formal documents are openly available for inspection.

Larry Paulson

On 10 Jul 2017, at 10:10, Peter Lammich <lammich@in.tum.de> wrote:

On Sa, 2017-07-08 at 19:59 +0100, Lawrence Paulson wrote:

"We have been given a proof of X; Is it credible?"

At this point, I second Ken. Although having human-readable machine
checked proofs is certainly a nice feature to get an idea of how a
proof works, the main points to check should be the statement of the
theorem, including the definitions (syntax tweaks, etc) it uses, as
well as the axioms its proof uses. Then, one relies on the logical
inference kernel that the proof is actually correct.
In particular, auxiliary lemmas and definitions only used for the
proof, but not in the main statement of the theorem, should be
irrelevant for trusting the theorem.

This principle should, in first place, be independent of whether the
user is malicious or not. However, in Isabelle, the malicious user has
a lot of possibilities to hide tweaks from a reviewer, while, in
HOLZero, these possibilities are very limited (if existent at all).

Actually, Isabelle contains many tools in this spirit, for example,
even the simplifier or classical reasoner apply some transformations on
the theorems provided to them. More high-level tools like function
package or datatype package even do really complex proofs, completely
hidden from the user.

--
Peter

Last updated: Oct 08 2025 at 20:22 UTC