[isabelle] Proved False using normalization by evaluation · Mirror: Isabelle Users Mailing List

Stream: Mirror: Isabelle Users Mailing List

Topic: [isabelle] Proved False using normalization by evaluation

Email Gateway (Feb 20 2025 at 17:18):

From: Kevin Kappelmann <kevin.kappelmann@tum.de>
Dear list,

I just proved False using the normalization method in Isabelle/HOL, see
below. It seems like TYPE(T) arguments are generalized(?) to TYPE(?'a)
arguments when running the method (even when dropping the code equation
for the constant to be normalized).

Is this a known problem and is there anything I can do to stop the
normalizer from tinkering with TYPE(T) arguments?

Background: I used normalization by evaluation to work on a large data
structure in an Isabelle/HOL proof for which the performance of the
simplifier is insufficient.

Best wishes,

Kevin

theory Scratch
   imports
     Main
begin

definition [code del]: "test ≡ card (UNIV :: 'a set) = 1"

(*TYPE(unit) changes to TYPE('a) even though [code del] is used for test*)
value [nbe] "test TYPE(unit)"

lemma oh_oh: "test TYPE(unit) = test TYPE(bool)"
   apply normalization
   done

theorem "False"
proof -
   have "test TYPE(unit) = True" unfolding test_def by simp
   moreover have "test TYPE(unit) = False" unfolding oh_oh by (simp add:
test_def)
   ultimately show False by simp
qed


end

Email Gateway (Feb 20 2025 at 17:45):

From: Stepan Holub <cl-isabelle-users@lists.cam.ac.uk>
Oops. Here is a direct illustration that "normalization" is contradictory:

lemma "card (UNIV :: 'a set) = card (UNIV :: 'b set)"
apply normalization ..

Stepan

On 20-Feb-25 6:10 PM, Kevin Kappelmann wrote:

Dear list,

I just proved False using the normalization method in Isabelle/HOL,
see below. It seems like TYPE(T) arguments are generalized(?) to
TYPE(?'a) arguments when running the method (even when dropping the
code equation for the constant to be normalized).

Is this a known problem and is there anything I can do to stop the
normalizer from tinkering with TYPE(T) arguments?

Background: I used normalization by evaluation to work on a large data
structure in an Isabelle/HOL proof for which the performance of the
simplifier is insufficient.

Best wishes,

Kevin

```isabelle

theory Scratch
imports
Main
begin

definition [code del]: "test ≡ card (UNIV :: 'a set) = 1"

(*TYPE(unit) changes to TYPE('a) even though [code del] is used for
test*)
value [nbe] "test TYPE(unit)"

lemma oh_oh: "test TYPE(unit) = test TYPE(bool)"
apply normalization
done

theorem "False"
proof -
have "test TYPE(unit) = True" unfolding test_def by simp
moreover have "test TYPE(unit) = False" unfolding oh_oh by (simp
add: test_def)
ultimately show False by simp
qed

end

```

Email Gateway (Feb 20 2025 at 18:01):

From: Manuel Eberl <manuel@pruvisto.org>
Just for the benefit of readers on this mailing list who are not
familiar with the NBE method: this is not a proof method that goes
through the Isabelle kernel. It is an oracle, and one with a significant
trusted code base. Moreover, I think it is used much less than the code
generator oracle (the "eval" method), so issues like this one can remain
unnoticed for a long time.

So, although I do not think it was previously known that you could prove
"False" with NBE, it is not very surprising either. And it does not
mean that Isabelle as a system is broken.

Unfortunately, I don't know enough about the internals of NBE to really
comment on this issue otherwise. I just wanted to make sure this was
stated somewhere clearly on the mailing list.

Cheers,

Manuel

On 20/02/2025 18:45, Stepan Holub (via cl-isabelle-users Mailing List)
wrote:

Oops. Here is a direct illustration that "normalization" is
contradictory:

lemma "card (UNIV :: 'a set) = card (UNIV :: 'b set)"
apply normalization ..

Stepan

On 20-Feb-25 6:10 PM, Kevin Kappelmann wrote:

Dear list,

I just proved False using the normalization method in Isabelle/HOL,
see below. It seems like TYPE(T) arguments are generalized(?) to
TYPE(?'a) arguments when running the method (even when dropping the
code equation for the constant to be normalized).

Is this a known problem and is there anything I can do to stop the
normalizer from tinkering with TYPE(T) arguments?

Background: I used normalization by evaluation to work on a large
data structure in an Isabelle/HOL proof for which the performance of
the simplifier is insufficient.

Best wishes,

Kevin

```isabelle

theory Scratch
imports
Main
begin

definition [code del]: "test ≡ card (UNIV :: 'a set) = 1"

(*TYPE(unit) changes to TYPE('a) even though [code del] is used for
test*)
value [nbe] "test TYPE(unit)"

lemma oh_oh: "test TYPE(unit) = test TYPE(bool)"
apply normalization
done

theorem "False"
proof -
have "test TYPE(unit) = True" unfolding test_def by simp
moreover have "test TYPE(unit) = False" unfolding oh_oh by (simp
add: test_def)
ultimately show False by simp
qed

end

```

Email Gateway (Feb 20 2025 at 19:22):

From: Tobias Nipkow <nipkow@in.tum.de>
I went back to the paper we wrote about NBE 15 years ago
https://www21.in.tum.de/~nipkow/pubs/jfp12.html. This is what it says about types:

In most cases, these type reconstructions are unique, as follows from the
structure of normal terms in the simply-typed lambda calculus. However, in the
presence of polymorphic constants, the most general type could be more general
than intended. For example, let f be a polymorphic constant of type “(’a => ’a)
=> bool”, say without any rewrite rule. Then the untyped normal form of “f
(λu::bool. u)” would be “f (λu. u)” with most general type annotations “f
(λu::’a. u)”. To avoid such widening of types only those equations will be
considered as being proved by normalisation where the typing of the result is
completely determined, i.e. those equations where the most general type for the
result does not introduce any new type variables. It should be noted that this,
in particular, is always the case if an expression evaluates to True.

Looking at Kevin's example (where unit becomes 'a), maybe we didn't implement
the check at the end correctly. Florian?

Tobias

On 20/02/2025 19:01, Manuel Eberl wrote:

Just for the benefit of readers on this mailing list who are not familiar with
the NBE method: this is not a proof method that goes through the Isabelle
kernel. It is an oracle, and one with a significant trusted code base. Moreover,
I think it is used much less than the code generator oracle (the "eval" method),
so issues like this one can remain unnoticed for a long time.

So, although I do not think it was previously known that you could prove "False"
with NBE, it is not very surprising either. And it does not mean that Isabelle
as a system is broken.

Unfortunately, I don't know enough about the internals of NBE to really comment
on this issue otherwise. I just wanted to make sure this was stated somewhere
clearly on the mailing list.

Cheers,

Manuel

On 20/02/2025 18:45, Stepan Holub (via cl-isabelle-users Mailing List) wrote:

Oops. Here is a direct illustration that "normalization" is contradictory:

lemma "card (UNIV :: 'a set) = card (UNIV :: 'b set)"
apply normalization ..

Stepan

On 20-Feb-25 6:10 PM, Kevin Kappelmann wrote:

Dear list,

I just proved False using the normalization method in Isabelle/HOL, see
below. It seems like TYPE(T) arguments are generalized(?) to TYPE(?'a)
arguments when running the method (even when dropping the code equation for
the constant to be normalized).

Is this a known problem and is there anything I can do to stop the normalizer
from tinkering with TYPE(T) arguments?

Background: I used normalization by evaluation to work on a large data
structure in an Isabelle/HOL proof for which the performance of the
simplifier is insufficient.

Best wishes,

Kevin

```isabelle

theory Scratch
imports
Main
begin

definition [code del]: "test ≡ card (UNIV :: 'a set) = 1"

(TYPE(unit) changes to TYPE('a) even though [code del] is used for test)
value [nbe] "test TYPE(unit)"

lemma oh_oh: "test TYPE(unit) = test TYPE(bool)"
apply normalization
done

theorem "False"
proof -
have "test TYPE(unit) = True" unfolding test_def by simp
moreover have "test TYPE(unit) = False" unfolding oh_oh by (simp add:
test_def)
ultimately show False by simp
qed

end

```

smime.p7s

Email Gateway (Feb 20 2025 at 19:32):

From: Stepan Holub <cl-isabelle-users@lists.cam.ac.uk>
Noting that UNIV is normalized to " List.coset []", one is tempted to
suspect that the typing is careless in the most natural place to go
wrong, namely with Nil.
(sorry for meddling, out of curiosity, with something I know nothing about).

Stepan

On 20-Feb-25 8:21 PM, Tobias Nipkow wrote:

I went back to the paper we wrote about NBE 15 years ago
https://www21.in.tum.de/~nipkow/pubs/jfp12.html. This is what it says
about types:

In most cases, these type reconstructions are unique, as follows from the structure of normal terms in the simply-typed lambda calculus. However, in the presence of polymorphic constants, the most general type could be more general than intended. For example, let f be a polymorphic constant of type “(’a => ’a) => bool”, say without any rewrite rule. Then the untyped normal form of “f (λu::bool. u)” would be “f (λu. u)” with most general type annotations “f (λu::’a. u)”. To avoid such widening of types only those equations will be considered as being proved by normalisation where the typing of the result is completely determined, i.e. those equations where the most general type for the result does not introduce any new type variables. It should be noted that this, in particular, is always the case if an expression evaluates to True.

Looking at Kevin's example (where unit becomes 'a), maybe we didn't
implement the check at the end correctly. Florian?

Tobias

On 20/02/2025 19:01, Manuel Eberl wrote:

Just for the benefit of readers on this mailing list who are not
familiar with the NBE method: this is not a proof method that goes
through the Isabelle kernel. It is an oracle, and one with a
significant trusted code base. Moreover, I think it is used much less
than the code generator oracle (the "eval" method), so issues like
this one can remain unnoticed for a long time.

So, although I do not think it was previously known that you could
prove "False" with NBE, it is not very surprising either. And it does
not mean that Isabelle as a system is broken.

Unfortunately, I don't know enough about the internals of NBE to
really comment on this issue otherwise. I just wanted to make sure
this was stated somewhere clearly on the mailing list.

Cheers,

Manuel

On 20/02/2025 18:45, Stepan Holub (via cl-isabelle-users Mailing
List) wrote:

Oops. Here is a direct illustration that "normalization" is
contradictory:

lemma "card (UNIV :: 'a set) = card (UNIV :: 'b set)"
apply normalization ..

Stepan

On 20-Feb-25 6:10 PM, Kevin Kappelmann wrote:

Dear list,

I just proved False using the normalization method in Isabelle/HOL,
see below. It seems like TYPE(T) arguments are generalized(?) to
TYPE(?'a) arguments when running the method (even when dropping the
code equation for the constant to be normalized).

Is this a known problem and is there anything I can do to stop the
normalizer from tinkering with TYPE(T) arguments?

Background: I used normalization by evaluation to work on a large
data structure in an Isabelle/HOL proof for which the performance
of the simplifier is insufficient.

Best wishes,

Kevin

```isabelle

theory Scratch
imports
Main
begin

definition [code del]: "test ≡ card (UNIV :: 'a set) = 1"

(*TYPE(unit) changes to TYPE('a) even though [code del] is used for
test*)
value [nbe] "test TYPE(unit)"

lemma oh_oh: "test TYPE(unit) = test TYPE(bool)"
apply normalization
done

theorem "False"
proof -
have "test TYPE(unit) = True" unfolding test_def by simp
moreover have "test TYPE(unit) = False" unfolding oh_oh by (simp
add: test_def)
ultimately show False by simp
qed

end

```

Email Gateway (Feb 20 2025 at 20:31):

From: Makarius <makarius@sketis.net>
On 20/02/2025 19:01, Manuel Eberl wrote:

Just for the benefit of readers on this mailing list who are not familiar with
the NBE method: this is not a proof method that goes through the Isabelle
kernel. It is an oracle, and one with a significant trusted code base.
Moreover, I think it is used much less than the code generator oracle (the
"eval" method), so issues like this one can remain unnoticed for a long time.

So, although I do not think it was previously known that you could prove
"False" with NBE, it is not very surprising either. And it does not mean
that Isabelle as a system is broken.

Unfortunately, I don't know enough about the internals of NBE to really
comment on this issue otherwise. I just wanted to make sure this was stated
somewhere clearly on the mailing list.

Yes, it is very important to be careful about wording of statements,
especially in informal language.

The subject of this thread is wrong, because there is nothing formally proven
here.

Side-remark: We did not get a report of genuine problems in the Isabelle
inference kernel for so many years, that I begin to feel uneasy. In olden
times, there used to be reports approx. every 2 years.

Makarius

Email Gateway (Feb 21 2025 at 08:26):

From: Stepan Holub <cl-isabelle-users@lists.cam.ac.uk>

The subject of this thread is wrong, because there is nothing formally
proven here.

Strictly speaking, then, even the title "AFP" is wrong, as it contains
some (very few) instances of normalization method (at least one
<https://www.isa-afp.org/sessions/registers/#Teleport.html#Teleport.teleport_locale.teleport|fact>
followed by "qed").

Therefore, indeed, Manuel's warning is very important. A casual reader
can be excused for not being able to make these fine distinctions.

Stepan

On 20-Feb-25 9:31 PM, Makarius wrote:

On 20/02/2025 19:01, Manuel Eberl wrote:

Just for the benefit of readers on this mailing list who are not
familiar with the NBE method: this is not a proof method that goes
through the Isabelle kernel. It is an oracle, and one with a
significant trusted code base. Moreover, I think it is used much less
than the code generator oracle (the "eval" method), so issues like
this one can remain unnoticed for a long time.

So, although I do not think it was previously known that you could
prove "False" with NBE, it is not very surprising either. And it does
not mean that Isabelle as a system is broken.

Unfortunately, I don't know enough about the internals of NBE to
really comment on this issue otherwise. I just wanted to make sure
this was stated somewhere clearly on the mailing list.

Yes, it is very important to be careful about wording of statements,
especially in informal language.

The subject of this thread is wrong, because there is nothing formally
proven here.

Side-remark: We did not get a report of genuine problems in the
Isabelle inference kernel for so many years, that I begin to feel
uneasy. In olden times, there used to be reports approx. every 2 years.

Makarius

Email Gateway (Feb 21 2025 at 09:34):

From: Makarius <makarius@sketis.net>
On 21/02/2025 09:25, Stepan Holub (via cl-isabelle-users Mailing List) wrote:

The subject of this thread is wrong, because there is nothing formally
proven here.

Strictly speaking, then, even the title "AFP" is wrong, as it contains some
(very few) instances of normalization method (at least one <https://www.isa-
afp.org/sessions/registers/#Teleport.html#Teleport.teleport_locale.teleport|
fact> followed by "qed").

The AFP title is correct, because almost everything is formally proven. There
can be exceptions, when people are doing odd experiments. Or there can be
situations where it is appropriate to use tools without proper proofs.

One day we could render the formal status of results in HTML somehow, to make
this more clear.

And note that "qed" in Isar should not be read as "this is formally proven".

Therefore, indeed, Manuel's warning is very important. A casual reader can be
excused for not being able to make these fine distinctions.

Indeed. That was his attempt to undo the error in the original subject by Kevin.

Makarius

Email Gateway (Feb 21 2025 at 10:03):

From: Makarius <makarius@sketis.net>
On 20/02/2025 18:45, Stepan Holub (via cl-isabelle-users Mailing List) wrote:

Here is a direct illustration that "normalization" is contradictory:

lemma "card (UNIV :: 'a set) = card (UNIV :: 'b set)"
apply normalization ..

That is a good example. This is how to expose its implicit assumptions, in
terms of the inference kernel:

lemma a: "card (UNIV :: 'a set) = card (UNIV :: 'b set)"
apply normalization ..

ML ‹Thm_Deps.all_oracles @{thms a}›

(*
val it = [(("Code_Generator.triv_of_class", {}), NONE),
(("Code_Generator.normalization_by_evaluation", {}), NONE)]:
Proofterm.oracle list
*)

Makarius

Email Gateway (Feb 21 2025 at 10:11):

From: Peter Lammich <lammich@in.tum.de>

That is a good example. This is how to expose its implicit
assumptions, in terms of the inference kernel:

lemma a: "card (UNIV :: 'a set) = card (UNIV :: 'b set)"
apply normalization ..

ML ‹Thm_Deps.all_oracles @{thms a}›

(*
val it = [(("Code_Generator.triv_of_class", {}), NONE),
(("Code_Generator.normalization_by_evaluation", {}), NONE)]:
Proofterm.oracle list
*)

There actually is the top-level command thm_oracles for that, too:

thm_oracles a

Prints:

oracles:
triv_of_class
normalization_by_evaluation

As oracles are tracked transitively, you can always do that to your
final theorems, if you want to know on what they depend.

Peter

Makarius

Email Gateway (Feb 21 2025 at 10:23):

From: Manuel Eberl <manuel@pruvisto.org>
On 21/02/2025 10:33, Makarius wrote:

Therefore, indeed, Manuel's warning is very important. A casual
reader can be excused for not being able to make these fine
distinctions.

Indeed. That was his attempt to undo the error in the original subject
by Kevin.

I would not go that far. I'm sure I have referred to things that I
"proved" with eval, for example, as "proven". Without quotation marks. A
reasonable argument can be made that oracles are evil and should not be
used in formal proof at all (or otherwise one should not call it "formal
proof" anymore). But I think some more nuance is appropriate and one has
to look at the oracle in question.

I for one have sometimes wondered whether something like a "numeral
addition and multiplication" oracle that can only prove theorems of the
form "(a :: num) + b = c" and "(a :: num) * b = c" would be an
interesting thing to have to speed up numerical computations without
resorting to full-on reflection. The trusted code base would be tiny,
and the soundness of Isabelle depends on trusting Poly/ML to be able to
do arithmetic on integers already anyway. And I think it would be
throwing out the baby with the bath water to say a proof using this
oracle is no longer a proof.

Something like "eval" or "nbe" is of course much more involved and there
are much greater dangers for introducing inconsistency (e.g. one can use
"code_printing" to make eval "prove" just about anything).

Email Gateway (Feb 21 2025 at 10:59):

From: Roland Lutz <rlutz@hedmen.org>
On Fri, 21 Feb 2025, Stepan Holub wrote:

Strictly speaking, then, even the title "AFP" is wrong, as it contains
some (very few) instances of normalization method (at least one followed
by "qed").

Therefore, indeed, Manuel's warning is very important. A casual reader can
be excused for not being able to make these fine distinctions.

Isn't the actual underlying problem here the trust in oracles' results?

I get that it isn't straightforward to reconstruct a proof from, e.g., an
SMT solver. But the fact that Isabelle is used for formal proofs and
wording like "qed" should mean that Isabelle "takes responsibility" for
oracle results. So this really is an Isabelle bug, whether for the
actual bug or for the inclusion of the oracle, right?

Ideally, the result from an oracle should be a proofterm. How unrealistic
is it to implement this? What places would need to be changed?

Roland

Email Gateway (Feb 21 2025 at 11:47):

From: Fabian Huch <huch@in.tum.de>
You seem to be confusing terminology here.

The concept of an oracle is that it must be trusted. SMT solvers are
normally not trusted as oracles in Isabelle (and especially not in the
AFP) -- e.g., sledgehammer reconstructs all its proofs unless you
explicitly ask it not to.

It is not a surprise that using additional oracles in arbitrary ways
will introduce inconsistencies -- you can also axiomatize "False", if
you want.

Also Isabelle is an LCF-style system which eliminates the need for proof
terms.

Fabian

On 2/21/25 11:38, Roland Lutz wrote:

On Fri, 21 Feb 2025, Stepan Holub wrote:

Strictly speaking, then, even the title "AFP" is wrong, as it
contains some (very few) instances of normalization method (at least
one followed by "qed").

Therefore, indeed, Manuel's warning is very important. A casual
reader can
be excused for not being able to make these fine distinctions.

Isn't the actual underlying problem here the trust in oracles' results?

I get that it isn't straightforward to reconstruct a proof from, e.g.,
an SMT solver. But the fact that Isabelle is used for formal proofs
and wording like "qed" should mean that Isabelle "takes
responsibility" for oracle results. So this really is an Isabelle
bug, whether for the actual bug or for the inclusion of the oracle,
right?

Ideally, the result from an oracle should be a proofterm. How
unrealistic is it to implement this? What places would need to be
changed?

Roland

Email Gateway (Feb 21 2025 at 12:33):

From: Makarius <makarius@sketis.net>
On 21/02/2025 11:38, Roland Lutz wrote:

I get that it isn't straightforward to reconstruct a proof from, e.g., an SMT
solver.

People have actually done that, and by default you get a proper proof for
Isabelle "smt" invocations.

So this really is an Isabelle bug, whether for the actual bug or for the
inclusion of the oracle, right?

The term "bug" is undefined in the Isabelle culture. I normally say
"unexpected behaviour". In the majority of cases this is then a
misunderstanding of how things work, rather than a problem in the implementation.

Ideally, the result from an oracle should be a proofterm. How unrealistic is
it to implement this? What places would need to be changed?

Not necessarily a proofterm, but proper inferences in terms of the kernel.
This is the normal way in the greater HOL community, and done properly most of
the time.

An alternative is CakeML: these guys have managed to to bootstrap a formally
proven stack of prover technology, such that things are fast and correct at
the same time.

Makarius

Email Gateway (Feb 21 2025 at 13:48):

From: Dominic Mulligan <cl-isabelle-users@lists.cam.ac.uk>
Following Peter's post about thm_oracles, is there a way of invoking
isabelle build for a session so that it fails if any proof of a lemma
appearing within that session invokes a method that does not pass through
the kernel?

Thanks,
Dom

On Fri, 21 Feb 2025 at 12:33, Makarius <makarius@sketis.net> wrote:

On 21/02/2025 11:38, Roland Lutz wrote:

I get that it isn't straightforward to reconstruct a proof from, e.g.,
an SMT
solver.

People have actually done that, and by default you get a proper proof for
Isabelle "smt" invocations.

So this really is an Isabelle bug, whether for the actual bug or for
the
inclusion of the oracle, right?

The term "bug" is undefined in the Isabelle culture. I normally say
"unexpected behaviour". In the majority of cases this is then a
misunderstanding of how things work, rather than a problem in the
implementation.

Ideally, the result from an oracle should be a proofterm. How
unrealistic is
it to implement this? What places would need to be changed?

Not necessarily a proofterm, but proper inferences in terms of the kernel.
This is the normal way in the greater HOL community, and done properly
most of
the time.

An alternative is CakeML: these guys have managed to to bootstrap a
formally
proven stack of prover technology, such that things are fast and correct
at
the same time.

Makarius

Email Gateway (Feb 21 2025 at 13:51):

From: Manuel Eberl <manuel@pruvisto.org>
Others have written good responses to this already, but let me just add
this: Proof terms are not really something that is used very much in
Isabelle. When you have an "oracle" in Isabelle that can easily be made
to return a proof term (be it an Isabelle proof term or something more
domain-specific, such as a SAT or SMT proof), the usual thing to do is
to just "import" the proof term by plugging together Isabelle theorems
(such that everything goes through the kernel). Or, better still, to
just have the "oracle" plug Isabelle theorems together directly.

The "generate some proof object and then push it through the kernel"
approach is taken e.g. by Isabelle's "smt" tactic, by various tools to
import SAT proofs in e.g. the DRUP format, and I think there was a proof
method for linear orders that does this as well (not sure whether it is
in the AFP or the distribution).

The "plug theorems together directly" approach is taken e.g. by my
real_asymp method for proving asymptotic properties like limits and "Big
O" statements. This is in the Isabelle distribution. The advantage here
is that you don't have to first compute a (possibly huge) proof object
and then process it (a bit like deforestation). The disadvantage might
be that plugging together Isabelle theorems tends to be expensive, so if
you're doing a lot of exploration where it's not clear whether a branch
of computation will actually lead to a result, this might not be the
right approach.

But once you do any of that, it's not an oracle anymore (at least not in
Isabelle terminology), but a proof method. An oracle is by definition
trusted code that need not justify what it does to the Isabelle kernel.
Consequently, one has to be very careful with these. Morally, I think a
good argument can be made that oracles make sense for statements that,
if you were to see a proof for them on paper or in a proof term, it
would feel like a waste of space: straightforward computations that take
a lot of time but where it is completely clear how to do them. I think
that is how people in systems such as Coq would often use reflection.

Cheers,

Manuel

On 21/02/2025 11:38, Roland Lutz wrote:

On Fri, 21 Feb 2025, Stepan Holub wrote:

Strictly speaking, then, even the title "AFP" is wrong, as it
contains some (very few) instances of normalization method (at least
one followed by "qed").

Therefore, indeed, Manuel's warning is very important. A casual
reader can
be excused for not being able to make these fine distinctions.

Isn't the actual underlying problem here the trust in oracles' results?

I get that it isn't straightforward to reconstruct a proof from, e.g.,
an SMT solver. But the fact that Isabelle is used for formal proofs
and wording like "qed" should mean that Isabelle "takes
responsibility" for oracle results. So this really is an Isabelle
bug, whether for the actual bug or for the inclusion of the oracle,
right?

Ideally, the result from an oracle should be a proofterm. How
unrealistic is it to implement this? What places would need to be
changed?

Roland

Email Gateway (Feb 21 2025 at 14:20):

From: Stepan Holub <cl-isabelle-users@lists.cam.ac.uk>
Let me dare and ask a different, but related, and possibly naively basic
question.

Is there a clear answer to a logician who wants to know what is the
axiomatic system in which the theorem is proved if it "passes through
the kernel"? When advertising Isabelle/HOL, I am often asked this
question, and I confess I am never sure I have a satisfactory answer.
The answer should be Pure + HOL. But I am afraid that to explain what
that means /exactly/, one should either refer to the actual ML code
(which is not satisfactory to the logician asking the question) or to
provide quite some background literature, which will give only an
approximate answer anyway.

Stepan

On 21-Feb-25 2:50 PM, Manuel Eberl wrote:

Others have written good responses to this already, but let me just
add this: Proof terms are not really something that is used very much
in Isabelle. When you have an "oracle" in Isabelle that can easily be
made to return a proof term (be it an Isabelle proof term or something
more domain-specific, such as a SAT or SMT proof), the usual thing to
do is to just "import" the proof term by plugging together Isabelle
theorems (such that everything goes through the kernel). Or, better
still, to just have the "oracle" plug Isabelle theorems together
directly.

The "generate some proof object and then push it through the kernel"
approach is taken e.g. by Isabelle's "smt" tactic, by various tools to
import SAT proofs in e.g. the DRUP format, and I think there was a
proof method for linear orders that does this as well (not sure
whether it is in the AFP or the distribution).

The "plug theorems together directly" approach is taken e.g. by my
real_asymp method for proving asymptotic properties like limits and
"Big O" statements. This is in the Isabelle distribution. The
advantage here is that you don't have to first compute a (possibly
huge) proof object and then process it (a bit like deforestation). The
disadvantage might be that plugging together Isabelle theorems tends
to be expensive, so if you're doing a lot of exploration where it's
not clear whether a branch of computation will actually lead to a
result, this might not be the right approach.

But once you do any of that, it's not an oracle anymore (at least not
in Isabelle terminology), but a proof method. An oracle is by
definition trusted code that need not justify what it does to the
Isabelle kernel. Consequently, one has to be very careful with these.
Morally, I think a good argument can be made that oracles make sense
for statements that, if you were to see a proof for them on paper or
in a proof term, it would feel like a waste of space: straightforward
computations that take a lot of time but where it is completely clear
how to do them. I think that is how people in systems such as Coq
would often use reflection.

Cheers,

Manuel

On 21/02/2025 11:38, Roland Lutz wrote:

On Fri, 21 Feb 2025, Stepan Holub wrote:

Strictly speaking, then, even the title "AFP" is wrong, as it
contains some (very few) instances of normalization method (at least
one followed by "qed").

Therefore, indeed, Manuel's warning is very important. A casual
reader can
be excused for not being able to make these fine distinctions.

Isn't the actual underlying problem here the trust in oracles' results?

I get that it isn't straightforward to reconstruct a proof from,
e.g., an SMT solver. But the fact that Isabelle is used for formal
proofs and wording like "qed" should mean that Isabelle "takes
responsibility" for oracle results. So this really is an Isabelle
bug, whether for the actual bug or for the inclusion of the oracle,
right?

Ideally, the result from an oracle should be a proofterm. How
unrealistic is it to implement this? What places would need to be
changed?

Roland

Email Gateway (Feb 21 2025 at 15:38):

From: Fabian Huch <huch@in.tum.de>
Yes, you can look at the transitive dependencies of your theorem that
are axioms (doable in a bit of Isabelle/ML programming). Your axiomatic
system consists of those + the metalogic itself, which has been
formalized e.g. by Simon Roßkopf and Tobias Nipkow [1] (their proof
checker even verifies whether the extracted term is a proof for the
theorem).

The trouble is that you are going to find a lot of axioms, e.g., every
constant or type definition will add some. Those axioms are also added
directly, without using the oracle mechanism. I'm not quite sure why
(possibly performance reasons), because the latter would mean that it'd
be sufficient to only look at a few oracles instead of a linearly
growing set of axioms -- and we even have helpful tools such as the
thm_oracles command.

Fabian

[1]: Roßkopf, S., Nipkow, T. A Formalization and Proof Checker for
Isabelle’s Metalogic. /J Autom Reasoning/ 67, 1 (2023).
https://doi.org/10.1007/s10817-022-09648-w

On 2/21/25 15:20, Stepan Holub (via cl-isabelle-users Mailing List) wrote:

Let me dare and ask a different, but related, and possibly naively
basic question.

Is there a clear answer to a logician who wants to know what is the
axiomatic system in which the theorem is proved if it "passes through
the kernel"? When advertising Isabelle/HOL, I am often asked this
question, and I confess I am never sure I have a satisfactory answer.
The answer should be Pure + HOL. But I am afraid that to explain what
that means /exactly/, one should either refer to the actual ML code
(which is not satisfactory to the logician asking the question) or to
provide quite some background literature, which will give only an
approximate answer anyway.

Stepan

On 21-Feb-25 2:50 PM, Manuel Eberl wrote:

Others have written good responses to this already, but let me just
add this: Proof terms are not really something that is used very much
in Isabelle. When you have an "oracle" in Isabelle that can easily be
made to return a proof term (be it an Isabelle proof term or
something more domain-specific, such as a SAT or SMT proof), the
usual thing to do is to just "import" the proof term by plugging
together Isabelle theorems (such that everything goes through the
kernel). Or, better still, to just have the "oracle" plug Isabelle
theorems together directly.

The "generate some proof object and then push it through the kernel"
approach is taken e.g. by Isabelle's "smt" tactic, by various tools
to import SAT proofs in e.g. the DRUP format, and I think there was a
proof method for linear orders that does this as well (not sure
whether it is in the AFP or the distribution).

The "plug theorems together directly" approach is taken e.g. by my
real_asymp method for proving asymptotic properties like limits and
"Big O" statements. This is in the Isabelle distribution. The
advantage here is that you don't have to first compute a (possibly
huge) proof object and then process it (a bit like deforestation).
The disadvantage might be that plugging together Isabelle theorems
tends to be expensive, so if you're doing a lot of exploration where
it's not clear whether a branch of computation will actually lead to
a result, this might not be the right approach.

But once you do any of that, it's not an oracle anymore (at least not
in Isabelle terminology), but a proof method. An oracle is by
definition trusted code that need not justify what it does to the
Isabelle kernel. Consequently, one has to be very careful with these.
Morally, I think a good argument can be made that oracles make sense
for statements that, if you were to see a proof for them on paper or
in a proof term, it would feel like a waste of space: straightforward
computations that take a lot of time but where it is completely clear
how to do them. I think that is how people in systems such as Coq
would often use reflection.

Cheers,

Manuel

On 21/02/2025 11:38, Roland Lutz wrote:

On Fri, 21 Feb 2025, Stepan Holub wrote:

Strictly speaking, then, even the title "AFP" is wrong, as it
contains some (very few) instances of normalization method (at
least one followed by "qed").

Therefore, indeed, Manuel's warning is very important. A casual
reader can
be excused for not being able to make these fine distinctions.

Isn't the actual underlying problem here the trust in oracles' results?

I get that it isn't straightforward to reconstruct a proof from,
e.g., an SMT solver. But the fact that Isabelle is used for formal
proofs and wording like "qed" should mean that Isabelle "takes
responsibility" for oracle results. So this really is an Isabelle
bug, whether for the actual bug or for the inclusion of the oracle,
right?

Ideally, the result from an oracle should be a proofterm. How
unrealistic is it to implement this? What places would need to be
changed?

Roland

Email Gateway (Feb 21 2025 at 15:53):

From: Tobias Nipkow <nipkow@in.tum.de>
Stepan,

There is no simple precise answer.

Isabelle's meta-logic is clearly defined here:
https://link.springer.com/article/10.1007/s10817-022-09648-w
It explains how propositions in Isabelle's term langage are combined (which is
how you would expect an inference system to behave), includiung well-formedness,
type class and overloading.

The above paper presents a proof checker that takes an Isabelle proof term and
checks that the axioms of whatever theory you are in (which may be
implicational, i.e. rules) have been plugged together correctly. The proof
checker was verified (in HOL itself).

The Isabelle theory HOL is the result you obtain when you add a bunch of axioms
that you can find in HOL.thy. It is meant to formalize
A. Church, A formulation of the simple theory of types, Journal of Symbolic
Logic 5 (1940) 56–68.
but deviates in some aspects, esp the types. Same for all HOL-based systems.

In the end there is always some source could you would need to read or trust to
make sure that what you see (as a user) is what you get (internally).

Tobias

On 21/02/2025 15:20, Stepan Holub (via cl-isabelle-users Mailing List) wrote:

Let me dare and ask a different, but related, and possibly naively basic question.

Is there a clear answer to a logician who wants to know what is the axiomatic
system in which the theorem is proved if it "passes through the kernel"? When
advertising Isabelle/HOL, I am often asked this question, and I confess I am
never sure I have a satisfactory answer. The answer should be Pure + HOL. But I
am afraid that to explain what that means /exactly/, one should either refer to
the actual ML code (which is not satisfactory to the logician asking the
question) or to provide quite some background literature, which will give only
an approximate answer anyway.

Stepan

On 21-Feb-25 2:50 PM, Manuel Eberl wrote:

Others have written good responses to this already, but let me just add this:
Proof terms are not really something that is used very much in Isabelle. When
you have an "oracle" in Isabelle that can easily be made to return a proof
term (be it an Isabelle proof term or something more domain-specific, such as
a SAT or SMT proof), the usual thing to do is to just "import" the proof term
by plugging together Isabelle theorems (such that everything goes through the
kernel). Or, better still, to just have the "oracle" plug Isabelle theorems
together directly.

The "generate some proof object and then push it through the kernel" approach
is taken e.g. by Isabelle's "smt" tactic, by various tools to import SAT
proofs in e.g. the DRUP format, and I think there was a proof method for
linear orders that does this as well (not sure whether it is in the AFP or the
distribution).

The "plug theorems together directly" approach is taken e.g. by my real_asymp
method for proving asymptotic properties like limits and "Big O" statements.
This is in the Isabelle distribution. The advantage here is that you don't
have to first compute a (possibly huge) proof object and then process it (a
bit like deforestation). The disadvantage might be that plugging together
Isabelle theorems tends to be expensive, so if you're doing a lot of
exploration where it's not clear whether a branch of computation will actually
lead to a result, this might not be the right approach.

But once you do any of that, it's not an oracle anymore (at least not in
Isabelle terminology), but a proof method. An oracle is by definition trusted
code that need not justify what it does to the Isabelle kernel. Consequently,
one has to be very careful with these. Morally, I think a good argument can be
made that oracles make sense for statements that, if you were to see a proof
for them on paper or in a proof term, it would feel like a waste of space:
straightforward computations that take a lot of time but where it is
completely clear how to do them. I think that is how people in systems such
as Coq would often use reflection.

Cheers,

Manuel

On 21/02/2025 11:38, Roland Lutz wrote:

On Fri, 21 Feb 2025, Stepan Holub wrote:

Strictly speaking, then, even the title "AFP" is wrong, as it contains some
(very few) instances of normalization method (at least one followed by "qed").

Therefore, indeed, Manuel's warning is very important. A casual reader can
be excused for not being able to make these fine distinctions.

Isn't the actual underlying problem here the trust in oracles' results?

I get that it isn't straightforward to reconstruct a proof from, e.g., an SMT
solver. But the fact that Isabelle is used for formal proofs and wording
like "qed" should mean that Isabelle "takes responsibility" for oracle
results. So this really is an Isabelle bug, whether for the actual bug or
for the inclusion of the oracle, right?

Ideally, the result from an oracle should be a proofterm. How unrealistic is
it to implement this? What places would need to be changed?

Roland

smime.p7s

Email Gateway (Feb 21 2025 at 16:24):

From: Makarius <makarius@sketis.net>
On 21/02/2025 14:50, Manuel Eberl wrote:

But once you do any of that, it's not an oracle anymore (at least not in
Isabelle terminology), but a proof method.

Your explanation was all fine, only this bit needs refinement:

* "proof method" is goal refinement operation of the Isar proof language.
Proof methods may appear as initial method in 'proof' or terminal method in
'qed'; the Isar language is parametric over the method language, these may do
whatever is technically possible, even non-sense.

* Above you could say "derived inference rule (of the kernel)".

Makarius

Email Gateway (Feb 21 2025 at 16:33):

From: Makarius <makarius@sketis.net>
On 21/02/2025 15:20, Stepan Holub (via cl-isabelle-users Mailing List) wrote:

Is there a clear answer to a logician who wants to know what is the axiomatic
system in which the theorem is proved if it "passes through the kernel"? When
advertising Isabelle/HOL, I am often asked this question, and I confess I am
never sure I have a satisfactory answer. The answer should be Pure + HOL. But
I am afraid that to explain what that means /exactly/, one should either refer
to the actual ML code (which is not satisfactory to the logician asking the
question) or to provide quite some background literature, which will give only
an approximate answer anyway.

There are many clear answers, but with some complexity involved in the
explanations. I am not a aware of a single canonical answer that could be
obtained without an effort.

When introducing Isabelle, I sometimes start with Pure and the Isar proof
language, and eventually bootstrap HOL from that. Results of such exercises
are available in src/Pure/Examples. Quite a lot can be done from basic
principles, without a huge stack of addon proof tools getting involved. Thus
the audience gets some sense how the formal game works.

An alternatively approach is to export everything as a huge collection of
lambda terms. Despite many efforts in recent years, this is still not ready
for prime time, but I would like to see this working eventually. For example,
the Dedukti guys have volunteered to absorb whatever we throw at them, and
make an independent integrity check of the theory content and the proof terms.

Makarius

Email Gateway (Feb 21 2025 at 16:43):

From: Makarius <makarius@sketis.net>
On 21/02/2025 14:48, Dominic Mulligan (via cl-isabelle-users Mailing List) wrote:

Following Peter's post about thm_oracles, is there a way of invoking
isabelle build for a session so that it fails if any proof of a lemma
appearing within that session invokes a method that does not pass through the
kernel?

There are various possibilities to do that. The Isabelle/ML operation
Build.add_hook allows to install a function that gets invoked at the end of
the ML session build. E.g. Mirabelle is using that to inspect all intermediate
goal states, to see how automatic tools like them.

You could use that to inspect the regular fact name space, to see if "bad" thm
values are stored. (Tools could still hide their own evil thms in pockets that
are invisible, and conjure up such bad results later.)

To "harden" regular Isabelle builds, I would use a slightly different
approach: There is already "isabelle build -o export_theory". It could provide
sufficient information to analyze the overall thm dependency graph later on,
in Isabelle/Scala instead of Isabelle/ML. Isabelle/Scala has the advantage
that it "oversees" the overall Isabelle/ML session. The Scala tool could cause
an error, or display results nicely as a report in the HTML presentation, for
example, to say clearly "how correct" which theorem is, meaning what are its
extra dependencies on oracles, axioms, etc.

Makarius

Email Gateway (Feb 21 2025 at 18:15):

From: Frédéric Blanqui <frederic.blanqui@inria.fr>

Le 21/02/2025 à 17:32, Makarius a écrit :

An alternatively approach is to export everything as a huge collection
of lambda terms. Despite many efforts in recent years, this is still
not ready for prime time, but I would like to see this working
eventually. For example, the Dedukti guys have volunteered to absorb
whatever we throw at them, and make an independent integrity check of
the theory content and the proof terms.

Hi. Indeed, https://github.com/Deducteam/isabelle_dedukti is an Isabelle
component that can convert Isabelle proof terms into Dedukti or Lambdapi
files. Current Dedukti and Lambdapi checkers (dkcheck, kontroli and
lambdapi) are quite fast, especially the first two. The main limitation
currently is rather on the Isabelle side: the current proof export is
quite slow. In the last Isabelle workshop (Tbilisi, 2024), Makarius gave
a nice talk on proof export and his long-term project to improve it (see
https://files.sketis.net/Isabelle_Workshop_2024/Isabelle_2024_paper_6.pdf).
Note that isabelle_dedukti does not handle oracles. The Dedukti or
Lambdapi representation provides another interesting way to see what
exactly is the logic implemented by Isabelle, and what is assumed by the
kernel.

Email Gateway (Apr 15 2025 at 16:19):

From: Florian Haftmann <florian.haftmann@cit.tum.de>
See now

https://isabelle.in.tum.de/repos/isabelle/rev/5601f5cce4c6

for the final piece of a series of improvements in nbe.ML.

There were basically two things to put into shape:

Low-level type inference during term reconstruction was restored using
proof-context level type inference.
Constants carry explicit type arguments, hence the underapproximation
for equality used for non-left-linear patterns can distinguish different
type instances. As a side effect, type reconstruction of terms is
complete now (although this never had been an issue an practice AFAIR).

Florian

OpenPGP_0xA707172232CFA4E9.asc
OpenPGP_signature.asc

Last updated: Jul 26 2025 at 12:43 UTC