[isabelle] Isabelle Foundation & Certification · Archive Mirror: Isabelle Users Mailing List

Stream: Archive Mirror: Isabelle Users Mailing List

Topic: [isabelle] Isabelle Foundation & Certification

Email Gateway (Aug 22 2022 at 11:12):

From: Makarius <makarius@sketis.net>
Back to this thread after the initial round of discussion ...

First of all, I don't see much of a technical problem here, but mainly a
social one. A bit too many things not explicitly spelled out; a bit too
many rumors and private mailing threads about "Isabelle/HOL is
inconsistent", already since summer 2014.

For years we have tried to establish isabelle-dev (also isabelle-users) as
a professional channel for open discussion about presumed problems,
potential solutions, proposals for changes etc. That works out half way,
and needs more efforts.

On other mailing lists (e.g. Coq-club) I see routinely the discussion of
genuine logical breakdowns without much excitement about it. The Coq/HoTT
implementation even seems to require patching Coq to disable critical
checks, for the elite of users who really know what there are doing at the
bleeding edge ...

The present thread is essentially about the paper "A Consistent Foundation
for Isabelle/HOL" by O. Kuncar and A. Popescu, ITP 2015.

Email Gateway (Aug 22 2022 at 11:13):

From: Tjark Weber <tjark.weber@it.uu.se>
Beyond the specific def/overloading issue, there is a more fundamental
problem. Isabelle (and other proof assistants) have evolved in a social
context where users typically act in good faith.

It is a fact that is neither widely advertised nor, as far as I can
tell, well-understood outside a small community that Isabelle theories
are not actually machine-checked proofs. (Sure, theories may contain
proofs that can be checked by Isabelle. But if someone gives you an
Isabelle theory and claims that it proves, e.g, FLT, there is no
machine that could decide this claim for you.)

This can be addressed in various ways, of course. You give several
pages of "methodological recommendations" in your white paper,
essentially attempting to identify a safe subset of Isabelle/Isar. HOL
Zero and proof terms are other approaches, with different drawbacks.

Surely satisfactory solutions will be developed eventually, when people
perceive this problem as sufficiently important.

Best,
Tjark

Email Gateway (Aug 22 2022 at 11:14):

From: Tobias Nipkow <nipkow@in.tum.de>
The length of this dicussion puzzles me. Typedef can introduce inconsistencies,
nobody wants this behaviour and nobody needs this behaviour. There is a
perfectly clear way to rule out this whole class of unwanted behaviours: do not
allow certain cycles. This solution, including code, has been on the table for a
year now. By now not just mere users but also certification agencies are
worried. This is a pressing issue and it is not rocket science.

Everything else is secondary. It is nice that Andrei's and Ondrej's paper
establishes certain logical properties of their improved system, and it is
interesting to discuss these properties, but the most important thing is the
actual addition of a cycle check to the code.

Tobias
smime.p7s

Email Gateway (Aug 22 2022 at 11:14):

From: Makarius <makarius@sketis.net>
To get beyond the black-and-white of the Inconsistency Club vs. Consistency
Club, here are some more shades and dimensions.

The subsequent classification of reliability and robustness of proof
assistants follows the tradition of the "LCF-approach", which has two main
ingredients:

(1) a programming language with strong type-safety properties (the
"meta-language" ML)

(2) an implementation of the logical core as abstract datatype thm
within ML

The key ML modules should be reasonable small, to make them obviously
correct (whatever that means precisely). Arbitrary derived tools may be put
on top of that, without breaking things: all inferences going through the
thm module are "correct-by-construction".

Both ingredients of the LCF approach are equally important. E.g. a
fully-formalized and proven implementation of the thm module that is exposed
to an unsafe ML environment still leads to an unreliable system, if the user
is able to poke around arbitrarily.

To better capture the "True LCF-nature" of proof assistants 35 years after
LCF, lets consider qualities in following categories:

[small] overall size of critical code modules.

[ML] The implementation/extension language of the prover.

[thm] The actual inference kernel, to derive theorems within a given
theory.

[defs] additional guarantees for "definitional theories", whatever that
means precisely.

[context] treatment of formal context (theory or proof context with formal
declarations, additional tool data etc.).

[logic] well-understood (by experts) and easy to understand (by users).

[formalized] key aspects are formally specified or even proven (the logical
calculus, models of the prover, the actual prover implementation etc.).

The rating uses small natural numbers in unary notation: 0, +, ++, +++. More
pluses mean somehow better. The scale is open-ended, but I will not go
beyond +++ right now. 0 means neither good nor bad, just zero.

The results below are based on my own understanding of these systems -- I've
studied sources of all of them. The architects and engineers of the
mentioned systems are invited to improve on these estimates.

LCF

I only know the Cambridge LCF code base, by L. Paulson from around 1985.
Original LCF is hard to get by -- M. Gordon once showed a stack of printed
LISP + ML listings of it.

[small] +++
[ML] +++
[thm] +++
[defs] 0
[context] 0
[logic] ++
[formalized] 0

[small] + [ML] + [thm] define the game of "LCF-style proof assistant".
Original ML is embedded into LISP, and there is no system access, no
non-sense like mutable strings or funny holes in the type-system.

[defs] No definitions; the LCF logic is used axiomatically.

[context] management is implicit in ML; there is no way to switch contexts
nor go back to older contexts.

[logic] LCF as a logic is theoretically nice and clean, but not so easy to
understand for regular users.

HOL88

HOL88 is like LCF, with the addition of checked definitional principles as
part of the kernel. Thus it goes beyond the original "LCF-approach" in a
significant way. Everything else is similar to LCF.

[small] +++
[ML] +++
[thm] +++
[defs] +++
[context] 0
[logic] ++
[formalized] 0

[logic] Note that I don't give +++ for the logic here. The original "Simple
Type Theory" is really nice and simple, especially in the exposition by W.
Farmer, "The seven virtues of STT"
http://www.sciencedirect.com/science/article/pii/S157086830700081X

In contrast, Gordon-HOL adds the important concepts of parametric
polymorphism and typedefs, together with considerable complexity. A proof of
degraded simplicity is the historic misunderstanding about extra type
variables in the body of definitions that don't occur in the type of the
defined term. (In Isabelle-jargon this is called "hidden polymorphism", and
is causing horror and shudder to many veterans.)

HOL4

HOL4 is the current incarnation of the HOL-line after HOL88, starting with
HOL90. It uses a self-hosting SML platform, instead of ML inside LISP.

[small] ++
[ML] ++
[thm] +++
[defs] +++
[context] 0
[logic] +++
[formalized] +

[small] less small than the pioneering implementations, but still within
reason.

[ML] I am giving only ++, since HOL4 resides in the raw toplevel of a
standalone SML implementation. This allows to do various system-level
non-sense, e.g. use interrupt signals (CTRL-C on TTY) or threads to disrupt
the internal state of some modules (which expect strictly sequentional
execution).

[thm] + [defs] at its full height, especially after the formal treatment in
http://www.cl.cam.ac.uk/~rk436/itp14.pdf

[context] is implicit in the ML environment (hidden mutable references).
Implementation not thread-safe; global assumption of sequentialism.
Otherwise as in LCF.

[logic] I am giving +++ instead of ++ for HOL88, since the formal treatment
has lead to various clarification of obscure details in the original book
chapters by A. Pitts in the HOL88 book.

[formalized] See again http://www.cl.cam.ac.uk/~rk436/itp14.pdf and more
recent ongoing work. Looking forward to see results applied to actual HOL4,
or rather HOL5?

HOL-Light

This is the famous fork from the HOL family by John Harrison.

[small] +++
[ML] +
[thm] +++
[defs] +++
[context] 0
[logic] ++
[formalized] +

[small] very small size of key modules.

[ML] HOL-Light uses OCaml instead of SML. OCaml is much less safe than SML,
e.g. strings are mutable, ints overflow at unspecified word size without any
exception, various other oddities and deviations from original ML make it
hard to understand mathematically (e.g possibility for circular datatypes).
Signals allow to inject arbitrary ML exceptions into user code, not just
interrupts. (I am not taking Obj.magic non-sense into account here, since
its absence can be easily detected by looking at the source.)

There is still one + instead of 0, since OCaml is safer than C/C++.

[thm], [defs], [context], [logic], [formalized] similar to HOL4.

Isabelle/HOL

Here "Isabelle/HOL" refers to the main product that end-users experience
when downloading Isabelle. Its sophisticated architecture based on Isabelle
as a framework for building logic-based tools needs to be kept in mind, when
trying understand it thoroughly.

[small] +
[ML] +++
[thm] ++
[defs] +
[context] +++
[logic] +
[formalized] 0

[small] The assembly of Isabelle/ML modules is quite substantial just for
Isabelle/Pure, and more is added in the Isabelle/HOL library. Nonetheless,
there are certain principles of building up in stages to manage complexity.

[ML] extra-safe SML, since we do our own management of the ML compiler and
ML environments, including proper treatment of signals and threads in the
run-time system. Isabelle/ML could in principle do more to isolate user
code from raw system access and other potential non-sense, but that would
mean ++++.

[thm] roughly like in the HOL family, but extra features are hardwired into
the inference kernel, such as higher-order unification or type-class
inferences. It is also not immediately clear, which modules contribute to
the thm implementation and which not.

[defs] constant definitions are fully checked (in Isabelle2014 and
Isabelle2015). HOL typedefs are not fully unchecked and thus "axiomatic" in
the terminology of Isabelle: users are responsible to refrain from non-sense
with overloading involving type constructors.

[context] Type theory and Proof.context serve as immutable containers for
logical reasoning, independently of implicit assumptions about time and
space. This provides structural integrity that is not present in other
LCF-style proof assistant. E.g. it is possible to reason in parallel in
different contexts undisturbed. Going back ("undo") to older states is
trivial: in fact the user is working simultaneous in many different
contexts, when editing a theory document.

[logic] only one +, since type-classes add considerable complexity. The
traditional idea is that this is just a front-end to plain HOL, with a
hypotheticla dictionary-construction, or something else still to be
discussed.

[formalized] Nothing substantial yet. I don't expect a verified Isabelle
implementation anytime soon, due to the sheer size and complexity of
critical modules. It would be nice to transfer results from other HOL
groups, e.g. by exporting to an independent checker in CakeML, OpenTheory
etc.

Coq

Coq is strictly speaking not an LCF-style nor HOL-style system, but G. Huet
and L. Paulson were working together on Cambridge-LCF around 1985, and there
are undeniable family traces.

[small] +
[ML] +++
[thm] ++
[defs] ++
[context] +
[logic] +
[formalized] +

[small] critical Coq modules are about as big as in Isabelle -- I usually
estimate it to be slightly bigger than Isabelle, but this is open to
discussion.

[ML] OCaml is used, but in a more sophisticated manner than in HOL-Light, to
make up for the inherent weaknesses of the language. E.g. there are proper
integers (not machine words) and proper strings (not mutable byte arrays).
Coq also manages its own byte-code interpreter, and more.

[thm] Not an LCF-style abstract datatype, but separate checking, potentially
by an external program. At the level of abstraction of these notes, it is
difficult to relate this to the other systems.

[defs] Serious checking of definitions, with serious complexity built-in.

[context] Just one context for theory and proofs. More explicit than in
classic HOL family, but less versatile than in Isabelle. Tool context is
swapped in-and-out on a single global instance. Not thread-safe.

[logic] True
[message truncated]

Email Gateway (Aug 22 2022 at 11:14):

From: Freek Wiedijk <freek@cs.ru.nl>
Makarius:

I think the newest OCaml support immutable strings?
And if in the kernel you would use nums instead of ints,
then the second problems would be gone too? (It would be
interesting to know how much of a hassle this last change
would be. And maybe you only would need to go through
nums when you actually calculate with ints in the kernel?
Does this even happen, and if so, in which functions?)

Freek

Email Gateway (Aug 22 2022 at 11:14):

From: Makarius <makarius@sketis.net>
On Sun, 20 Sep 2015, Freek Wiedijk wrote:

OCaml is much less safe than SML, e.g. strings are mutable, ints
overflow at unspecified word size without any exception,

I think the newest OCaml support immutable strings?

This would be very good news indeed.

And if in the kernel you would use nums instead of ints, then the second
problems would be gone too?

The Coq guys are doing that.

In Poly/ML we have int = num by default -- at the bottom there is either
the GMP library or an old custom-made implementation. Both can fail in
their own right, but it is better to have the idea of proper integers
somehow implemented than unspecified machine word arithmetic.

In a discussion with J. Harrison on the HOL mailing list some years ago
about proper integers in LCF-style kernels, we eventually agreed that the
most robust approach would be to have just 64bit integers with explicit
checking (overflow exception), and not bignums.

Note that ML code using infamous catch-all patterns "e1 handle _ => e2"
would become erratic in situations of spurious overflow! The Isabelle
code base is clear of that (thanks to static checking and warnings in
Poly/ML), but I still see it routinely in other provers.

And maybe you only would need to go through nums when you actually
calculate with ints in the kernel? Does this even happen, and if so, in
which functions?

SML allows in principle to work with fixed-size integers (with overflow
exception) or unbounded integers side-by-side. An ultra-safe kernel would
use the former, and applications the latter.

We actually used to have such a situation in the past, not on our own
choice, but due to SML/NJ imposing it on us. It was very annoying to have
limited int here, and bigint there (e.g. for computer-algebra
applications). Eventually, I made some efforts to collapse the zoo of int
types just to one big int, and we have lived happily ever after.

Makarius

Email Gateway (Aug 22 2022 at 11:14):

From: Makarius <makarius@sketis.net>
On Thu, 17 Sep 2015, Ramana Kumar wrote:

Yesterday, I showed my colleagues that it is possible to prove False in
Isabelle2015 without any warnings or errors and without any use of
axiomatization or oracles.

This is a confusion produced by the paper: example 2 is in opposition to
the axiomatic nature of "typedef" in all published versions of
Isabelle/HOL. See the documentation in the isar-ref manual -- it is
usually up-to-date and authoritative, although sometimes a bit terse.

Of course, one could argue if it is a good idea to treat typedef like
that. This would lead to the very start of a serious discussion -- one
without rumors and unnecessary confusion caused by worrying about
publicity or "bad press".

The trick was, of course, Ondrej's example, but updated to use
"overloading" rather than the deprecated "defs" keyword.

Both are mostly the same. The old "defs" command is declared legacy,
since it complicates critical system interfaces. It is a bit like MS-DOS,
where you still have old CP/M system calls for no good reason, other than
old customs.

Many of them were surprised, and wanted to know why, when I said that a
patch did exist, it has not been incorporated. I think rumours about
"Isabelle developers" are unavoidable at that point.

In the last 1.5 years, Ondrej has produced various interesting border
cases. I incorporated changes quickly that where valid counter-examples of
expected and documented behaviour -- he also helped in working out missing
details.

The proposed changes of the present paper are genuine feature additions
that go beyond the known and documented model of Isabelle/HOL. This is
why I reduced the priority to a reasonable level. And in fact, we are
talking here about a few months latency of TODO-list pipeline -- things
routinely take much longer than that. Isabelle is not a research
prototype that can be changed arbitrarily at a high rate.

The key question is how users, even power users, can get a more realistic
feeling what the system can do and what not. Maybe we should make a
systematic collection of odd situations, like
https://github.com/clarus/falso for Coq.

Many years ago, system failure happened occasionally in everyday use, so
power users knew what to expect. Actual breakdown is now so rare in real
work that users think the system is unbreakable, and start spreading false
claims to the outside world.

Makarius

Email Gateway (Aug 22 2022 at 11:14):

From: Tobias Nipkow <nipkow@in.tum.de>
On 20/09/2015 16:02, Makarius wrote:

On Thu, 17 Sep 2015, Ramana Kumar wrote:

Yesterday, I showed my colleagues that it is possible to prove False in
Isabelle2015 without any warnings or errors and without any use of
axiomatization or oracles.

This is a confusion produced by the paper: example 2 is in opposition to the
axiomatic nature of "typedef" in all published versions of Isabelle/HOL. See
the documentation in the isar-ref manual -- it is usually up-to-date and
authoritative, although sometimes a bit terse.

Of course, one could argue if it is a good idea to treat typedef like that. This
would lead to the very start of a serious discussion

I'm afraid that most of the people who have spoken up have looked at the issue
in the light of example 2 and have already decided that this is a no-brainer.
Their contributions were based on that conclusion. And in some cases also on
Ondrej's experiments that showed that neither the distribution nor the AFP
contains any such circularity, neither intentional nor accidental. Hence we have
not heard any views asking for such circularities to be accepted. But maybe you
want to convince us otherwise.

Tobias

-- one without rumors and
unnecessary confusion caused by worrying about publicity or "bad press".

The trick was, of course, Ondrej's example, but updated to use "overloading"
rather than the deprecated "defs" keyword.

Both are mostly the same. The old "defs" command is declared legacy, since it
complicates critical system interfaces. It is a bit like MS-DOS, where you
still have old CP/M system calls for no good reason, other than old customs.

Many of them were surprised, and wanted to know why, when I said that a patch
did exist, it has not been incorporated. I think rumours about "Isabelle
developers" are unavoidable at that point.

In the last 1.5 years, Ondrej has produced various interesting border cases. I
incorporated changes quickly that where valid counter-examples of expected and
documented behaviour -- he also helped in working out missing details.

The proposed changes of the present paper are genuine feature additions that go
beyond the known and documented model of Isabelle/HOL. This is why I reduced
the priority to a reasonable level. And in fact, we are talking here about a few
months latency of TODO-list pipeline -- things routinely take much longer than
that. Isabelle is not a research prototype that can be changed arbitrarily at a
high rate.

The key question is how users, even power users, can get a more realistic
feeling what the system can do and what not. Maybe we should make a systematic
collection of odd situations, like https://github.com/clarus/falso for Coq.

Many years ago, system failure happened occasionally in everyday use, so power
users knew what to expect. Actual breakdown is now so rare in real work that
users think the system is unbreakable, and start spreading false
claims to the outside world.

Makarius

smime.p7s

Email Gateway (Aug 22 2022 at 11:15):

From: "\"Mark Adams\"" <mark@proof-technologies.com>
Hi Makarius,

Wouldn't it be highly relevant to include HOL Zero in the systems you
consider in your list?

Shouldn't you be considering pretty printing? In most systems, you can
create all sorts of confusing stuff by exploiting flaws, as we've already
discussed in the past.

And the risk of a trojan horse inference kernel or pretty printer hasn't
been considered.

But then I don't think you were attempting to be complete.

Makarius:

OCaml is much less safe than SML, e.g. strings are
mutable

Freek:
I think the newest OCaml support immutable strings?

Yes OCaml 4.02 does support immutable strings, but it's not yet considered
to fully incorporated into the language proper. Unless you do something
specific, you're still using mutable strings.

Note that HOL Zero manages to avoid the problem by making copies of strings
at the interface to the kernel. There are 21 instances of this in the
kernel.

Makarius:
ints overflow at unspecified word size without
any exception,

Freek:
And if in the kernel you would use nums instead of ints,
then the second problems would be gone too? (It would
be interesting to know how much of a hassle this last
change would be. And maybe you only would need to
go through nums when you actually calculate with ints in
the kernel? Does this even happen, and if so, in which
functions?)

HOL Light 'new_basic_type_definition' uses 'length' which returns a possibly
overflowing int. So if the predicate in the supplied theorem had enough
type variables (2^30 in a 32-bit OS) then the returned theorem would be
wrong. This is of course almost impossible to do in practice.

Note that HOL Zero avoids the risk entirely by using 'big_int' (which can be
arbitrarily big). This requires two utilities used in the kernel, that are
implemented in 14 lines of code.

Makarius:
.. various other oddities and deviations from original ML
make it hard to understand mathematically (e.g possibility for
circular datatypes). Signals allow to inject arbitrary ML
exceptions into user code, not just interrupts. (I am not taking
Obj.magic non-sense into account here, since its absence
can be easily detected by looking at the source.)

HOL Zero avoids these risks as far as I know. But if you could find a way
and demonstrate it, perhaps exploiting exception injection, you could earn
yourself the $100 bounty.. HOL Zero avoids the Obj.magic problem by
overwriting the module.

Note that it's not good enough to grep the source to look for Obj.magic -
you can use obfuscated OCaml code to create an OCaml source code file
containing an Obj.magic, and then read this into the OCaml session.

Mark.

Email Gateway (Aug 22 2022 at 11:15):

From: "\"Mark Adams\"" <mark@proof-technologies.com>
I should mention that a lot of the issues and ideas mentioned in this thread
are discussed in my recent paper:

www.proof-technologies.com/flyspeck/qed_paper.pdf

Mark.

on 19/9/15 10:39 AM, Tjark Weber <tjark.weber@it.uu.se> wrote:

Email Gateway (Aug 22 2022 at 11:15):

From: Chantal Keller <chantal.keller@wanadoo.fr>
Dear Ramana,

As far as I am aware, the more convincing attempt at checking proofs
coming from provers of the HOL community is Holide
<https://who.rocq.inria.fr/Ali.Assaf/research/translating-hollight-dedukti-pxtp-2015.pdf>.
It is still costly though, both to produce proofs and check them, but
this cost has been reduced a lot these last years compared to previous
work. It is based on OpenTheory.

I do not know how it compares to the work David mentioned on seL4.

Best,
Chantal.

Email Gateway (Aug 22 2022 at 11:15):

From: Tjark Weber <tjark.weber@it.uu.se>
To elaborate, and to answer questions that I received via private
email: The main difficulty here is that Isabelle theories are written
in a rich and powerful language (that may contain, e.g., embedded ML
code), to the point that the observable behavior of Isabelle when it
checks (or more accurately, processes) the theory cannot be trusted.

Of course, other theorem provers with powerful proof languages also
suffer from this problem.

Best,
Tjark

Email Gateway (Aug 22 2022 at 11:16):

From: Larry Paulson <lp15@cam.ac.uk>
What we usually do to check a proof development is run “Isabelle build” and look at the output. Of course, Isabelle (and any system based on the LCF paradigm) allows arbitrary code execution. This should be borne in mind in connection with any idea that the use of a proof assistant eliminates the need to trust the person making the proof. None of us are developing software that has the remotest claim to satisfying any sort of security requirement.

Larry Paulson

Email Gateway (Aug 22 2022 at 11:16):

From: Buday Gergely <gbuday@karolyrobert.hu>
Larry Paulson wrote:

Proof Carrying Code promised that the delivered code was accompanied by a proof that shows some security properties of the code. Its home page is outdated, does anybody know how far they went achieving this aim?

Gergely

Email Gateway (Aug 22 2022 at 11:20):

From: Makarius <makarius@sketis.net>
(Back to this still open thread.)

Last Monday I started to look at the actual sources and the patch that was
mentioned in the paper, and learned a few more things that I will explain
later, when writing up a full overview of my present understanding of it.

Since reading ML sources and changesets is always a constructive process for
me, I've produced various changesets over several days (being off-list),
leading up to 64a5bce1f498 with the following NEWS entry:

The 'typedef' command has been upgraded from a partially checked
"axiomatization", to a full definitional specification that takes the
global collection of overloaded constant / type definitions into
account. Type definitions with open dependencies on overloaded
definitions need to be specified as "typedef (overloaded)". This
provides extra robustness in theory construction. Rare INCOMPATIBILITY.

As well as the following updated section in the isar-ref manual (formerly
"Typedef axiomatization"):

section ‹Semantic subtype definitions \label{sec:hol-typedef}›

text ‹
\begin{matharray}{rcl}
@{command_def (HOL) "typedef"} & : & @{text "local_theory →
proof(prove)"} \\
\end{matharray}

A type definition identifies a new type with a non-empty subset of an
existing type. More precisely, the new type is defined by exhibiting an
existing type @{text τ}, a set @{text "A :: τ set"}, and proving @{prop
"∃x. x ∈ A"}. Thus @{text A} is a non-empty subset of @{text τ}, and
the new type denotes this subset. New functions are postulated that
establish an isomorphism between the new type and the subset. In
general, the type @{text τ} may involve type variables @{text "α⇩1, …,
α⇩n"} which means that the type definition produces a type constructor
@{text "(α⇩1, …, α⇩n) t"} depending on those type arguments.

@{rail ‹
@@{command (HOL) typedef} @{syntax "overloaded"}? abs_type '=' rep_set
;
@{syntax_def "overloaded"}: ('(' @'overloaded' ')')
;
abs_type: @{syntax typespec_sorts} @{syntax mixfix}?
;
rep_set: @{syntax term} (@'morphisms' @{syntax name} @{syntax name})?
›}

To understand the concept of type definition better, we need to recount
its somewhat complex history. The HOL logic goes back to the Simple Theory of Types'' (STT) of A. Church @{cite "church40"}, which is further explained in the book by P. Andrews @{cite "andrews86"}. The overview article by W. Farmer @{cite "Farmer:2008"} points out theseven virtues'' of this relatively simple family of logics. STT has
only ground types, without polymorphism and without type definitions.

\medskip M. Gordon @{cite "Gordon:1985:HOL"} augmented Church's STT by
adding schematic polymorphism (type variables and type constructors) and
a facility to introduce new types as semantic subtypes from existing
types. This genuine extension of the logic was explained semantically by
A. Pitts in the book of the original Cambridge HOL88 system @{cite
"pitts93"}. Type definitions work in this setting, because the general
model-theory of STT is restricted to models that ensure that the
universe of type interpretations is closed by forming subsets (via
predicates taken from the logic).

\medskip Isabelle/HOL goes beyond Gordon-style HOL by admitting
overloaded constant definitions @{cite "Wenzel:1997:TPHOL" and
"Haftmann-Wenzel:2006:classes"}, which are actually a concept of
Isabelle/Pure and do not depend on particular set-theoretic semantics of
HOL. Over many years, there was no formal checking of semantic type
definitions in Isabelle/HOL versus syntactic constant definitions in
Isabelle/Pure. So the @{command typedef} command was described as
``axiomatic'' in the sense of \secref{sec:axiomatizations}, only with
some local checks of the given type and its representing set.

Recent clarification of overloading in the HOL logic proper @{cite
"Kuncar-Popescu:2015"} demonstrate how the dissimilar concepts of
constant definitions versus type definitions may be understood
uniformly. This requires an interpretation of Isabelle/HOL that
substantially reforms the set-theoretic model of A. Pitts @{cite
"pitts93"}, by taking a schematic view on polymorphism and interpreting
only ground types in the set-theoretic sense of HOL88. Moreover,
type-constructors may be explicitly overloaded, e.g.\ by making the
subset depend on type-class parameters (cf.\ \secref{sec:class}). This
is semantically like a dependent type: the meaning relies on the
operations provided by different type-class instances.

\begin{description}

\item @{command (HOL) "typedef"}~@{text "(α⇩1, …, α⇩n) t = A"} defines
a new type @{text "(α⇩1, …, α⇩n) t"} from the set @{text A} over an
existing type. The set @{text A} may contain type variables @{text
"α⇩1, …,α⇩n"}as specified on the LHS, but no term variables.
Non-emptiness of @{text A} needs to be proven on the spot, in order to
turn the internal conditional characterization into usable theorems.

The ``@{text "(overloaded)"}'' option allows the @{command "typedef"}
specification to depend on constants that are not (yet) specified and
thus left open as parameters, e.g.\ type-class parameters.

Within a local theory specification, the newly introduced type
constructor cannot depend on parameters or assumptions of the context:
this is syntactically impossible in HOL. The non-emptiness proof may
formally depend on local assumptions, but this has little practical
relevance.

For @{command (HOL) "typedef"}~@{text "t = A"} the newly introduced type
@{text t} is accompanied by a pair of morphisms to relate it to the
representing set over the old type. By default, the injection from type
to set is called @{text Rep_t} and its inverse @{text Abs_t}: An
explicit @{keyword (HOL) "morphisms"} specification allows to provide
alternative names.

The logical characterization of @{command typedef} uses the predicate of
locale @{const type_definition} that is defined in Isabelle/HOL. Various
basic consequences of that are instantiated accordingly, re-using the
locale facts with names derived from the new type constructor. Thus the
generic theorem @{thm type_definition.Rep} is turned into the specific
@{text "Rep_t"}, for example.

Theorems @{thm type_definition.Rep}, @{thm type_definition.Rep_inverse},
and @{thm type_definition.Abs_inverse} provide the most basic
characterization as a corresponding injection/surjection pair (in both
directions). The derived rules @{thm type_definition.Rep_inject} and
@{thm type_definition.Abs_inject} provide a more convenient version of
injectivity, suitable for automated proof tools (e.g.\ in declarations
involving @{attribute simp} or @{attribute iff}). Furthermore, the rules
@{thm type_definition.Rep_cases}~/ @{thm type_definition.Rep_induct},
and @{thm type_definition.Abs_cases}~/ @{thm type_definition.Abs_induct}
provide alternative views on surjectivity. These rules are already
declared as set or type rules for the generic @{method cases} and
@{method induct} methods, respectively.

\end{description}

This may still require more polishing before the winter release of
Isabelle2016, but there is time for that.

My impression is that part of the discussion so far was encumbered by anxiety
that this add-on feature won't make it into the next release. Now that there is
no more pressure in this respect, we can sort out remaining low-level and
high-level problems in a relaxed manner.

In the next round I will comment more on the ITP paper and the notes on the
original patch ...

Makarius

Email Gateway (Aug 22 2022 at 11:20):

From: Peter Lammich <lammich@in.tum.de>
Now Isabelle has a feature, which is (IMHO) essential for a theorem
prover, and which many users (including me) deemed already present in
older Isabelle's, and were surprised that it wasn't.

A leap ahead for Isabelle!

Email Gateway (Aug 22 2022 at 11:20):

From: Makarius <makarius@sketis.net>
Now we are back to the actual social problems: Why do Isabelle power users
have an inadequate idea what the system does and what not, despite the
official documentation.

We need to work harder on this thread, to move forward.

Makarius

Email Gateway (Aug 22 2022 at 11:20):

From: Andrei Popescu <A.Popescu@mdx.ac.uk>
Hi Makarius,

Many thanks for this great news! Declaring the Isabelle/HOL typedef to be definitional is well-justified and surely welcomed by the users.
This definitional status, achieved on a tricky foundational terrain in the presence of constant overloading, benefits from the previous work
on taming overloading by yourself, Steven Obua and Florian Haftmann (among others).

I am looking forward to the relaxed discussion in front of us. :-)

All the best,
Andrei

Email Gateway (Aug 22 2022 at 11:20):

From: Makarius <makarius@sketis.net>
Here is now my virtual review of the ITP paper behind this thread. It came
out a bit longer than anticipated, although I included only half of the
thoughts that came to me, while reading it 3 times.

Makarius

Title: A Consistent Foundation for Isabelle/HOL
Authors: Ondřej Kunčar, Andrei Popescu
http://www21.in.tum.de/~kuncar/documents/kuncar-popescu-itp2015.pdf

Overall evaluation: accept
Reviewer's confidence: high

Review:

The paper revisits the question how overloaded definitions (of constants and
type constructores) may be understood within the HOL logic. This work
consists of three parts:

(1) Technical treatment of the HOL logic, with a notion of "definitional
theory" and its well-formedness, interpretation of this extended HOL logic
in a substantially reformed version of the original HOL88 interpretation
due to A. Pitts.

(2) A proposed change to the Isabelle2014 code base, to upgrade its
'typedef' mechanism to perform the additional well-formedness checks
described in the paper. The text refers to
http://www21.in.tum.de/~kuncar/documents/patch.html which provides some
changesets and one page of explanations how this fits into the setup of
the paper.

(3) General motivation and discussion of perceived problems of
Isabelle/HOL, which the paper sets out to resolve. This part is apt to
cause confusion in various ways, both to users and non-users of Isabelle.
The style of presentation considerably subtracts from this otherwise fine
paper.

Part (1)

These are sections 3, 4, 5 in the paper. Section 5 is the main contribution.
The classic HOL logic of HOL88 (due to M. Gordon and semantics by A. Pitts)
is taken as starting point. There is a nice explanation in 4.3, why
overloaded definitions of constants or types does not work out in the
interpretation of A. Pitts (HOL88 book from 1993). Thus a new interpretation
is introduced as follows:

* Types are not immediately interpreted away into sets, but preserved as
syntactic entities to keep sufficient structure that allows well-founded
recursion. The paper calls this "a natural syntactic-semantic blend, which
avoids stunt performance in set theory".

* Type interpretation starts out on ground types. Polymorphism is
interpreted only in the final stage, treating it as strictly schematic
over all ground interpretations.

* Models for ground interpretation are constructed in stages, following
the well-founded dependency relation of the given collection of
definitions. Here types and constants are interpreted simultaneously for
each stage, while in the HOL88 semantics by A. Pitts all types are
interpreted before all constants.

This model construction is fit together with other recent work by the first
author [19], opening the perspective of fully checked definitional theories
in Isabelle/HOL. More about this in part (2).

Taking the little space of a paper in LNCS proceedings into account, the
technical treatment of the problem is worked out very carefully. Numerous
fine points and additional side-conditions are included in the
considerations.

Nonetheless, there is a potential for confusion due to slight deviation of
terminology and concepts, compared to papers on the subject e.g. by
Wenzel/Haftmann and the Isabelle documentation (isar-ref manual).

In particular: Section 3.4 defines a "definitional theory" as a set of
overloaded definitions (for constants or types) such that certain local
syntactic conditions hold. Section 4.2 defines a "well-formed definitional
theory" as a definitional theory with well-founded dependency relation as
global condition. Section 4.5 establishes the main theorem that a
well-formed definitional theory is consistent -- by demonstrating that it
has a model in the above sense, and concluding that False cannot be derived
in the inference system -- due to soundness of the interpretation.

In contrast, the specifications of a "definitional theory" of the paper
would be called in the Isabelle documentation "unchecked definitions",
"definitional axioms", or "axiomatizations", to emphasize the missing aspect
of global well-formedness that this paper is mainly about. A "well-formed
definitional theory" of the paper would be just called "definitional theory"
in Isabelle terminology, and much stronger properties intended than just
consistency. Instead of merely ensuring the existence of a model, the
requirement is to preserve derivability of old propositions precisely, and
allow to reduce new derivations of new propositions into old derivations by
"expanding" definitions in some way. If and how this works depends on
fine-points of the underlying definitional mechanisms: for overloaded
constant definitions in Isabelle/Pure this works [13, 35], but for type
definitions the situation is a-priori quite different, e.g. see [13] section
4.3. The deeper reason for this are the distinctive categories of terms and
types in HOL, and the lack of equational reasoning on types.

A suitable semantic treatment is required to re-unify the view on terms and
types, as is done in the present paper. But this leads to weaker results.
The paper does briefly mention "a suitable notion of conservativeness" as
future work, but due to lack of further explanations, many readers will
probably miss key point behind this.

There is another aspect of "consistency" versus "conservativity" that is
easily missed in the paper. Section 3.5 states "Isabelle/HOL theory
development proceeds by: 1 ... 2 ... 3 ...". The three kinds of operations
ensure that every stage of theory development is "definitional" in the sense
of the paper -- and "well-formedness" can be established later. But what
happens when the user starts adding non-definitional axioms in between?
Strictly speaking, the main result does not apply and the whole theory
looses its good definitional properties.

In contrast, the traditional explanation of (constant) definitions in
Isabelle is more modular in that respect: given an arbitrary base theory ---
one that somehow makes sense to the user in a particular application, even
after making unchecked axiomatizations --- definitions merely introduce
names for existing entities and thus preserve the key properties of the base
theory. This agnostic approach of "definitional theory extensions"
(occording to Isabelle terminolgy) is particularly important from the
perspective of the Isabelle/Pure framework, where the interpretation of
object-logics is not yet known.

For HOL applications one might argue that genuine axiomatic extensions are
not done in practice these days. Nonetheless some exotic applications like
HOLZF (by Obua) do exist. Without stronger results under which conditions
typedefs "make sense" or "are OK" (using words from the paper), such
ambitious users would have to revisit the whole model theory of HOL again.

Part (2)

The patch to Isabelle2014 is rather minimal: the existing infrastructure to
check overloaded constant definitions (going back to Isabelle2007, and
slightly improved by O. Kuncar [19] is generalized to cover type definitions
as well. So Isabelle/Pure is upgraded to provide a general service for
"definitional specification items" that are identified in the name space for
constants or types, but without looking at actual content.

While the explanations for the patch states "the situation is from a
technical (implementation) point view a little bit more complicated", the
outcome is actually simpler than in the paper. Being forced to strip away
accidental aspects of the HOL object-logic does occasionally have
advantages.

For example, the delicate notion of "build-in types" vs. "non-built-in
types" is absent in the implementation. The notes on the patch provide some
explanations, why it works out nonetheless. It would be nice to see this
refinement applied to the main work of part (1), to trim it further down to
the very essence of symbolic specifications with schematic polymorphism and
overloading.

The proposed change to Isabelle2014 is called "correction patch", as if
something would be broken that is fixed by the change. This misunderstanding
leads directly into the general discussion of this work below.

Part (3)

These are sections 1 "Introduction", section 2 "Related Work", section 6
"Conclusion", i.e. the important pieces that put the technical contribution
into proper perspective. This is what most people read, and what attendants
of a conference presentation who are busy with their e-mails or smart-phone
usually take home. Unfortunately, the story being told here do not quite fit
to Isabelle.

The misunderstanding already starts in the first paragraph, where
Isabelle/HOL is included into the "umbrella term" of "HOL-based provers".
HOL4, HOL-Light, ProofPower, HOL Zero are fine systems, but Isabelle/HOL is
not as closely related to them as the "HOL" name might suggest. In many
respects Isabelle is actually closer to Coq.

Technically, the key misunderstanding is the role of "the kernel" (in the
words of the paper). HOL88 as the predecessor of all the other HOL systems
pioneered an add-on to the original LCF kernel design to have checked
definitions as primitive rules. In contrast, Isabelle is closer to LCF in
separating logical inferences and a-priori unchecked (axiomatic) theory
specifications: there is no notion of definitions in the inference kernel of
Isabelle. Since this is a bit impractical for big applications, more and
more checks on theory content have been added over the years (e.g. 2004,
2005, 2007, 2010, 2014), but that process was never formally closed. And
this is what the paper is actually about: complete emulation of HOL88-style
definitions in Isabelle/HOL (by providing additional services in the
Isabelle/Pure framework).

We can now revisit the critical examples in t
[message truncated]

Email Gateway (Aug 22 2022 at 11:21):

From: Andrei Popescu <a.popescu@mdx.ac.uk>
Hi Makarius,

Many thanks for your comments, and for the acceptance decision which will make the rebuttal phase very relaxed.

I'll send the rebuttal comments tomorrow.

All the best,
Andrei

Email Gateway (Aug 22 2022 at 11:22):

From: Andrei Popescu <a.popescu@mdx.ac.uk>
Hi Makarius,

I thank you very much for looking at the technical contribution of the paper beyond matters of terminology and "attitude."
I find your summary of our technical contribution to be excellent.

In order to discuss your comments concerning our "attitude," I would first like to state my own view
on the situation, which is not entirely divergent from your view.

Isabelle/HOL is not a standard object logic of Isabelle/Pure, in the usual sense of object logics represented in a logical framework.
The culprit is typedef, which is implemented in an ad hoc manner, and not as an object-logic judgment. This is necessary
due to the nature of typedef. But this is already a warning that whatever guarantees one provides at the generic level of
Isabelle/Pure may simply not be good enough for Isabelle/HOL.

By contrast, the constant definition mechanism from Isabelle/Pure is imported to Isabelle/HOL following a uniform
"object-logic inside meta-logic" shallow embedding scheme. This scheme also applies, for example, to the representation of
bindings and the function space.

In your ‘97 paper you prove the following very nice result:
In Isabelle/Pure, under suitable orthogonality and well-foundedness conditions, the addition of overloaded
constant definitions to any theory T is "meta-safe," in that it is syntactically conservative and
anything provable in the extended theory can also be represented and proved in the original T in
a canonical way, by “realizing” the new constants as terms in the old signature. This indeed is a very
strong conservativity result, which also implies preservation of consistency. It applies to the object logics
of Isabelle/Pure, which can be regarded as theories. As you also remind us in your review, another pleasant
feature of this result is T not needing to be a definitional theory.

But: Isabelle/HOL of course only partly benefits from this result. In fact, compared to Gordon-HOL,
which offers model-theoretic safety
for both constant and type definitions, I would say that in these conditions Isabelle/HOL would be
only half-way safe, and quite non-democratically so. On the one hand, there is this luxurious (meta-)safety
guarantee for constant definitions, and on the other hand there is nothing for typedef. Instead of stating this problem
as a pressing open problem (already in ‘97), you had decided to introduce this oxymoron: "axiomatic typedef."
Unfortunately, nobody (except perhaps for Florian who used to “joke” about typedef being axiomatic) has noticed this
ideological move away from the tradition of the HOL systems.

(Here, I have a confession. When reading your paper, it simply did not cross my mind that you could have
just abandoned typedef :-( and instead I came up with the following explanation: Your approach is to prove as
much as you can in Isabelle/Pure for constants,
namely, meta-safety, and in particular syntactic conservativity. Then you would move to the object level of Isabelle/HOL where
typedef is model-theoretically safe, in particular syntactically conservative for definitional theories. Putting together these two
properties, you would get consistency of definitional theories of Isabelle/HOL. As we know now,
this almost works but not quite, since typedefs are allowed to depend on unresolved overloading.)

Next I comment on your review:

Technically, the key misunderstanding is the role of "the kernel" (in the
words of the paper). HOL88 as the predecessor of all the other HOL systems
pioneered an add-on to the original LCF kernel design to have checked
definitions as primitive rules. In contrast, Isabelle is closer to LCF in
separating logical inferences and a-priori unchecked (axiomatic) theory
specifications: there is no notion of definitions in the inference kernel of Isabelle.

In the paper, we use the word "kernel" to mean "logic kernel" (as I hoped would be clear from the paper's
general context and from the usage of the phrase
“logic kernel” for one of the two occurrences of the word).
The logic kernel consists of the inference rules and the definitional mechanisms, regardless of where they
are located in the implementation. A small logic kernel is an implementation-independent virtue of
Isabelle/HOL as well as of all the Gordon-HOL systems.

Nonetheless, there is a potential for confusion due to slight deviation of
terminology and concepts, compared to papers on the subject e.g. by
Wenzel/Haftmann and the Isabelle documentation (isar-ref manual).
...
In contrast, the specifications of a "definitional theory" of the paper
would be called in the Isabelle documentation "unchecked definitions",
"definitional axioms", or "axiomatizations", to emphasize the missing aspect
of global well-formedness that this paper is mainly about.

For the journal version, we will switch to the terminology from the Isabelle documentation.
I believe "definitional axioms" is the most suggestive term here. And "axiomatizations" is too wide.

A "well-formed definitional theory" of the paper would be just called "definitional theory"
in Isabelle terminology, and much stronger properties intended than just consistency.
Instead of merely ensuring the existence of a model, the requirement is to preserve derivability of old propositions precisely, and
allow to reduce new derivations of new propositions into old derivations by
"expanding" definitions in some way. If and how this works depends on
fine-points of the underlying definitional mechanisms: for overloaded
constant definitions in Isabelle/Pure this works [13, 35], but for type
definitions the situation is a-priori quite different, e.g. see [13] section 4.3.
The deeper reason for this are the distinctive categories of terms and
types in HOL, and the lack of equational reasoning on types.

Indeed, consistency is a crucial, but rather weak property. With a bit of extra care, we could have stated our result
as more than consistency, namely, something similar to the preservation of standard models in the sense
of Gordon-HOL (which you revisit in your '97 paper).

A suitable semantic treatment is required to re-unify the view on terms and
types, as is done in the present paper. But this leads to weaker results.
The paper does briefly mention "a suitable notion of conservativeness" as
future work, but due to lack of further explanations, many readers will
probably miss key point behind this.

We do cite your paper there, so the reader can look up the relevant notion. However, I think that for most of the interesting theories,
typedef cannot be safe in any sense
even remotely similar to what you prove for constant definitions. In fact, typedef cannot be proved to preserve consistency
for non-definitional theories. This is shown by an example in your '97 paper, which I slightly generalize below:
Given any finite consistent definitional theory T, we should always be able find a number N such that there exists a
set-theoretic model of T where no type has precisely N elements. So T + "no type 'a has precisely N elements" is consistent,
but defining a type with N elements makes it inconsistent.

There is another aspect of "consistency" versus "conservativity" that is
easily missed in the paper. Section 3.5 states "Isabelle/HOL theory
development proceeds by: 1 ... 2 ... 3 ...". The three kinds of operations
ensure that every stage of theory development is "definitional" in the sense
of the paper -- and "well-formedness" can be established later. But what
happens when the user starts adding non-definitional axioms in between?
Strictly speaking, the main result does not apply and the whole theory
looses its good definitional properties.
...
For HOL applications one might argue that genuine axiomatic extensions are
not done in practice these days.

You are right, our result is restricted to definitional theories.
But typedef seems brittle in the face of amendments to the standard HOL model theory.
And such amendments are easily possible if one steps outside definitional theories.

Nonetheless some exotic applications like
HOLZF (by Obua) do exist. Without stronger results under which conditions
typedefs "make sense" or "are OK" (using words from the paper), such
ambitious users would have to revisit the whole model theory of HOL again.

IMO, Obua and other initiators of interesting experiments, besides forming an absolute minority of the users, are logical
grown-ups who typically know what they are doing. And they are in plain wilderness in the first place, when proving consistency.

In contrast, the traditional explanation of (constant) definitions in
Isabelle is more modular in that respect: given an arbitrary base theory ---
one that somehow makes sense to the user in a particular application, even
after making unchecked axiomatizations --- definitions merely introduce
names for existing entities and thus preserve the key properties of the base
theory. This agnostic approach of "definitional theory extensions"
(occording to Isabelle terminolgy) is particularly important from the
perspective of the Isabelle/Pure framework, where the interpretation of
object-logics is not yet known.

Agreed. But I am happy that the less generic Isabelle/HOL typedef feature now receives some consideration as well.

For example, the delicate notion of "build-in types" vs. "non-built-in
types" is absent in the implementation. The notes on the patch provide some
explanations, why it works out nonetheless. It would be nice to see this
refinement applied to the main work of part (1), to trim it further down to
the very essence of symbolic specifications with schematic polymorphism and
overload
[message truncated]

Email Gateway (Aug 22 2022 at 11:27):

From: Burkhart Wolff <Burkhart.Wolff@lri.fr>
Dear all,

with respect to the debate around the paper by Ondřej Kunčar and Andrei Popescu: A Consistent Foundation for Isabelle/HOL (ITP 2015)
and the debate it created, I have the following remarks and informations to add:

1) To be not misunderstood: I find this publication helpful and, after the quite nonchalant
reactions of key members of the Isabelle Community, strictly speaking necessary.

2) This paper creates outside the Isabelle community more echo than people might think.
At the moment, I am as part of the EUROMILS project part of the team that attempts
to get a common criteria (CC EAL5) evaluation for PikeOS through, where the models
and proofs were done with Isabelle. I can tell that I had a lengthy debate with
Evaluators and (indirectly) BSI representatives which became aware about this paper.

And of course, there is the effect of a children's telephone game which distorts the
story hopelessly.

3) As part of the project, we wrote early a Recommandations-Whitepaper explaining the importance
of conservative extensions and trying to define something like a “safe subset” of Isabelle.
It is called:

"Using Isabelle/HOL in Certification Processes: A System Description and Mandatory Recommendations"

and is part one of the EUROMILS Deliverable http://www.euromils.eu/downloads/Deliverables/Y2/2015-EM-UsedFormalMethods-WhitePaper.pdf <http://www.euromils.eu/downloads/Deliverables/Y2/2015-EM-UsedFormalMethods-WhitePaper.pdf>,
(pp. 1 .. 39)
a paper that is submitted to the ANSI and the BSI as part of the Common Criteria Evaluation of the PikeOS operating system.
It may be that these Mandatory recommendations were reused in future projects of this kind.

In this paper, we ruled out the critical consts - defs combination as unsafe, and made sure that we did not use these constructs in
our entire theories (as well as axioms, etc. Restraining strictly to conservative extension and avoiding obfuscation).

4) I welcome to see more formally proved meta-theory of Isabelle’s specification constructs; the HOL4 community shows at the
moment impressive progresses in this direction. May be that other open issues could be addressed as well.

Best regards,

Burkhart

Email Gateway (Aug 22 2022 at 11:27):

From: Peter Lammich <lammich@in.tum.de>
In the replied-to mail, I see more arguments that we should add proper
dependency checking to Isabelle/HOL as soon as possible.

As far as I know, there is a patch of Kunčar, which exists for more than
half a year now, which

* ensures that consts/defs only produce conservative extensions
* Does not slow down Isabelle significantly
* Works with whole AFP and Isabelle-Library, i.e., is does not break
existing formalizations by being too restrictive.

However, it seems to be rejected or not given high priority by some main
Isabelle developers.

If we would have integrated this patch earlier, we could have said:
There was an issue, but now it is fixed, and we even have a
(pen-and-paper) proof that it is sound now. So, anyone who reads the
paper would probably be happy, and the rumours spread would be somewhat
like: "The old Isabelle versions are unsound, you should update to
Isabelle-2015, this is provably sound now"

However, now we have: If you use overloading, you are basically on your
own, and have to ensure consistency yourself. Rules how to ensure
consistent definitions are not included in the Isabelle documentation.
And the rumours about Isabelle unsoundness spread as described in the
replied-to mail.

So if we want to use Isabelle as a device to get very high confidence in
our proofs, any mechanism that allows you to prove False in some
intransparent ways should be considered a severe malfunction of the
system, and fixed as soon as possible. And tainting an essential
mechanism as axiomatic (as the documentation of defs does) is not a
solution, but makes the system essentially unusable for getting
high-confidence theorems.

In my opinion, we should even think of a mode of operation that forbids
to add any axioms beyond a certain default set of axioms (e.g. HOL),
such that we can establish the guarantee: "Sound wrt. HOL", without
manually inspecting all theory files for axiomatic declarations (see
HOL-zero how to drive this idea to the extreme).

Email Gateway (Aug 22 2022 at 11:27):

From: Makarius <makarius@sketis.net>
On Wed, 16 Sep 2015, Peter Lammich wrote:

As far as I know, there is a patch of Kunčar, which exists for more than
half a year now, which
* ensures that consts/defs only produce conservative extensions
* Does not slow down Isabelle significantly
* Works with whole AFP and Isabelle-Library, i.e., is does not break
existing formalizations by being too restrictive.

However, it seems to be rejected or not given high priority by some main
Isabelle developers.

Can you quote precisely who said what? Otherwise we get into a situation
of vague rumors and implicit accusations of unnamed people.

If we would have integrated this patch earlier, we could have said:
There was an issue, but now it is fixed, and we even have a
(pen-and-paper) proof that it is sound now. So, anyone who reads the
paper would probably be happy, and the rumours spread would be somewhat
like: "The old Isabelle versions are unsound, you should update to
Isabelle-2015, this is provably sound now"

The last sentence is the opposite of what I am trying to point out on
isabelle-users and on isabelle-dev for countless years. When there is an
incident of some sort, and change might improve the situation or make it
worse. There is never a state where one could claim it to be "fixed".

Makarius

Email Gateway (Aug 22 2022 at 11:27):

From: Larry Paulson <lp15@cam.ac.uk>
I was under the impression that this patch had been adopted. I don’t believe that I saw any arguments against it.
Larry

Email Gateway (Aug 22 2022 at 11:28):

From: Lars Noschinski <noschinl@in.tum.de>
We know that the current situation is bad -- at least, as long as one
wants to treat definitions as conservative extensions, which many of us
want to do. I understand there is a patch with a sound theory behind it,
which prohibits unsound definitions, without breaking current applications.

We may never achieve perfectness, there may be other issues, there may
be an even deeper change which would be even better. But I would expect
a critical (under the above assumptions) hole to be closed, as soon as
the problem is understood -- and not blocked on the vague notion that
every change may break something.

-- Lars

Email Gateway (Aug 22 2022 at 11:28):

From: Peter Lammich <lammich@in.tum.de>

However, it seems to be rejected or not given high priority by some main
Isabelle developers.

Can you quote precisely who said what? Otherwise we get into a situation
of vague rumors and implicit accusations of unnamed people.

Unfortunately, there seems to be a lot of rumours, and these issues have
never been discussed openly. Also my knowledge is partially based on
rumours. Clear facts are the following:

In their paper [2], Popescu and Kuncar refer to a patch
http://www21.in.tum.de/~kuncar/documents/patch.html
and say: "it is currently evaluated at the Isabelle headquarters"

Since the paper was written about 6 month ago, and nothing has yet
happened in this direction [in my opinion, the patch should be
integrated as soon as possible, and Isabelle2015-1 released immediately!
], I concluded what I said above.

If we would have integrated this patch earlier, we could have said:
There was an issue, but now it is fixed, and we even have a
(pen-and-paper) proof that it is sound now. So, anyone who reads the
paper would probably be happy, and the rumours spread would be somewhat
like: "The old Isabelle versions are unsound, you should update to
Isabelle-2015, this is provably sound now"

The last sentence is the opposite of what I am trying to point out on
isabelle-users and on isabelle-dev for countless years. When there is an
incident of some sort, and change might improve the situation or make it
worse. There is never a state where one could claim it to be "fixed".

This generic and vague statement can be used as an argument against any
change, and against any argumentation why the change is necessary.

Let me try anyway: Most users of Isabelle expect that they cannot prove
False, unless they use some well-known unsafe methods, such as
axiomatization and oracles. This is a primary design goal of Isabelle.
This is supported, for example in [1], by the following statements:
"The logical core is implemented according to the well-known “LCF
approach”"

"object-logics are specified by
stating their characteristic rules as new axioms. Any later additions in
application the-
ories are usually restricted to definitional specifications, and the
desired properties are
being proven explicitly"

The fact that one actually can prove false by using "defs", which is
commonly believed to be "definitional specification", is in strong
contrast to this goal. So, any state in which
1) all of Isabelle and AFP still works
2) there are less possibilities to prove False in unexpected ways
is arguably an improvement, as it conforms more to a primary design
goal. Even if this improvement makes some aspects worse, one has to
weigh up those disadvantages with the advantage of making the system
more conforming to one of its primary design goals.

References:
[1] Makarius Wenzel, Lawrence C. Paulson, Tobias Nipkow:
The Isabelle Framework. TPHOLs 2008: 33-38

[2] Ondrej Kuncar, Andrei Popescu:
A Consistent Foundation for Isabelle/HOL. ITP 2015: 234-252

Email Gateway (Aug 22 2022 at 11:28):

From: Peter Lammich <lammich@in.tum.de>
On Mi, 2015-09-16 at 16:10 +0100, Larry Paulson wrote:

I was under the impression that this patch had been adopted. I don’t believe that I saw any arguments against it.
Larry

If this should be true, we should make it as public as possible, to stop
any rumours about Isabelle unsoundness. The best way would be to release
Isabelle2015-1 immediately, even if it would only be (Isabelle2015 +
patch), and not based on the current state of the repository.

Email Gateway (Aug 22 2022 at 11:28):

From: Andrei Popescu <A.Popescu@mdx.ac.uk>
Dear All,

As far as I know, the integration of Ondra's solution patch is in Makarius's TODO list. Before doing the integration, Makarius wanted to see a mathematical justification for why our solution is sound.
Now such a justification is provided in our ITP paper^[*], in conjunction with Ondra's CPP paper^[**] -- it is not formal, but quite rigorous. So I see no major impediment to adopting our solution. Of course, it would be great if this were given a higher priority.

Just for the record, please also note that inconsistencies are not so infrequent in proof assistants: some of the prominent cases of such "incidents" are collected in our ITP paper, under the heading "Inconsistency Club."

[*] http://www.eis.mdx.ac.uk/staffpages/andreipopescu/pdf/ITP2015.pdf

[**] http://www4.in.tum.de/~kuncar/documents/kuncar-cpp2015.pdf

All the best,
Andrei

Please note that Middlesex University's preferred way of receiving all correspondence is via email in line with our Environmental Policy. All incoming post to Middlesex University is opened and scanned by our digital document handler, CDS, and then emailed to the recipient.

If you do not want your correspondence to Middlesex University processed in this way please email the recipient directly. Parcels, couriered items and recorded delivery items will not be opened or scanned by CDS. There are items which are "exceptions" which will be opened by CDS but will not be scanned a full list of these can be obtained by contacting the University.

Email Gateway (Aug 22 2022 at 11:28):

From: Lars Noschinski <noschinl@in.tum.de>
As I understand it, there are/were two patches: One addressing a problem
in the implementation of the cyclicity check and a second one adding
dependencies on types as described in the ITP 2015 paper.

To my knowledge, only the first one has been adopted.

-- Lars

Email Gateway (Aug 22 2022 at 11:28):

From: Larry Paulson <lp15@cam.ac.uk>
I’d like to point out again that users need to take responsibility for their own definitions. I regularly see proofs that are valueless because they are based on incorrect definitions.

To my mind, a soundness bug is when a valid expression or proof state is transformed into something wrong. The problem identified here are that Isabelle is failing to prohibit certain definitions that don’t make sense. There is no claim that Isabelle is doing anything wrong with these definitions. It’s hard to believe that a user could make such definitions accidentally.

It would be interesting to find out how these problems were identified: whether they were looked for out of curiosity, or whether on the other hand they manifested themselves in the course of an actual proof.

Larry Paulson

Email Gateway (Aug 22 2022 at 11:28):

From: Andrei Popescu <a.popescu@mdx.ac.uk>
Dear Larry,

You wrote:

I'd like to point out again that users need to take responsibility for their own definitions. I regularly see proofs that are valueless because they are based on incorrect definitions.
To my mind, a soundness bug is when a valid expression or proof state is transformed into something wrong. The problem identified here are that Isabelle is failing to prohibit certain definitions that don't
make sense. There is no claim that Isabelle is doing anything wrong with these definitions. It's hard to believe that a user could make such definitions accidentally.

Your position here puzzles me now as much as it did the last time we talked about this. Let's forget about Isabelle/HOL for a second, and think of a logic L with axioms and deduction rules, but no definitions.
Further, assume that L is known, or strongly believed to be consistent, in that it does not prove False. Now consider L_D, the logic L augmented with definitional mechanisms. This augmented logic should of course not prove False either! Writing meaningful definitions is the user's responsibility, but having the definitions consistent is the logic L_D's responsibility. Guaranteed consistency distinguishes definitions from arbitrary new axioms -- I learned this years ago from your papers and books.

You wrote:

It would be interesting to find out how these problems were identified: whether they were looked for out of curiosity, or whether on the other hand they manifested themselves
in the course of an actual proof.

Ondra discovered the typedef inconsistency, so he is the best person to answer this.

All the best,
Andrei

Email Gateway (Aug 22 2022 at 11:29):

From: "\"Mark Adams\"" <mark@proof-technologies.com>
I think there is an important distinction here between getting a definition
wrong and making the logic inconsistent. Of course a user can always
accidentally make a wrong definition, and of course this is a big problem is
usage of theorem provers, but this is a red herring. Users rightly expect
that it should not be possible for definitions to make the logic
inconsistent. This, surely, is one of the big selling points of using
definitional facilities as opposed to just adding axioms - they are (or
should be) fundamentally conservative. And this is why it was so important
to fix the bug in HOL's primitive constant definition facility in the late
1980s, where type variables occurring in the RHS were allowed not to occur
in the LHS.

Mark Adams.

on 16/9/15 5:25 PM, Larry Paulson <lp15@cam.ac.uk> wrote:

I’d like to point out again that users need to take responsibility for
their
own definitions. I regularly see proofs that are valueless because they
are
based on incorrect definitions.

To my mind, a soundness bug is when a valid expression or proof state is
transformed into something wrong. The problem identified here are that
Isabelle is failing to prohibit certain definitions that don’t make sense.
There is no claim that Isabelle is doing anything wrong with these
definitions. It’s hard to believe that a user could make such definitions
accidentally.

It would be interesting to find out how these problems were identified:
whether they were looked for out of curiosity, or whether on the other
hand
they manifested themselves in the course of an actual proof.

Larry Paulson

On 16 Sep 2015, at 16:39, Peter Lammich <lammich@in.tum.de> wrote:

On Mi, 2015-09-16 at 16:10 +0100, Larry Paulson wrote:

I was under the impression that this patch had been adopted. I don’t
believe that I saw any arguments against it.
Larry

If this should be true, we should make it as public as possible, to stop
any rumours about Isabelle unsoundness. The best way would be to release
Isabelle2015-1 immediately, even if it would only be (Isabelle2015 +
patch), and not based on the current state of the repository.

--
Peter

Email Gateway (Aug 22 2022 at 11:29):

From: Ramana Kumar <rk436@cam.ac.uk>
Let me add to this discussion, just because I happen to have been
discussing it at Data61 yesterday too.

Yesterday, I showed my colleagues that it is possible to prove False in
Isabelle2015 without any warnings or errors and without any use of
axiomatization or oracles. The trick was, of course, Ondrej's example, but
updated to use "overloading" rather than the deprecated "defs" keyword.
Many of them were surprised, and wanted to know why, when I said that a
patch did exist, it has not been incorporated. I think rumours about
"Isabelle developers" are unavoidable at that point. We are also interested
in the opinion of qualifiers and regulators of the tools we use, and
Burkhart's original message is very interesting from that perspective.

My own opinion now: I would like the patch intended to make it impossible
to prove False in Isabelle/HOL no matter what definitions one uses
incorporated as soon as possible. I'm surprised that that is a contentious
issue. I believe we have found the last of those problems now, because of
Ondrej and Andrei's (on paper) consistency proof.

In the longer term, I am very interested in mechanising the consistency
proof (as we have done for the basic HOL logic as used in HOL Light).
Furthermore, if there is enough community interest, I would be interested
in building a verified proof-checker for Isabelle/HOL. One crucial
ingredient required from the Isabelle community here is the ability to
export proofs in a low-level format (ideally OpenTheory, rather than some
other new format), but I believe you're already quite close to that with
the option to produce proof terms.

I think this story involving explicit proofs and small verified checkers is
the ultimate one to sell to certification authorities, even while
day-to-day we also want our big tools to be sound.

Email Gateway (Aug 22 2022 at 11:29):

From: Jasmin Blanchette <jasmin.blanchette@inria.fr>
Dear Larry, Andrei,

Your position here puzzles me now as much as it did the last time we talked about this. Let's forget about Isabelle/HOL for a second, and think of a logic L with axioms and deduction rules, but no definitions.
Further, assume that L is known, or strongly believed to be consistent, in that it does not prove False. Now consider L_D, the logic L augmented with definitional mechanisms. This augmented logic should of course not prove False either! Writing meaningful definitions is the user's responsibility, but having the definitions consistent is the logic L_D's responsibility. Guaranteed consistency distinguishes definitions from arbitrary new axioms -- I learned this years ago from your papers and books.

I can only second this. After reading books like the Isabelle tutorial, which has Larry as a coauthor, I developed a certain understanding for what "definitional" and "foundational" means, and was for many years under the impression that there was a strong consensus in the proof assistant communities. In this context, I find Larry's comments rather puzzling. In fact, I agree with almost every single sentence he wrote, but

To my mind, a soundness bug is when a valid expression or proof state is transformed into something wrong.

violates the very notion of "definitional". At some point, we will have to make up our minds as to whether our definitions are definitions or just arbitrary axioms (and whether "typedef"s count as definition).

Mark's comments, which I just read, also neatly summarizes what I thought until recently was a consensus also shared by Larry.

Jasmin

Email Gateway (Aug 22 2022 at 11:29):

From: Holger Blasum <hbl@sysgo.com>
Hello Larry, list,

I can give more detail for the case of EURO-MILS: We were made aware of
"Isabelle Isar-Reference Manual (Version 2013-2, pp. 103): "It is at
the discretion of the user to avoid malformed theory specifications!"[1]
Hence first attempt was to rule out use of "consts"[2].
This resulted in people even becoming more curious and playing
with Kuncar/Popescu.
Moreover, it has been pointed out that even if we rule out "consts" and "defs" in our theories then it is still used e.g. in HOL.thy[3] (probably without
overloading, but it is hard to judge for me).

The task at hand is not only convincing ourselves (arguably when proving some
property oneself one usually/sometimes gets a feeling for what is correct
and what not) but also others (who have limited resources).

To give an example: in attached System_Is_Secure.thy the first
derivation of "System_Is_Secure" (theorem System_Is_Secure_1)
forces to make the assumptions of "A" and "~A" obvious. This makes
it harder to cheat ourselves or others.

The second derivation of "System_Is_Secure" (theorem System_Is_Secure_2)
hides the assumptions and could be more easy be overlooked.

My working understanding is (correct if that is wrong!) that fixing
Isabelle would rule out hidden derivations such as System_Is_Secure_2.

[1] http://isabelle.in.tum.de/doc/isar-ref.pdf p 121
The (unchecked) option disables global dependency checks for this def-
inition, which is occasionally useful for exotic overloading. It is at
the discretion of the user to avoid malformed theory specifications!
[2] http://www.euromils.eu/downloads/Deliverables/Y2/2015-EM-UsedFormalMethods-WhitePaper.pdf Section 3.1 "Constants. The Isabelle consts command is not used."
[3] http://afp.sourceforge.net/browser_info/current/HOL/HOL/HOL.html
search for "defs"

---------(inlined: System_Is_Secure.thy, also attached)
theory System_Is_Secure
imports Main
begin

consts A :: bool
consts System_Is_Secure :: bool

(* formulation with explicit assumptions, easy for reviewer to spot that user made too strong assumptions *)
theorem System_Is_Secure_1:
assumes "A"
and "~ A"
shows System_Is_Secure
proof-
from assms show ?thesis by simp
qed

(* Kuncar/Popescu: A Consistent Foundation for Isabelle/HOL, ITP 2015 *)
consts c :: bool
typedef T = "{True, c}" by blast
defs c_bool_def: "c::bool == ~ (ALL(x::T) y. x = y)"
lemma L: "(ALL(x::T) y. x = y) <-> c"
using Rep_T Rep_T_inject Abs_T_inject by blast
lemma MyFalse: False
using L unfolding c_bool_def by auto

theorem MySystem_Is_Secure [simp]: System_Is_Secure
using MyFalse by simp

(* formulation with implicit assumptions, not that easy for reviewer to spot that user made too strong assumptions *)
theorem System_Is_Secure_2: System_Is_Secure by simp

end

best,

Email Gateway (Aug 22 2022 at 11:29):

From: Holger Blasum <hbl@sysgo.com>
Same posting as parent, this time with attachment attached.
System_Is_Secure.thy

Email Gateway (Aug 22 2022 at 11:29):

From: Tobias Nipkow <nipkow@in.tum.de>
On 16/09/2015 17:48, Andrei Popescu wrote:

Dear All,

As far as I know, the integration of Ondra's solution patch is in Makarius's TODO list. Before doing the integration, Makarius wanted to see a mathematical justification for why our solution is sound.
Now such a justification is provided in our ITP paper^[*], in conjunction with Ondra's CPP paper^[**] -- it is not formal, but quite rigorous. So I see no major impediment to adopting our solution.>

Let us hope you are right, although Makarius was dismissive of what you proved
in earlier discussions and wanted a conservativity result.

Moreover, the check that Ondrej implemented is clearly necessary to ban these
cicularities.

Of course, it would be great if this were given a higher priority.

That is the whole point. This is not a minor issue with fonts but concerns and
endangers the very core of what we are doing.

Tobias

Just for the record, please also note that inconsistencies are not so infrequent in proof assistants: some of the prominent cases of such "incidents" are collected in our ITP paper, under the heading "Inconsistency Club."

[*] http://www.eis.mdx.ac.uk/staffpages/andreipopescu/pdf/ITP2015.pdf

[**] http://www4.in.tum.de/~kuncar/documents/kuncar-cpp2015.pdf

All the best,
Andrei

Please note that Middlesex University's preferred way of receiving all correspondence is via email in line with our Environmental Policy. All incoming post to Middlesex University is opened and scanned by our digital document handler, CDS, and then emailed to the recipient.

If you do not want your correspondence to Middlesex University processed in this way please email the recipient directly. Parcels, couriered items and recorded delivery items will not be opened or scanned by CDS. There are items which are "exceptions" which will be opened by CDS but will not be scanned a full list of these can be obtained by contacting the University.

smime.p7s

Email Gateway (Aug 22 2022 at 11:30):

From: Tjark Weber <tjark.weber@it.uu.se>
Ramana,

Take a look at Stefan Berghofer's thesis. He already implemented a
proof checker for his proof terms (albeit not verified).

The main issue, if I recall correctly, was that these proof terms could
become huge.

Best,
Tjark

Email Gateway (Aug 22 2022 at 11:30):

From: Ramana Kumar <rk436@cam.ac.uk>
Thanks Tjark.

I am aware of Stefan's excellent work, and I'm sorry if I gave the
impression that my proposal was novel in any sense. (Perhaps the part about
verifying the checker is new, but that's just icing for the purposes of
this discussion.)

My concern is whether the proof checker for proof terms is actually used or
not. In an ideal world, I would think that for any substantial
formalisation, the proof terms would be the bulk of the evidence provided
to the certification authority or any other party interested to know why
the claims being made are true, and those proof terms would be checked
regularly (say twice a year) by a small checker if the formalisation is
under continued development. Does anyone actually do that? Is scalability
the only issue?

Apologies if this is veering too far from the original topic. I'm happy to
continue discussion on another thread if desired.

Email Gateway (Aug 22 2022 at 11:30):

From: Peter Lammich <lammich@in.tum.de>
I feel like I should give a real example of what I think is consensus,
and why the current state in Isabelle/HOL is not satisfactory:

If I manage to prove in Isabelle, e.g.,

theorem fermats_theorem:
"∀n::nat>2. ¬(∃a::nat>0. ∃b::nat>0. ∃c::nat>0. a^n + b^n = c^n)"

then, to believe that I really proved Fermat, one only has to check that
the definitions of the natural number library really match the expected
operations on naturals, and that my statement of the theorem really
matches Fermat's theorem.

However, in current Isabelle, one has to check every single definition,
even the unrelated ones, for consistency, to rule out, e.g., the
following proof. Note that this proof, assuming Fermat really holds,
does not even introduce inconsistency into the logic ... it's just a
hidden axiomatization of Fermat:

(* Hide and obfuscate this stuff well, such that it is not too
easy to find for a human, but ensure that ATPs still find it
*)
definition "P ≡
∀n::nat>2. ¬(∃a::nat>0. ∃b::nat>0. ∃c::nat>0. a^n + b^n = c^n)"

consts c :: bool
typedef T = "{True,c}" by blast

lemma L: "(∀(x::T) y. x = y) ⟷ c"
using Rep_T Rep_T_inject Abs_T_inject by (cases c) force+

defs c_def: "c ≡ if (∀x::T. ∀y. x=y) then P else ¬P"

(* Place this main statement prominently in your theories, and hail
sledgehammer for being really powerful *)
theorem fermats_theorem:
"∀n::nat>2. ¬(∃a::nat>0. ∃b::nat>0. ∃c::nat>0. a^n + b^n = c^n)"
using L P_def c_def by presburger (* This proof was found by
sledgehammer! *)

So, with sledgehammer becoming more powerful, and having the possibility
of making inconsistent definitions, it's only a matter of time when
sledgehammer finds some nice proof for you, which exploits, in some
non-obvious ways, the inconsistency of completely unrelated definitions.

Email Gateway (Aug 22 2022 at 11:30):

From: Larry Paulson <lp15@cam.ac.uk>
Thanks for giving this example.

As people seem to have misunderstood my position, let me try again, because important points are being overlooked.

Of course I think that this circularity check ought to be made. I was under the impression that this patch had been made already. Nevertheless, I do not believe that it poses any immediate problem for users.

Over the whole of the 1990s, it was possible to introduce cyclic definitions in Isabelle, and this possibility was specifically mentioned in the documentation. To do this required making three separate steps: you had to introduce some A, then define some B in terms of A, and finally define A in terms of B. Anybody who thinks they are in danger of doing this accidentally really should be working in another field. Certainly none of our existing users were affected when we tightened up our checks. Nobody had made this mistake.

The reason I keep stressing this point is that I regularly see work where the definitions don’t make any sense, even though they are noncircular. It is very easy to do. You verify some mechanism and you include some well-definedness predicate on states that can never be satisfied. Then you prove “WD(x) ==> P(x)". Ph.D. supervisors and referees do to overlook such things, even when they are blatant. And then it is necessary to argue with the referees because “the proof has been checked by machine”.

I am sure that this type definition problem will be fixed in time for the next release. I’m not sure whether the other problem will be fixed.

Larry Paulson

Email Gateway (Aug 22 2022 at 11:30):

From: Ramana Kumar <rk436@cam.ac.uk>
Dear Larry,

Thank you for the clarification. I agree that you are making an important
point.

I’m not sure whether the other problem will be fixed.
>

One might be tempted to throw up hands and proclaim that the theorem prover
cannot be expected to check that your definitions actually make sense; at
best it can check that they are consistent. However, thinking about this
problem just now, I realised that something similar to the existing
AutoQuickcheck could be helpful: something that says "you seem to be trying
to prove this the hard way, but it's actually very simple because your
assumptions imply False!".

Ramana

Email Gateway (Aug 22 2022 at 11:30):

From: Tobias Nipkow <nipkow@in.tum.de>
On 17/09/2015 13:56, Larry Paulson wrote:

Thanks for giving this example.

As people seem to have misunderstood my position, let me try again, because important points are being overlooked.

Of course I think that this circularity check ought to be made. I was under the impression that this patch had been made already. Nevertheless, I do not believe that it poses any immediate problem for users.

Over the whole of the 1990s, it was possible to introduce cyclic definitions in Isabelle, and this possibility was specifically mentioned in the documentation. To do this required making three separate steps: you had to introduce some A, then define some B in terms of A, and finally define A in terms of B. Anybody who thinks they are in danger of doing this accidentally really should be working in another field. Certainly none of our existing users were affected when we tightened up our checks. Nobody had made this mistake.

The reason I keep stressing this point is that I regularly see work where the definitions don’t make any sense, even though they are noncircular. It is very easy to do. You verify some mechanism and you include some well-definedness predicate on states that can never be satisfied. Then you prove “WD(x) ==> P(x)". Ph.D. supervisors and referees do to overlook such things, even when they are blatant. And then it is necessary to argue with the referees because “the proof has been checked by machine”.

In the model checking community this is known as the vacuity problem and it has
received a certain amount of attention.

Tobias

I am sure that this type definition problem will be fixed in time for the next release. I’m not sure whether the other problem will be fixed.

Larry Paulson

On 17 Sep 2015, at 07:14, Holger Blasum <hbl@sysgo.com> wrote:

Hello Larry, list,

On 09-16, Larry Paulson wrote:

It would be interesting to find out how these problems were identified: whether they were looked for out of curiosity, or whether on the other hand they manifested themselves in the course of an actual proof.

I can give more detail for the case of EURO-MILS: We were made aware of
"Isabelle Isar-Reference Manual (Version 2013-2, pp. 103): "It is at
the discretion of the user to avoid malformed theory specifications!"[1]
Hence first attempt was to rule out use of "consts"[2].
This resulted in people even becoming more curious and playing
with Kuncar/Popescu.

smime.p7s

Email Gateway (Aug 22 2022 at 11:30):

From: Johannes Hölzl <hoelzl@in.tum.de>
Am Donnerstag, den 17.09.2015, 12:56 +0100 schrieb Larry Paulson:
[..]

Over the whole of the 1990s, it was possible to introduce cyclic
definitions in Isabelle, and this possibility was specifically
mentioned in the documentation. To do this required making three
separate steps: you had to introduce some A, then define some B in
terms of A, and finally define A in terms of B. Anybody who thinks
they are in danger of doing this accidentally really should be working
in another field. Certainly none of our existing users were affected
when we tightened up our checks. Nobody had made this mistake.

If such a circle happens in the _one_ theory file and is as obvious as
this one, maybe the developer should be working in another field. But we
get a bigger and bigger type class hierarchy and we started a couple of
years ago to use type classes in type definitions. I work a lot with the
type class hierarchy but some parts are still blurry to me. The last
time Florian printed it out, it was a huge poster!

For large developments in Isabelle, one imports a lot of theories and
uses a lot of automatic proof methods. What happens if my definitions
are completely fine, but a freak combination of theories I import allows
Sledgehammer to prove false?

Johannes

The reason I keep stressing this point is that I regularly see work
where the definitions don’t make any sense, even though they are
noncircular. It is very easy to do. You verify some mechanism and you
include some well-definedness predicate on states that can never be
satisfied. Then you prove “WD(x) ==> P(x)". Ph.D. supervisors and
referees do to overlook such things, even when they are blatant. And
then it is necessary to argue with the referees because “the proof has
been checked by machine”.

I am sure that this type definition problem will be fixed in time for
the next release. I’m not sure whether the other problem will be
fixed.

Larry Paulson

On 17 Sep 2015, at 07:14, Holger Blasum <hbl@sysgo.com> wrote:

Hello Larry, list,

On 09-16, Larry Paulson wrote:

It would be interesting to find out how these problems were
identified: whether they were looked for out of curiosity, or whether
on the other hand they manifested themselves in the course of an
actual proof.

I can give more detail for the case of EURO-MILS: We were made aware
of
"Isabelle Isar-Reference Manual (Version 2013-2, pp. 103): "It is at
the discretion of the user to avoid malformed theory
specifications!"[1]
Hence first attempt was to rule out use of "consts"[2].
This resulted in people even becoming more curious and playing
with Kuncar/Popescu.

Email Gateway (Aug 22 2022 at 11:30):

From: Burkhart Wolff <Burkhart.Wolff@lri.fr>
Dear Larry,

I respectfully disagree.
It poses an immediate problem for users who have to argue in front of
certification authorities why using Isabelle may guarantee absolute certainty
(within a given frame of an underlying logic and model). And this might be an
important business case for the entire community.

Why Isabelle, if I could also apply Bachblütentherapie to improve software quality ?

If a definition makes SENSE (that is, in my view, indeed users responsibility) is just
another issue than that it makes an underlying theory inconsistent, which should
be Isabelle’s responsibility if methodologically correctly used.

By the way, I support the proposal earlier made in this thread to have a
“safe_use_flag” which restricts a session to constructs that we have reasons
to believe that they are conservative.

Best

Email Gateway (Aug 22 2022 at 11:30):

From: Larry Paulson <lp15@cam.ac.uk>
Sledgehammer is also useful here, because generally you get to see exactly which facts have been used to prove the theorem. Sometimes you may see a fact being used there doesn’t appear to be relevant. It’s always worth checking to see what is going on.

Larry

Email Gateway (Aug 22 2022 at 11:31):

From: Lars Noschinski <noschinl@in.tum.de>
I am not aware of anyone doing that. As far as I can tell, we only ever
build Main with proof terms, and even that breaks frequently (by hitting
some resource limit).

-- Lars

Email Gateway (Aug 22 2022 at 11:31):

From: David Cock <david.cock@inf.ethz.ch>
We tried it once on seL4. We ran out of memory (192GB, from memory)
very quickly. It's currently impractical. A streaming approach might
be better i.e. have the proof checker run simultaneously with Isabelle,
and check every step taken by the kernel as it's made, without ever
having to actually construct the whole proof term.

David

Email Gateway (Aug 22 2022 at 11:31):

From: Larry Paulson <lp15@cam.ac.uk>
Dear Burkhart,

I sympathise with you, but you go too far.

We do want tools such as Isabelle to be trusted by regulators and authorities. and so we do need to address every issue of this sort that comes along. Nevertheless, we should not be guaranteeing absolute certainty to anybody. If we oversell what is achievable, we risk a backlash.

It is true that it is now realistic to imagine fully verified proof checkers, the compilation to machine code also verified, running on a verified operating system and on verified hardware. Nevertheless, all of this involves using verification technology to verify itself. And there are innumerable other ways in which errors can creep in. Your “within a given model” is a huge qualification on "absolute certainty".

Avra Cohn’s 1989 essay on the subject remains topical:

http://www.cl.cam.ac.uk/~mjcg/papers/AvraProofPaper.pdf
http://link.springer.com/article/10.1007%2FBF00243000

There are any number of papers that report some of the gains that can be realised using verification tools, without making unrealistic promises.

Larry

Email Gateway (Aug 22 2022 at 11:31):

From: Burkhart Wolff <Burkhart.Wolff@lri.fr>
Dear Larry,

I was talking here about

(relative) consistency of the logic
and it's safe extension schemes.
Short term goal: an agreement on the safe core (I e : kernel PLUS extension schemes) and on the methodological issues.

Formal proofs for that are challenging, but nowadays perfectly feasible.

I was NOT talking about implementation correctness of Isabelle "as is", I am perfectly aware
Of the quite monstrous proof task and the principle limits of such an approach (it's always going to be based on models ... Of the machine,
The compiler, etc.)

Still, On the long run, relative
Solutions along this line of research
will and should come even for implementation correctness of Isabelle.

Von meinem iPhone gesendet

Email Gateway (Aug 22 2022 at 11:42):

From: Makarius <makarius@sketis.net>
This is an important point that is rarely explored systematically. From
my own experience in the past 10-15 years:

* "synthetic problems" of the first kind are encountered about 2 times
per year; typical changes are ff75ed08b3fb (2001) or ccbf9379e355
(2015)

* "practical problems", i.e. those encountered in actual applications
are encountered every 2-3 years

What happens all the time are misunderstandings by users about concrete
syntax or concrete syntax that is actually wrong. Nobody is worried about
that as a custom.

Makarius

Email Gateway (Aug 22 2022 at 11:42):

From: Makarius <makarius@sketis.net>
Here is the missing entry:

HOL Zero

Minimalistic re-implementation of HOL, designed with trustworthiness as
its top priority.

[small] +++
[ML] ++
[thm] +++
[defs] +++
[context] 0
[logic] ++
[formalized] 0

[ML] Unlike HOL-Light there are explicit provisions to avoid unsafe
aspects of OCaml. Architectural weaknesses remain, since user ML scripts
operate directly on the OCaml toplevel.

[small], [thm], [defs], [context], [logic] similar to HOL Light.

I.e. it is mostly like HOL Light, but [ML] gets ++ (like HOL4 with its SML
basis), because extra care is taken to avoid weaknesses of OCaml.

A fully managed ML environment as in Isabelle would get +++.

Makarius

Email Gateway (Aug 22 2022 at 11:42):

From: Makarius <makarius@sketis.net>
Indeed. All of this mostly talks about the "logical core", whatever its
internal structure is precisely.

Around the core there are many more aspects, where users can do non-sense,
or the system could be wrong.

Makarius

Email Gateway (Aug 22 2022 at 11:42):

From: Makarius <makarius@sketis.net>
That is definitely interesting work. I had some discussions about it with
Ali Assaf in October 2013, IIRC.

What is also remarkable in the above paper: the "HOL family" only covers
the actual HOL systems, while Isabelle/HOL falls into the category of
"other systems" (like Coq, Nuprl).

Thus the HOL family proper is defined as the systems with full
import/export wrt. OpenTheory.

Makarius

Email Gateway (Aug 22 2022 at 11:42):

From: Makarius <makarius@sketis.net>
Interesting paper. Here are some notes, after skimming through it.

The term "LCF approach" is used in the tradition of HOL88 to include
definitional principles, but LCF (and Isabelle) did not have that from
the outset. For papers that talk about trustworthyness of provers, the
distinction of [thm] vs. [theory] -- according to my earlier
categorization -- to avoid great confusion.
Concerning ML (SML or OCaml) as "system implementation language" and
weaknesses coming from exposing that to the user: LCF did not have this
problem, the LISP core was clearly separated from ML. The HOL family
after HOL88 collapsed everything to one ML toplevel. Isabelle is closer
to original LCF by running user-code in a managed Isabelle/ML
environment that is not the same as the underlying SML.
Isabelle does not have "proof scripts". The proper terminology is "proof
texts" or "proof documents". The point is that there is more structural
integrity enforced by the system, on some controlled language that is
not just the implementation language of the prover.
"Allow uncontrolled adaption to the way formulae are displayed": cf.
check/uncheck phases in Isabelle, which admit arbitrary non-sense on
input and output of terms, without any formal barriers. These are
official programming interfaces, not back-doors.
3.2.1 interesting notes about HOL Light, including a special
(non-)treatment of typedefs as implicit pre-requisites to proven
theorems. Such a formal record of dependencies (or "proof digest" in
Isabelle terminology) was excluded from the discussion so far. The
question is how "holes" in the reasoning are tracked (non-definitional
axioms, oracles). I think that HOL4 is doing this most thoroughly.
Isabelle only half-thoroughly.

I liked the general approach to port formalizations to other systems for
independent checking. I think the main weakness of the bigger systems
(Coq, Isabelle) is that they have grown into large islands of their own,
islands that could be mistaken as continents. It would be exceedingly
nice to see full OpenTheory connectivity for Isabelle/HOL, although the
addition of overloaded type constructors to the standard portfolio has
moved us one more step away from it.

Makarius

Last updated: Nov 15 2025 at 20:21 UTC