[isabelle] lexical matters · Archive Mirror: Isabelle Users Mailing List

Stream: Archive Mirror: Isabelle Users Mailing List

Topic: [isabelle] lexical matters

Email Gateway (Aug 18 2022 at 16:46):

From: mark@proof-technologies.com
on 19/1/11 3:25 AM, Michael Norrish <Michael.Norrish@nicta.com.au> wrote:

On 19/01/11 14:04, mark@proof-technologies.com wrote:

I never thought that advocating a trustworthy pretty printer would be so
controversial!!!

It's just not an interesting problem. We can solve it trivially by
writing
everything out in sexp-like notation. ACL2 is demonstrably an acceptable
format for security-critical validation, and this is all the
pretty-printing
they have.

Not sure what you mean here. You seem to be saying that low-tech printers
are the way forward, which again seems to be suggesting that good concrete
syntax printers do not help human readability.

Moreover, with proof terms (in Isabelle) and things like OpenTheory for
the
3 HOLs, we can avoid having to look at clients' ML code.

I am also advocating proof porting. With proof porting, however, we have
the same problem - in the target system, we still need to know that the
ported theorem is the right one and that the ported definitions are right.

All of the "3 HOLs" (presumably you are not including HOL Zero in the 3 HOLs
:) have problems with printing irregular names, and all of them have
problems coping with overloaded variables.

Mark.

Email Gateway (Aug 18 2022 at 16:46):

From: Michael Norrish <Michael.Norrish@nicta.com.au>
On 19/01/11 14:36, mark@proof-technologies.com wrote:

on 19/1/11 3:25 AM, Michael Norrish<Michael.Norrish@nicta.com.au> wrote:

It's just not an interesting problem. We can solve it trivially by writing
everything out in sexp-like notation. ACL2 is demonstrably an acceptable
format for security-critical validation, and this is all the pretty-printing
they have.

Not sure what you mean here. You seem to be saying that low-tech
printers are the way forward, which again seems to be suggesting
that good concrete syntax printers do not help human readability.

Of course they do. We also know that in our day-to-day use of our systems, the pretty-printer is not a cause of problems, so we can use the nice readability-enhancing features of these systems. On the other hand, if there is a worry about malicious attacks (as in the validated work scenario) then people concerned about these things can print the terms in question using sexp-syntax, which we know to be adequate.

Moreover, with proof terms (in Isabelle) and things like OpenTheory for the
3 HOLs, we can avoid having to look at clients' ML code.

I am also advocating proof porting. With proof porting, however, we have
the same problem - in the target system, we still need to know that the
ported theorem is the right one and that the ported definitions are right.

Right, which is why we have the pretty-printing solution of my paragraph above.

All of the "3 HOLs" (presumably you are not including HOL Zero in the 3 HOLs
:) have problems with printing irregular names, and all of them have
problems coping with overloaded variables.

I guess you mean overloaded constants. My thought-experiment sexp-syntax would raise an exception if it encountered variable or constant names that were lexically bad. Overloads would not be printed at all. Instead, you'd get things like

(= (integer$+ (integer$int_of_num x) (integer$int_of_num y))
(integer$int_of_num (arithmetic$+ x y)))

Michael

Email Gateway (Aug 18 2022 at 16:46):

From: Gerwin Klein <gerwin.klein@nicta.com.au>
On 19/01/2011, at 1:50 PM, <mark@proof-technologies.com> <mark@proof-technologies.com> wrote:

on 18/1/11 10:22 PM, Gerwin Klein <gerwin.klein@nicta.com.au> wrote:

We should remember that at the moment it is very easy for someone to
maliciously produce what appears to be a formal proof of some statement
...

I don't think this is a problem in practice. Theorem provers are already
used in big certification projects. The evaluators in such projects
usually
know what they are doing,

How can you be so confident about this?

Talking to some of them. I'm sure this is not the case for all evaluators, but there are people in that business that have used Isabelle for longer than most of the current developers.

In the large, safety-critical
certification projects I have been involved with, the evaluators do the best
they can with the tools they have available. Proof checkers with readable
and highly trustworthy output is a luxury not available to them.

Depends on the project, of course, and what level of certification. Currently not that many certifications with interactive formal proof exist, but some do (ACL2 has been used quite a bit). As you say, evaluators need to take the best that is available. LCF provers are a long way ahead of what they usually have to deal with. That's why I don't think it's a problem in practice.

at least up to the level where they would easily
be able to spot attempts to make syntactically misleading proof
statements.
It's easy enough to scan for ML blocks or similar in theory files, spot
duplicate identifiers, etc, and demand good explanations for their
presence
(or just forbid them). And the threat of just using a proof checker is
always there, so why try to cheat on syntax?

The input in this process is the full power of an ML program. We all know
that it is possible to hide incredibly subtle things in huge programs. Are
we to ban non-trivial ML in the process of producing a proof?

If you're really concerned about malicious users, why not? Usual Isabelle interaction makes no use of ML. Very few of the AFP entries, many of them substantial, do anything remotely like that.

This would be
forcing contractors to do their work with one hand tied behind their back.
The contracted, safety-critical proof work I have been involved with would
certainly have been completely infeasible without writing large amounts of
ML program to support my proof work.

In Isabelle? HOL4 or HOL-light, sure, that's their main interaction mode, but Isabelle hasn't had the ML-as-user-interface paradigm for quite a few years.

I'm not saying that you'll never want to use any ML for bigger advanced projects, but these ML bits are a far way off from complex ML programs. Compared to the effort that should go into validating models and statements, checking small amounts of proof method ML code for pretty printing cheat attempts would be trivial (there's no reason to have any pretty printing code there at all). There's no reason evaluators should have to accept huge amounts of complex ML code.

It is far better for the auditor if they can treat the ML code that
establishes the final theorem as some sort of black box. To do this the
theorem prover needs certain basic properties, including a trustworthy
printer.

Well, there is one, you just don't seem to like it: inspecting the term on the ML level. As Michael says, it's as least as good as Lisp s-expressions.

There's a difference between accidentally creating something misleading (reasonably hard to do if you use the normal interface) and a malicious user with motivation and criminal energy. If you want to guard against the latter, low-tech plain terms and external proof checking surely brings more assurance than further infrastructure.

I'm with Larry on this: the much bigger problem is to convince yourself
that
the model and final theorem are in any way useful or right. This is where
cheating will happen much earlier.

I did not say that this was not a problem! This is the big problem!

I guess we're all mostly of the same opinion fundamentally anyway, just different views on where to put resources and emphasis.

And
this is the problem that I am advocating needs a tool that properly supports
the process - the process of determining that the model and final theorem
are right (as well as that the final theorem has actually been proved).
This process involves using the pretty printer.

Checking that the proof shows what you think it does involves the pretty printer only at a very shallow level. It's not the case that we can't trust the Isabelle pretty printer to support development and that it will magically show us wrong terms.

I've often seen and written theorems that didn't mean what I thought they meant. It's so far never been the fault of the pretty printer. It can be, because you'd be able to trick it with some effort, but getting rid of that possibility is about assurance, not about finding the right proof statement. It can have different solutions.

Cheers,
Gerwin

Email Gateway (Aug 18 2022 at 16:46):

From: mark@proof-technologies.com
on 19/1/11 3:51 AM, Michael Norrish <Michael.Norrish@nicta.com.au> wrote:

...

Not sure what you mean here. You seem to be saying that low-tech
printers are the way forward, which again seems to be suggesting
that good concrete syntax printers do not help human readability.

Of course they do. We also know that in our day-to-day use of our
systems,
the pretty-printer is not a cause of problems, so we can use the nice
readability-enhancing features of these systems. On the other hand, if
there is a worry about malicious attacks (as in the validated work
scenario)
then people concerned about these things can print the terms in question
using sexp-syntax, which we know to be adequate.

So if you recognise that good concrete-syntax pretty printers are generally
good for human readability, then this is also true for the proof auditor.
If we could have a trusted system that supports proof auditing AND has a
good trustworthy concrete-syntax printer, then isn't that the best of both
worlds?

All of the "3 HOLs" (presumably you are not including HOL Zero in the 3
HOLs :) have problems with printing irregular names, and all of them
have problems coping with overloaded variables.

I guess you mean overloaded constants....

No, I mean variables that are overloaded with other variables. The sort of
thing that basic Hindley-Milner type inference cannot deal with (but this is
for parsing so let's not get distracted by this). Apart from HOL Zero, no
other system is capable of printing (or parsing, as it happens) an
expression with overloaded variables.

I suppose your low-tech s-exp printer would throw these out too. It's
starting to be less trivial than originally envisaged. Isn't it just best
to have a good trustworthy concrete-syntax printer?

Mark.

Email Gateway (Aug 18 2022 at 16:46):

From: Konrad Slind <konrad.slind@gmail.com>
If a trusted prettyprinter is deemed to be important tool for the auditor,
then
the auditor can code one up, or look over your shoulder while you code it.
It's a fairly simple task. Of course, for the paranoid, this just means
that
something else will emerge to be anxious about.

Konrad.

Email Gateway (Aug 18 2022 at 16:46):

From: mark@proof-technologies.com
on 19/1/11 4:02 AM, Gerwin Klein <gerwin.klein@nicta.com.au> wrote:

In the large, safety-critical certification projects I have been involved
with,
the evaluators do the best
they can with the tools they have available. Proof checkers with
readable
and highly trustworthy output is a luxury not available to them.

Depends on the project, of course, and what level of certification.
Currently not that many certifications with interactive formal proof
exist,
but some do (ACL2 has been used quite a bit). As you say, evaluators need
to
take the best that is available. LCF provers are a long way ahead of what
they usually have to deal with. That's why I don't think it's a problem in
practice.

So it's ok because LCF systems don't tend to get used! I'm trying to enable
LCF system to get used in practice. This can potentially greatly increase
the assurance in the project.

Are we to ban non-trivial ML in the process of producing a proof?

This reduces what can be done in the project.

If you're really concerned about malicious users, why not? Usual Isabelle
interaction makes no use of ML. Very few of the AFP entries, many of them
substantial, do anything remotely like that.

Yes, but I've been involved in large safety-critical projects where use of
ML has been a practical necessity. Formal methods is generally currently
extremely expensive if it involves formal proof (but not in the projects
where I have been involved with).

The contracted, safety-critical proof work I have been involved with
would
certainly have been completely infeasible without writing large amounts
of
ML program to support my proof work.

In Isabelle? HOL4 or HOL-light, sure, that's their main interaction mode,
but Isabelle hasn't had the ML-as-user-interface paradigm for quite a few
years.

Yes, so Isabelle would be safer in this respect because the proof scripts
aren't even in ML. But to enable cost-effective large formal verification
projects involving formal proof, I think allowing the contractor to use the
power of ML to overcome practical problems, specific to the particular
project, would be an extremely useful thing. I'm talking about bespoke
automation and bespoke semi-automation. Now it's always possible to do a
bit of bespoke pre-processing and still use a restricted, non-ML paradigm,
but this will often have its own integrity risks and is in any case is
fundamentally limited. We need to give contractors the full power of ML to
get the costs of full formal verification down. But we shouldn't trust them
not to be malicious!

I'm not saying that you'll never want to use any ML for bigger advanced
projects, but these ML bits are a far way off from complex ML programs.
Compared to the effort that should go into validating models and
statements,
checking small amounts of proof method ML code for pretty printing cheat
attempts would be trivial (there's no reason to have any pretty printing
code there at all). There's no reason evaluators should have to accept
huge
amounts of complex ML code.

I've been involved with large safety-critical projects that use 20,000 lines
of bespoke ML.

It is far better for the auditor if they can treat the ML code that
establishes the final theorem as some sort of black box. To do this the
theorem prover needs certain basic properties, including a trustworthy
printer.

Well, there is one, you just don't seem to like it: inspecting the term on
the ML level. As Michael says, it's as least as good as Lisp
s-expressions.

Good concrete-syntax printers make it much easier to read. When hundreds of
lines of specification need to be reviewed, concrete syntax is a practical
necessity.

There's a difference between accidentally creating something misleading
(reasonably hard to do if you use the normal interface) and a malicious
user
with motivation and criminal energy. If you want to guard against the
latter, low-tech plain terms and external proof checking surely brings
more
assurance than further infrastructure.

I find that analysts just make mistakes, especially when things are being
done on the industrial scale, and especially when there is a team of
analysts. Now occasionally these mistakes will surface in really bizarre
ways, and things that one would think "this would never happen in practice"
actually do happen. I regret not keeping a careful log of strange problems.
But problems with the pretty printer have certainly caused all sorts of
problems, and, I think, some connected with the soundness of the analysis
being undertaken. These problems have often been quite subtle.

I guess we're all mostly of the same opinion fundamentally anyway, just
different views on where to put resources and emphasis.

Absolutely.

And this is the problem that I am advocating needs a tool that properly
supports
the process - the process of determining that the model and final theorem
are right (as well as that the final theorem has actually been proved).
This process involves using the pretty printer.

Checking that the proof shows what you think it does involves the pretty
printer only at a very shallow level. It's not the case that we can't
trust
the Isabelle pretty printer to support development and that it will
magically show us wrong terms.

The complex Isabelle derived code, outside the relatively small core of code
implementing crucial trusted components such as the inference kernel and the
pretty printer, may be susceptible to constructing expressions that end up
exploiting problems with Isabelle's pretty printer. I have found that such
things are easy to subtly create maliciously (but of course if the user
cannot use ML, I think we can assume the developers won't do this). But I
have also found that in practice on large projects, subtle problems with
tools, including theorem provers, just happen accidentally.

For example (forgive me if my lack of deep understanding of the Isabelle
core means I am wrong here), but presumably it is quite feasible, in
principle, that some complex bit of derived Isablle code occasionally
constructs terms that involve overloaded variables. This is certainly true
in the "3 HOL" systems (i.e. the 4 HOL systems minus HOL Zero :). So nasty
terms that involve overloaded varaibles could "magically" get innocently
constructed by well-meaning source code developed by well-meaning Isabelle
developers.

I've often seen and written theorems that didn't mean what I thought they
meant. It's so far never been the fault of the pretty printer. It can be,
because you'd be able to trick it with some effort, but getting rid of
that
possibility is about assurance, not about finding the right proof
statement.
It can have different solutions.

Yes, the risk is that the wrong theorems are being proved or that things are
being given the wrong definitions, and this is why we need proof auditing,
and this is why we need good concrete-syntax pretty printers, to enable the
auditor to effectively review things. But I find I'm repeating myself
here...

Mark.

Email Gateway (Aug 18 2022 at 16:46):

From: mark@proof-technologies.com
on 19/1/11 5:22 AM, Konrad Slind <konrad.slind@gmail.com> wrote:

Not so simple that existing theorem provers have trusted pretty printers
(other than HOL Zero)....

I think it would be an excellent use of formal methods to formally verify a
trusted pretty printer (such as HOL Zero's). And, no, I'm wouldn't get
paranoid about how we trust the theorem prover used to do this.

Mark.

Email Gateway (Aug 18 2022 at 16:47):

From: Alexander Krauss <krauss@in.tum.de>
Hi Mark,

This discussion is a bit strange. It seems this is mostly due to a
conflation of at least three very different functionalities that you
expect from a "theorem prover":

(a) Checking Proofs -- This requires a trusted inference kernel only,
based on abstract syntax. No interesting parsing or pretty printing
here. Challenges: Logical complexities, Performance, Keeping code simple
and understandable.

(b) Developing Proofs -- This requires flexibility and automation,
support for arbitrary user-level code, and a way to reduce everything to
kernel-level inferences. Trusted pretty printing is not an issue here
either. Instead I want as powerful tools as possible.

(c) Auditing Definitions and Theorem Statements -- This is mainly an
interface problem of the auditor trying to understand what has been
proved. Pretty printing plays a role here, but there are many other
things. But note that proofs play no role here. This is just about
definitions and theorem statements.

I hereby make the bold claim that any system trying to do any two of
these things at the same time is bound to make some lousy compromises
that aren't acceptable if you are really serious.

It happens that LCF-style provers traditionally try to do both (a) and
(b), and they do make compromises. Bringing (c) into the picture and
trying to add it into one of those systems would probably make the
situation worse, not better. This is just an argument for a separate
tool, which could be HOL0. I have the impression though that you are
trying to address both (a) and (c) at the same time, which may lead to
new lousy compromises.

The issues about someone maliciously attacking the kernel etc. are
orthogonal and probably have standard solutions. Operating systems and
security people can solve this as soon as it is important enough for
someone to pay the price. But logicians are not the right people to do this.

If a trusted prettyprinter is deemed to be important tool for the auditor,
then the auditor can code one up, or look over your shoulder while
you code it. It's a fairly simple task.

Mark Adams wrote:
Not so simple that existing theorem provers have trusted pretty printers
(other than HOL Zero)....

previously, Mark Adams wrote:

but just
a healthy degree of concern, especially when the problems can be easily
addressed.

So are you claiming that is it easily addressed or that it isn't???
Maybe your reasoning needs to be checked by a trusted kernel?

Anyway, what properties must a pretty printer have in order to be
trustworthy? Is there anything formal to be said here apart from Freek
Wiedijk's Pollack-consistency? Or is it just defined in terms of the
attacks you happen to imagine right now? It would be interesting to read
some details here instead of just a claim of how important it is.

Alex

Email Gateway (Aug 18 2022 at 16:48):

From: mark@proof-technologies.com
Hi everyone, shall we make this the last e-mail on this subject open to the
list? I understand that this thread is perhaps not of the question/answer
type that most Isabelle users would be interested in. Why don't people
e-mail me if they want to see or participate in any of the continuations and
we'll keep it private. Unless there are lots of objections to keeping it
private that is...

on 19/1/11 9:56 AM, Alexander Krauss <krauss@in.tum.de> wrote:

Hi Mark,

This discussion is a bit strange. It seems this is mostly due to a
conflation of at least three very different functionalities that you
expect from a "theorem prover":

(a) Checking Proofs -- ...

(b) Developing Proofs -- ...

(c) Auditing Definitions and Theorem Statements -- ...

I hereby make the bold claim that any system trying to do any two of
these things at the same time is bound to make some lousy compromises
that aren't acceptable if you are really serious.

...

I think it's helpful to the discussion to break it down into different
activities like you have. You are right to differentiate between checking
proofs and developing proofs. I have always viewed auditing definitions and
theorem statements as part of the process of checking proofs, but I suppose
you could split it off as a different activity.

But I would say that there is absolutely no conflict in practice in catering
for (a) and (c) together in the same system, and furthermore that they
naturally go hand-in-hand. The inference kernel does (a), and is
architecturally separate from the pretty printer which supports (c). Both
have trustworthiness as their top priority.

There is some conflict between (a)/(c) and (b). So both (a) and (c) ideally
involve using a system where trustworthiness is the top priority, together
with sufficient efficiency to enable (a) and with certain aspects of
usability to support doing (c) effectively (such as good pretty-printing of
definitions and theorems, and good user feedback for certain basic
operations). To achieve the trustworthiness, the source code for the
trusted parts is ideally written with the utmost simplicity and clarity.
But for (b), its top priorities are general usability and efficiency, with
trustworthiness having less importance. There is a conflict between general
usability and clarity of the inference kernel, because things like
hierarchical theories, for example, can help usability but (at least
traditionally) involve greatly complicating the inference kernel. Also, to
pursue efficiency for (b), the inference kernel is typically made
significantly bigger and more complicated.

So yes, HOL Zero was written to support (a)/(c) and avoid the conflicts with
(b). But no, I don't see there being any significant compromises in the HOL
Zero implementation. Take a look for yourself and see! And then compare it
with HOL4, ProofPower HOL and Isabelle. (I can point out the equivalent
parts of each system if you are really interested in doing this.)

HOL Light is an interesting case. It's simplicity (and the simplicity of
it's embryonic forerunner - the GTT system implemented by Konrad and John)
is the inspiration for HOL Zero. I would suggest that HOL Light sits
between (a) and (b) with some slightly uncomfortable compromises (its lack
of theory hierarchy, for example, arguably makes it less usable), but that
at the same time does a surprisingly good job. The inference kernel has
only 10 inference rules and is very simple, but it is not at all unusably
slow. And Flyspeck is showing that HOL Light is up to the task of being
used on big projects. But anyway, I saw HOL Light as still unsatisfactory,
if only due to its pretty printer making it inappropriate for the (c) role,
and reluctantly decided that a new system was required to fill the gap.

I think HOL4, ProofPower and Isabelle are aiming for (b) and get the balance
about right. Their use of an LCF-style architecture gives good assurance
but without conflicting with (b) in my opinion. I think the balance is a
bit wrong as regards to their pretty printers (because the problems can be
relatively easily addressed without compromising on (b), and furthermore
this would actually help (b)), and this is what this thread has become
about.

The issues about someone maliciously attacking the kernel etc. are
orthogonal and probably have standard solutions. Operating systems and
security people can solve this as soon as it is important enough for
someone to pay the price. But logicians are not the right people to do
this.

They are not orthogonal in my opinion, because the LCF-style architecture
itself delivers trustworthiness to protect against both the malicious and
the accidental. I think the orthogonality comes in when differentiating
between trustworthiness within the scope of the ML program implementing the
theorem prover and trustworthiness outside its scope (e.g. bugs in the ML
interpreter, operating system interrupts, etc, etc). But I agree that we
should get the security experts in to help with the latter.

Mark Adams wrote:

Not so simple that existing theorem provers have trusted pretty
printers (other than HOL Zero)....

previously, Mark Adams wrote:

but just a healthy degree of concern, especially when the problems
can be easily addressed.

So are you claiming that is it easily addressed or that it isn't???
Maybe your reasoning needs to be checked by a trusted kernel?

:-)

I'm claiming that it's easy. My suggestion that it was "not so simple" was
just to emphasise that nothing much has been done about these things until
now.

Anyway, what properties must a pretty printer have in order to be
trustworthy? Is there anything formal to be said here apart from Freek
Wiedijk's Pollack-consistency? Or is it just defined in terms of the
attacks you happen to imagine right now? It would be interesting to read
some details here instead of just a claim of how important it is.

I've spent a long time thinking about it and independently came up with the
tightest of Freek's "Pollack consistency" concepts. I called it
"parser/printer completeness". But there's more to this than Freek's paper
because he doesn't cover "printing soundness" - i.e. printing what the user
expects (it's misleading to print an "and" when internally it's an "or").

Mark.

Email Gateway (Aug 18 2022 at 16:56):

From: Makarius <makarius@sketis.net>
It depends what you want to do in the end. Generating sources that are
fed back into the system is inherently hard. I have occasionally seen
people doing it empirically, e.g. trying Syntax.read_term on some
identifier and checking if they get a "Free" term -- but this does not
really take the scoping rules of const vs. free vs. bound into account.

Unless your output needs to be inspected directly by users it is easier to
avoid generating sources altogether, and use the Isabelle/ML interfaces to
get logical content into the system. This is the norml way in LCF-style
provers. Concrete syntax is just a superficial add-on for end-users.

Makarius

Email Gateway (Aug 18 2022 at 16:56):

From: mark@proof-technologies.com
On a related topic, what does Isabelle do for parsing/printing irregular
variable/constant names? E.g. a variable with a space in its name. Is it
possible to parse such a variable name, and, whether it is or not, what does
the printer output?

Mark

on 16/1/11 12:45 PM, Makarius <makarius@sketis.net> wrote:

Email Gateway (Aug 18 2022 at 16:56):

From: Matthias Schmalz <Matthias.Schmalz@inf.ethz.ch>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Mark,

I think, I can answer that; other readers should feel free to correct me.

mark@proof-technologies.com schrieb:

On a related topic, what does Isabelle do for parsing/printing irregular
variable/constant names? E.g. a variable with a space in its name. Is it
possible to parse such a variable name,

No, as
term "a a"
is rejected.

and, whether it is or not, what does
the printer output?

If you create variables with irregular names using the ML
infrastructure, the printer justs prints them:

ML_val {*
cterm_of @{theory} (Free ("a a", @{typ bool}))
*}

... prints
val it = "a a" : cterm

-Matthias
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFNMx6UczhznXSdWggRAnCrAJwLsrSrcYzqATD6GF2MxiRTJ9o/qgCfd8Uq
r24zX3C4KVZvEN3aqTAY4Ks=
=4xpw
-----END PGP SIGNATURE-----

Email Gateway (Aug 18 2022 at 16:57):

From: Matthias Schmalz <Matthias.Schmalz@inf.ethz.ch>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

It depends what you want to do in the end. Generating sources that are
fed back into the system is inherently hard. I have occasionally seen
people doing it empirically, e.g. trying Syntax.read_term on some
identifier and checking if they get a "Free" term -- but this does not
really take the scoping rules of const vs. free vs. bound into account.

For the moment, I need to generate sources. Heuristic solutions are
fine. Are you aware of other pitfalls than hitting reserved keywords,
predefined constants, and capturing?

Unless your output needs to be inspected directly by users it is easier
to avoid generating sources altogether, and use the Isabelle/ML
interfaces to get logical content into the system. This is the norml
way in LCF-style provers. Concrete syntax is just a superficial add-on
for end-users.

On the long run, I indeed intend to bypass the parser. I implement my
tool in Scala and would like to avoid writing my own Scala/ML linkup.
Is there a way of accessing the term datatype via the the Isabelle/Scala
interface? Can I use Isabelle/Scala to access the Syntax.read_term function?

Thanks,
Matthias

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFNMyWEczhznXSdWggRAvr3AJ0eRIp9uf9kgwKUp04ezV8mv5WNjgCghaiE
i8BbcS+Rrm35tdaKS05fAys=
=SKRs
-----END PGP SIGNATURE-----

Email Gateway (Aug 18 2022 at 16:57):

From: Makarius <makarius@sketis.net>
On Sun, 16 Jan 2011, Matthias Schmalz wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

It depends what you want to do in the end. Generating sources that are
fed back into the system is inherently hard. I have occasionally seen
people doing it empirically, e.g. trying Syntax.read_term on some
identifier and checking if they get a "Free" term -- but this does not
really take the scoping rules of const vs. free vs. bound into account.

For the moment, I need to generate sources. Heuristic solutions are
fine. Are you aware of other pitfalls than hitting reserved keywords,
predefined constants, and capturing?

This depends on various fine points of what you really need, i.e. just
detect lexical identifiers or ensure that certain generated names are
resolved as free variables, not constants.

The following example does the lexical check only, quite reliably in the
sense that I've studied the relevant Isabelle/Pure sources for 15min.

ML {*

fun is_syntax_ident ctxt s =
Syntax.is_identifier s andalso
not (can Name.dest_internal s) andalso
not (Syntax.is_keyword (ProofContext.syn_of ctxt) s);

locale test =
fixes xxx ("FOO")
begin

lemma True
proof

fix yyy ("BAR")

ML_val {* is_syntax_ident @{context} "FOO" *}
ML_val {* is_syntax_ident @{context} "BAR" *}
ML_val {* is_syntax_ident @{context} "True" *}
ML_val {* is_syntax_ident @{context} "x" *}
ML_val {* is_syntax_ident @{context} "x_" *}

qed

end

It is important to work with the proper local context. In particular, a
background @{theory} is generally not sufficient.

Unless your output needs to be inspected directly by users it is easier
to avoid generating sources altogether, and use the Isabelle/ML
interfaces to get logical content into the system. This is the norml
way in LCF-style provers. Concrete syntax is just a superficial add-on
for end-users.

On the long run, I indeed intend to bypass the parser. I implement my
tool in Scala and would like to avoid writing my own Scala/ML linkup. Is
there a way of accessing the term datatype via the the Isabelle/Scala
interface? Can I use Isabelle/Scala to access the Syntax.read_term
function?

Not yet, but this will come at some point. There are some raw technical
issues, and some more profound questions concerning the formal context on
the Isabelle/Scala side. Some concrete applications will make this emerge
more quickly :-)

Makarius

Email Gateway (Aug 18 2022 at 16:57):

From: Makarius <makarius@sketis.net>
On Sun, 16 Jan 2011, Matthias Schmalz wrote:

mark@proof-technologies.com schrieb:

On a related topic, what does Isabelle do for parsing/printing
irregular variable/constant names? E.g. a variable with a space in its
name. Is it possible to parse such a variable name,

No, as
term "a a"
is rejected.

By extending the syntax in user space, you can easily inject odd strings
into the term language, e.g. via something like

FREE ''foo bar''

with a grammar production for the "FREE" literal token and a suitable
parse translations. See the existing CONST notation, although I do not
really recommend to try this at home.

If you create variables with irregular names using the ML
infrastructure, the printer justs prints them:

ML_val {*
cterm_of @{theory} (Free ("a a", @{typ bool}))
*}

... prints
val it = "a a" : cterm

This use of cterm_of reminds me of an old trick that has come out of use
some years ago, because printing with the background theory certificate
lacks local syntax of the foreground context (local theory or proof body).

Did you come up with idea yourself, or do we still have it in some old
manual?

Here is the localized version:

ML_val {*
writeln (Syntax.string_of_term @{context} (Free ("a a", @{typ bool})))
*}

Makarius

Email Gateway (Aug 18 2022 at 16:57):

From: Matthias Schmalz <Matthias.Schmalz@inf.ethz.ch>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I made it up by myself, being unaware of Syntax.string_of_term. Don't
worry about the manuals.

-Matthias
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFNM0l7czhznXSdWggRAusXAJ4qB/WlFBQUmfbfXywAVPsIHv8xuQCdGOqO
r6Rt5033J5jZhXciaOPRDek=
=DFfR
-----END PGP SIGNATURE-----

Email Gateway (Aug 18 2022 at 16:57):

From: mark@proof-technologies.com
Thanks for your answers, Matthias/Makarius.

I have a follow-up question... What about variables overloaded with other
variables (i.e. same name but different type) or overloaded with constants
(same name, any type) - can terms with any such overloading be parsed, and
what happens when they are printed?

Mark.

on 16/1/11 7:21 PM, Makarius <makarius@sketis.net> wrote:

Email Gateway (Aug 18 2022 at 16:58):

From: Makarius <makarius@sketis.net>
As far as the kernel is concerned, the identity of atoms (variables,
consts) consists of both the name and type, although "overloading" means
something different in Isabelle.

In contrast, in the syntax layer (the so-called "term check/uncheck"
phase, which generalizes the idea of type-inference) variables are
expected to be consistently typed: equal name imples equal type, and
scopes for variables vs. constants are resolved consistently.

If you print a term that violates this, it might look "funny", or outright
misleading. This effect can already happen due to the customary omission
of type information for variables.

Anyway, it also depends what is meant by "printed" exactly. In Isabelle
there is quite a bit of extra formal markup in the output, that is not
shown in plain text. In Proof General you already get the typical
Isabelle color-code of blue/green/brown variables and black constants.
In a Prover IDE like Isabelle/jEdit you could make even more out of that,
e.g. tell the user the full truth about term atoms via tooltips or popups.

Makarius

Email Gateway (Aug 18 2022 at 16:58):

From: mark@proof-technologies.com
I see.

Is there a printing mode for type-annotating constants and/or variables with
their types (like in HOL4)? I'm concerned about the risk of statements
being displayed in a misleading way.

Mark

on 17/1/11 7:40 PM, Makarius <makarius@sketis.net> wrote:

Email Gateway (Aug 18 2022 at 16:58):

From: Makarius <makarius@sketis.net>
You can implement your own in user space. These things are not part of
the kernel. Even the ML toplevel pretty printer can be modified in user
space.

Anyway, do you now the funny paper about by Freek Wiedijk, UITP 2010?

I've known this "issue" ever since I got exposed to Isabelle for the very
first time, in summer 1993.

Makarius

Email Gateway (Aug 18 2022 at 16:58):

From: Makarius <makarius@sketis.net>
This was a copy-paste accident: Freek's paper is about
"Pollack-inconsistency".

Makarius

Email Gateway (Aug 18 2022 at 16:58):

From: mark@proof-technologies.com

Is there a printing mode for type-annotating constants and/or variables
with their types (like in HOL4)? I'm concerned about the risk of
statements being displayed in a misleading way.

...
Anyway, do you now the funny paper about "Pollack-inconsistency" by Freek
Wiedijk, UITP 2010?

Yes thanks. In fact my HOL system, HOL Zero
(http://proof-technologies.com/holzero.html), deals with all these issues
(or so I claim - there is a $100 bounty for anyone that spots a problem). I
was wondering how well Isabelle fared.

I've known this "issue" ever since I got exposed to Isabelle for the very
first time, in summer 1993.

It puzzles me how little has been done about these things until now. Are
not people concerned that what is being displayed could be misleading?

Mark.

Email Gateway (Aug 18 2022 at 16:59):

From: Makarius <makarius@sketis.net>
There are other, more pressing problems. Sitting at a bare-bones ML
toplevel merely gives you an illusion of "direct" access to a system, but
there are still many things that can go wrong.

An LCF-style prover is reasonably correct by construction, independently
of concrete syntax input and output. Errors in the
internalization/externalization do not multiply as quickly as genuine
logical errors.

Makarius

Email Gateway (Aug 18 2022 at 16:59):

From: Michael Norrish <Michael.Norrish@nicta.com.au>
No.

If one has reason to suspect that something funny might be going on, the terms in question can be examined using the primitive ML API.

Alternatively, one can print the term out using a primitive pretty-printer that doesn't attempt to do anything "fancy".

For example, in HOL4 one can use

Parse.print_term_by_grammar min_grammars

Michael

Email Gateway (Aug 18 2022 at 16:59):

From: Tjark Weber <webertj@in.tum.de>
Well, I for one think it's an unsatisfying design flaw in (some)
existing provers that one cannot trust their output.

In practice, however, this becomes a potential problem only when you
want to check a proof from an untrustworthy source. (Proof objects
provide a partial solution here.)

As long as you assume a non-malicious user who wants to create a valid
machine-checked proof, it is not much of an issue.

Kind regards,
Tjark

Email Gateway (Aug 18 2022 at 16:59):

From: Makarius <makarius@sketis.net>
This does not sound like you are speaking here about Isabelle at all. We
do not assume a "non-malicious" user. You can always inspect internal
certificates, even before embarking on full proof terms.

Makarius

Email Gateway (Aug 18 2022 at 16:59):

From: mark@proof-technologies.com
on 17/1/11 11:14 PM, Makarius <makarius@sketis.net> wrote:

There are other, more pressing problems.

I think trustworthiness in theorem provers should always be a top priority.
After all, this is why we are using them in the first place. So long as it
doesn't have terrible consequences on other aspects of the theorem prover,
such as efficiency or usability, that is.

Sitting at a bare-bones ML
toplevel merely gives you an illusion of "direct" access to a system, but
there are still many things that can go wrong.

Yes, but I think it's helpful to divide up possible things that can go wrong
into those that are due to the design or implementation of the ML program,
and those that are due to things largely outside the scope of the program
(e.g. bugs in the ML interpreter, OS events interfering with normal
operation, etc, etc). The former is where all the problems arise in
reality, and is very much under within out capability to solve. I may be
wrong, but you're presumably talking about the latter?

An LCF-style prover is reasonably correct by construction, independently
of concrete syntax input and output. Errors in the internalization/
externalization do not multiply as quickly as genuine logical errors.

Yes, I agree that the LCF style largely solves problems of trustworthiness,
and that the sort of problems it solves are the most serious. But given
that this has been solved (a long time ago), shouldn't we tackle the
remaining problems?

Mark.

Email Gateway (Aug 18 2022 at 17:00):

From: mark@proof-technologies.com
on 17/1/11 11:34 PM, Michael Norrish <Michael.Norrish@nicta.com.au> wrote:

On 18/01/11 09:34, mark@proof-technologies.com wrote:

It puzzles me how little has been done about these things until now. Are
not people concerned that what is being displayed could be misleading?

No.

If one has reason to suspect that something funny might be going on, the
terms in question can be examined using the primitive ML API.

This relies on having a sharp-sighted user who can spot subtle strange
behaviour (in addition to all the other demands on the attention of the poor
theorem prover user). And in any case, it will sometimes not be possible to
spot that something funny has happened, because it may appear to be behaving
precisely as expected (even though it actually isn't). Isn't it better to
have tools that we can solidly rely on?

But yes, I suppose terms can be examined in terms of their primitive syntax,
but surely this is completely impractical for large industrial formal
methods projects involving large formal specifications. In practice, anyone
reviewing a large industrial proof will be using the normal pretty printer.
That is, if anyone is actually bothering to do a reviewing exercise in the
first place...

Alternatively, one can print the term out using a primitive pretty-printer
that doesn't attempt to do anything "fancy".

For example, in HOL4 one can use

Parse.print_term_by_grammar min_grammars

I don't think this particular command solves the irregular names problem, or
the overloaded variables problem. But I suppose if used in conjunction with
HOL4's optional type annotated output and the new facility for coloured
printing of variables then it could help.

Mark.

Email Gateway (Aug 18 2022 at 17:00):

From: Makarius <makarius@sketis.net>
Trustworthiness is just one of many problems, and we have been able to
increase it in Isabelle gradually over the years.

Apart from that efficiency, accessibility, usability etc. are of general
importance to make our arcane provers acceptable to more people out there.
I often find myself in the situation of a formalistic fundamentalist when
speaking to people outside our community.

E.g. when you start moving towards serious Prover IDEs, what I've been
involved for quite some time now, you will find many more issues than the
rather trivial parse/print problem of the low-level term language. Of
course, I always try to keep as much immediate reliability as possible.
But I also feel one should separate concerns here: if you need extreme
trustworthiness, then you should export full theory and proof content to
some tiny external checker. The latter does not even need concrete syntax
in the traditional sense.

HOL0 could be one such checker, if it would be able to absorb huge
theories and proofs (traces of inferences). Then one could implement some
"HOL0" button in Isabelle.

Makarius

Email Gateway (Aug 18 2022 at 17:00):

From: Lawrence Paulson <lp15@cam.ac.uk>
Obviously trustworthiness is important. Nevertheless, a fixation on this one issue can be counter-productive. I'm not aware of a single piece of research that was invalidated due to bugs in theorem provers.

Email Gateway (Aug 18 2022 at 17:00):

From: mark@proof-technologies.com
on 18/1/11 12:12 PM, Makarius <makarius@sketis.net> wrote:

On Tue, 18 Jan 2011, mark@proof-technologies.com wrote:

Yes, I agree that the LCF style largely solves problems of
trustworthiness, and that the sort of problems it solves are the
most serious. But given that this has been solved (a long time
ago), shouldn't we tackle the remaining problems?

Trustworthiness is just one of many problems, and we have been
able to increase it in Isabelle gradually over the years.

Apart from that efficiency, accessibility, usability etc. are of general
importance to make our arcane provers acceptable to more people
out there. I often find myself in the situation of a formalistic
fundamentalist when speaking to people outside our community.

Yes, and I similarly. There are people out there who simply don't want any
involvement with formal proof. My fear is that, as we are trying to push
theorem proving more towards mainstream usage, sceptics will emerge citing
"well you can't even trust theorem provers anyway". There was that debate
in the 1970s...

E.g. when you start moving towards serious Prover IDEs, what I've been
involved for quite some time now, you will find many more issues than the
rather trivial parse/print problem of the low-level term language.

Yes, adding layers can present more problems, and I'm sure you have more
experience in this than me. But surely we should be addressing these
problems as they arise, instead of brushing many of them under the carpet?
Shouldn't there be some sort of research effort concentrating on this?

Of course, I always try to keep as much immediate reliability as possible.
But I also feel one should separate concerns here: if you need extreme
trustworthiness, then you should export full theory and proof content to
some tiny external checker. The latter does not even need concrete syntax
in the traditional sense.

Unfortunately, the external checker needs to have a trustworthy printer
(despite that this has not traditionally been expressed as a priority). It
needs to be able to print syntax in a highly readable way, so that a proof
auditor can effectively review exactly what the checker has checked. A
large project involving formal proof might have hundreds of definitions that
need reviewing, as well as the end-result theorems that use these
definitions. Wading through primitive syntax makes this an almost
impossible task.

HOL0 could be one such checker, if it would be able to absorb huge
theories and proofs (traces of inferences). Then one could implement some
"HOL0" button in Isabelle.

Well that's what I designed it for! Being able to absorb huge theories is
of course crucial. But also is being able to trust the inference kernel and
the pretty printer. The HOL Zero button already exists in a prototype HOL
Light variant I developed. Not so easy for Isabelle HOL....

Mark.

Email Gateway (Aug 18 2022 at 17:00):

From: mark@proof-technologies.com
on 18/1/11 12:38 PM, Lawrence Paulson <lp15@cam.ac.uk> wrote:

Obviously trustworthiness is important. Nevertheless, a fixation on this
one
issue can be counter-productive. I'm not aware of a single piece of
research
that was invalidated due to bugs in theorem provers.
On the other hand, I
have seen a great many papers where the work was essentially worthless
because of inadequate modelling of the problem. Naturally, the theorem
prover cannot check that a model is realistic, but I frequently hear the
suggestion that nothing can go wrong if you use a theorem prover. The
community needs to focus more on ways of making models trustworthy. There
are probably quite a few interesting research topics along those lines.

I agree that there are more important issues that require more attention.
I'm not advocating endless fixation with cosmic rays and the like, but just
a healthy degree of concern, especially when the problems can be easily
addressed.

We should remember that at the moment it is very easy for someone to
maliciously produce what appears to be a formal proof of some statement but
is in fact something that takes advantage of the sorts of pretty printer
problems I am talking about. Now if theorem provers are ever going to make
it big, we are then in the situation where commercial pressures, etc, are
pushing people to claim they have proved something when they have in fact
not. One of the important roles of a theorem prover should be as a highly
trustworthy tool to support a human proof auditor checking that some huge
10,000 line proof of a theorem does indeed establish the theorem.

Mark.

Larry Paulson

On 18 Jan 2011, at 11:55, <mark@proof-technologies.com> wrote:

on 17/1/11 11:14 PM, Makarius <makarius@sketis.net> wrote:

There are other, more pressing problems.

I think trustworthiness in theorem provers should always be a top
priority.
After all, this is why we are using them in the first place. So long as
it
doesn't have terrible consequences on other aspects of the theorem
prover,
such as efficiency or usability, that is.

Email Gateway (Aug 18 2022 at 17:00):

From: mark@proof-technologies.com
on 18/1/11 2:01 AM, Tjark Weber <webertj@in.tum.de> wrote:

As long as you assume a non-malicious user who wants to create a valid
machine-checked proof, it is not much of an issue.

Yes, but I don't think we can assume a non-malicious user! What if someone
wants to claim they have proved something in order to get paid?
Subcontracting out proofs could be a serious business in the future.

Mark.

Email Gateway (Aug 18 2022 at 17:00):

From: Tjark Weber <webertj@in.tum.de>
And a malicious user, by misusing ML in Isabelle theory files, couldn't
replace the functions that you would use to inspect internal
certificates -- or the entire inference kernel; or even the entire
Isabelle process -- with his own implementation?

Kind regards,
Tjark

Email Gateway (Aug 18 2022 at 17:00):

From: Makarius <makarius@sketis.net>
Yes, depending how much criminal energy you want to invest.

Actually, this my favourite theoretical attack on LCF prover integrity.
After "The Matrix" it also has a name, see
http://en.wikipedia.org/wiki/Blue_Pill_%28malware%29

We could harden Isabelle/ML against this, since the ML environment is
under our control. Since this has no practical relevance, though, it has
not been done so far. It is important to keep focused on real problems.

Makarius

Email Gateway (Aug 18 2022 at 17:00):

From: Lawrence Paulson <lp15@cam.ac.uk>
Once you assume malicious users, you are turning theorem-proving software into security software. The time for this may come, but it will require a significant investment in security engineering. The first step would be to develop a realistic threat model. And this isn't really possible in the absence of actual attacks. It will be interesting to see if there are any over the coming years.

Larry

Email Gateway (Aug 18 2022 at 17:01):

From: mark@proof-technologies.com
on 18/1/11 2:28 PM, Makarius <makarius@sketis.net> wrote:

On Tue, 18 Jan 2011, Tjark Weber wrote:

...

And a malicious user, by misusing ML in Isabelle theory files, couldn't
replace the functions that you would use to inspect internal
certificates -- or the entire inference kernel; or even the entire
Isabelle process -- with his own implementation?

Yes, depending how much criminal energy you want to invest.

Like the printer problems I mention, I also see these ML trojan horse issues
as important, easy-to-address and yet problems that have not yet been
addressed. The ML interpreter should support being able to block
overwriting in some way or other. E.g. supply the interpreter with a
datatype or a function name that cannot be overwritten. It isn't exactly
rocket science, and it removes a risk. I just don't understand why nothing
has been done about this.

Actually, this my favourite theoretical attack on LCF prover integrity.
After "The Matrix" it also has a name, see
http://en.wikipedia.org/wiki/Blue_Pill_%28malware%29

We could harden Isabelle/ML against this, since the ML environment is
under our control. Since this has no practical relevance, though, it has
not been done so far. It is important to keep focused on real problems.

How could this be done?

Mark.

Email Gateway (Aug 18 2022 at 17:01):

From: Makarius <makarius@sketis.net>
Dvid Matthews kindly provides ways in Poly/ML to manage the whole
compilation process in user space, including printing and binding of the
results. One would merely have to protect agains shadowing of certain
structure names, say.

When you've showed me an early version of HOL0 in Cambridge ITP 2009, I
have already pointed out that OCaml is not the best platform for that (due
to its mutable strings and silently overflowing machine words called
"int").

Makarius

Email Gateway (Aug 18 2022 at 17:01):

From: mark@proof-technologies.com
on 18/1/11 7:10 PM, Makarius <makarius@sketis.net> wrote:

....

Dvid Matthews kindly provides ways in Poly/ML to manage the whole
compilation process in user space, including printing and binding of the
results. One would merely have to protect agains shadowing of certain
structure names, say.

Oh good. This is excellent news. How long has this been available for?

A related problem is overwriting a datatype's pretty printer. Do you know
if Poly/ML can protect against this?

When you've showed me an early version of HOL0 in Cambridge ITP 2009, I
have already pointed out that OCaml is not the best platform for that (due
to its mutable strings and silently overflowing machine words called
"int").

Yes you did (was it Montreal ITP 2008?). HOL Zero uses a technique to avoid
problems with mutable strings (unlike HOL Light) and doesn't use machine
words for its representation of natural number numerals (like HOL Light).

But you are right - OCaml is not the best language. Larry convinced me a
few months ago that SML is simply more well-defined. I plan to port it to
SML as a priority as soon version 1.0 is out. But this port might not be
until next year.

Mark.

Email Gateway (Aug 18 2022 at 17:01):

From: Makarius <makarius@sketis.net>
Maybe 2 years. It was one of the many improvements that David Matthews
did for parallel ML and ML IDE support.

This is where the Isabelle/ML compiler invocation happens:
http://isabelle.in.tum.de/repos/isabelle/file/Isabelle2009-2/src/Pure/ML/ml_compiler_polyml-5.3.ML
It is basically the Isabelle/Isar toplevel running ML, with toplevel name
space management and pretty printing under our control. Without that
parallel Isabelle would hardly work in practice.

Makarius

Email Gateway (Aug 18 2022 at 17:01):

From: Gerwin Klein <gerwin.klein@nicta.com.au>
I don't think this is a problem in practice. Theorem provers are already used in big certification projects. The evaluators in such projects usually know what they are doing, at least up to the level where they would easily be able to spot attempts to make syntactically misleading proof statements. It's easy enough to scan for ML blocks or similar in theory files, spot duplicate identifiers, etc, and demand good explanations for their presence (or just forbid them). And the threat of just using a proof checker is always there, so why try to cheat on syntax?

I'm with Larry on this: the much bigger problem is to convince yourself that the model and final theorem are in any way useful or right. This is where cheating will happen much earlier.

Cheers,
Gerwin

Email Gateway (Aug 18 2022 at 17:01):

From: Michael Norrish <Michael.Norrish@nicta.com.au>
If the contractor's product is being audited sufficiently carefully, the auditor will notice the dubious calls to the pretty-printing/bare-ML infrastructure that were required to get the fake result looking acceptable.

I really do not believe that this is an interesting issue.

If the terms are really so large that printing them in "raw" form is likely to be a problem, then they will be just as incomprehensible in pretty form, and just as unreliable when checked by eye-balling. In that situation, the client will presumably check their supplier's work by doing something like

if aconv myterm (concl suppliers_thm) then print "OK"
else print "FAILURE"

And voilà, the pretty-printing "issue" disappears.

Michael

Email Gateway (Aug 18 2022 at 17:02):

From: Michael Norrish <Michael.Norrish@nicta.com.au>
And the auditor failing to spot such malfeasance would be barely worth the name.

Michael

Email Gateway (Aug 18 2022 at 17:02):

From: mark@proof-technologies.com
on 18/1/11 10:22 PM, Gerwin Klein <gerwin.klein@nicta.com.au> wrote:

We should remember that at the moment it is very easy for someone to
maliciously produce what appears to be a formal proof of some statement
...

I don't think this is a problem in practice. Theorem provers are already
used in big certification projects. The evaluators in such projects
usually
know what they are doing,

How can you be so confident about this? In the large, safety-critical
certification projects I have been involved with, the evaluators do the best
they can with the tools they have available. Proof checkers with readable
and highly trustworthy output is a luxury not available to them.

at least up to the level where they would easily
be able to spot attempts to make syntactically misleading proof
statements.
It's easy enough to scan for ML blocks or similar in theory files, spot
duplicate identifiers, etc, and demand good explanations for their
presence
(or just forbid them). And the threat of just using a proof checker is
always there, so why try to cheat on syntax?

The input in this process is the full power of an ML program. We all know
that it is possible to hide incredibly subtle things in huge programs. Are
we to ban non-trivial ML in the process of producing a proof? This would be
forcing contractors to do their work with one hand tied behind their back.
The contracted, safety-critical proof work I have been involved with would
certainly have been completely infeasible without writing large amounts of
ML program to support my proof work.

It is far better for the auditor if they can treat the ML code that
establishes the final theorem as some sort of black box. To do this the
theorem prover needs certain basic properties, including a trustworthy
printer.

I'm with Larry on this: the much bigger problem is to convince yourself
that
the model and final theorem are in any way useful or right. This is where
cheating will happen much earlier.

I did not say that this was not a problem! This is the big problem! And
this is the problem that I am advocating needs a tool that properly supports
the process - the process of determining that the model and final theorem
are right (as well as that the final theorem has actually been proved).
This process involves using the pretty printer.

Mark.

Email Gateway (Aug 18 2022 at 17:02):

From: mark@proof-technologies.com
on 18/1/11 10:52 PM, Michael Norrish <Michael.Norrish@nicta.com.au> wrote:

On 19/01/11 00:11, mark@proof-technologies.com wrote:
...

Yes, but I don't think we can assume a non-malicious user! What if
someone wants to claim they have proved something in order to get
paid? Subcontracting out proofs could be a serious business in the
future.

If the contractor's product is being audited sufficiently carefully, the
auditor will notice the dubious calls to the pretty-printing/bare-ML
infrastructure that were required to get the fake result looking
acceptable.

The input to the process is raw ML code, and huge amounts of it. Almost
anything can be hidden in this, unless very strict coding standards are
enforced. It is a far nicer situation for the auditor to be able to treat
the ML code more-or-less as a black box, and review the state of the theorem
prover after it has supposedly established the end-result theorem. I am
advocating a tool that makes life for the auditor easier. A tool with a
pretty printer that can be trusted.

If the terms are really so large that printing them in "raw" form is
likely
to be a problem, then they will be just as incomprehensible in pretty
form,

You appear to be claiming that concrete syntax printers are of no practical
use to the human reader.

and just as unreliable when checked by eye-balling. In that situation,
the
client will presumably check their supplier's work by doing something like

if aconv myterm (concl suppliers_thm) then print "OK"
else print "FAILURE"

And voilà, the pretty-printing "issue" disappears.

Let's assume we don't trust the parser. How do we know that 'myterm' is
correct? We need to review definitions and terms, and this is best done
using a trustworthy pretty printer.

I never thought that advocating a trustworthy pretty printer would be so
controversial!!!

Mark.

Email Gateway (Aug 18 2022 at 17:02):

From: Michael Norrish <Michael.Norrish@nicta.com.au>
It's just not an interesting problem. We can solve it trivially by writing everything out in sexp-like notation. ACL2 is demonstrably an acceptable format for security-critical validation, and this is all the pretty-printing they have.

Moreover, with proof terms (in Isabelle) and things like OpenTheory for the 3 HOLs, we can avoid having to look at clients' ML code.

Michael

Last updated: Jul 13 2025 at 04:38 UTC