Stream: Archive Mirror: Isabelle Users Mailing List

Topic: [isabelle] Problems with cong-rules for parser


view this post on Zulip Email Gateway (Aug 18 2022 at 15:32):

From: René Thiemann <rene.thiemann@uibk.ac.at>
Dear all,

we currently have problems setting up the right cong-rules for our
parser such termination is provable.

Here, the major problems are parsers which call themselves twice in a
row.

Consider the following theory:

theory Test imports Main
begin

types
('t, 'a) parser = "'t list => string + ('a x 't list)"

take a token list and produce a string error-message or a pair of
result and remaining token list

(state-monad with errors)

now we can define the standard return and bind operations

abbreviation
return :: "'a => ('t, 'a)parser"
where
"return x = (%ts. Inr (x, ts))"

definition
bind :: "('t, 'a) parser => ('a => ('t, 'b) parser) => ('t, 'b)
parser"
where
"bind m f ts = (case m ts of
Inl e => Inl e
| Inr (x,ts') => f x ts')"

consuming one character

fun consume :: "('t,'t)parser"
where "consume [] = Inl ''error''"
| "consume (t # ts) = Inr (t, ts)"

now, here is the difficult function:

function foo :: "('t, 't list)parser"
where "foo ts = (bind consume
(% t. bind foo
(% us. bind foo
(% vs. return (t # us @ vs))))) ts"
by pat_completeness auto
termination
proof
(* impossible, since there is no link between input ts and the list
of tokens that is used for recursive foo-calls *)
oops

so, let us add some congruence rule:

lemma bind_cong[fundef_cong]:
fixes m1 :: "('t,'a)parser"
assumes "m1 ts1 = m2 ts1" and "!! y ts. m1 ts1 = Inr (y,ts) ==> f1
y ts = f2 y ts" and "ts1 = ts2"
shows "((bind m1 f1) ts1) = ((bind m2 f2) ts2)"
using assms unfolding bind_def by (cases "m1 ts1", auto)

now, indeed a simplied version of foo can be proven to be terminating
(if there is only one recursive call)

function foo' :: "('t, 't list)parser"
where "foo' ts = (bind consume
(% t. bind foo'
(% vs. return (t # vs)))) ts"
by pat_completeness auto
termination by ... tedious reasoning

but, if we want to have the full version which two recursive calls,
then not even the function
definition is accepted:

function foo'' :: "('t, 't list)parser"
where "foo'' ts = (bind consume
(% t. bind foo''
(% us. bind foo''
(% vs. return (t # us @ vs))))) ts"
(*
exception THM 0 raised (line 709 of "thm.ML"): implies_elim: major
premise
*)

Does someone have an idea, what went wrong in this case?

Best regards,
René

PS: The small theory file is attached.
Test.thy

view this post on Zulip Email Gateway (Aug 18 2022 at 15:32):

From: Tobias Nipkow <nipkow@in.tum.de>
Not an answer but a thought: maybe result checking can simplify your
life? Maybe you can write your parser directly in ML and verify each run
of the parser by unparsing and comparison with the input?

Tobias

René Thiemann schrieb:

view this post on Zulip Email Gateway (Aug 18 2022 at 15:32):

From: Tobias Nipkow <nipkow@in.tum.de>
René Thiemann schrieb:

Not an answer but a thought: maybe result checking can simplify your
life? Maybe you can write your parser directly in ML and verify each run
of the parser by unparsing and comparison with the input?

Yes and no: We do unparsing and comparison in any way, however, we
cannot write
our parser in ML as we want to have the parser also in the
target-language from
the code-generator (as the generated verifier has to parse), or can the
code-generator
generate (Haskell)-code from ML (in the new release)?

I see. No, Haskell code from ML code is not possible.

Tobias

Hence, we need a parser in HOL which does not have to satisfy any
properties since
comparison against the input is performed. However, for the
code-generator we need
termination of all functions we would like to export (including the
parser).
And since termination of the parser is currently the only axiom we use,
we would
like to get rid of this axiom by actually proving termination.

Best,
René

Dear all,

we currently have problems setting up the right cong-rules for our
parser such termination is provable.

Here, the major problems are parsers which call themselves twice in a
row.

Consider the following theory:

theory Test imports Main
begin

types
('t, 'a) parser = "'t list => string + ('a x 't list)"

take a token list and produce a string error-message or a pair of result
and remaining token list

(state-monad with errors)

now we can define the standard return and bind operations

abbreviation
return :: "'a => ('t, 'a)parser"
where
"return x = (%ts. Inr (x, ts))"

definition
bind :: "('t, 'a) parser => ('a => ('t, 'b) parser) => ('t, 'b) parser"
where
"bind m f ts = (case m ts of
Inl e => Inl e
| Inr (x,ts') => f x ts')"

consuming one character

fun consume :: "('t,'t)parser"
where "consume [] = Inl ''error''"
| "consume (t # ts) = Inr (t, ts)"

now, here is the difficult function:

function foo :: "('t, 't list)parser"
where "foo ts = (bind consume
(% t. bind foo
(% us. bind foo
(% vs. return (t # us @ vs))))) ts"
by pat_completeness auto
termination
proof
(* impossible, since there is no link between input ts and the list
of tokens that is used for recursive foo-calls *)
oops

so, let us add some congruence rule:

lemma bind_cong[fundef_cong]:
fixes m1 :: "('t,'a)parser"
assumes "m1 ts1 = m2 ts1" and "!! y ts. m1 ts1 = Inr (y,ts) ==> f1 y
ts = f2 y ts" and "ts1 = ts2"
shows "((bind m1 f1) ts1) = ((bind m2 f2) ts2)"
using assms unfolding bind_def by (cases "m1 ts1", auto)

now, indeed a simplied version of foo can be proven to be terminating
(if there is only one recursive call)

function foo' :: "('t, 't list)parser"
where "foo' ts = (bind consume
(% t. bind foo'
(% vs. return (t # vs)))) ts"
by pat_completeness auto
termination by ... tedious reasoning

but, if we want to have the full version which two recursive calls, then
not even the function
definition is accepted:

function foo'' :: "('t, 't list)parser"
where "foo'' ts = (bind consume
(% t. bind foo''
(% us. bind foo''
(% vs. return (t # us @ vs))))) ts"
(*
exception THM 0 raised (line 709 of "thm.ML"): implies_elim: major
premise
*)

Does someone have an idea, what went wrong in this case?

Best regards,
René

PS: The small theory file is attached.

view this post on Zulip Email Gateway (Aug 18 2022 at 15:32):

From: Alexander Krauss <krauss@in.tum.de>
Hi René,

we currently have problems setting up the right cong-rules for our
parser such termination is provable.
[...]

Does someone have an idea, what went wrong in this case?

I think your cong rule is slightly wrong. When I change it to

lemma bind_cong[fundef_cong]:
fixes m1 :: "('t,'a)parser"
assumes "m1 ts2 = m2 ts2"
and "\<And> y ts. m2 ts2 = Inr (y,ts) ==> f1 y ts = f2 y ts"
and "ts1 = ts2"
shows "((bind m1 f1) ts1) = ((bind m2 f2) ts2)"

then the function definition works for me. (I haven't attempted a
termination proof, but the proof obligation looks reasonable.)

The difference is that you have to refer to the "2"-variables in the
assumptions of the branches, cf. eg. if_cong.

It would be nice if there was a helpful error at the time where you
declare the rule, but I do not know what the general check would have to
be... The format of cong-rules isn't really well-defined, except for the
way they are used internally.

Since you are working with a state-exception monad, the termination
proof isn't actually crucial: I recently prototyped a function
definition facility for monadic functions of this kind which needs no
termination proof at all. You may be interested in this, since it fits
perfectly to your problem (the Ceta-Parser is actually mentioned in the
paper as an application):
http://www4.in.tum.de/~krauss/mrec/
However, the tool is not quite ready for integration yet, but if you are
not in a hurry, the problem might solve itself in a few months :-)

Alex

view this post on Zulip Email Gateway (Aug 18 2022 at 15:33):

From: René Thiemann <rene.thiemann@uibk.ac.at>

Not an answer but a thought: maybe result checking can simplify your
life? Maybe you can write your parser directly in ML and verify each
run
of the parser by unparsing and comparison with the input?

Yes and no: We do unparsing and comparison in any way, however, we
cannot write
our parser in ML as we want to have the parser also in the target-
language from
the code-generator (as the generated verifier has to parse), or can
the code-generator
generate (Haskell)-code from ML (in the new release)?

Hence, we need a parser in HOL which does not have to satisfy any
properties since
comparison against the input is performed. However, for the code-
generator we need
termination of all functions we would like to export (including the
parser).
And since termination of the parser is currently the only axiom we
use, we would
like to get rid of this axiom by actually proving termination.

Best,
René

Dear all,

we currently have problems setting up the right cong-rules for our
parser such termination is provable.

Here, the major problems are parsers which call themselves twice in
a row.

Consider the following theory:

theory Test imports Main
begin

types
('t, 'a) parser = "'t list => string + ('a x 't list)"

take a token list and produce a string error-message or a pair of
result
and remaining token list

(state-monad with errors)

now we can define the standard return and bind operations

abbreviation
return :: "'a => ('t, 'a)parser"
where
"return x = (%ts. Inr (x, ts))"

definition
bind :: "('t, 'a) parser => ('a => ('t, 'b) parser) => ('t, 'b)
parser"
where
"bind m f ts = (case m ts of
Inl e => Inl e
| Inr (x,ts') => f x ts')"

consuming one character

fun consume :: "('t,'t)parser"
where "consume [] = Inl ''error''"
| "consume (t # ts) = Inr (t, ts)"

now, here is the difficult function:

function foo :: "('t, 't list)parser"
where "foo ts = (bind consume
(% t. bind foo
(% us. bind foo
(% vs. return (t # us @ vs))))) ts"
by pat_completeness auto
termination
proof
(* impossible, since there is no link between input ts and the list
of tokens that is used for recursive foo-calls *)
oops

so, let us add some congruence rule:

lemma bind_cong[fundef_cong]:
fixes m1 :: "('t,'a)parser"
assumes "m1 ts1 = m2 ts1" and "!! y ts. m1 ts1 = Inr (y,ts) ==> f1 y
ts = f2 y ts" and "ts1 = ts2"
shows "((bind m1 f1) ts1) = ((bind m2 f2) ts2)"
using assms unfolding bind_def by (cases "m1 ts1", auto)

now, indeed a simplied version of foo can be proven to be terminating
(if there is only one recursive call)

function foo' :: "('t, 't list)parser"
where "foo' ts = (bind consume
(% t. bind foo'
(% vs. return (t # vs)))) ts"
by pat_completeness auto
termination by ... tedious reasoning

but, if we want to have the full version which two recursive calls,
then
not even the function
definition is accepted:

function foo'' :: "('t, 't list)parser"
where "foo'' ts = (bind consume
(% t. bind foo''
(% us. bind foo''
(% vs. return (t # us @ vs))))) ts"
(*
exception THM 0 raised (line 709 of "thm.ML"): implies_elim: major
premise
*)

Does someone have an idea, what went wrong in this case?

Best regards,
René

PS: The small theory file is attached.

view this post on Zulip Email Gateway (Aug 18 2022 at 15:33):

From: René Thiemann <rene.thiemann@uibk.ac.at>
Hi Alex,

I think your cong rule is slightly wrong. When I change it to

lemma bind_cong[fundef_cong]:
fixes m1 :: "('t,'a)parser"
assumes "m1 ts2 = m2 ts2"
and "\<And> y ts. m2 ts2 = Inr (y,ts) ==> f1 y ts = f2 y ts"
and "ts1 = ts2"
shows "((bind m1 f1) ts1) = ((bind m2 f2) ts2)"

then the function definition works for me. (I haven't attempted a
termination proof, but the proof obligation looks reasonable.)

The difference is that you have to refer to the "2"-variables in the
assumptions of the branches, cf. eg. if_cong.

thanks, that was it. Now termination of the XML-parser is proven.

It would be nice if there was a helpful error at the time where you
declare the rule, but I do not know what the general check would
have to be... The format of cong-rules isn't really well-defined,
except for the way they are used internally.

:-) I never would have seen from the error message, that I just would
have to swap the 1's and 2's.

Since you are working with a state-exception monad, the termination
proof isn't actually crucial: I recently prototyped a function
definition facility for monadic functions of this kind which needs
no termination proof at all. You may be interested in this, since it
fits perfectly to your problem (the Ceta-Parser is actually
mentioned in the paper as an application):
http://www4.in.tum.de/~krauss/mrec/
However, the tool is not quite ready for integration yet, but if you
are not in a hurry, the problem might solve itself in a few months :-)

okay, nice to know. Perhaps we'll wait a bit before proving
termination of the full CPF-parser.

René


Last updated: May 06 2024 at 16:21 UTC