[isabelle] Monadic do notation for Isabelle · Archive Mirror: Isabelle Users Mailing List

Stream: Archive Mirror: Isabelle Users Mailing List

Topic: [isabelle] Monadic do notation for Isabelle

Email Gateway (Aug 18 2022 at 15:18):

From: Christian Sternagel <christian.sternagel@uibk.ac.at>
Hi there,

I wrote a theory (plus accompanying ML file) that allows to use
do-notation (as in Haskell and where braces and semicolons are
mandatory) for arbitrary monads (in this case, arbitrary bind- and
then-functions), after a setup via:

setup {*
add_monad "List" {
monad_bind = @{term "%xs f. concat (map f xs)"},
monad_then = NONE
}
*}

(Here "monad_then = NONE" means that "bind m (%_. n)" is used instead of
"then m n".)

setup {* use_monad "List" *}

Now, we can write, e.g.,

do {
let xs = [0..2];
x <- xs;
y <- xs;
[(x, y)]
}

to obtain the list [(0, 0), (0, 1), (0, 2), ..., (2, 2)].

So far so good :). In this setup it is not possible to combine different
monads. Consider for example the following option monad:

setup {*
add_monad "Option" {
monad_bind = @{term "%m f. Option.map f m"},
monad_then = NONE
}
*}

I would like to be able to write, e.g.,

do {
let xs = [Some 0, Some 1, None];
x <- xs;
y <- xs;
let r = do {
m <- x;
n <- y;
Some (x + y)
};
return [r]
}

which is an admittedly contrived example. My first thought, was to give
"do" an optional argument (the name of the monad that should be used), as in

do List { ... do Option { ... } ... }

This brings me to my actual question ;)

If I would use Theory_Data to store the "current monad", my
parse_translations would have to modify the current theory. I assume,
this is bad (furthermore, I have no clue how to do this :)). Would it be
okay to use a reference for this? What about potential problems?

cheers

chris

Email Gateway (Aug 18 2022 at 15:18):

From: Brian Huffman <brianh@cs.pdx.edu>
Hi Christian,

I can see that your parse_translation might need to read information
from the current theory, but I don't see why it would have to modify
the current theory. The parse_translation for case syntax (function
case_tr in HOL/Tools/Datatype/datatype_case.ML) probably uses all the
relevant techniques that you would need to implement your generic
monadic syntax: it reads information about constructors from theory
data, and translates the case branches (built with the "_case1" and
"_case2" syntactic constructors) into the appropriate case
combinators. It does not modify any theory data during parsing.

I'm not sure how your current parse_translation is implemented, but
here's the approach I would take: First, define some syntactic
constructors for the do-syntax, similar to how the case syntax is
defined in HOL.thy:

nonterminals dobinds

syntax
"_do1" :: "'a => dobinds" ("_")
"_do2" :: "[pttrn, 'a, dobinds] => dobinds" ("_ <- _;/ _")
"_do_syntax" :: "[string, dobinds] => 'a" ("do _ {_}")

Then define a parse_translation for "_do_syntax" that would parse the
list of "_do2" and "_do1" constructors, then choose an appropriate
monad instance based on the value of the string argument (obtaining
the table of monad instances from the theory data).

Alternatively you could omit the "string" argument to "_do_syntax" and
look up the monad instance based on the type of the last statement of
the do-block (except I'm not sure whether the type information would
be available yet). Or you could do a lookup based on what the
top-level constant of the last statement is, e.g. a term of the form
[_] indicates the list monad, and Some _ indicates the option monad.

Brian

Email Gateway (Aug 18 2022 at 15:19):

From: Makarius <makarius@sketis.net>
This is going in the right direction.

Also note that Scala "for" comprehensions are officially explained as a
certain syntax translation on untyped expressions, which are later run
through the regular type-inference.

If it turns out that you do need type information for your tanslation you
can organize it like that:

* The translation functions (advanced parse translations etc.) only take
care of producing proper binders.

* Further transformations within the type-checking phase are done by
your own "check" operation, which is installed via Syntax.add_term_check

The latter gives you an opportunity to participate in a simultaneous
process to infer full type information. The regular Hindley-Milner type
inference is only one part of that. This relatively new and advanced
abstract syntax technology is still not use very much, but could open a
chance to get things really right that did not work so well with plain old
macros (aka translations).

Makarius

Email Gateway (Aug 18 2022 at 15:19):

From: Alexander Krauss <krauss@in.tum.de>
Hi Chris,

I am not a syntax wizard, so I cannot help much with the actual task,
but note that there are two monads in the library already, each with its
own syntax setup for do notation:

HOL/Library/State_Monad.thy
HOL/Imperative_HOL/Heap_Monad.thy

You might want to look at them if you haven't already. In the AFP there
is also MiniML/Maybe.thy, but with a rather basic syntax.

If you manage to unify all this somehow, this would be a really cool thing.

Alex

Christian Sternagel wrote:

Email Gateway (Aug 18 2022 at 15:19):

From: Christian Sternagel <christian.sternagel@uibk.ac.at>
On 05/19/2010 10:22 AM, Makarius wrote:

The translation functions (advanced parse translations etc.) only take
care of producing proper binders.
I am not sure what you mean by that... currently I first introduce:

nonterminals
do_expr

syntax
...
"_do_bind" :: "pttrn => 'a => do_expr => do_expr"
("_ <- _;//_") [1000, 13, 12] 12)
...

consts
...
bindM :: "'a => ('b => 'c) => 'c"
...

translations
...
"_do_bind x m n" => "CONST bindM m (%x n)"
...

and then use Syntax.add_term_check to register a transformation from
this general constants to specific ones (that have to be registered
first by e.g.,
setup {* Monad.add_monad "option" "option_bind" "Some" *}), depending on
the type.

Further transformations within the type-checking phase are done by
your own "check" operation, which is installed via Syntax.add_term_check

1) Are those term_checks applied bottom-up to all subterms or do I have
to use a recursive function that will be applied at the root?

2) Further, I saw the "stage" parameter. Do I assume correctly that the
stage gives the order of the checks and that type checking is done at
stage 0?

3) Is add_term_uncheck the symmetric operation before printing?

4) I find this framework very convenient :D. With few lines of code I
seem to have a working solution (for the parsing direction and modulo
the fact that it is currently necessary to provide typing contraints at
the "right" places, but that is due to my lack of knowledge about how to
use types properly in Isabelle programming).

cheers

chris

Email Gateway (Aug 18 2022 at 15:19):

From: Makarius <makarius@sketis.net>
On Wed, 19 May 2010, Christian Sternagel wrote:

Further transformations within the type-checking phase are done by
your own "check" operation, which is installed via Syntax.add_term_check

1) Are those term_checks applied bottom-up to all subterms or do I have to
use a recursive function that will be applied at the root?

2) Further, I saw the "stage" parameter. Do I assume correctly that the stage
gives the order of the checks and that type checking is done at stage 0?

3) Is add_term_uncheck the symmetric operation before printing?

Each check or uncheck operation operates on the whole textual situation,
i.e. a context and a collection of terms to be treated simultaneously.
Thus you do not only get the root of a single term, but the full chunk of
text to be processed, i.e. the super-root. You can also keep information
in the result context for later -- in the very end it is discarded,
though.

Stages progress from smaller to larger integer values for check, and
reverse for uncheck. At stage 0 there are the "standard" check/uncheck
phases, there is also a final "fixate" check at stage 100 to finalize
type-inference parameters that are left over. Anything else can be defined
in the user context, and arbitrary things can be done here, i.e. this
powerful mechanism needs to be used with some care.

Since the whole process iterates repeatedly over all stages, your part
might get invoked multiple times, until a global fixed-point is reached
(if it ever happens). This is a bit like a simplifier on abstract syntax,
but without any builtin strategy to traverse the terms.

4) I find this framework very convenient :D. With few lines of code I
seem to have a working solution (for the parsing direction and modulo
the fact that it is currently necessary to provide typing contraints at
the "right" places, but that is due to my lack of knowledge about how to
use types properly in Isabelle programming).

Somehow the standard type checking at phase 0 is special: it takes raw
parser output (where all names and binder scopes are already resolved, but
lacks type information apart from constraints that happen to be in the
input), and produces fully types terms according to the usual
Hindley-Milner scheme. If you act before, you won't get proper type
information, but afterwards most of it is already finished. Typically,
later check phases merely "improve" inferred types by specializing them,
as is usually done in advanced Haskell-style systems.

Case 1: You operate after stage 0. This means that your concrete syntax
needs to be mapped to auxiliary combinators etc. that will be accepted by
the standard type check. That result can then be expanded to suitable
logical constants by your function. In other words you need to be able to
model your monad language using simply-typed higher-order abstract syntax.

Case 2: You operate before stage 0. This means you merely rearrange the
user input schematically such that the later stages will synthesize just
the right types accidentally. This is a bit like "for" in Scala, where
the compiler expands nested comprehensions to certain combinators in an
untyped fashion. (Maybe they now have some papers on the details, I did
not find any 1-2 years ago. There are some basic explanations in the
Scala book by Odersky-et-al.)

Makarius

Email Gateway (Aug 18 2022 at 15:20):

From: Makarius <makarius@sketis.net>
On Fri, 21 May 2010, Christian Sternagel wrote:

Hence here is just my check function:

fun check_monad ts ctxt =
let

| subst_monad_app ctxt (Const (@{const_name "thenM"}, T), [m, n]) =
(case add_monad_type ctxt n (add_monad_type ctxt m []) of
typ :: _ =>
let
val used = Term.add_free_names n [];
val v = Free (Name.variant used "x", dummyT);

At this point it is important to use the proper context to invent fresh
names, not just the terms you hold in your hand. Term.add_free_names and
Name.variant belong to a very old layer of the system. Use
Variable.variant_frees or similar -- its implementation may even serve as
an example how it works with contexts:

fun variant_frees ctxt ts frees =
let
val names = Variable.names_of (fold Variable.declare_names ts ctxt);
val xs = fst (Name.variants (map #1 frees) names);
in xs ~~ map snd frees end;

I) Currently I'm going for Case 2 (of makarius suggestion) and
registered the check function at stage 0.

I recommend to use a stage number <> 0. While there are standard policies
about precedence of plugins at the same stage, it is better to make your
intention fully clear.

I've heard that you have made some progress off-list. If there is a nice
result emerging here, it is worth posting it again.

Makarius

Email Gateway (Aug 18 2022 at 15:20):

From: Christian Sternagel <christian.sternagel@uibk.ac.at>
Hi again!

I am still tinkering with the monad notation stuff, and since I have now
spent quite some time, I am not willing to give up anymore :D

I thought that some code would be good this time, but I'm not sure
whether I am allowed to send attachments to the mailing list? Hence here
is just my check function:

fun check_monad ts ctxt =
let
fun map_default f xs =
let val ys = map f xs in
if exists is_some ys
then SOME (map2 the_default xs ys)
else NONE
end;

fun subst_monad_app ctxt (Const (@{const_name "returnM"}, T), [x]) =
let val typ = range_type T
in if is_monad_type ctxt typ
then SOME (Const (get_return typ ctxt, dummyT) $ x)
else more_err ()
end
| subst_monad_app ctxt (Const (@{const_name "returnM"}, T), []) =
let val typ = range_type T
in if is_monad_type ctxt typ
then SOME (Const (get_return typ ctxt, dummyT))
else more_err ()
end
| subst_monad_app ctxt (Const (@{const_name "bindM"}, _), [m, f]) =
(case add_monad_fun_type ctxt f (add_monad_type ctxt m []) of
typ :: _ =>
SOME (Const (get_bind typ ctxt, dummyT) $ m $ f)
| [] => more_err ())
| subst_monad_app ctxt (Const (@{const_name "thenM"}, T), [m, n]) =
(case add_monad_type ctxt n (add_monad_type ctxt m []) of
typ :: _ =>
let
val used = Term.add_free_names n [];
val v = Free (Name.variant used "x", dummyT);
in
SOME (Const (get_bind typ ctxt, dummyT) $ m $ lambda v n)
end
| [] => more_err ())
| subst_monad_app ctxt (t, ts) =
(case map_default (subst_monad ctxt) ts of
SOME ts' => SOME (list_comb (t, ts'))
| NONE => NONE)
and subst_monad ctxt (Abs (v, T, t)) =
(case subst_monad ctxt t of
SOME t' => SOME (Abs (v, T, t'))
| NONE => NONE)
| subst_monad ctxt t = subst_monad_app ctxt (strip_comb t);
in
map_default (subst_monad ctxt) ts
|> Option.map (rpair ctxt)
end;

(Here, more_err () just tells the user that more type constraints are
needed in order to infer the current monad type.)

I) Currently I'm going for Case 2 (of makarius suggestion) and
registered the check function at stage 0. When I add a type contraint
around a do block, this does always work, however I am not pleased with
the few examples that work without any type constraints.

The tricky part is "bindM" (and "thenM" is analogeous). Here I know that
the arguments m and f (which are of type "'a m" and "'a => 'b m" for
some monad type "m") are already checked. Now, I want to find out, which
actual bind method I should use instead of "bindM" (the function
get_bind returns the corresponding constant for a given type in a given
context).

The choice is based on the functions "add_monad_type" (for m) and
"add_monad_fun_type" (for f).

I have mainly tried 2 versions (which do both not work in all cases):

1) My first try was to collect all types occurring somewhere in a given
term (by folding a function that collects all Type's from a given typ,
over all typs of a term) that are registered as monadic. But then I get
typing problems when more than one monad type occurs inside a term, e.g.,

do {
x <- map Some [0..2];
return x
}

assuming that "list" and "option" are registered as monadic, my function
finds "'a option" before "'a list" and hence substitutes "option_bind",
when "list_bind" was needed. It is not immediately clear to me which
strategy I should use for finding monad types.

2) In the other extreme I do just check for the fastype of m and the
range type of the fastype of f to be monadic... but then I cannot handle
cases like, e.g.,

do {
x <- xs;
y <- [0..2];
return (x, y)
}

although, the list monad is the only possibility.

So I would need a better way to check for the current monad type, based
on m and f.

A further problem are cases like

do {
let xs = [0 .. 2];
x <- xs;
y <- ys;
return (x, y)
}

which is parsed to (using the infix >>= instead of "bindM"):

(%xs. xs >>= (%x. ys >>= (%y. returnM (x, y)))) [0 .. 2]

So at the actual occurrences of >>= I would have to know the type of the
bound variable xs. But I do not know how to handle this in Isabelle
(since bound variables do not contain types).

I do appreciate any comments and/or suggestions.

II) I also have some design questions concerning printing for you :)

I would like to allow both, the combinator notation using >>=, >>, and
return; and the do-notation. As input this is easy: just provide
notation for bindM, thenM, and returnM

notation
bindM (infixl ">>=" 54) and ...

For printing it would be nice to let the user decide (or do you have
better suggestions?). Can this be handled with 'modes'? Also, return
should only be used when "monad printing" is on, since, seeing "return
x" everywhere instead of "Some x" (for the option monad) or "Inr x" (for
the error monad) does not seem to be nice.

cheers

chris

Email Gateway (Aug 18 2022 at 15:20):

From: Christian Sternagel <christian.sternagel@uibk.ac.at>
On 05/21/2010 09:11 PM, Makarius wrote:

On Fri, 21 May 2010, Christian Sternagel wrote:

Hence here is just my check function:

fun check_monad ts ctxt =
let

| subst_monad_app ctxt (Const (@{const_name "thenM"}, T), [m, n]) =
(case add_monad_type ctxt n (add_monad_type ctxt m []) of
typ :: _ =>
let
val used = Term.add_free_names n [];
val v = Free (Name.variant used "x", dummyT);

At this point it is important to use the proper context to invent fresh
names, not just the terms you hold in your hand. Term.add_free_names and
Name.variant belong to a very old layer of the system. Use
Variable.variant_frees or similar -- its implementation may even serve
as an example how it works with contexts:

fun variant_frees ctxt ts frees =
let
val names = Variable.names_of (fold Variable.declare_names ts ctxt);
val xs = fst (Name.variants (map #1 frees) names);
in xs ~~ map snd frees end;

I) Currently I'm going for Case 2 (of makarius suggestion) and
registered the check function at stage 0.

I recommend to use a stage number <> 0. While there are standard
policies about precedence of plugins at the same stage, it is better to
make your intention fully clear.
The problem is that I have a solution that works when using stage 0. But
not on any other stage :). On stage ~1 I get a MATCH exception for a
call to range_type (to find out the specific "return" to use) and on
stage 1 I can not use dummyT without getting a TYPE exception.

Here is my new check function (as I said, when registered at stage 0,
every example I have tried, did also work). Thanks to alex krauss, I am
now using Pattern.rewrite_term, which makes everthing much more
convenient. If I understand correctly, since I registered "check_monad"
at stage 0, it is interleaved with type inference... and that is exactly
what I need here.

fun check_monad ts ctxt =
let
val thy = ProofContext.theory_of ctxt;

fun map_default f xs =
let val ys = map f xs in
if exists is_some ys
then SOME (map2 the_default xs ys)
else NONE
end;

fun subst_monad ctxt (Const (@{const_name "returnM"}, T)) =
let val typ = range_type T
in if is_monad_type ctxt typ
then SOME (Const (get_return typ ctxt, dummyT))
else NONE
end
| subst_monad ctxt (Const (@{const_name "bindM"}, T) $ m $ f) =
let val mty :: fty :: rty = binder_types T in
(case filter (is_monad_type ctxt) (mty :: range_type fty :: rty) of
typ :: _ =>
SOME (Const (get_bind typ ctxt, dummyT) $ m $ f)
| [] => NONE)
end
| subst_monad ctxt (Const (@{const_name "thenM"}, T) $ m $ n) =
(case filter (is_monad_type ctxt) (binder_types T) of
typ :: _ =>
let
val used = Term.add_free_names n [];
val v = Free (Name.variant used "x", dummyT);
in
SOME (Const (get_bind typ ctxt, dummyT) $ m $ lambda v n)
end
| [] => NONE)
| subst_monad ctxt _ = NONE

fun no_abstract_consts t =
let
val cs = Term.add_const_names t [];
val taboos = [@{const_name "returnM"}, @{const_name "bindM"},
@{const_name "thenM"}];
in
(case inter (op =) cs taboos of
[] => true
| _ => more_err ())
end;

fun check t =
let val t' = Pattern.rewrite_term thy [] [subst_monad ctxt] t
in if t = t' andalso no_abstract_consts t then NONE else SOME
t' end;
in
map_default check ts
|> Option.map (rpair ctxt)
end;

I've heard that you have made some progress off-list. If there is a nice
result emerging here, it is worth posting it again.
I will :)

Makarius

chris

PS: I will change the variable stuff according to your comment... it is
sometimes difficult to see from the sources, which functions are
supposed to be used and which not :D

Email Gateway (Aug 18 2022 at 15:21):

From: Makarius <makarius@sketis.net>
On Fri, 21 May 2010, Christian Sternagel wrote:

I) Currently I'm going for Case 2 (of makarius suggestion) and
registered the check function at stage 0.

I recommend to use a stage number <> 0. While there are standard
policies about precedence of plugins at the same stage, it is better to
make your intention fully clear.
The problem is that I have a solution that works when using stage 0. But
not on any other stage :). On stage ~1 I get a MATCH exception for a
call to range_type (to find out the specific "return" to use) and on
stage 1 I can not use dummyT without getting a TYPE exception.

Stage 0 is perfectly OK, if you really mean to be a peer of the standard
typecheck mechnism, which seems to be the case here.

BTW, the exception raised by Term.range_type is the builtin Match of SML,
not MATCH (which some other modules define internally). It is important to
be hypercritical concerning the naming of any exception with unqualified
name and without argument, because mispelling will result in a dreaded
catch-all pattern. This would make the meaning of the programm depend on
the weather, since various external events are injected as Interrupt
exception into the regular program executation.

| subst_monad ctxt (Const (@{const_name "thenM"}, T) $ m $ n) =
(case filter (is_monad_type ctxt) (binder_types T) of
typ :: _ =>
let
val used = Term.add_free_names n [];
val v = Free (Name.variant used "x", dummyT);
in
SOME (Const (get_bind typ ctxt, dummyT) $ m $ lambda v n)
end
| [] => NONE)
| subst_monad ctxt _ = NONE

Back again to this part of inventing locally fresh names, which is
suprisingly hard to get right in many practical situations.

In fact, the above is technically right, because you merely feed the name
to the Term.lambda function, which will then abstract over v that is not
in the body n. It is also a bit confusing, because lambda terms work via
de-Bruijn indices, so you do not need to invent names in most situations
-- the pretty printer will take care of the renaming if required.

Above you merely do a dummy abstraction, which usually works by
incrementing "loose bounds" in the body and adding an Abs constructor on
top, with a "comment" for the bound variable as you like, e.g. "x". The
incrementing can be skipped if the term is "closed", which is actually a
precondition for the lambda function. Your subterms seem to be indeed
closed like that, because you operate as a plugin to Pattern.rewrite_term
which strips off abstractions before entering the body.

Interestingly, Pattern.rewrite_term invents auxiliary free variables only
by looking at the visible term material. This means there is an implicit
assumption that the rewrite process will not introduce other fixed
variables from the surrounding context.

Anyway, to cut a long story short, you can just use Term.abs_dummy above
without your own inventing of variables. Like lambda it assumes a closed
body.

Makarius

Last updated: Oct 08 2025 at 20:22 UTC