Stream: Archive Mirror: Isabelle Users Mailing List

Topic: [isabelle] Non-ASCII characters and unsupported text stru...


view this post on Zulip Email Gateway (Aug 22 2022 at 19:58):

From: Wolfgang Jeltsch <wolfgang-it@jeltsch.info>
Hello!

What is the recommended way of embedding non-ASCII characters in
ordinary text (headlines and paragraph contents)? In actual Isabelle
code, I have access to the “infinite number of non-ASCII symbols” but
not in ordinary text as far as I can see.

I know that I can embed LaTeX source code that produces the desired
symbols, but that seems too low-level to me. Is there a way to enter
characters like no-break spaces and proper quotation marks (“ and ”)
directly?

Another thing are structures that aren’t currently supported by
Isabelle’s markup language. In particular I wonder how to properly embed
citations. Sure, I can use LaTeX source code again, but I’m wondering
whether it’s also possible to have an Isabelle construct, perhaps user-
defined, like \<cite>‹…›.

Regarding LaTeX source code, I discovered that it is possible to embed
it directly into the contents of text, section, and such. Is this
intentional or is it just that it happens to work with the current
Isabelle implementation? On the other hand, there is the \<latex>‹…›
construct, which doesn’t seem to be documented (I found it via the IDE’s
completion mechanism). Is this construct preferred over just embedding
LaTeX source code? Does it differ from plain embedding?

All the best,
Wolfgang

view this post on Zulip Email Gateway (Aug 22 2022 at 20:07):

From: Wolfgang Jeltsch <wolfgang-it@jeltsch.info>
Hi!

I’d be glad if anyone could answer at least some of the questions I
raised in my e-mail below. I’d like to write the documentation of my
Isabelle code properly, for which feedback on these matters would be
necessary. Thanks in advance.

All the best,
Wolfgang

view this post on Zulip Email Gateway (Aug 22 2022 at 20:11):

From: Wolfgang Jeltsch <wolfgang-it@jeltsch.info>
Am Montag, den 01.07.2019, 10:48 +1000 schrieb Peter Gammie:

Wolfgang,

I don’t know what the state of the art is, or where documentation
things are headed, but I’ll try to give you some pointers anyway.

Dear Peter,

thanks a lot for your explanations. Below are my responses to them.

What is the recommended way of embedding non-ASCII characters in
ordinary text (headlines and paragraph contents)? In actual Isabelle
code, I have access to the “infinite number of non-ASCII symbols”
but not in ordinary text as far as I can see.

The Isabelle docs suggest that UTF-8 is OK in these contexts. See (I
think) the Isabelle/jEdit manual.

I know that I can embed LaTeX source code that produces the desired
symbols, but that seems too low-level to me. Is there a way to enter
characters like no-break spaces and proper quotation marks (“ and ”)
directly?

I don’t know anything about entering these characters. I’d suggest you
use your favourite editor on the thy file directly. I get the
impression that modern LaTeX can eat UTF8 OK.

Hmm, when I enter a non-ASCII character into an Isabelle/jEdit buffer, I
cannot be sure that it will make it into the file in UTF-8-encoded form
when saving. I think it works in practice for symbols like “ and ”.
However, if I enter the Greek letter γ, for example, it will be saved to
the file as \<gamma>. If it appears in documentation text, outside of
actual Isabelle code, this \<gamma> will end up verbatim in the
generated LaTeX source code, causing an error.

So apparently it would work to place a symbol like “ or ” into the
documentation part of an Isabelle file and having it transferred to the
generated LaTeX file and finally processed correctly by LaTeX using some
sort of LaTeX UTF-8 support. However, this would work only because
symbols like “ and ” don’t have an Isabelle encoding in the form \<…>.
As soon as they’d get one, we would run into the same problem as with
the γ above.

Another thing are structures that aren’t currently supported by
Isabelle’s markup language. In particular I wonder how to properly
embed citations. Sure, I can use LaTeX source code again, but I’m
wondering whether it’s also possible to have an Isabelle construct,
perhaps user-defined, like \<cite>‹…›.

There is a cite antiquotation. Try completing on print_ in
Isabelle/jEdit. One of those will give you a list of antiquotations.

Amazing! I had looked for antiquotations in the Isabelle/Isar Reference
Manual but somehow overlooked the part that talks about \<^cite>.
Strangely enough, \<^cite> doesn’t appear in the auto-completion list
(in Isabelle2018 at least).

To figure out their arguments you can trawl the docs, grep the
Isabelle source code, or grep the AFP. Once you get to some steady-
state the NEWS file is handy.

Hmm, so far I had assumed that all antiquotations are documented in the
Isabelle/Isar Reference Manual, but apparently there are some that
aren’t.

Regarding LaTeX source code, I discovered that it is possible to
embed it directly into the contents of text, section, and such.
Is this intentional or is it just that it happens to work with the
current Isabelle implementation?

AFAIK this is what everyone has always done and will be doing for a
while yet. Makarius appears to be steadily adding more semi-formal
markup (? - I might be misusing that term) — \<^item> being one thing
that sticks in my mind, and the absence so far of a \label{..}
replacement.

I really like this trend of having more and more markup for
documentation text.

On the other hand, there is the \<latex>‹…› construct, which
doesn’t seem to be documented (I found it via the IDE’s
completion mechanism). Is this construct preferred over just
embedding LaTeX source code? Does it differ from plain embedding?

I don’t know. Perhaps try to find uses of \<latex> in the AFP or
HOL.

There seems to be absolutely no occurrence of \<latex> in the Isabelle
standard library.

I wonder why you found this construct but not the cite
antiquotation. Perhaps that would be useful feedback on feature
discovery / the utility of the documentation.

I found \<latex> only by accident in the completion pop-up window when
looking for something else. As I mentioned above, \<^cite> doesn’t
appear in the completion list, and the list of antiquotations in the
Isabelle/Isar Reference Manual is incomplete.

cheers,
peter

All the best,
Wolfgang

view this post on Zulip Email Gateway (Aug 22 2022 at 20:11):

From: Peter Gammie <peteg42@gmail.com>
Wolfgang,

I don’t know what the state of the art is, or where documentation things are headed, but I’ll try to give you some pointers anyway.

On 12 Jun 2019, at 05:13, Wolfgang Jeltsch <wolfgang-it@jeltsch.info> wrote:

What is the recommended way of embedding non-ASCII characters in
ordinary text (headlines and paragraph contents)? In actual Isabelle
code, I have access to the “infinite number of non-ASCII symbols” but
not in ordinary text as far as I can see.

The Isabelle docs suggest that UTF-8 is OK in these contexts. See (I think) the Isabelle/jEdit manual.

I know that I can embed LaTeX source code that produces the desired
symbols, but that seems too low-level to me. Is there a way to enter
characters like no-break spaces and proper quotation marks (“ and ”)
directly?

I don’t know anything about entering these characters. I’d suggest you use your favourite editor on the thy file directly. I get the impression that modern LaTeX can eat UTF8 OK.

Another thing are structures that aren’t currently supported by
Isabelle’s markup language. In particular I wonder how to properly embed
citations. Sure, I can use LaTeX source code again, but I’m wondering
whether it’s also possible to have an Isabelle construct, perhaps user-
defined, like \<cite>‹…›.

There is a cite antiquotation. Try completing on print_ in Isabelle/jEdit. One of those will give you a list of antiquotations. To figure out their arguments you can trawl the docs, grep the Isabelle source code, or grep the AFP. Once you get to some steady-state the NEWS file is handy.

Here’s an example: @{cite [cite_macro=citet] "Winskel:1993”}. Note that citet is a LaTeX macro — in this case a natbib one.

It seems that loading the corresponding BibTeX file into Isabelle/jEdit enables hyperlinks.

Regarding LaTeX source code, I discovered that it is possible to embed
it directly into the contents of text, section, and such. Is this
intentional or is it just that it happens to work with the current
Isabelle implementation?

AFAIK this is what everyone has always done and will be doing for a while yet. Makarius appears to be steadily adding more semi-formal markup (? - I might be misusing that term) — \<^item> being one thing that sticks in my mind, and the absence so far of a \label{..} replacement.

On the other hand, there is the \<latex>‹…›
construct, which doesn’t seem to be documented (I found it via the IDE’s
completion mechanism). Is this construct preferred over just embedding
LaTeX source code? Does it differ from plain embedding?

I don’t know. Perhaps try to find uses of \<latex> in the AFP or HOL. Historically there were things like text_raw that suppressed the default Isabelle armour. See e.g. the possibly-obsolete https://isabelle.in.tum.de/community/Generate_TeX_Snippets

I wonder why you found this construct but not the cite antiquotation. Perhaps that would be useful feedback on feature discovery / the utility of the documentation.

cheers,
peter

view this post on Zulip Email Gateway (Aug 22 2022 at 20:12):

From: Peter Gammie <peteg42@gmail.com>
Wolfgang,

On 5 Jul 2019, at 03:07, Wolfgang Jeltsch <wolfgang-it@jeltsch.info> wrote:

Hmm, when I enter a non-ASCII character into an Isabelle/jEdit buffer, I
cannot be sure that it will make it into the file in UTF-8-encoded form
when saving. I think it works in practice for symbols like “ and ”.
However, if I enter the Greek letter γ, for example, it will be saved to
the file as \<gamma>. If it appears in documentation text, outside of
actual Isabelle code, this \<gamma> will end up verbatim in the
generated LaTeX source code, causing an error.

Err, right. So I think you’re supposed to say @{text “\<gamma>”} or even better, use @{const “…”} or @{term “…”} etc. for symbols that Isabelle knows about. There’s a list in the Isabelle repo/distribution etc/symbols.

Yes, you’d probably struggle to write Greek words in a text block.

So apparently it would work to place a symbol like “ or ” into the
documentation part of an Isabelle file and having it transferred to the
generated LaTeX file and finally processed correctly by LaTeX using some
sort of LaTeX UTF-8 support. However, this would work only because
symbols like “ and ” don’t have an Isabelle encoding in the form \<…>.
As soon as they’d get one, we would run into the same problem as with
the γ above.

Indeed. You could define your own Isabelle symbols for these, but I’m not sure it’ll look great. How about the standard LaTeX `` ‘'?

I think it’s clear what needs doing from the symbols file and the LaTeX errors you’ll get.

Hmm, so far I had assumed that all antiquotations are documented in the
Isabelle/Isar Reference Manual, but apparently there are some that
aren’t.

I’d get used to spelunking the source code. You can try in Isabelle/jEdit (try CMD/Ctrl-click on promising things) and follow it up with grep…

cheers,
peter

view this post on Zulip Email Gateway (Aug 22 2022 at 20:12):

From: Wolfgang Jeltsch <wolfgang-it@jeltsch.info>
Am Samstag, den 06.07.2019, 20:32 +1000 schrieb Peter Gammie:

Hmm, when I enter a non-ASCII character into an Isabelle/jEdit
buffer, I cannot be sure that it will make it into the file in
UTF-8-encoded form when saving. I think it works in practice for
symbols like “ and ”. However, if I enter the Greek letter γ, for
example, it will be saved to the file as \<gamma>. If it appears
in documentation text, outside of actual Isabelle code, this
\<gamma> will end up verbatim in the generated LaTeX source code,
causing an error.

Err, right. So I think you’re supposed to say @{text “\<gamma>”} or
even better, use @{const “…”} or @{term “…”} etc. for symbols that
Isabelle knows about.

That would be correct only when these symbols are Isabelle identifiers.
For writing something like “π-calculus” or “η-expansion”, it would be
semantically incorrect (although it would produce the correct output).

So apparently it would work to place a symbol like “ or ” into the
documentation part of an Isabelle file and having it transferred to
the generated LaTeX file and finally processed correctly by LaTeX
using some sort of LaTeX UTF-8 support. However, this would work
only because symbols like “ and ” don’t have an Isabelle encoding in
the form \<…>. As soon as they’d get one, we would run into the
same problem as with the γ above.

Indeed. You could define your own Isabelle symbols for these, but I’m
not sure it’ll look great. How about the standard LaTeX `` ''?

I think that’s the way to go for now.

So my conclusion is:

* Use antiquotations whenever you embed Isabelle code into the
documentation text.

* Use antiquotations for markup (for example, for lists and URLs).

* Use LaTeX source code otherwise, particularly for non-ASCII symbols
in text (for example, ~ for no-break spaces and '' for ”).

All the best,
Wolfgang

view this post on Zulip Email Gateway (Aug 22 2022 at 20:16):

From: Makarius <makarius@sketis.net>
On 04/07/2019 19:07, Wolfgang Jeltsch wrote:

Hmm, when I enter a non-ASCII character into an Isabelle/jEdit buffer, I
cannot be sure that it will make it into the file in UTF-8-encoded form
when saving. I think it works in practice for symbols like “ and ”.
However, if I enter the Greek letter γ, for example, it will be saved to
the file as \<gamma>. If it appears in documentation text, outside of
actual Isabelle code, this \<gamma> will end up verbatim in the
generated LaTeX source code, causing an error.

So apparently it would work to place a symbol like “ or ” into the
documentation part of an Isabelle file and having it transferred to the
generated LaTeX file and finally processed correctly by LaTeX using some
sort of LaTeX UTF-8 support. However, this would work only because
symbols like “ and ” don’t have an Isabelle encoding in the form \<…>.
As soon as they’d get one, we would run into the same problem as with
the γ above.

(I am rather late on this thread, and most things have been answered
already. Here are just a few side-remarks.)

A quick hypersearch (not grep) over $ISABELLE_HOME/src/Doc for
"utf|unicode" as insensitive regexp reveals the following explanation in
the Isabelle/jEdit manual:

The appendix of @{cite "isabelle-isar-ref"} gives an overview of the
standard interpretation of finitely many symbols from the infinite
collection. Uninterpreted symbols are displayed literally, e.g.\
``▩‹\<foobar>›''. Overlap of Unicode characters used in symbol
interpretation with informal ones (which might appear e.g.\ in comments)
needs to be avoided. Raw Unicode characters within prover source files
should be restricted to informal parts, e.g.\ to write text in non-latin
alphabets in comments.

The correct conclusion has already been drawn on this thread: do not
lean out of the window too far. Non-latin characters for Eurepean
languages (excluding Greek) are fine. Odd quotation or punctuation marks
are better avoided: they are somehow fragile and non-portable, even
without Isabelle symbol interpretation getting in the way. (Recall that
"Unicode" is not as universal and stable as its name might suggest.)

There is a cite antiquotation. Try completing on print_ in
Isabelle/jEdit. One of those will give you a list of antiquotations.

Amazing! I had looked for antiquotations in the Isabelle/Isar Reference
Manual but somehow overlooked the part that talks about \<^cite>.
Strangely enough, \<^cite> doesn’t appear in the auto-completion list
(in Isabelle2018 at least).

I see "cite" in the index of isar-ref, and documentation on \<^latex> in
section "3.3.5 Document comments" -- both in current Isabelle2019 and
old Isabelle2018.

Regarding LaTeX source code, I discovered that it is possible to
embed it directly into the contents of text, section, and such.
Is this intentional or is it just that it happens to work with the
current Isabelle implementation?

AFAIK this is what everyone has always done and will be doing for a
while yet. Makarius appears to be steadily adding more semi-formal
markup (? - I might be misusing that term) — \<^item> being one thing
that sticks in my mind, and the absence so far of a \label{..}
replacement.

I really like this trend of having more and more markup for
documentation text.

Yes, the document source is becoming more an more formal. (Everything is
somehow "source text" for different languages of Isabelle. There is no
"code", unless you want to call the input program text for LaTeX like that.)

At some point there will be also an HTML backend to the document
language, not just LaTeX. This means that hardcore LaTeX hacking might
no longer work uniformly, unless restricted to small snippets in
\<^latex>‹…› (which could be rendered via KaTeX in HTML/CSS/JS). This is
still a bit speculative, though: we are not there yet.

I found \<latex> only by accident in the completion pop-up window when
looking for something else. As I mentioned above, \<^cite> doesn’t
appear in the completion list, and the list of antiquotations in the
Isabelle/Isar Reference Manual is incomplete.

Generally not that \<^NAME> (control symbol) with a cartouche often
works as replacement for old @{NAME ...}, but only if there is a single
argument that is a cartouche.

The control symbol notation has turned out so convenient and popular,
that I am tempted to discontinue the old form at some point in the
future. This requires to change the syntax of various older
antiquotations, notably @{cite ...}, @{thm ...}, @{lemma ...}. Almost
everything else is already in proper shape: @{typ ...}, @{term ...},
@{const_name ...}.

Makarius

view this post on Zulip Email Gateway (Aug 22 2022 at 20:16):

From: Makarius <makarius@sketis.net>
On 06/07/2019 12:32, Peter Gammie wrote:

Err, right. So I think you’re supposed to say @{text “\<gamma>”} or even better, use @{const “…”} or @{term “…”} etc. for symbols that Isabelle knows about. There’s a list in the Isabelle repo/distribution etc/symbols.

The standard way to typeset formal text (without checking) inside
informal document source is just a cartouche, e.g. ‹α›. That is a short
form for \<^cartouche> with a single cartouche argument, it abbreviates
@{cartouche ...}, which in turn is an alias for the older @{text ...}.

As a general principle, each embedded language in Isabelle that uses
antiquotations as "one shot free" for such an implicit cartouche
operator. For example, the generated source in
$ISABELLE_HOME/src/Tools/Haskell uses that to refer to an ML expression
(of type string) that is meant to be inlined into the generated Haskell
source. e.g. ‹Markup.nameN›.

Yes, you’d probably struggle to write Greek words in a text block.

Yes, this is because Unicode is a bit fragile: there are official math
greek slots, but using them is too ambitious. It is outside the 16-bit
range of UTF-16 (requiring surrogates that often cause problems). It
also means you would hardly ever see the correct glyphs outside the
Isabelle font context (e.g. in a mail client or on Stackoverflow).

It is a concession to quasi-portability to trade greek text for greek as
mathematical symbols.

Makarius

view this post on Zulip Email Gateway (Aug 22 2022 at 20:16):

From: Wolfgang Jeltsch <wolfgang-it@jeltsch.info>
This snippet of an Isabelle comment reminds me of another question I
wanted to ask: Is my assumption correct that a backslash followed by
either a space or a newline with optionally some spaces afterwards is
Isabelle’s representation of a no-break space and thus translated to a
tilde when producing a LaTeX file?

All the best,
Wolfgang

view this post on Zulip Email Gateway (Aug 22 2022 at 20:16):

From: Makarius <makarius@sketis.net>
The Isabelle document source language merely uses plain (La)TeX
conventions here: backslash-space and tilde occur like that in the
generated .tex output, their meaning is defined by Knuth and Lamport.
(You can use option isabelle build -o document_output=output to see the
result in the "output" directory relative to the session.)

Whenever the update of the Isabelle document preparation system to
HTML/CSS/JS happens, it will interpret these special sequences in the
same manner, i.e. as some HTML non-breaking space.

In source text you should refrain from using Unicode space variations:
it is just too confusing / fragile / non-portable.

Makarius

view this post on Zulip Email Gateway (Aug 22 2022 at 20:16):

From: Wolfgang Jeltsch <wolfgang-it@jeltsch.info>
So the backslash–tilde combination is not that Markdown extension that
means a no-break space but the good old LaTeX command. Well, as a LaTeX
command it only means an ordinary space; so for getting a no-break space
I should use the tilde then.

All the best,
Wolfgang


Last updated: Mar 29 2024 at 08:18 UTC