[isabelle] Document processing with XeLaTeX and unicode s... · Archive Mirror: Isabelle Users Mailing List

Stream: Archive Mirror: Isabelle Users Mailing List

Topic: [isabelle] Document processing with XeLaTeX and unicode s...

Email Gateway (Aug 19 2022 at 08:58):

From: Lars Hupel <hupel@in.tum.de>
I have some text in my theory using quotation marks, e.g.:

text{* In the proof of the main theorem from the lecture notes, the
concept of a »fixed copy« of a graph is fundamental. *}

On disk, this is translated to:

text{* In the proof of the main theorem from the lecture notes, the
concept of a \<guillemotright>fixed copy\<guillemotleft> of a graph is
fundamental. *}

After running

$ isabelle usedir -b -D document Girth_Chromatic Ugraph_Properties

the TeX sources still contain the token '\<guillemotright>' and xelatex
complains about an undefined control sequence. (*)

Interestingly enough, such a translation doesn't happen for the symbols
'›' and '‹'.

Is there any way I could keep using '»' and '«' as quotation marks?

(*) side note: xelatex is able to process Unicode tokens, so it would be
fine if the .thy or .tex file simply contained '»' and '«'.

Email Gateway (Aug 19 2022 at 08:58):

From: Lars Hupel <hupel@in.tum.de>

text{* In the proof of the main theorem from the lecture notes, the
concept of a »fixed copy« of a graph is fundamental. *}

I should add that I produce these symbols by typing AltGr+Y (or +X,
respectively) in jEdit's text area.

Email Gateway (Aug 19 2022 at 08:59):

From: Makarius <makarius@sketis.net>
On Sat, 13 Oct 2012, Lars Hupel wrote:

text{* In the proof of the main theorem from the lecture notes, the
concept of a »fixed copy« of a graph is fundamental. *}

On disk, this is translated to:

text{* In the proof of the main theorem from the lecture notes, the
concept of a \<guillemotright>fixed copy\<guillemotleft> of a graph is
fundamental. *}

This is how the the encoding UTF-8-Isabelle of Isabelle/jEdit works.
UTF-16 material in the text buffer is turned into plain Isabelle symbols
according to the cumulative etc/symbols specifications found in the
Isabelle installation.

The basic idea is to use "Unicode" as poor man's rendering for
mathematical text, where non-ASCII symbols mainly occur in the formal
part, or in the informal part in a way that is not in conflict with the
formal symbol interpretation. Since guillemots have been found useful for
formal notation at some point, they are part of this default table of
interpreted Isabelle symbols. Thus it becomes difficult to use them in
informal text, as you have observed.

the TeX sources still contain the token '\<guillemotright>' and xelatex
complains about an undefined control sequence. (*)

Interestingly enough, such a translation doesn't happen for the symbols
'›' and '‹'.

This is because the single guillemots are not interpreted by default
etc/symbols.

Is there any way I could keep using '»' and '«' as quotation marks?

You could use some perl script over the generated .tex sources to replace
remaining \<guillemotright> by some other text; the formal ones are
already expanded to {\isasymguillemotright} at this point.

This works by having some document/IsaMakefile to do the main processing,
and injecting the perl filter before the standard "isabelle latex" phases
of your nested IsaMakefile (see also "isabelle document" in the Isabelle
System manual).

(*) side note: xelatex is able to process Unicode tokens, so it would be
fine if the .thy or .tex file simply contained '»' and '«'.

This sounds like you are still optimistic about Unicode in general. For
me a sentence with "able to process" and "Unicode" is an Oxymoron, after
several years of struggling with this constantly changing standard.
There are so many things that can go wrong with Unicode.

In France, there are jokes like
http://www.apprendre-en-ligne.net/bloginfo/index.php/2009/01/21/151-martine-ecrit-en-utf-8
which stem from everyday life. Just last week I got a printed bon at the
supermarket with a simular unicode accident. So Unicode is something that
does some job sometimes, but not to be counted on by default if there is
another way.

Makarius

Email Gateway (Aug 19 2022 at 08:59):

From: Makarius <makarius@sketis.net>
Just for fun another unicode joke: see
http://isabelle.in.tum.de/dist/library/HOL/ex/Hebrew.html

The example mixes formal and informal use of Unicode in an optimistic way
-- I made this when Emacs22 started to support it some years ago.

If you look at the last lemma with correct rendering of Unicode
according to the standard (e.g. in Firefox), you will be surprised about
the swap of the digits "135" with the "he" letter wrt. to the "=" sign.
Luckily neither Emacs nor jEdit conform to the Unicode standard of
rendering the mix of left-to-right and right-to-left text.

How does xelatex handle this example? Does it render Unicode according to
the standard here and thus produce logical non-sense?

Makarius

Last updated: Mar 07 2025 at 20:20 UTC