[isabelle] Supporting native Unicode input in Isabelle/jEdit · Archive Mirror: Isabelle Users Mailing List

Stream: Archive Mirror: Isabelle Users Mailing List

Topic: [isabelle] Supporting native Unicode input in Isabelle/jEdit

Email Gateway (Aug 19 2022 at 10:21):

From: Christoph LANGE <math.semantic.web@gmail.com>
Dear Isabelle community,

in a private conversation with Makarius I had asked the following
question, which I should rather share on this list:

I wonder why Isabelle/jEdit does not support native Unicode input, but
only uses Unicode for display? Are there historical reasons, or is it a
deliberate design choice? Nowadays, where Unicode is widely supported,
there are ways of directly entering a wide range of characters. I
mapped, e.g., “ ” … – on keys of my keyboard, which is why I'm also
using them in LaTeX. Some people go further; consider e.g. the Neo
keyboard layout http://www.neo-layout.org/ (for English see
http://en.wikipedia.org/wiki/Neo_keyboard_layout#Neo), which even has a
plane for mathematical symbols.

Cheers,

Christoph

Email Gateway (Aug 19 2022 at 10:33):

From: Makarius <makarius@sketis.net>
On Thu, 28 Feb 2013, Christoph LANGE wrote:

I wonder why Isabelle/jEdit does not support native Unicode input, but
only uses Unicode for display?

Taking this question in isolation, it actually should work both for input
and display. The prover sources are stored in a certain text format as
Isabelle "symbols", but that is treated as "encoding" by jEdit, so you
will edit the text in UTF-16 of the JVM. This is not fundamentally
different from having external files in UTF-8, UTF-16, UTF-32 etc. all
recoded to the UTF-16 of Java before editing it.

It is up to the JVM, jEdit, or other input methods to produce such unicode
glyphs. Copy-and-paste should also work as long as you stay within the
Unicode image of the prover sources.

Just yesterday, I used this myself two times:

* In one of the private mails to you, I made some Isar proofs that were
copied from Isabelle/jEdit into the terminal where my mail client was
running. This worked sometimes but not always: the combination of
terminal + alpine had sometimes problems coping with the multibyte UTF-8
version of the text, so in the end I was not 100% sure if you actually
received the proof in a way I had in mind.

* Posting a small Isar proof on stackoverflow, via the same copy-paste
technique JVM -> Firefox, it worked without problems.

Nowadays, where Unicode is widely supported

There is a long story behind that, some of it is discussed in my MKM 2011
paper, see section 2.1 in
http://www4.in.tum.de/~wenzelm/papers/isabelle-doc.pdf

In short, I do not subscribe to the "universality" of Unicode, there are
just a bit too many problems with it. The approach in Isabelle/Scala and
Isabelle/jEdit is to use it as "poor man's mathematical rendering" in the
front-end -- so your funny keyboard should work with it after some
convincing. The prover itself is free from Unicode worries, and the
ongoing evolution of the Unicode standards in the past decades. Theory
sources are stored persistenly in a unicode-free manner, but it tolerates
native UTF-8 to some extent.

After the experience of struggling with the many faces of Unicode in the
past 3 years, I would go as far to say that an LCF-style prover should
not allow Unicode at its kernel. There are just too many dangers of
getting things utterly wrong (e.g. code-points being swapped or normalized
according to strict rules of the standard, which are sometimes implemented
in this way or that way.)

Makarius

Email Gateway (Aug 19 2022 at 10:39):

From: Christoph LANGE <math.semantic.web@gmail.com>
2013-03-01 12:30 Makarius:

On Thu, 28 Feb 2013, Christoph LANGE wrote:

I wonder why Isabelle/jEdit does not support native Unicode input, but
only uses Unicode for display?

Taking this question in isolation, it actually should work both for
input and display.

Indeed it does. I'm no longer sure how I came up with this question a
few months back. Maybe it didn't work in the 2012 version? I don't
remember.

The prover sources are stored in a certain text
format as Isabelle "symbols", but that is treated as "encoding" by
jEdit, so you will edit the text in UTF-16 of the JVM. This is not
fundamentally different from having external files in UTF-8, UTF-16,
UTF-32 etc. all recoded to the UTF-16 of Java before editing it.

Given all these different encodings I agree that it does not make sense
to arbitrarily choose one of them as the preferred one for *.thy files,
as this does not ensure that there won't be some text editor messing it
up, e.g. an editor that only supports UTF-8 and that assumes any file to
be in UTF-8.

Cheers,

Christoph

Last updated: Jul 18 2025 at 20:25 UTC