[isabelle] Unicode (Was: Update on I3P) · Archive Mirror: Isabelle Users Mailing List

Stream: Archive Mirror: Isabelle Users Mailing List

Topic: [isabelle] Unicode (Was: Update on I3P)

Email Gateway (Aug 18 2022 at 15:55):

From: Joachim Breitner <mail@joachim-breitner.de>
Hi,

thanks. Unfortunately, the auto-updater does not work for me (no idea
why, "strace -e network" does not show anything obvious and I have no
proxy...). Anyways, I prefer installing updating software manually :-)

I have another issue with i3p to report that is more severe. In my
current project, I sometimes get quite large goals. When that happens,
i3p becomes very slow when I move backwards and forwards in my theory.
My ProofGeneral using colleges do not have this problem.

You can check yourself if you clone git://git.nomeata.de/funcCF.git,
checkout revision b625bc51b6a4372f73d807dcd4e79ccedcf2a530 and run
CFGraph/HOLCFExSV.thy (the final lemma fails in this revision, don’t
worry about that). This took >15 minutes on my machine. I then reduced
the size of the goals by replacing
proof (cases "(d,ds,ve,b)" rule:fstate_case)
with
proof (cases "(d,ds,ve,b)" rule:fstate_case,
auto split:d.split prim.split simp del:Un_insert_left Un_insert_right)

which is probably bad style but helped me reduce the time to run the
theory to a bearable 1min15s. This is of course a rather big issue, as I
might not always be able (or willing) to find a work-around like this.

Is there anything that can be done about this, or is this an inherent
consequence of the i3p design?

Nevertheless I’m a happy i3p user,
Joachim
signature.asc

Email Gateway (Aug 18 2022 at 16:03):

From: Joachim Breitner <mail@joachim-breitner.de>
Dear Holger,

I’m not sure what happened, but it seems it broke somehow: I tried to
edit this file:
http://isabelle.in.tum.de/repos/AFP/file/tip/thys/Free-Groups/Isomorphisms.thy
and i3p would fail at \<Z> and \<one> (but not at \<F>). The error
message at \<Z> is:

Inner lexical error at: ðµ, mult = op +, one = 0::int ⦈
Failed to parse proposition
At command "abbreviation".

so there is clearly some encoding problem. Notice that \<F> and \<Z>
have quite different code points:
\<F> code: 0x002131 font: Isabelle
\<Z> code: 0x01d4b5 font: Isabelle

All this is on Debian Linux, using a UTF8 locale and with Isabelle
2009-2.

A similar, but not i3p specific, problem is that Isabelle views \<cdot>
as U+22C5 DOT OPERATOR. I have · (U+00B7 MIDDLE DOT) on my keyboard¹,
which causes a lexical error in Isabelle as well. As I hope that never
ever people would want to define · and ⋅ differently in Isabelle, would
it be possible to accept both U+00B7 and U+22C5 for \<cdot>?

Thanks,
Joachim

¹ using this xmodmap line:
keycode 60 = period colon period colon periodcentered division periodcentered
signature.asc

Email Gateway (Aug 18 2022 at 16:03):

From: Joachim Breitner <mail@joachim-breitner.de>
Hi again,

I should add that this sequence of four characters is the utf8
representation of 𝒵 (\<Z>), re-encoded as utf8:
$ echo -n ðµ|iconv -futf8 -tlatin1
𝒵
$ echo -n 𝒵 | iconv -flatin1 -tutf8
ðµ

Maybe i3p has problems with unicode character code points that do not
fit in one 16-bit number?

Greetings,
Joachim
signature.asc

Email Gateway (Aug 18 2022 at 16:03):

From: Makarius <makarius@sketis.net>
This is a deeper problem of the way Unicode works on the JVM, see also
http://download.oracle.com/javase/6/docs/api/java/lang/Character.html

Few Java applications get this specification of "surrogate code points"
right, because this situation occurs rarely. UTF-8 works generally better
because the multi-character sequences are the rule, not the exception.

It is unlikely that this will get fundamentally better on the JVM,
although some people have already started joking that Oracle will soon
identify "" and null for type String ...

The easyiest fix is to remove these non-16bit codes from the table of
Isabelle symbol interpretation.

Makarius

Email Gateway (Aug 18 2022 at 16:08):

From: Joachim Breitner <mail@joachim-breitner.de>
Hi,

Yes, this is a work-around (although I had to edit the copy of the
symbols file in i3p/modules/org-i3p-fontsupport.jar). A proper fix would
be nice, though – Unicode is just too shiny :-)

Greetings,
Joachim
signature.asc

Email Gateway (Aug 18 2022 at 16:09):

From: Holger Gast <gast@informatik.uni-tuebingen.de>
Hi Joachim,

Not in general; when adding the support for true code points in August,
I already wrote a number of tests to ensure that theories do not get broken
by editing them. Unfortunately, this did not catch a remaining bug
in the on-the-fly encoding in the driver ;(

This is fixed now (check for updates) and a corresponding test is added
to make sure it will work more smoothly in the future.

Thanks for pointing this out,

Holger

Last updated: Jul 05 2025 at 20:22 UTC