[isabelle] UTF-8 versus UTF-8-isabelle · Mirror: Isabelle Users Mailing List

Stream: Mirror: Isabelle Users Mailing List

Topic: [isabelle] UTF-8 versus UTF-8-isabelle

Email Gateway (Oct 20 2022 at 09:27):

From: Paolo Crisafulli <paolo.crisafulli@irt-systemx.fr>
Hi everyone,

I was wondering if there are any drawbacks to saving Isabelle theory
files as UTF-8 instead of the default UTF-8-isabelle format.

I would like to use UTF-8, because it really makes reading a (git) diff
so much easier, but it seems that UTF-8-isabelle is the widely
(universally?) adopted format.

My questions:

are there solid reasons to save one's Isabelle-HOL theories as
UTF-8-isabelle?
is UTF-8 not advisable in some contexts (e.g. AFP)?
is there an efficient way to convert a set of UTF-8 theories
to-and-fro UTF-8-isabelle? For the moment, when I need to do it
(UTF-8-isabelle to UTF-8), I create a new file, change its format to
UTF-8 and copy paste into it from the UTF-8-isabelle; which does not scale.

Thank you very much in advance for your help.

Cheers,

Paolo

Email Gateway (Oct 20 2022 at 09:47):

From: Dominique Unruh <cl-isabelle-users@lists.cam.ac.uk>
Hi,

specifically for this question:

is there an efficient way to convert a set of UTF-8 theories
to-and-fro UTF-8-isabelle? For the moment, when I need to do it
(UTF-8-isabelle to UTF-8), I create a new file, change its format to
UTF-8 and copy paste into it from the UTF-8-isabelle; which does not scale.

I do not have a command-line tool to offer, but there is a class in the
scala-isabelle library that converts between the two formats:

https://javadoc.io/doc/de.unruh/scala-isabelle_2.13/latest/de/unruh/isabelle/misc/Symbols.html

This class can be used independently of the rest of scala-isabelle
(i.e., you do not need to use scala-isabelle to control an
Isabelle-process).

It should be just a few lines of Scala (or Java) code to make a
command-line script. E.g. (untested):

val input = ... read text from input file ...
val symbols = new Symbols() // in Java: something like new Symbols(Symbols.init$default$1, Symbols.init$default$2)
val output = symbols.symbolsToUnicode(symbols, false)
... write output to file ...

Best wishes,
Dominique.

Email Gateway (Oct 20 2022 at 09:58):

From: Makarius <makarius@sketis.net>
On 20/10/2022 11:26, Paolo Crisafulli wrote:

I was wondering if there are any drawbacks to saving Isabelle theory files as
UTF-8 instead of the default UTF-8-isabelle format.

This terminology is from Isabelle/jEdit, which provides an approximative view
on source text with Isabelle symbols via the UTF-8-Isabelle encoding; it goes
along with Isabelle fonts to make certain symbols appears as they should (e.g.
from the "document" group).

The Isabelle documentation has further explanations on Isabelle symbols, e.g.
see "implementation" section 0.6 on "Strings of symbols".

In short: Isabelle symbols are closer to "LaTeX" than to "Unicode". I need to
put both into quotes, because LaTeX and Unicode are wide open to variation and
unexpected breakdowns.

Since you are from France, here is a well-known (bad) joke about UTF-8 vs.
former ISO-latin-1/15:
https://www.apprendre-en-ligne.net/bloginfo/index.php/2009/01/21/151-martine-ecrit-en-utf-8

I would like to use UTF-8, because it really makes reading a (git) diff so
much easier, but it seems that UTF-8-isabelle is the widely (universally?)
adopted format.

The official Isabelle source format is plain text with named symbols, e.g.
\<alpha>. This has many motivations accumulated over the years. In particular,
long-term persistent data is better stored in a format that is independent
from short-term side conditions.

To view Isabelle (or AFP) history with nice rendering of symbols, it should be
easy to make a wrapper to the repository client tools. A future version of
Isabelle/VSCode will probably have that, both for git and Mercurial, which is
the preferred VCS for Isabelle applications.

My questions:

are there solid reasons to save one's Isabelle-HOL theories as UTF-8-isabelle?

Please rephrase this question, after you have studied the documentation and
got an impression how Isabelle symbols really work.

I can then tell more stories from ancient times that have lead to this
representation of formal sources. (Including advantages and shortcomings.)

is UTF-8 not advisable in some contexts (e.g. AFP)?

Raw Unicode symbols are not admissible in Isabelle sources: they will cause
subtle problems, sometimes on the spot, sometimes later.

There are administrative tools to detect "Suspicious Unicode characters",
which then need to be removed by hand: proper interpretation of what was
actually intended requires manual intervention.

In practice, these are usually strange quotes produced by macOS, which often
don't work on other platforms (or other versions of macOS).

is there an efficient way to convert a set of UTF-8 theories to-and-fro
UTF-8-isabelle? For the moment, when I need to do it (UTF-8-isabelle to
UTF-8), I create a new file, change its format to UTF-8 and copy paste into it
from the UTF-8-isabelle; which does not scale.

Isabelle/Scala provides operations Symbol.decode and Symbol.encode. The idea
is to have all persistent data (files, repository content etc.) in
named-symbol form like \<alpha> and decode that for display into a Unicode
point to be used with the Isabelle fonts.

Makarius

Email Gateway (Oct 20 2022 at 10:05):

From: Makarius <makarius@sketis.net>
You know my attitude towards tha scala-isabelle library already: I don't
understand its purpose. Regular Isabelle/Scala should do the job.

For the above purpose, there are functions isabelle.Symbol.decode and
isabelle.Symbol.encode, without anything special to care about.

You can e.g. use them in "isabelle scala", or "isabelle jedit" with the
Console/Scala plugin.

Makarius

Email Gateway (Oct 20 2022 at 11:04):

From: Makarius <makarius@sketis.net>
After spending 15min with the sources
https://github.com/dominique-unruh/scala-isabelle/blob/44d1881784a/src/main/scala/de/unruh/isabelle/misc/Symbols.scala
I need to warn against using this approach.

Why have a fork/clone of regular Isabelle/Scala operations that work slightly
differently in subtle manners? This is bad software engineering. Using data
definition formats of Isabelle (etc/symbols) directly and interpreting the
content separately will cause problems that are hard to detect.

Even within your own codebase, you already have a diversion from the default
symbols from Isabelle2020 stated in the docstring vs. the Isabelle2021-RC3
version in the commit cd98f7393d05.

A proper implementation would use isabelle.Symbol interfaces instead of
file-formats. Any special functionality can be put on top, e.g. your treatment
of sub/sup symbols.

Another side-remark on the implementation: regex matching has disappeared from
the official version in March 2022, see
https://isabelle-dev.sketis.net/rISABELLE90eaac98b3fa

By not relying on that, it generally becomes more robust and efficient, and
clear about its exact functionality. (This follows Isabelle/ML, which lacks
regex matching in the first place.)

Makarius

Email Gateway (Oct 21 2022 at 14:46):

From: Paolo Crisafulli <paolo.crisafulli@irt-systemx.fr>
Thank you Dominique and Makarius for your answers (and the joke),
writing a specific git difftool (chaining the usage of Symbol and the
regular diff) seems the best way to go.

If anyone has already done that, I'd be happy to reuse.

Cheers,

Paolo

Email Gateway (Oct 21 2022 at 16:50):

From: Makarius <makarius@sketis.net>
Here is an example command-tool for Isabelle2022-RC4, written in Isabelle/Scala:
https://makarius.sketis.net/repos/narration/file/7ff489b866ee

The README.md explains how to wire-up the whole thing with Isabelle. It should
be then rather easy to implement a different "payload" for the tool. (The
"narration" tool re-interprets PIDE markup as an alternative to the HTML
presentation of Isabelle2022.)

Makarius

Email Gateway (Oct 22 2022 at 12:20):

From: Makarius <makarius@sketis.net>
A few more notes on this example:

* It coordinates 3 repositories: the project itself, Isabelle, AFP. The
special tool isabelle/Admin/init takes care of most details: its documentation
is "isabelle/Admin/init -?" or the text of the shell script itself. The
"system" manual has proper documentation of underlying tools, like "isabelle
components".

* The README.md hints at IntelliJ IDEA for Isabelle/Scala development:
sometimes this works great, sometimes it doesn't (e.g. due to problems with
Maven or Gradle, or IntelliJ IDEA itself and the Scala3 plugin). Recently,
I've often found myself editing Scala files with jEdit instead (not
Isabelle/jEdit).

* The narration tool provides etc/build.props with "services" to make its
Isabelle_Tool implementation available for the "isabelle ..." command-line:
this is the most interesting / relevant aspect here. The "wiring" happens via
"isabelle component -u ..." to register the project as separate Isabelle
component.

The latter can be done without Isabelle repository clones, directly from a
release version.

In summary: this is probably the top-end of "Isabelle tooling" technology and
complexity at the moment.

The bottom end is this: simply use Isabelle/ML to invoke a shell script on the
spot, and let the regular PIDE front-end display the symbols as usual. For
example (assuming the running Isabelle is a repository clone):

ML ‹
Isabelle_System.bash ▩‹hg -R "$ISABELLE_HOME" diff --color=never -c
14dd8b46307f›
›

The funny notation ▩‹...› is an antiquotation in Isabelle/ML: it inlines a
multiline string, similar to """...""" in some other languages, but better.

Here is an alternative with repository server:

ML_command ‹
Isabelle_System.download
"https://isabelle.sketis.net/repos/isabelle-release/raw-diff/14dd8b46307f/src/HOL/Computational_Algebra/Polynomial.thy"
|> Bytes.content
|> writeln
›

It is also possible to turn such Isabelle/ML snippets into Isabelle/Isar
commands. But this is really just an example: I think the proper way is to use
Isabelle/Scala to make a PIDE GUI component for version control (in
Isabelle/jEdit and/or Isabelle/VSCode).

Makarius

Email Gateway (Oct 28 2022 at 15:07):

From: Paolo Crisafulli <paolo.crisafulli@irt-systemx.fr>
Hi Makarius,

Thank you very much, I'll probably need some time to process this.

If I successfully turn this into a nice way to have an improved git diff
over .thy files, I'll be happy to share it.

Paolo

Last updated: Jul 03 2025 at 20:23 UTC