From: "\"Becker, Hanno\"" <cl-isabelle-users@lists.cam.ac.uk>
Hi,
We added to AutoCorrode (https://github.com/awslabs/autocorrode) yet another tool, Isabelle/REPL (I/R). It provides a REPL as an interactive and programmatic interface to Isabelle/ML.
I/R is rather simple. It manages a set of REPLs, each basically a list of Isar texts; you push/pop Isar to/from a REPL to explore proofs. REPLs can be rooted in theories, locations within theories (assuming a suitably augmented heap build), or at points within other REPLs (for sub-proof exploration).
An MCP wrapper is provided exposing I/R to agents like Kiro CLI or Claude Code. For a quick experiment, use the Docker container in https://github.com/awslabs/AutoCorrode/tree/main/ir which runs I/R and exposes an MCP server which your agent can connect to once the following is added to your agent config:
"mcpServers": {
"i/r": {
"type": "http",
"url": http://localhost:9148/mcp,
"description": "Isabelle Isar REPL"
}
}
I ran some initial experiments seeing Opus 4.6 quickly develop and prove bubble sort and heap sort atop I/R, which is promising. However, more experimentation is needed to tune the I/R interface.
Taking a step back, we announced multiple tools over the past weeks and months. Here is how we see things come together, in a nutshell:
I/Q: Mechanism for Agent <-> Isabelle/Scala interaction
I/R: Mechanism for Agent <-> Isabelle/ML interaction
I/P: Proxy for offloading (agentic) proof development to remotes.
Isabelle Assistant: Orchestration, prompts, GUI
More work is needed to refine these tools and bring them together effectively. For example, I/R will be integrated into I/Q + Isabelle Assistant to allow the assistant to develop proofs in the background, based off arbitrary states in the interactive session, without interfering with the user.
Kind regards,
Hanno
From: Makarius <makarius@sketis.net>
On 23/02/2026 18:32, "Becker, Hanno" (via cl-isabelle-users Mailing List) wrote:
We added to AutoCorrode (https://github.com/awslabs/autocorrode <https://
github.com/awslabs/autocorrode>) yet another tool, Isabelle/REPL (I/R). It
provides a REPL as an interactive and programmatic interface to Isabelle/ML.
Can you explain the purpose of it --- after so many decades of a proper
interactive document model of Isabelle/PIDE?
I've worked myself with REPL interaction since Isabelle Proof General 2.0
1998, and it always caused problems that can be avoided by removing the thing.
"One happy day in the past", we actually made it. Contemporary Isabelle is
free from old-fashioned REPL interaction. The paper to upgrade your mindset to
the document model of Isabelle is here:
https://arxiv.org/abs/1307.1944v1
READ-EVAL-PRINT in Parallel and Asynchronous Proof-checking
Makarius Wenzel, 2013
That was 2013, and so many years later, the implementation mostly conforms to
the concepts that were introduced back then.
The Isabelle2025-2 release has a few more things, specifically requested from
some guys doing data mining and machine learning with Isabelle. See the NEWS
on "isabelle process_theories" and Thy_Info.get_theory_elements
I/R is rather simple. It manages a set of REPLs, each basically a list of Isar
texts; you push/pop Isar to/from a REPL to explore proofs. REPLs can be rooted
in theories, locations within theories (assuming a suitably augmented heap
build), or at points within other REPLs (for sub-proof exploration).
This reminds me of Isabelle until approx. 1998, with the stacked toplevel by
Paulson, from ancient LCF. Larry can explain the history better.
Of course, anybody is free to do with Isabelle anything, even emulating
retro-provers. I won't spend time with that, though, like reviewing sources or
answering questions how to retrofit it into current Isabelle.
The REPL era is long over, and it won't come back.
Makarius
From: "\"Becker, Hanno\"" <cl-isabelle-users@lists.cam.ac.uk>
Hi Makarius,
Can you explain the purpose of it --- after so many decades of a proper interactive document model of Isabelle/PIDE?
Sure. Let me explain the problem we're trying to solve. Please chime in if you see better approaches - Isabelle is a complex system and I don't claim to oversee it as well as you do.
To be clear upfront: I/R is not meant as a tool for human proof development. I agree that REPL interaction is inferior to the document model for that purpose. The target user of I/R is an AI agent. There are two use cases:
The first is AI-assisted interactive development. Consider an AI assistant integrated into Isabelle/jEdit or VSCode. The user is working on a theory, leaves a sorry, and asks the agent to fill it in. The agent needs to explore a proof -- potentially trying multiple approaches, backtracking, running sledgehammer -- all without disturbing the user's experience. The user should keep editing freely while the agent works in the background, and only see a suggestion once the agent has found and validated a proof. This requires the agent to do multi-step proof exploration at a fixed theory state, decoupled from the live document. I/R provides this: given a Toplevel.state (e.g. extracted from the document at the sorry's location), the agent can iteratively develop and validate Isar text without touching the document (accepting that the result may become irrelevant if the user makes conflicting edits). With I/Q (our Isabelle/Scala integration), we already use PIDE overlays for one-shot proof checking at a given location, but this doesn't extend to multi-step exploration where each step depends on the result of the previous one.
The second is offline improvement: agents working on a code base to e.g. rewrite apply-style proofs to Isar, or fill in sorrys. I've looked at process_theories and record_theories + Thy_Info.get_theory_elements in 2025-2. Thank you for the pointer - get_theory_elements is very close to something we implemented independently (storing intermediate Toplevel.states during heap builds via a custom session option); I expect we will be able to switch to record_theories for this, which is a welcome replacement for our custom approach. What remains is the feedback loop: an agent developing a proof will take multiple steps, backtrack, and try different strategies -- rebuilding a full theory file at each iteration would be too slow for our use case. I/R lets the agent work at sub-theory granularity -- from individual commands to sequences of definitions and lemmas -- without rebuilding the surrounding theory. I see it as a natural continuation of the path you took with process_theories: from session-level to theory-level to sub-theory-level exploration. And once a proof is found, validation goes the other way: the agent checks at command level via I/R, validates the resulting theory file via process_theories, and ultimately rebuilds the full session via isabelle build.
I'd welcome your thoughts and suggestions on any of the above.
Best,
Hanno
On 23/02/2026, 17:58, "cl-isabelle-users-request@lists.cam.ac.uk <mailto:cl-isabelle-users-request@lists.cam.ac.uk> on behalf of Makarius" <cl-isabelle-users-request@lists.cam.ac.uk <mailto:cl-isabelle-users-request@lists.cam.ac.uk> on behalf of makarius@sketis.net <mailto:makarius@sketis.net>> wrote:
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
On 23/02/2026 18:32, "Becker, Hanno" (via cl-isabelle-users Mailing List) wrote:
We added to AutoCorrode (https://github.com/awslabs/autocorrode <https://github.com/awslabs/autocorrode> <https://
github.com/awslabs/autocorrode>) yet another tool, Isabelle/REPL (I/R). It
provides a REPL as an interactive and programmatic interface to Isabelle/ML.
Can you explain the purpose of it --- after so many decades of a proper
interactive document model of Isabelle/PIDE?
I've worked myself with REPL interaction since Isabelle Proof General 2.0
1998, and it always caused problems that can be avoided by removing the thing.
"One happy day in the past", we actually made it. Contemporary Isabelle is
free from old-fashioned REPL interaction. The paper to upgrade your mindset to
the document model of Isabelle is here:
https://arxiv.org/abs/1307.1944v1 <https://arxiv.org/abs/1307.1944v1>
READ-EVAL-PRINT in Parallel and Asynchronous Proof-checking
Makarius Wenzel, 2013
That was 2013, and so many years later, the implementation mostly conforms to
the concepts that were introduced back then.
The Isabelle2025-2 release has a few more things, specifically requested from
some guys doing data mining and machine learning with Isabelle. See the NEWS
on "isabelle process_theories" and Thy_Info.get_theory_elements
I/R is rather simple. It manages a set of REPLs, each basically a list of Isar
texts; you push/pop Isar to/from a REPL to explore proofs. REPLs can be rooted
in theories, locations within theories (assuming a suitably augmented heap
build), or at points within other REPLs (for sub-proof exploration).
This reminds me of Isabelle until approx. 1998, with the stacked toplevel by
Paulson, from ancient LCF. Larry can explain the history better.
Of course, anybody is free to do with Isabelle anything, even emulating
retro-provers. I won't spend time with that, though, like reviewing sources or
answering questions how to retrofit it into current Isabelle.
The REPL era is long over, and it won't come back.
Makarius
From: Makarius <makarius@sketis.net>
On 23/02/2026 23:29, Becker, Hanno wrote:
To be clear upfront: I/R is not meant as a tool for human proof development. I agree that REPL interaction is inferior to the document model for that purpose. The target user of I/R is an AI agent.
There are two use cases:
The first is AI-assisted interactive development.
The second is offline improvement:
I'd welcome your thoughts and suggestions on any of the above.
Reading through your text twice, I get the impression that nothing really
depends on the old REPL approach. Rather it is an attempt to get something
working quickly, without mastering the subtle details of PIDE interaction.
(Which might also require some refinements of PIDE on my side, not your side.)
From the past posts, I did already harvest a few hints of how the PIDE model
could improve eventually --- also from conversations with other people that
have with similar applications. This will take its time on my side, and might
break your approaches so far --- which are mostly outside normal Isabelle user
space anyway, so there are no guarantees being violated.
A few general remarks how Isabelle developments works (at least in the past
2-3 decades):
- There are no "pull requests" for Isabelle, or semantic equivalents of it.
It means that people cannot "force features onto the system" by starting their
own experiments and hoping them to be adopted. Instead, everything needs to be
grown steadily from the bottom up, with 1-3 guys who really understand how
things work.
- Significant changes or add-ons to Isabelle subsystems need to be
developed together with the person who is mainly responsible for it in the
long term (rarely are there 2 or even more persons --- this does not work
well). The names can be quickly found out by inspecting the changes in the
repository history. For Isabelle/PIDE, I am responsible in all respects, and I
will not delegate anything --- that is too risky. I used to be responsible for
the old REPL, too. We even had (very bad) times with REPL + PIDE side-by-side.
- When people show me small changes for the parts where I am responsible, I
usually put them on a stack and study them eventually. Once I've understood
the 2-3 key ideas, I sit down to rewrite them in "canonical form". Thus I
often discover a few mistakes in the approach, and I won't copy them. It can
also happen that I introduce new mistakes myself: the system has become very
complex indeed, and has left the state of "research experiment" long ago.
Rather, it has become a fine art to grow it further, without breaking it.
So far, I don't see how this fits to your project at its scale. There is a
type error here.
Concerning REPLs in particular: I've already spent several days in 2024/2025
with someone from a notable AI group, and he could not explain the purpose of
it --- that REPL was even a clone of my ancient implementation that was
already gone. This explains my current attitude of not to wasting any more
time with that footnote of ITP history.
Makarius
From: "\"Becker, Hanno\"" <cl-isabelle-users@lists.cam.ac.uk>
Hi Makarius,
Thank you for your reply.
The REPL approach is meant as a pragmatic solution given the current architecture, not a necessity in itself, I agree. Native PIDE support for passive multi-step proof exploration is likely preferable -- I'd be curious to hear how you envision that, whether with current PIDE or future changes. We'll adopt record_theories in the meantime; thanks again for the pointers!
Best regards,
Hanno
On 23/02/2026, 23:19, "cl-isabelle-users-request@lists.cam.ac.uk <mailto:cl-isabelle-users-request@lists.cam.ac.uk> on behalf of Makarius" <cl-isabelle-users-request@lists.cam.ac.uk <mailto:cl-isabelle-users-request@lists.cam.ac.uk> on behalf of makarius@sketis.net <mailto:makarius@sketis.net>> wrote:
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
On 23/02/2026 23:29, Becker, Hanno wrote:
To be clear upfront: I/R is not meant as a tool for human proof development. I agree that REPL interaction is inferior to the document model for that purpose. The target user of I/R is an AI agent.
There are two use cases:
The first is AI-assisted interactive development.
The second is offline improvement:
I'd welcome your thoughts and suggestions on any of the above.
Reading through your text twice, I get the impression that nothing really
depends on the old REPL approach. Rather it is an attempt to get something
working quickly, without mastering the subtle details of PIDE interaction.
(Which might also require some refinements of PIDE on my side, not your side.)
From the past posts, I did already harvest a few hints of how the PIDE model
could improve eventually --- also from conversations with other people that
have with similar applications. This will take its time on my side, and might
break your approaches so far --- which are mostly outside normal Isabelle user
space anyway, so there are no guarantees being violated.
A few general remarks how Isabelle developments works (at least in the past
2-3 decades):
There are no "pull requests" for Isabelle, or semantic equivalents of it.
It means that people cannot "force features onto the system" by starting their
own experiments and hoping them to be adopted. Instead, everything needs to be
grown steadily from the bottom up, with 1-3 guys who really understand how
things work.
Significant changes or add-ons to Isabelle subsystems need to be
developed together with the person who is mainly responsible for it in the
long term (rarely are there 2 or even more persons --- this does not work
well). The names can be quickly found out by inspecting the changes in the
repository history. For Isabelle/PIDE, I am responsible in all respects, and I
will not delegate anything --- that is too risky. I used to be responsible for
the old REPL, too. We even had (very bad) times with REPL + PIDE side-by-side.
When people show me small changes for the parts where I am responsible, I
usually put them on a stack and study them eventually. Once I've understood
the 2-3 key ideas, I sit down to rewrite them in "canonical form". Thus I
often discover a few mistakes in the approach, and I won't copy them. It can
also happen that I introduce new mistakes myself: the system has become very
complex indeed, and has left the state of "research experiment" long ago.
Rather, it has become a fine art to grow it further, without breaking it.
So far, I don't see how this fits to your project at its scale. There is a
type error here.
Concerning REPLs in particular: I've already spent several days in 2024/2025
with someone from a notable AI group, and he could not explain the purpose of
it --- that REPL was even a clone of my ancient implementation that was
already gone. This explains my current attitude of not to wasting any more
time with that footnote of ITP history.
Makarius
From: Yakoub Nemouchi <y.nemouchi@gmail.com>
Hi Makarius,
Is it a nice excuse to move to VSCode as main frontend? If the modern tech
is there why not?
I know it is a hard choice given the flexibility provided by jEdit for java
devs!
Yakoub.
On Mon, Feb 23, 2026 at 9:51 PM Becker, Hanno <
cl-isabelle-users@lists.cam.ac.uk> wrote:
Hi Makarius,
Thank you for your reply.
The REPL approach is meant as a pragmatic solution given the current
architecture, not a necessity in itself, I agree. Native PIDE support for
passive multi-step proof exploration is likely preferable -- I'd be curious
to hear how you envision that, whether with current PIDE or future changes.
We'll adopt record_theories in the meantime; thanks again for the pointers!Best regards,
HannoOn 23/02/2026, 23:19, "cl-isabelle-users-request@lists.cam.ac.uk <mailto:
cl-isabelle-users-request@lists.cam.ac.uk> on behalf of Makarius" <
cl-isabelle-users-request@lists.cam.ac.uk <mailto:
cl-isabelle-users-request@lists.cam.ac.uk> on behalf of
makarius@sketis.net <mailto:makarius@sketis.net>> wrote:CAUTION: This email originated from outside of the organization. Do not
click links or open attachments unless you can confirm the sender and know
the content is safe.On 23/02/2026 23:29, Becker, Hanno wrote:
To be clear upfront: I/R is not meant as a tool for human proof
development. I agree that REPL interaction is inferior to the document
model for that purpose. The target user of I/R is an AI agent.There are two use cases:
The first is AI-assisted interactive development.
The second is offline improvement:
I'd welcome your thoughts and suggestions on any of the above.
Reading through your text twice, I get the impression that nothing really
depends on the old REPL approach. Rather it is an attempt to get something
working quickly, without mastering the subtle details of PIDE interaction.
(Which might also require some refinements of PIDE on my side, not your
side.)From the past posts, I did already harvest a few hints of how the PIDE
model
could improve eventually --- also from conversations with other people that
have with similar applications. This will take its time on my side, and
might
break your approaches so far --- which are mostly outside normal Isabelle
user
space anyway, so there are no guarantees being violated.A few general remarks how Isabelle developments works (at least in the past
2-3 decades):
There are no "pull requests" for Isabelle, or semantic equivalents of it.
It means that people cannot "force features onto the system" by starting
their
own experiments and hoping them to be adopted. Instead, everything needs
to be
grown steadily from the bottom up, with 1-3 guys who really understand how
things work.Significant changes or add-ons to Isabelle subsystems need to be
developed together with the person who is mainly responsible for it in the
long term (rarely are there 2 or even more persons --- this does not work
well). The names can be quickly found out by inspecting the changes in the
repository history. For Isabelle/PIDE, I am responsible in all respects,
and I
will not delegate anything --- that is too risky. I used to be responsible
for
the old REPL, too. We even had (very bad) times with REPL + PIDE
side-by-side.When people show me small changes for the parts where I am responsible, I
usually put them on a stack and study them eventually. Once I've understood
the 2-3 key ideas, I sit down to rewrite them in "canonical form". Thus I
often discover a few mistakes in the approach, and I won't copy them. It
can
also happen that I introduce new mistakes myself: the system has become
very
complex indeed, and has left the state of "research experiment" long ago.
Rather, it has become a fine art to grow it further, without breaking it.So far, I don't see how this fits to your project at its scale. There is a
type error here.Concerning REPLs in particular: I've already spent several days in
2024/2025
with someone from a notable AI group, and he could not explain the purpose
of
it --- that REPL was even a clone of my ancient implementation that was
already gone. This explains my current attitude of not to wasting any more
time with that footnote of ITP history.Makarius
From: Makarius <makarius@sketis.net>
On 25/02/2026 03:35, Yakoub Nemouchi wrote:
Is it a nice excuse to move to VSCode as main frontend? If the modern tech is
there why not?
I know it is a hard choice given the flexibility provided by jEdit for java devs!
Isabelle/VSCode is certainly something to be reconsidered seriously again at
some point, but it has been a continuously "emerging" as new Isabelle
technology for almost 1 decade already.
I was initially much more enthusiastic about the VSCode platform than now,
after spending so much time on it in 2025.
This paper from the Isabelle Workshop 2022
https://files.sketis.net/Isabelle_Workshop_2022/Isabelle_2022_paper_7.pdf
marks the peak of my enthusiasm, afterwards it only went downhill.
The Future work section from that paper says:
"""
Isabelle/VSCode with Electron, Chromium, Node.js opens a whole new world to
Isabelle applications. Only the future will show what comes from it, but a
few immediate project can already be anticipated now:
▪ Isabelle/VSCode as a basis for ‹accessible formal mathematics›, as
sketched in the paper by Neuper/Stöger/Wenzel in the same Isabelle workshop
2022.
▪ PDF.js with Isabelle-specific URLs (within the PDF) that are dynamically
routed to commands of the running Prover IDE. Thus browsing and editing of
form proof documents could be connected, e.g. in an interactive tutorial on
formalized mathematics.
▪ PDF.js with some GUI elements (via HTML/CSS/JS) to view Isabelle PDF
documents that are dynamically generated from the running PIDE session. This
will be the basis for a proper document-oriented Prover IDE, without
requiring ▩‹isabelle build› or ▩‹isabelle document› from the command-line.
Both Isabelle/VSCode and Isabelle/jEdit can benefit from that. Within
Isabelle/VSCode, HTML browser views are directly available as side-windows.
Within Isabelle/jEdit, we can open the browser as separate Electron window.
An alternative is the standard browser of the OS platform, but that opens
the gates to ``browser hell'' and should not be done for sophisticated
applications.
┉ Another open question: JavaScript and/or TypeScript eventually be elevated
to ``official'' Isabelle programming languages, just like ML and Scala? Or
will Scala.js⁋‹https://www.scala-js.org› do a sufficiently good job, to
allow staying with Scala most of the time? That needs to be tried out soon,
especially in the context of the new Scala3 eco-system (a new compiler and a
new language).⁋‹https://dotty.epfl.ch›
"""
Most of the open problems and questions are still open until today, e.g. the
suitability of Scala.js (Fabian Huch made some experiments in Dec-2025 that I
still need to study).
The project of "accessible formal mathematics" mentioned above now has some
funding and some tangible results, but without VSCode! It turned out a bad
idea to rely on MicroSoft and Google to do their homework on accessibility: it
often does not work at all, sometimes hardly at all, sometimes reasonably well
--- depending on the weather conditions of the underlying VSCode release that
happens so frequently.
Thus we resolved to do the homework on accessibility ourselves in
Isabelle/jEdit. As a consequence, its ancient Java/Swing GUI has already been
brushed up for the Isabelle2025-1 release, and more will be coming for the
Isabelle2026 release (see its time table
https://sketis.net/2025/plan-for-isabelle2026-october-2026). This will help
everybody, not just blind and visually impaired people.
The PDF.js idea above can be tried again using JCEF (Java Chromium Embedded
Framework) in Isabelle/jEdit.
Conclusion: Isabelle/jEdit is "not quite dead yet", and actually prospering.
So Isabelle/VSCode will have more and more to catch up --- on inferior
technological basis: Node.js/Electron and the rather closed VSCode codebase
that is very difficult to adapt to the needs (and ambitions) of Isabelle.
Makarius
Last updated: Mar 14 2026 at 08:38 UTC