[isabelle] Sandboxed evaluation of Isabelle theories? · Archive Mirror: Isabelle Users Mailing List

Stream: Archive Mirror: Isabelle Users Mailing List

Topic: [isabelle] Sandboxed evaluation of Isabelle theories?

Email Gateway (Aug 19 2022 at 10:07):

From: Joachim Breitner <breitner@kit.edu>
Dear List,

I am in the process of overhauling the Isabelle tutorial¹ at the
University of Karlsruhe. Since we already have a somewhat nice system
for submitting programming solutions (the Praktomat²) I am considering
to use that for the submitted Isabelle solutions as well. For just
submitting and commenting the files this would work well.

The next step would be to automatically check the files at submission,
e.g. run them through isabelle-process or isabelle build. But is that
safe? I believe not, as inside ML {* ... *}, arbitrary missiles could be
launched.

Is there an option that disables all such unsafe features, that would
allow running a theory from an untrusted source?

(I’m ignoring resources issues here; these could be enforced by other
means.)

Thanks,
Joachim

¹ http://pp.ipd.kit.edu/lehre/SS2013/tba/
¹ http://pp.ipd.kit.edu/project.php?id=34 and
https://github.com/danielkleinert/Praktomat
signature.asc

Email Gateway (Aug 19 2022 at 10:07):

From: Lawrence Paulson <lp15@cam.ac.uk>
Given the combination of an LCF architecture and ML's static scoping rules, I don't see what harm could be done by the insertion of arbitrary ML code. In any event, anybody who can master Isabelle's APIs sufficiently to do anything interesting by inserting ML code deserves top marks for your course :-)

Larry Paulson

Email Gateway (Aug 19 2022 at 10:07):

From: Makarius <makarius@sketis.net>
That's a cource where top marks are very easy to get:

ML {* Isabelle_System.bash "echo 666" *}

It is left as an exercise to put something really dangerous into the above
shell script.

To tighten up the system a little bit, there is isabelle-process option -S
to disallow such gross insecurities.

If one wanted to take that issue very seriously, one could sandbox the ML
compiler invocation that is embedded into Isabelle/Isar. The
infrastructure is in principle all there, but we don't have concrete
applications for that, and nobody paying large sums to make it work and
keep it working over time.

Makarius

Email Gateway (Aug 19 2022 at 10:07):

From: Lawrence Paulson <lp15@cam.ac.uk>
It depends on the definition of "harm". The student may be able to erase all of the professor's files, and that's certainly bad. If your students are really that crazy, you could run the assessments on a dedicated machine. But it would take a pretty clever series of UNIX calls to trick the system into thinking you had finished a proof.

Of course, for a tutorial course, it would be okay to prohibit access to ML altogether. That shouldn't be difficult to achieve.

Larry Paulson

Email Gateway (Aug 19 2022 at 10:07):

From: Peter Lammich <lammich@in.tum.de>
On Di, 2013-02-12 at 11:42 +0000, Lawrence Paulson wrote:

Given the combination of an LCF architecture and ML's static scoping rules, I don't see what harm could be done by the insertion of arbitrary ML code.

Well, you have acceess to the PolyML standard libraries, like File-IO,
etc., which you can abuse for sending SPAM, manipulating/reading out
other student's solutions, just mess up the config, etc ...

Email Gateway (Aug 19 2022 at 10:08):

From: Tjark Weber <webertj@in.tum.de>
"You take the blue pill - the story ends, you wake up in your bed and
believe whatever you want to believe. You take the red pill - you stay
in Wonderland, and I show you how deep the rabbit hole goes."

Arbitrary ML code can hijack any communication between the LCF kernel
and the user, thus creating a perfect illusion of unsoundness even when
the kernel itself is unaffected.

Perhaps more interestingly, arbitrary ML code can spawn external
processes, which can potentially modify the system state (including the
memory of the running Isabelle process) in arbitrary ways.

Anyway, I am guessing that Joachim is more concerned about features
like unrestricted file system or network access than about soundness of
Isabelle.

Best regards,
Tjark

Email Gateway (Aug 19 2022 at 10:08):

From: Joachim Breitner <breitner@kit.edu>
Hi,

exactly. Should -S be sufficient for that?

Thanks,
Joachim
signature.asc

Email Gateway (Aug 19 2022 at 10:08):

From: Makarius <makarius@sketis.net>
To some extent, yes. It makes doing bad things harder.

That option was originally motivated by semi-public prover webservices,
notably the Proofweb project from Nijmegen, around 2006/2007. What they
also did of course, was the usual sandboxing of the Unix process on the
server side.

It is in principle possible to make this all much more secure, and
restrict ML inside Isar to a precisely specified subset of the language
and its libraries.

Makarius

Email Gateway (Aug 19 2022 at 10:08):

From: Lawrence Paulson <lp15@cam.ac.uk>
Isn't this what virtual machines are for?

But a student who injected malicious code in such circumstances would be lucky to avoid expulsion, and I can't see it happening here.

Larry Paulson

Email Gateway (Aug 19 2022 at 10:08):

From: Makarius <makarius@sketis.net>
The Nijmegen guys also had a course with assignments and solutions managed
by the web server.

What they did back then was plain Linux chroot. Current Linux variants
probably offer a bit more, similar to classic BSD "jail". Variants of
virtualization are also possible, of course. I think that certain "cloud
computing" infrastructure works like that.

The question of "LCF prover as secure web service" has many aspects to be
explored more seriously. One could easily make a research project out of
that, with everybody still speaking about clouds until the next buzzword
is emerging.

Makarius

Email Gateway (Aug 19 2022 at 10:08):

From: Joachim Breitner <breitner@kit.edu>
Hi,

thanks. Next question:

/opt/isabelle/Isabelle2013-RC2/bin/isabelle-process -S -r -I HOL < Foo.thy

will tell me about wrong proofs (*** Failed to finish proof), but not about

uses of sorry
unexpected end of the file.

For the first one it seems I have to somehow disable the quick_and_dirty mode, but passing
"-e 'quick_and_dirty := false'" did not have the desired effect.

Any suggestions on how to proceed here?

Greetings,
Joachim
signature.asc

Email Gateway (Aug 19 2022 at 10:08):

From: Makarius <makarius@sketis.net>
Piping directly into the Isar toplevel gives indeed very little static
consistency checking. You would have to emulate Proof General in
imitating some of that, but it never really managed > 10 years.

It is better to work in batch mode, by telling the raw ML toplevel to
invoke use_thys like this:

isabelle-process -S -r -q -e 'use_thys ["~/tmp/A"];'

Batch mode then also says something like this by default:

Loading theory "A"
*** Cheating requires quick_and_dirty mode!
*** At command "sorry" (line 4 of "/Volumes/Macintosh_HD/Users/makarius/tmp/A.thy")

Makarius

Email Gateway (Aug 19 2022 at 10:09):

From: "C. Diekmann" <diekmann@in.tum.de>
Does export_code still allow directory traversal and arbitrary file overwriting?

Email Gateway (Aug 19 2022 at 10:09):

From: Joachim Breitner <breitner@kit.edu>
Hi,

indeed it does; this is one problem.

Maybe export_code should be disabled with -S on, or at least export_code
with an file argument different from "-"...

From browsing the code (which I’m unfortunately not fluent in) it seems
that such safeguards could be added centrally
in ./src/Pure/General/file.ML, e.g. disabling open_output completely if
"-S" (or a stronger flag, say "-S -S") is on.

Another, lesser, problem is that the import statement can access
arbitrary .thy files if the path is known.

Greetings,
Joachim
signature.asc

Email Gateway (Aug 19 2022 at 10:09):

From: Joachim Breitner <breitner@kit.edu>
Dear Makarius,

thanks for the hint. Now it does detect sorry and partial theories well,
and this invocation is also easier for out system as the filename can be
passed in a parameter.

I noticed that -S had no effect any more (due to how the MLTEXT variable
in isabelle-process is assembled), but passing
-e 'Secure.set_secure (); use_thys ["/tmp/Deduction"];'
works.

Greetings,
Joachim
signature.asc

Email Gateway (Aug 19 2022 at 10:09):

From: Joachim Breitner <breitner@kit.edu>
Dear ML- and Isabelle-programmers,

would this be a good idea:

$ diff -u file.ML-orig file.ML
--- file.ML-orig 2013-02-13 10:35:41.886000990 +0100
+++ file.ML 2013-02-13 10:43:58.573982673 +0100
@@ -104,7 +104,9 @@

fun open_dir f = with_file OS.FileSys.openDir OS.FileSys.closeDir f o platform_path;
fun open_input f = with_file TextIO.openIn TextIO.closeIn f o platform_path;
-fun open_output f = with_file TextIO.openOut TextIO.closeOut f o platform_path;
+fun open_output f path =
+ (Secure.deny_secure "Cannot open file for writing in secure mode.";
+ with_file TextIO.openOut TextIO.closeOut f (platform_path path));
fun open_append f = with_file TextIO.openAppend TextIO.closeOut f o platform_path;

end;

It seems to work here.

For some reason I don’t see the error message, but rather

Exception trace for exception - ERROR raised in library.ML line 264

End of trace

Exception- Interrupt raised

while using ML {* .. *}, which also causes a call to deny_secure, gives a nice error message.

*** Cannot evaluate ML source in secure mode
*** At command "ML" (line 3 of "/tmp/Deduction.thy")
Exception- TOPLEVEL_ERROR raised

Would it be possible to include this or an equivalent feature in a
future Isabelle release, so that there is no risk of forgetting to patch
Isabelle when we upgrade the submission server to a new release?

Thanks,
Joachim
signature.asc

Email Gateway (Aug 19 2022 at 10:09):

From: Makarius <makarius@sketis.net>
This is where Unix chroot (or better) comes into play. It is futile to
try isolating file system access of the process in the implementation.

For the funny Secure.set_secure mode, I merely pinned down relatively
well-defined things that can extend the code base dynamically.

Makarius

Email Gateway (Aug 19 2022 at 10:10):

From: Florian Haftmann <florian.haftmann@informatik.tu-muenchen.de>

This is where Unix chroot (or better) comes into play.

What seems reasonably robust to me is:

Setup one virtualized reference machine
- no access to net from the machine itself
- read-only access to one dedicated directory on your host system
  where the theories to be checked are placed
- Isabelle2013
- sshd running to issue batch jobs there etc.

Safe this machine such that you can restore it on problems of any kind.
Then use it!

Cheers,
Florian
signature.asc

Last updated: Feb 01 2025 at 20:19 UTC