[isabelle] automatically grade Isabelle homework? · Archive Mirror: Isabelle Users Mailing List

Stream: Archive Mirror: Isabelle Users Mailing List

Topic: [isabelle] automatically grade Isabelle homework?

Email Gateway (Aug 22 2022 at 13:44):

From: Johannes Waldmann <johannes.waldmann@htwk-leipzig.de>
Dear all,

I wonder about the feasibility of the following.
Perhaps something similar has already been done.

I want to automatically check solutions for homework assignments.
The problem statement could be a text with some "sorry",
and the student has to replace these with actual proofs,
and is forbidden to change anything else.

I am already doing this for Haskell homework -
students have to replace "undefined",
resulting code is type-checked and then small/quick-checked.
See URL at end of this mail for implementation.

what is the best way to pattern-match
student's Isabelle code against problem statement?
For Haskell, I actually compare abstract syntax trees.
how safe is it to run Isabelle? - For Haskell,
anything dangerous would be in IO, which is prohibited
by putting type annotations in the problem statement
(which the student cannot change).
And "unsafePerformIO" is prohibited because it would need
some "import" statement - which again would not survive
the pattern matching. So, the worst that can happen
is that student's code runs forever,
and for that I just limit execution time (to a few seconds).
Is there any way to sneak an IO action into a proof?

3.1 How would I run Isabelle? Of course I don't want the GUI.
3.2 Can I avoid putting student's submission in a file,
and pipe it instead?
3.3 How would I detect that Isabelle actually verified the proof?
Ideally, just by checking the return code?

Any input appreciated.

Johannes.

https://gitlab.imn.htwk-leipzig.de/autotool/all/tree/master/collection/src/Haskell/Blueprint

NB: I once saw an (unrelated) project
that used "maxima" to check elementary algebra homework,
and this had serious security holes -
you could easily do IO because there is no static checking.

Email Gateway (Aug 22 2022 at 13:44):

From: Lars Hupel <hupel@in.tum.de>
Dear Johannes,

what is the best way to pattern-match
student's Isabelle code against problem statement?
For Haskell, I actually compare abstract syntax trees.

is there any reason you require students to exactly fill out holes? For
our Haskell course we reckoned a "matching interface" is good enough.

In Isabelle you could get something similar by using three theories:

Interface.thy – definitions would go in there
Student.thy – lemma statements + sorry would go there, student has to
fill them
Checker.thy – contains the exact same lemma statements, but using a
proof "by rule".

Example:

Interface.thy

theory Interface imports Main begin

definition prime ...

Student.thy

theory Student imports Interface begin

lemma no_biggest_prime: obtains p where "prime p" "p > n"
sorry

Checker.thy

theory Checker imports Interface Student begin

lemma check_no_biggest_prime:
obtains p where "Interface.prime p" "p > n"
by (rule Student.no_biggest_prime)

Depending on where exactly a failure occurs (you can put those into
multiple sessions) you know where the student did something wrong.

how safe is it to run Isabelle? - For Haskell,
anything dangerous would be in IO, which is prohibited
by putting type annotations in the problem statement
(which the student cannot change).
And "unsafePerformIO" is prohibited because it would need
some "import" statement - which again would not survive
the pattern matching. So, the worst that can happen
is that student's code runs forever,
and for that I just limit execution time (to a few seconds).
Is there any way to sneak an IO action into a proof?

Isabelle allows arbitrary code execution. There is currently a "safe"
flag but:

with that many legitimate things don't work
it will be gone in the next release.

I strongly recommend containerizing Isabelle for running untrusted code.
(We use unprivileged containers without network access in a VM in our
AFP submission checker.)

On the plus side, setting timeouts is possible and robust.

3.1 How would I run Isabelle? Of course I don't want the GUI.

You have to prepare a ROOT file and put all the files into appropriate
folders. Then you can run

$ bin/isabelle build ...

See also §2 in the Isabelle system manual:
<https://isabelle.in.tum.de/dist/Isabelle2016/doc/system.pdf>

3.2 Can I avoid putting student's submission in a file,
and pipe it instead?

You could, but that would involve you writing some nontrivial amount of
custom Scala code to drive the build process. I wouldn't recommend it.

3.3 How would I detect that Isabelle actually verified the proof?
Ideally, just by checking the return code?

Yes, you get that for free from "isabelle build". Exit code 0 means
session built successfully, other exit codes mean timeout/failure/...

Cheers
Lars

Email Gateway (Aug 22 2022 at 13:45):

From: Joachim Breitner <breitner@kit.edu>
Hi Johannes,

also see this thread, where I asked similar questions (but without much
more answers :-)
https://lists.cam.ac.uk/pipermail/cl-isabelle-users/2016-April/msg00087.html

as Lars says, there is a safe mode, and we use in (in addition to
containerization using Docker) Karlsruhe where we automatically process
Isabelle submissions using the “Praktomat” checker. We currently do not
check what the students actually proved, for that we still manually
check their submission, but this way any sorries or other problems are
detected.

Maybe
https://github.com/KITPraktomatTeam/Praktomat/blob/master/src/checker/checker/IsabelleChecker.py
provides some inspiration.
signature.asc

Email Gateway (Aug 22 2022 at 13:45):

From: Lars Hupel <hupel@in.tum.de>
The point I was trying to make is: there is a "safe" mode, but don't
ever use it! :-)

Cheers
Lars

Email Gateway (Aug 22 2022 at 13:45):

From: Joachim Breitner <breitner@kit.edu>
Hi,

besides adding security (which is better handled by containerization),
the safe mode also prevents the student from adding ML code that allows
them to implement a bad oracle, i.e. their own "sorry", right? Will
there be another way of avoiding that, once the safe mode is gone?

Greetings,
Joachim
signature.asc

Email Gateway (Aug 22 2022 at 13:45):

From: Lars Hupel <hupel@in.tum.de>
Hi Joachim,

besides adding security (which is better handled by containerization),

... a dubious amount at best. Using "code_printing" and "export_code"
you can already write arbitrary things to the file system, e.g.
".bashrc" or the checker theory (if those are writeable). Also let me
reiterate that the "secure" mode is gone after Isabelle2016.

the safe mode also prevents the student from adding ML code that allows
them to implement a bad oracle, i.e. their own "sorry", right? Will
there be another way of avoiding that, once the safe mode is gone?

There are more ways to cheat than that, e.g. with "axiomatization" or
introducing custom or changing existing input/output syntax. However, ML
oracles can be easily detected using "Thm.peek_status" (the "oracle"
field is set to "true").

Given that the whole system is so complex there is no bullet-proof way
you can reasonably assign grades without reserving the right to deduct
points if cheating is detected by a human. (Human review of Isabelle
sources is a necessity.)

Cheers
Lars

Email Gateway (Aug 22 2022 at 13:45):

From: Ondřej Kunčar <kuncar@in.tum.de>
This is not reliable either. You can hide usage of an oracle in the
instance proof when you instantiate a type class. For example, create a
type class that assumes False and use sorry in the instance proof.

Probably the most reliable way that we can amend this is to remove the
axiomatic type classes from the kernel.

Bests,
Ondrej

Email Gateway (Aug 22 2022 at 13:48):

From: Johannes Waldmann <johannes.waldmann@htwk-leipzig.de>
Well, you can always give a name to a "hole" inside an expression.
Then student has to add a top-level definition for that name.

But then, being top-level (module-level)
means that it cannot refer to local names
from the scope that the hole was in,
but sometimes that's the point of an exercise,
if it's about "local" programming techniques.

Also, there are other kinds of holes (than expressions),
I want to use this also for patterns and types.
Again, these could be named, but then I lose the scope information,
or have to make it explicit (by adding arguments).

I imagine the same could be useful for Isabelle homework:
"fill the gap(s) in this proof (but don't touch the non-gaps)"
the point being that the student has to respect the (local) structure.

Johannes.

Last updated: Jul 01 2025 at 20:24 UTC