[isabelle] Adapting the Turing test to a proof assistant · Archive Mirror: Isabelle Users Mailing List

Stream: Archive Mirror: Isabelle Users Mailing List

Topic: [isabelle] Adapting the Turing test to a proof assistant

Email Gateway (Aug 23 2022 at 08:34):

From: Jose Manuel Rodriguez Caballero <jose.manuel.rodriguez.caballero@ut.ee>
Hello,
This post is about the applications of proofs assistants to real-life situations (outside pure mathematics), in particular of Isabelle/HOL, not about concrete details of the language. I would like to propose the following adaptation of the Turing test for proof assistants. In particular, I will focus on the concrete case of medicine, but I think that the same idea holds in other domains.

Let's imagine that there is a library in Isabelle/HOL about an illness X, modeled in the language of probability theory. With this library, it is possible to write theorems about the probability that the patient has the illness X and the probability that to use medication M is the right treatment for the patient. How to test such a library for real-life situations?

My idea is to do the following variation of the Turing test. We need three persons, two medical doctors (named Alice and Bob) and an Isabelle's user without knowledge of medicine (named Eve). Alice generates fictitious data about several cases, where one-half of the cases have illness X and the other half do not. Then Bob gives a medical diagnosis and a treatment to each case. Parallelly, Eve uses the library and the proof assistant in order to prove theorems about the probabilities for diagnosis and treatment. Finally, Alice needs to decide which of the diagnosis and treatments come from the medical doctor Bob and which come from the Isabelle's user Eve. Considering a large sample, e.g., 1000 fictitious cases generated by Alice, if the frequency of times that Alice is right is near one-half, then the library can be used for real-life applications because it is statistically indistinguishable from a medical doctor.

Kind regards,
José M.

Last updated: Jul 15 2025 at 16:29 UTC