Stream: Archive Mirror: Isabelle Users Mailing List

Topic: [isabelle] Quickcheck narrowing surprising failure


view this post on Zulip Email Gateway (Aug 22 2022 at 19:55):

From: Thomas Genet <thomas.genet@irisa.fr>
Dear Isabelle users,

I finally came up with a very concise version of my theory leading to a
failure of quickcheck that I do not understand.

Here is the theory:


datatype statement= Inc | Nop | Seq statement statement

fun eval:: "statement ⇒ nat ⇒ nat"
where
"eval Inc n = n+1" |
"eval Nop n = n" |
"eval (Seq s1 s2) n = (eval s1 (eval s2 n))"

lemma A: "(∀ s. (eval s n)>0) ⟶ (n>1)"
nitpick
quickcheck [tester=narrowing,timeout=300] (* no couterex! *)
quickcheck [tester=narrowing,size=1,timeout=300]
quickcheck [tester=narrowing,size=2,timeout=300]
quickcheck [tester=narrowing,size=3,timeout=300]
quickcheck [tester=narrowing,size=4,timeout=300]
oops

lemma B: "∀ s. eval s 0 < 10"
quickcheck[tester=narrowing] (* one counterex! *)
oops


And some questions:

1) With Isabelle (2018), all quickcheck calls fail to find a
counter-example on lemma A. However, n=1 is a counter-example:
evaluating all statements (sequences of Nop or Inc) on n=1 results into
values greater to 0. However, n=1 is not greater to 1.

I tried to increase the timeout and to limit the maximal size (since
"narrowing" first generate and then test).

Note that quickcheck narrowing succeeds in generating statements for
finding a counter-example on lemma B.

2) Nitpick finds a counter-example but claims that it may be spurious
because it "trivially holds for the given scopes or lies outside
Nitpick's supported fragment".

Since it does not hold, what makes this example to be "outside" of
Nitpick's fragment?

Best regards,

Thomas

view this post on Zulip Email Gateway (Aug 22 2022 at 19:56):

From: Thomas Genet <thomas.genet@irisa.fr>
Dear all,

I may have some guesses but I still wonder if this is correct...

Counterexample generation (for nitpick and quickcheck) is easier on
lemma B because to have a counter example it is enough to produce
one value for s.

To contradict lemma A, we need one value for n that falsify n>1 for all
possible s. Nitpick takes a chance to propose n=1 but claims that it may
be wrong whereas quickcheck does not even take this risk?

"lies outside Nitpick's supported fragment" is the universal quantifier
on the left-hand side of --> ?

Are these correct guesses?

Best regards,

Thomas

view this post on Zulip Email Gateway (Aug 22 2022 at 19:56):

From: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Yes, I believe you are guessing correct. There are some limited cases
where quickcheck can (magically) derive (with narrowing or the
predicate compiler) that it is sufficient to only check for certain
terms or certain partial terms to actually evaluate an universal
quantified formula, i.e., the formula to be true or to be false for
all possible candidates, but that magic only works with various
limitations. Already considerably minor syntactic changes (and even
more so semantic changes) can lead that quickcheck is not able to do
that and hence simply is doomed to fail.

Nitpick tries similar magic based on a three-valued logic encoding.

Some of those limitations are ultimately due to the fact that
quickcheck relies on Isabelle's code generator.

Lukas

view this post on Zulip Email Gateway (Aug 22 2022 at 20:05):

From: Thomas Genet <thomas.genet@irisa.fr>
Dear Isabelle users,

with the following lemma:

lemma "∀ l3. (∀ l1 l2. l1@l2 = l3) ⟶ (length l3>1)"

quickcheck [tester=narrowing] fails to find a counter-example, though
nitpick finds one.

Is there a way to overcome this? If I understand well, the use of
quantifiers forbids to use random/exhaustive on this lemma?

Thanks in advance,

Thomas

view this post on Zulip Email Gateway (Aug 22 2022 at 20:06):

From: Andreas Lochbihler <mail@andreas-lochbihler.de>
Dear Thomas,

If you look closely at nitpick's output, then you will see the hint that the
counterexample is "potentially spurious", and it is in this case. Because for any l3, the
antecedent of the implication can simply be falsified, which makes the implication true.
So quickcheck is actually right in not finding anything.

Andreas

view this post on Zulip Email Gateway (Aug 22 2022 at 20:07):

From: Thomas Genet <thomas.genet@irisa.fr>
Oops, Andreas, you are totally right.

I tried to reproduce the problem I have (on a big theory) using a
smaller example but it is clearly broken (even provable with auto!).
Sorry for this.

I will try to isolate the problem I have on a small theory and correct
counter example.

Best regards,

Thomas


Last updated: Nov 21 2024 at 12:39 UTC