[isabelle] Is there a collection of basic counting proper... · Mirror: Isabelle Users Mailing List

Stream: Mirror: Isabelle Users Mailing List

Topic: [isabelle] Is there a collection of basic counting proper...

Email Gateway (Jul 08 2022 at 09:35):

From: "Eugene W. Stark" <isabelle-users@starkeffect.com>
For an investigation I was trying to pursue, I just found myself proving the following:

lemma hypergeometric_combinatorial_property:
assumes "finite (U ∪ S)" and "U ∩ S = {}"
shows "card {T. T ⊆ U ∪ S ∧ card T = n ∧ card (S ∩ T) = k} =
(if n > card (U ∪ S) ∨ k > n then 0 else (card S choose k) * (card U choose (n - k)))"

As my proof is kind of long and ugly (it even took a while just to figure out the way of formulating
the boundary cases so that the induction would go through), I wonder if there is perhaps a collection
of already-distilled basic facts like this, concerning counting properties related to common discrete
probability distributions. I had a look in what seemed obvious places in the HOL-Library, but only
came up with a few facts of this type.

Thanks for any hints.

- Gene Stark

Email Gateway (Jul 08 2022 at 13:15):

From: Lawrence Paulson <lp15@cam.ac.uk>
I’m not aware of such a thing. It’s surely useful to collect combinatorial identities however.

Larry

Email Gateway (Jul 08 2022 at 13:16):

From: Manuel Eberl <manuel@pruvisto.org>
There is Lukas Bulwahn's AFP entry on the Twelvefold Way. is that the
sort of thing you have in mind?

https://www.isa-afp.org/entries/Twelvefold_Way.html

Manuel

Email Gateway (Jul 08 2022 at 16:11):

From: Tobias Nipkow <nipkow@in.tum.de>
As others have already stated, there is currently no specific library for such
lemmas. However, the lemma looks interesting enough that I would suggest to move
it to the finite set theory in HOL:
https://isabelle.in.tum.de/dist/library/HOL/HOL/Finite_Set.html (see, for
example, dvd_partition)

Unless somebody has better suggestion, I'll be happy to do it.

Tobias
smime.p7s

Email Gateway (Jul 08 2022 at 16:49):

From: Manuel Eberl <manuel@pruvisto.org>
I don't know how "long and ugly" Gene's proof is, but here's a fairly
straightforward one in 72 lines based on proving a bijection. I did
modify the statement a bit since one of the case distinctions was
redundant. It could be even shorter, but the automation is not good with
cardinalities it seems…

In combinatorics, I would generally always prefer a bijection-based
proof over a recursive one if at all possible. Or even better, a
generating-function proof (although that introduces a dependency to
HOL-Computational_Algebra).

Manuel

lemma hypergeometric_combinatorial_property:
assumes "finite (U ∪ S)" and "U ∩ S = {}"
shows "card {T. T ⊆ U ∪ S ∧ card T = n ∧ card (S ∩ T) = k} =
(if k > n then 0 else (card S choose k) * (card U choose (n

k)))"
(is "card ?A = ?rhs")
proof -
define f where "f = (λT. (T ∩ S, T ∩ U))"
define g where "g = (λ(X::'a set, Y). X ∪ Y)"
let ?B = "{X. X ⊆ S ∧ card X = k} × {Y. Y ⊆ U ∧ card Y + k = n}"

have "bij_betw f ?A ?B"
proof (rule bij_betwI[of _ _ _ g], goal_cases)
    case 1
    show ?case
    proof
      fix T assume T: "T ∈ ?A"
      have "n = card T"
        using T by auto
      also have "T = (T ∩ U) ∪ (T ∩ S)"
        using T by auto
      also have "card … = card (T ∩ U) + k"
        using T assms by (subst card_Un_disjoint) (auto simp: Int_commute)
      finally have "card (T ∩ U) + k = n" ..
      thus "f T ∈ ?B" using T
        by (auto simp: f_def g_def Int_commute)
    qed
next
    case 2
    show ?case
    proof
      fix XY assume XY: "XY ∈ ?B"
      then obtain X Y where [simp]: "XY = (X, Y)"
        by (cases XY)
      have XY: "X ⊆ S" "Y ⊆ U" "card X = k" "card Y + k = n"
        using XY by auto
      have 1: "g (X, Y) ⊆ U ∪ S"
        using XY by (auto simp: g_def)
      have "card (g (X, Y)) = card (X ∪ Y)"
        by (auto simp: g_def)
      also have "… = card X + card Y"
        by (rule card_Un_disjoint) (use XY assms in ‹auto dest:
finite_subset›)
      finally have 2: "card (g (X, Y)) = n"
        using XY by simp
      have "S ∩ g (X, Y) = X"
        using XY assms by (auto simp: g_def)
      hence 3: "card (S ∩ g (X, Y)) = k"
        using XY by auto
      from 1 2 3 show "g XY ∈ ?A"
        by simp
    qed
qed (use assms in ‹auto simp: f_def g_def›)
hence "card ?A = card ?B"
    by (rule bij_betw_same_card)

also have "card ?B = ?rhs"
proof (cases "k > n")
    case False
    hence "card Y + k = n ⟷ card Y = n - k" for Y :: "'a set"
      by linarith
    hence "?B = {X. X ⊆ S ∧ card X = k} × {Y. Y ⊆ U ∧ card Y = n - k}"
      by simp
    also have "card … = (card S choose k) * (card U choose (n - k))"
      unfolding card_cartesian_product using assms by (subst (1 2)
n_subsets) auto
    finally show ?thesis
      using False by simp
next
    case True
    hence "card Y + k ≠ n" for Y :: "'a set"
      by linarith
    hence "?B = {}"
      by simp
    with True show ?thesis
      by simp
qed

finally show ?thesis .
qed

Email Gateway (Jul 08 2022 at 16:50):

From: "Thiemann, René" <Rene.Thiemann@uibk.ac.at>
Hi Gene,

here is another counting lemma which is hidden in some AFP entry.

https://www.isa-afp.org/theories/clique_and_monotone_circuits/#Preliminaries.html#Preliminaries.card_inj_on_subset_funcset|fact

Perhaps it is of interest.

Best,
René

Email Gateway (Jul 08 2022 at 17:45):

From: "Eugene W. Stark" <isabelle-users@starkeffect.com>
Well, Manuel, you truly are the master ;-) I have already learned a few new tricks from your proof.

Mine was 324 lines for a direct brute-force induction on the cardinality of U ∪ S, and I had not
made any attempt to replace uses of smt, which I allowed liberally while banging it out.
So I am obviously not going to publish that version. The redundant case that you removed was
probably a vestige of the inductive version internal to the proof that I used.

The "higher-level" bijection idea is clearly an improvement, and I will keep that in mind.

So, basically, I guess all I have to contribute to this is the (imperfect) statement of the Lemma :-(

My original thoughts were along the following lines: Basic probability books have a bunch of
'ball and urn' examples, to illustrate the origins of various important probability distributions.
The examples are counting problems, but then the probability mass functions are then expressed
as formulas using things like binomial coefficients. If, as happened in my case, the counting problem
pops out of something else you are trying to do (such as manipulate a summation), then you end up
wanting/needing the lemma that relates the counting problem to the associated formula. Although my
combinatorics skills are embarrassingly poor, I did manage to figure out before too long that
what I had got was related to the hypergeometric distribution. I figured I might be lucky and
there would be a formal probability development somewhere that would have a list of facts about basic
distributions like this. (But I also didn't mind working out a proof myself, because I was
using some parts of Isabelle that I hadn't tried much before.)

Anyway, I was imagining some kind of table of basic distributions, together with facts like the
following:

relation between the underlying counting problem and the probability mass function

generating functions

formulas for moments

(probably other things that I am not thinking of right now)

It might be that a fair bit of what I was looking for could already be derived from some of the
more abstract developments pointed out in this thread (e.g. the "Twelve Ways"). But I doubt
that someone like me would be able to spot it if it were.

I would think that trying to start some kind of collection like this would make a relatively accessible
project for a student who wanted to develop some facility with formal theorem proving.

- Gene Stark

Email Gateway (Jul 08 2022 at 18:05):

From: Manuel Eberl <manuel@pruvisto.org>
On 08/07/2022 19:45, Eugene W. Stark wrote:

My original thoughts were along the following lines: Basic
probability books have a bunch of
'ball and urn' examples

I think most of the "ball and urn" things are contained in the
Twelvefold Way.

Anyway, I was imagining some kind of table of basic distributions,
together with facts like the
following:

relation between the underlying counting problem and the
probability mass function

Well, in HOL-Probability there is "pmf_of_set" (the uniform distribution
for discrete sets), where the probability of any one element is simply
one over the cardinality of the set that you're choosing from. There are
also a number of other basic distributions, such as Bernoulli, binomial,
negative binomial, geometric, Poisson.

generating functions

Generating functions are simply the formal power series in
HOL-Computational_Algebra. There are a few example applications in the
distribution and a few more in the AFP (see e.g. Catalan Numbers and
Bernoulli numbers).

formulas for moments

What exactly are you looking for?

In any case, as with most of our material, it is perhaps a bit difficult
to find. Having good documentation for the libraries is a long-standing
problem that we have not solved yet.

Manuel

Email Gateway (Jul 08 2022 at 19:31):

From: Tobias Nipkow <nipkow@in.tum.de>
Now that we have a short proof, could we also have a better name than
hypergeometric_combinatorial_property? Eg card_subsets_disj_Un, but maybe there
is a more abstract name.

Tobias
smime.p7s

Email Gateway (Jul 09 2022 at 19:44):

From: Chelsea Edmonds <cle47@cam.ac.uk>
I've also come across a similar problem in my own work formalising proofs for hypergraphs/combinatorial design theory. Coming up with some tricks/common lemmas/proof patterns to help simplify these kinds of proofs is something I've been thinking about doing at some point, so would be great to hear if anyone else has started on something similar already (or does so following this thread!). I've found there tend to be two main challenges: firstly, translating the intuition of a counting proof into a more formal representation, and secondly actually working with the resulting sets (as Manuel identified - the automation for reasoning on cardinality of sets can be rather poor, and this is even more so the case for multiset sizes from experience).

As a note, there are several examples of counting proofs (often involving multisets) in the library here: https://www.isa-afp.org/entries/Design_Theory.html

Also an example of a generating proof approach as an alternative for combinatorial arguments here: https://www.isa-afp.org/entries/Lucas_Theorem.html As Manuel pointed out this does introduce a dependency on the Computational Algebra library, but is relatively easy to work with once you get familiar with the fps representation (and the generating function approach was certainly more of a direct translation to Isabelle then the intuitive combinatorial proof on paper).

Best,
Chelsea

Email Gateway (Jul 10 2022 at 09:33):

From: Tobias Nipkow <nipkow@in.tum.de>
Dear Chelsea,

Thanks for these further examples. They make me wonder if I should really stick
the other counting lemmas that was posted into Finite_Set theory or if there
should be a dedicated library?

Tobias
smime.p7s

Email Gateway (Jul 10 2022 at 10:45):

From: Manuel Eberl <manuel@pruvisto.org>
Note that there is a HOL-Combinatorics session now.

Manuel

Email Gateway (Jul 10 2022 at 11:30):

From: Tobias Nipkow <nipkow@in.tum.de>
Good point. But I am not sure which of the counting lemmas, e.g. the one you
posted, should go where - but I trust you will put yours in the right place? It
also seems that there cannot be one bucket for all of them because some of them
talk about sets, some about multisets and some about functions, and some use
additional libraries like FPS...

Tobias
smime.p7s

Email Gateway (Jul 11 2022 at 09:11):

From: Tobias Nipkow <nipkow@in.tum.de>
On 08/07/2022 18:50, Thiemann, René wrote:

Hi Gene,

here is another counting lemma which is hidden in some AFP entry.

https://www.isa-afp.org/theories/clique_and_monotone_circuits/#Preliminaries.html#Preliminaries.card_inj_on_subset_funcset|fact

Perhaps it is of interest.

It is certainly of interest. I have moved it to theory FuncSet.

There are further interesting theorems in your theory ... I would really love to
have a tool that goes through AFP entries and suggests lemmas that could be
moved to the distribution and suggests where they might go.

Tobias

Best,
René

Am 08.07.2022 um 11:35 schrieb Eugene W. Stark <isabelle-users@starkeffect.com>:

For an investigation I was trying to pursue, I just found myself proving the following:

lemma hypergeometric_combinatorial_property:
assumes "finite (U ∪ S)" and "U ∩ S = {}"
shows "card {T. T ⊆ U ∪ S ∧ card T = n ∧ card (S ∩ T) = k} =
(if n > card (U ∪ S) ∨ k > n then 0 else (card S choose k) * (card U choose (n - k)))"

As my proof is kind of long and ugly (it even took a while just to figure out the way of formulating
the boundary cases so that the induction would go through), I wonder if there is perhaps a collection
of already-distilled basic facts like this, concerning counting properties related to common discrete
probability distributions. I had a look in what seemed obvious places in the HOL-Library, but only
came up with a few facts of this type.

Thanks for any hints.

- Gene Stark

smime.p7s

Last updated: Jul 15 2022 at 23:21 UTC