[isabelle] On random variables, dependencies and their ex... · Archive Mirror: Isabelle Users Mailing List

Stream: Archive Mirror: Isabelle Users Mailing List

Topic: [isabelle] On random variables, dependencies and their ex...

Email Gateway (Aug 22 2022 at 17:24):

From: Andreas Lochbihler <mail@andreas-lochbihler.de>
Dear Kawin,

Probability Mass Functions are indeed discrete distributions, not random
variables. But you can fairly easily build complex distributions out of
simple ones, using the monadic sequencing operator bind_pmf. The
function map_pmf is not as powerful.

For example, to get a distribution over lists of booleans that are all
sampled independently, you could write the following probabilistic program:

fun w_list_pmf :: "nat => bool list pmf" where
"w_list_pmf 0 = return_pmf []"
| "w_list_pmf n =
bind_pmf (w_list_pmf (n - 1)) (%xs.
bind_pmf (bernoulli_pmf *(1 - e) / 2)) (%x.
return_pmf (x # xs)))"

You can also define the function Δ_n using monadic programming:

definition Δ_n where "Δ_n n =
bind_pmf (w_list_pmf (n + 1)) (%xs.
return_pmf (f xs - f (tl xs)))"

However, you cannot express Δ_n in terms of Φ_i, because as you have
rightly observed, once you project from the list of Bernoulli choices to
a single index, then all dependencies between the choices are lost.

The expectation function always takes a function to transform the
elementary events into real numbers. In case they are already real
numbers, then the identity function id is indeed the right choice. This
additional parameter is important to obtain nice laws for bind_pmf.

The Archive of Formal Proofs contains a few entries which build on 'a
pmf, but AFAIK none of them solves the issue with random variables.
Essentially, you can view a discrete random variable X as a function
from a probability distribution p to some other space. With 'a pmf,
you'd model this by the term map_pmf f p.

Hope this helps,
Andreas

Email Gateway (Aug 22 2022 at 17:40):

From: Kawin Worrasangasilpa <kw448@cam.ac.uk>
Hi,

I have been trying to deal with random variables in Isabelle/HOL for some
time and I still can't find out how to deal with them exactly. I found in
"HOL/Probability/Probability_Mass_Function.thy" on how to define an " 'a
pmf " value briefly. For example, I could use map_pmf to change to
different space:

context
fixes ε :: real
begin

definition w_n :: "nat pmf" where
"w_n = map_pmf (λb. (if b then 1 else 0)) (bernoulli_pmf ((1-ε)/2))"

(this is to change from {True, False} to {1, 0})

definition W_n :: "int pmf" where
"W_n = map_pmf (λb. (-1) ^ (if b then 0 else 1)) (bernoulli_pmf ((1-ε)/2))"
(this is to change from {True, False} to {1, -1})

or use lift_definition:

lift_definition w_list_pmf :: "nat ⇒ bool list pmf" is
"λn (l::bool list).
if size l = n then (((1-ε)/2)^(count (mset l) True))*((1+ε/2)^(count
(mset l) False)) else 0"
(*this is to have a bool list random variable of fixed length and all
members are independent Bernoulli random variables with p = (1-ε)/2*)

But when it becomes more complicated and the random variables are dependent
I have no idea how to phrase it. For example, Let us say I have a sequence
of random variables Φ_i = f (w_1w_2...w_i) (all w_i are independent
Bernoulli random variables),
and then construct Δ_i = Φ_i - Φ_(i-1). The complication, in this case, is
Φ_i = f (w_1w_2...w_i) and Φ_(i-1) f (w_1w_2...w_(i-1)) use the same w_t
sequence which are w_1,w_2,...,w_(i-1), but the first Φ also uses w_i for
the last element in the list. So, I tried,

lift_definition w_list_pmf :: "nat ⇒ bool list pmf" is
"λn (l::bool list).
if size l = n then (((1-ε)/2)^(count (mset l) True))*((1+ε/2)^(count
(mset l) False)) else 0"
sorry

definition Φ_n :: "nat ⇒ real pmf" where
"Φ_n n = map_pmf (λl. f l) (w_list_pmf n)"
(*with f being any function in "bool list => real" and "n" is the lenth of
the input list*)

definition Δ_n :: "nat ⇒ real pmf" where
"Δ_n n = map_pmf
(λp. fst p - snd p) (pair_pmf (Φ_n n) (Φ_n (n -1)))",

But I guess this makes (Φ_n n) and (Φ_n(n-1)) completely independent
which is not what I want
since they use common random variables in their generating process:
w_1,w_2,...,w_(i-1). So I realised that these constants of " 'a pmf" type
are not supposed to be used as random variables but distribution (please
correct me on this).

Also, I would like to calculate these random variables' expectation, for
which I tried

lemma "n > 0 ⟶
prob_space.expectation (Δ_n n) (id) < -ε"
sorry

,and felt really doubtful of the existence of the function "id" since if "Δ_n
n" could be used as a random variable, it wouldn't need any function to be
involved in calculating the expectation.
Could I ask whether there are any better ways or repositories I should
check to sort out this random variables issue and their expectations?

Thanks,
Kawin

Last updated: Feb 01 2025 at 20:19 UTC