Stream: Beginner Questions

Topic: ✔ Real uniform distribution


view this post on Zulip Christoph Madlener (Jun 11 2022 at 09:26):

I'm trying to define a uniform distribution over an interval of reals [a,b][a,b] (where a<ba < b). (If there is an existing definition for this I wasn't able to find I'd also be glad if that is pointed out to me). There are two approaches I've tried (unsuccessfully) so far:

  1. define the density and lift that to a real pmf:
context
  fixes a :: real and b :: real
  assumes a_less_b: "a < b"
begin

lift_definition pmf_of_real_interval :: "real pmf" is "λx. indicator {a..b} x / (b - a)"
proof
   ...

In that proof I didn't manage to prove the goal that the density integrates to 1.

  1. define the cdf and use real_distribution_interval_measure to get a distribution that way. I'm also not quite sure how easy it is from there to get the real pmf I want in the end. I'm also open for suggestions on how to formulate the cdf more elegantly:
definition real_uniform_cdf :: "real ⇒ real ⇒ real ⇒ real" where
  "real_uniform_cdf a b ≡
    λx. if x < Min {a,b}
        then 0
        else if x ∈ {Min{a,b}..Max{a,b}}
        then (x - Min{a,b}) / (Max{a,b} - Min{a,b})
        else 1"

lemma real_uniform_cdf_altdef:
  "a ≤ b ⟹ real_uniform_cdf a b = (λx. if x < a then 0 else if x ∈ {a..b} then (x - a) / (b - a) else 1)"
  unfolding real_uniform_cdf_def
  by auto

lemma "a < b ⟹ real_distribution (interval_measure (real_uniform_cdf a b))"
proof(intro real_distribution_interval_measure, goal_cases)
...

Here, I'm struggling to prove the continuity of the cdf. I have to admit I haven't really wrapped my head around the whole filter thing. If there is any introductory reading (or illustrative examples) someone can recommend, I would also be thankful.

view this post on Zulip Manuel Eberl (Jun 11 2022 at 10:27):

The thing you are trying to define does not exist. A PMF is a probability mass function, i.e. it describes a distribution by telling you what the probability of each possible result x is. For continuous distributions like the one you are talking about this does not work, since that probability will be 0 for all results.

view this post on Zulip Manuel Eberl (Jun 11 2022 at 10:28):

You can of course talk about densities (which is presumable what you are trying to do here), but that is not what PMFs are.

view this post on Zulip Manuel Eberl (Jun 11 2022 at 10:28):

If you want non-discrete distributions, you have to use the measure type, and it will be somewhat more painful than using PMFs.

view this post on Zulip Manuel Eberl (Jun 11 2022 at 10:30):

Also note that many distributions do not have a probability density, such as e.g. the Dirac distribution (Giry_Monad.return or return_pmf) and more generally any discrete distribution.

view this post on Zulip Manuel Eberl (Jun 11 2022 at 10:35):

The thing you are looking for is uniform_measure :: 'a measure ⇒ 'a set ⇒ 'a measure.

The second parameter is the set over which you want to randomise, i.e. in your case {a..b}. The first parameter is just providing the underlying measurable space (i.e. a Sigma algebra); in your care you can just use borel.

view this post on Zulip Manuel Eberl (Jun 11 2022 at 10:36):

No, wait, that last bit was me being stupid. The first parameter is the underlying measure space; it's just just for the sigma algebra.

view this post on Zulip Manuel Eberl (Jun 11 2022 at 10:36):

The natural one to use for real numbers is lborel (The Lebesgue–Borel measure). There's also lebesgue, which is a beefed-up version of lborel, but you don't need it in most cases.

view this post on Zulip Manuel Eberl (Jun 11 2022 at 10:37):

It helps to know some basic measure theory to use all this stuff, but it's not absolutely necessary. If you have any problems feel free to ask me.

view this post on Zulip Christoph Madlener (Jun 11 2022 at 11:24):

I see. Maybe I can give some more context of what I'm trying to do. I'm trying to formalize a randomized algorithm that draws nn numbers from [0,1][0,1], which are then used to define a linear order (so I actually want to rely on the fact that the probability of drawing the same number twice is 0).
Eventually I want to reason about the expectation of some random variable over the (randomized) output of this algorithm. I have done something similar before, where the distributions were discrete and I could actually use 'a pmf to nicely formulate the randomized algorithm with the provided monad syntax.
To my understanding the first part of reasoning about expectations (and probabilities, etc.) shouldn't be a problem, since they are defined using measure anyways. I'm not sure how to define the randomized algorithm itself though (without the monad of pmf).

view this post on Zulip Maximilian Schaeffeler (Jun 11 2022 at 17:02):

I don't know if I understand your last remark correctly, but the Giry Monad is defined for general (sub-)probability spaces of type 'a measure, not just for 'a pmf.

view this post on Zulip Christoph Madlener (Jun 11 2022 at 17:36):

Yes, that was indeed a misconception on my part then. So I will just use uniform_measure and proceed (more or less) like before when I dealt with discrete distributions. Thanks a lot to both of you for your guidance!

view this post on Zulip Notification Bot (Jun 11 2022 at 17:36):

Christoph Madlener has marked this topic as resolved.

view this post on Zulip Manuel Eberl (Jun 12 2022 at 19:12):

Note that there is already some material about this "draw n numbers between 0 and 1 and look at the linear order defined by them" in the AFP entry on randomised treaps

view this post on Zulip Manuel Eberl (Jun 12 2022 at 19:13):

An alternative would be to just draw a random linear order instead. That has the advantage that it can be done fully inside the pmf monad

view this post on Zulip Manuel Eberl (Jun 12 2022 at 19:13):

Seriously you do not want to deal with the general giry monad if you can avoid it

view this post on Zulip Christoph Madlener (Jun 12 2022 at 19:37):

I will keep that in mind, at the moment I'm not sure if I can adapt the proof I'm basing the formalization on to fully avoid it.
Also, thanks for pointing out the material in the Treap AFP entry, I think I want to use parts of that at some point (especially the part of the equivalence between drawing the priorities and drawing a linear order directly).

view this post on Zulip Manuel Eberl (Jun 12 2022 at 20:45):

There's also a paper that talks about this

view this post on Zulip Manuel Eberl (Jun 12 2022 at 20:45):

Random binary tree structures or something like that, on my website


Last updated: Aug 13 2022 at 06:26 UTC