# Designing Probability functions(as relative frequency)

In this article, we look at a very very basic probability function which we would have naturally encountered, our job would be to reorient towards how these functions are designed and how these functions actually satisfy the axioms of the probability.

The first function that we look at where we look at the probability of an event say P(A)[where ‘P’ is the function we are interested in] is just going to be the relative frequency of the event

One way of looking at the probability of event ‘A’ i.e P(A) as the relative frequency of ‘A’

The catch here is that the experiment must be repeated a large number of times and must tend to infinity only then it's a good estimate of probability. In practice, we settle for a large number that we consider to be reasonable for example Karl Pearson tossed a coin 24000 times and observed the number of heads and computed the probability using the same

And as reflected in the above image, he got the probability as 0.5005, we know that when an un-biased coin is tossed, the probability of getting heads is 0.5 in Pearson’s case, the probability was 0.5005 and not exactly 0.5 as the experiment was conducted 24000 times which is large but is not large enough

The moment we say that this is the probability function, the question that arises is ‘does this function satisfies the axioms of probability?”

The first axiom says that the probability of any event must be greater than equal to 0.

The way this function computes the probability is a ratio of two positive numbers because the number of times an outcome occurs can be greater than or equal to 0, so we can be sure that the function satisfies the first axiom

The second axiom is that the probability of the sample space is equal to 1

So, this would be equal to the value that the number of times the outcome is in the sample space divided by the total number of times the experiment was repeated. Now trivially the outcome always has to be in the sample space so no matter how many times we repeat the experiment the outcome will be in the sample space, that’s the definition of the sample space so the numerator and the denominator would be equal and hence the probability of the sample space would be equal to 1

The third axiom states that the probability of the union of two disjoint events is equal to the probability of the two events

Let’s say event A₁ has some k₁ outcomes in it and the event A₂ has some k₂ outcomes in it and these are two disjoint events, none of the k₁ elements actually occur in A₂ and vice-versa.

Now the union of these two events will have a total of ‘k₁ + k₂’ elements, then we can say that the probability of the union of two events is given as:

(k₁ + k₂) / k

where ’k’ is the total number of times the experiment is conducted

And if we re-arrange the terms in this formula, we have ‘k₁/k’ and ‘k₂/k’ which corresponds to the probability of A₁ and A₂

So, turns out that this function indeed satisfies the axioms of probability.

There is a subtle point here that the axioms are about all possible events, they are not about the outcomes(they are about in terms that outcomes are events but in general the axioms are about all the possible events and not just outcomes) and when there are ’n’ possible outcomes then the total possible events are 2^n

So, we can compute the total possible events using the concepts of counting and multiplication principle, for every element/slot, there are two possibilities: either that element is a part of the set or it’s not part of the set

This is essentially the same as creating a sequence of ‘n’ elements from a set of two possible values {‘Y’, ’N’} where the repetition is allowed

Under this particular way of computing the probability as a relative frequency if we know the outcomes say there are some ’n’ outcomes, so we can think of these ’n’ outcomes as singleton events(only 1 element in each) and all these ’n’ events are disjoint by definition of outcomes and sample space. Now if know the probabilities of these ’n’ events then that is enough because every possible outcome/event is going to be a union of these outcomes

Let’s see some examples where we use the function which uses the relative frequency as the probability

Say in an image classification problem, we have images for 3 classes

We are interested in knowing that a randomly picked image would belong to a particular class say forest

We could say that if the trial is conducted 100000(60000 + 25000 + 15000) times out of which 15000 times we get the outcomes that the image is of a forest

Let’s look at one more example

So, the sample using which we estimate the probability must be drawn from the same population on which we are interested in making inferences let’s take an example to understand this point

The answer to this question is No

The orange rectangle in the above image is for the entire population but testing was limited to people with flu-like symptoms the circular part in the above image so the only conclusion that we can draw from the given data is that of the people with flu-like symptoms, there is a 4% chance that the person might have COVID-19, we can’t make the same conclusion for the entire population because the sample that was taken from a subset of the population and not the entire population and it was not representative of the entire population.

In this article, we discussed the probability function which takes into account the concept of relative frequency when computing the probability value.