1. Introduction
A mathematical function that explains the correlation between observations of various heights is known as a statistical distribution [1-17]. Simply said, a distribution is a group of data about a variable. These data can then be visually depicted when they are typically grouped in size order from smallest to largest. In fact, it is relatively easy to understand statistical distributions in terms of functional relationships, which can be simply thought of as a data variable with another data variable or several into certain functional relationships, most of which can be reflected in the coordinate axes. Once the distribution function is established, it may be quickly utilized to describe and compute important values, including the probability of an observation, as well as to depict the connection between observations in the domain. Different statistical distributions generally respond to different scenario models, for example, the most familiar and widely used is the normal distribution, which accurately describes the distribution of values of many natural phenomena. For example, the height of a population is normally distributed because most people in a given population are of average height, they are close to each other, those who are taller and shorter than the average are almost equally tall, and very few are either extremely tall or extremely short. If the height of the surveyed population is plotted from low to high, the graph will eventually be found to show a bell-shaped curve. These distributions are ultimately what help researchers to analyze data structures such as probability distributions, so let's start by understanding what statistics are.
2. Probabilistic programming
Probabilistic programming is a technique for developing system that assist humans in making decisions in the face of uncertainty. Many daily choices involve using judgment to identify pertinent, non-observable aspects. Probabilistic programming identifies those unseen aspects that are crucial to a decision by fusing our understanding of a situation with the rules of probability. Probabilistic reasoning system ais becoming simper to design and more often utilized thanks to a unique technique called probabilistic programming.
Interested in knowing, for example, if it will sell well. You may think it will succeed because you think it is well designed and because your market research shows that there is a market for it, but you can't be certain [1-17]. Mathematicians can make these selections with the use of probability terminology. You can gauge the probability of a product's success when you introduce it by using prior experience with related products. The decision to introduce the product can then be made using that probability. You can be concerned not only with whether the product will succeed, but also with how much money it will make or how much money it will cost you if it fails. You can utilize the probability of various events to decide more wisely.
FACT: Knowledge and logic are used to make decisions.
The main goal of probabilistic programming is to offer techniques to express the knowledge and reasoning needed to respond to inquiries.
2.1. Critical definitions
What you know to be true about your area generally, without taking into consideration the particulars of a particular circumstance, is known as general knowledge.
A probabilistic model is a way to encode generic information about a subject in numerical terms.
The exact facts you have to back up a claim is called evidence.
Query: Describe a feature of the situation you're interested in.
The process of inference involves drawing conclusions from evidence using a probabilistic model.
In probabilistic reasoning, you develop a model that quantifies and probabilistically expresses all the pertinent general knowledge in your domain.
The rules of probability offer a clear mathematical definition of the link between the model, the data you supply, and the responses to questions. Probabilistic inference, or simply inference, is the method of applying the model to respond to inquiries based on the data. Fortunately, mathematical computations have been automated by computer programs, which do the arithmetic for you. Inference algorithms are the name given to these methods.
Three methods of reasoning are available to probabilistic systems:
1. Predict future occurrences, which refers to thinking forward in time and making predictions about the future based on what you know about the present.
2. Determine the origin of events by extrapolating past circumstances from the results of the present.
3. Use historical data to improve your ability to anticipate the future.
The three modes of reasoning provide ways of reasoning about specific situations given the evidence. A probabilistic reasoning system may also be used to learn from the past and develop common sense. In the third method of thinking, you may understand how you might draw lessons from certain previous events to better anticipate particular future circumstances. Enhancing the model itself is another approach to gain knowledge from the past. The purpose of the learning algorithm, which differs somewhat from an inference algorithm, is to create a new model rather than provide a response to queries. The old model is the starting point for the learning algorithm, which builds a new model by updating the old one in response to experience. The new model will eventually be able to respond to query. The results produced by the new model should provide more accurate information than those produced by the previous model.
2.1.1. Systems of probabilistic inference and reliable predictions. The more data there are, the more accurate the probabilistic inference system will be, much like any machine learning system. The initial model's accuracy in reflecting reality and the quantity of data made available both affect how well the forecast turns out. In general, the initial model loses significance when more data are presented. This is so because the new model strikes a balance between the information in the data and the old model. The original model will predominate if there is very little data, thus it better be correct. When there is an abundance of data, the data will be considered more important, and the new model will frequently disregard the previous model.
To explain its probabilistic models, every system of probabilistic inference employs a representation language. These programs are referred to as probabilistic reasoning systems or probabilistic programming systems, and they are created in a programming language like Bayesian networks.
In contrast to mathematical constructs like a Bayesian network, models are written as programs in programming languages, which is the main distinction between probabilistic reasoning systems and probabilistic programming systems. This modification affects how evidence, questions, and responses relate to program variables. Program variable values may be specifically stated in evidence, whereas program variable values are requested in questions, and answers are probabilities of various query variable values. A probabilistic programming system also frequently includes a collection of inference methods. The language's programs can use these algorithms.
2.1.2. Key definitions. Representation language is a language that you may use to model your knowledge of a domain Expressive power is a representation language's capacity to represent a variety of types of knowledge
A language that can express every computation a machine is said to be Turing-complete.
Utilizing a Turing-complete programming language to express knowledge in a probabilistic representation language is known as probabilistic programming.
2.1.3. Probabilistic models represented as programs. Execution is a fundamental concept in programming languages. To produce output, you run a program. A probabilistic program is similar, but rather of having only one execution route, it may have several, each of which produces a distinct outcome. Random selections made throughout the program specify which execution route will be taken. The software encodes the probability of each conceivable outcome for each option made at random. Consequently, a probabilistic program maybe conceptualized as a program that run at random to produce a result.
Why probabilistic programming?
To anticipate the future, infer the past, and improve future predictions, one can utilize probabilistic reasoning. And probabilistic programming entails utilizing a Turing-complete programming language to describe probabilistic reasoning.
FACT Probabilistic reasoning + Turing-complete = probabilistic programming
3. Statistics
What are statistics? Statistics is the analysis of data. Traditionally, statistics is the formulation of hypotheses to questions and then analysis to answer judgmental hypotheses. What is a stochastic process? A stochastic or random process is a mathematical construct that is typically specified as a random variable in probability theory and other related theories.
Random variable is kind of like sometimes when you are doing survey or doing some experiment, you are not concerned in the specifics of the experiment's results. Since real numbers are a function of the sample space, random variables are defined as such. However, there are situations when random variables can only have a distinct range of values (finite or countable). A "new" sample space is created when Mathematicians define a random variable, and it is based on the random variable's possible values.
As a result, Mathematicians should be able to use the probability measure defined in the original sample space, to compute the probabilities associated to the different values of the random variable. Now that Mathematicians mention random variable, Mathematicians have to mention the distribution. The probability distribution of a random variable describes the distribution of probabilities over the values of the random variable.
4. Statistical distributions
4.1. The Bernoulli distribution
The Bernoulli distribution is very closely connected with like the logic operators, closely connected with logical operators. So the Bernoulli distribution can only have two values, whether it is 0 pr whether it is 1. For probability of P, let X=0 and X=1, the value can take on one, but for the probability function, the sum of the probability is equal to one. So the probability of X=0 is equal to one minus P, another probability of X=1is P. Mathematicians can define one event if does not occur, then the probability is 0. If the event occurs, the probability is \( 1 \) , this is Bernoulli distribution.
4.2. the binomial distribution
A Binomial experiment is defined by the following conditions:
• There are \( n \) identical trials total in the experiment.
• Either success \( (S) \) or failure \( (F) \) is the outcome of each experiment.
• A constant number \( p \) , which represents the probability of win on a single trial, determines the probability of success. The probability of unsuccess is also \( q = (1-p) \) .
• Each trial is independent.
• The total number of successes noted in the \( n \) trials is defined as \( X \) , the random variable of interests.
Observe that a Binomial experiment consists of a sequence of \( n \) independent repetitions of a bernoulli distribution trial with success probability \( p \) .
Next, the total numbers of success are a binomial random variable with two parameters \( n \) and \( P \) , where character \( X \) can have values of \( 0,1,……,n \) and
\( P(X=k)=(\frac{n}{k}){p^{k}}{(1-p)^{n-k}} \)
Each one configuration with \( k \) successes has the same probability of success \( { p^{k}}{(1-p)^{n-k}} \) and there is \( (\frac{n}{k}) \) configurations.
Then Mathematicians will talk about the continuous random variable
A random variable that can have continuous values is shown to be a continuous random variable.
Mapping \( X: Ω→R \) , \( ω→X(ω) \) , where \( X \) is a random variable.
• Cumulative distribution function (abbreviation CDF) : \( F(x)=P(X≤x) \)
• As a continuous random variable, \( X \) is said to, if \( f \) exists, which satisfies \( P(x∈A)=\int _{A}f(x)dx \) for the density function, denoted by f, is known for any "reasonable" set A.
In statistics, the binomial distribution is a typical distribution function for discrete processes in which each value that is independently created has a set probability.
Originally studied in pure probability games, in practically all areas of human research, data analysis is now often done using the binomial distribution. It applies to any fixed number (n) of iterations of an independent process, each with the same probability \( (p) \) of producing a particular outcome. For instance, it gives a formula to determine the probability of getting \( 10 \) sixes out of \( 50 \) dice rolls. The Swiss mathematician Jakob Bernoulli proved that the probability of \( k \) such occurrences from \( n \) repetitions is equal to the \( kth \) term of the expansion of the binomial expression in a paper that was published after his death in 1713. \( {(p + q)^{n}} \) (k starts at 0), where \( q = 1-p \) (because of the name binomial distribution). In the dice rolling example, the probability that any certain number will appear on each roll is \( \frac{1}{6} \) (the number of faces on the dice). Then the probability of \( 10 \) sixes occurring in 50 rolls is equal to \( {(5/6 + 1/6)^{50}} \) , the tenth term in a \( 50 \) -roll (begging with the 0th term), or \( 0.115586 \) .
4.3. The normal distribution
The most significant distribution in statistics is the normal distribution.
Definition: A random variable \( Y \) is shown that, \( Y \) have the normal distribution with two parameters \( μ \) and \( {σ^{2}} \) , if and only if it has a probability density function that has the following the form \( f(y)=\frac{1}{\sqrt[]{2π{σ^{2}}}}{e^{\frac{{-(y-μ)^{2}}}{2{σ^{2}}}}} \) , \( -∞ \lt y \lt ∞ \) , where \( -∞ \lt μ \lt ∞ \) , and \( {σ^{2}} \gt 0 \)
From the definition, Mathematicians observe that the normal distribution is symmetric around \( μ \) and that \( {σ^{2}} \) determines the spread of distributions.
The French mathematician Abraham de Moivre, who had a scientific interest in chance games and frequently assisted gamblers in calculating probabilities, is credited with developing the normal distribution. De Moivre is credited with working on the coin toss probability distribution. He sought to come up with a formula to describe, for example, the probability of getting 60 or more crosses in a set of 100 coin flips. He came at what is known as a "normal curve," a bell-shaped distribution, to respond to this query. This is an important finding since many phenomena exhibit a roughly normal distribution. For instance, a normal distribution describes variables (phenomena) like height, weight, and strength. Because of this, a Z-score table may be used to compare a standing person's weight or height against others. The first person to observe the relationship between the distribution of weight and height and the normal curve was a Belgian astronomer named Lambert Quetelet. Initially, measurement errors in astronomical data were examined using normal curves. These mistakes happened as a result of the observers' inherent bias and the inaccurate equipment.
Galileo first noticed that these mistakes were symmetrical. Additionally, he noticed that minor mistakes had a larger frequency. Inferences concerning the distribution of mistakes were made as a result of this. But it wasn't until the 19th century that it was realized how frequently these blunders occurred. Adrian and Gauss, two mathematicians, independently created a formula for the normal distribution. The calculation revealed that a normal curve provided a good approximation of the inaccuracies.
It is important to remember that Laplace, who developed the central limit theorem, also identified the same distribution near the end of the 18th century. According to this theorem, even when the sample mean is drawn from a non-normal distribution, its distribution still conforms to the normal distribution. The distribution approaches the normal distribution more closely as sample size increases. Numerous disciplines, ranging from the social sciences to medicine, use normal distributions and the accompanying Z-score computation. Z-scores are a helpful standardized number for comparing populations based on traits like height, weight, test scores, wealth, and a variety of other factors.
4.4. The uniform distribution
Definition: A random variable Y is shown to have a uniform distribution on the interval \( ({θ_{1}},{θ_{2}}) \) if and only if it has the probability density function of the form \( f(y)=\begin{cases} \begin{array}{c} \frac{1}{{θ_{2}}-{θ_{1}}}, {θ_{1}} \lt y \lt {θ_{2}} \\ 0, otherwise \end{array} \end{cases} \) Mathematicians write \( Y~Uniform({θ_{1}},{θ_{2}}) \)
The standard uniform distribution has the shape of a square with a height of one since it is the specific case where a is 0 and b is 1. The curve's area under any interval in a probability distribution denotes the probability that the distribution denoted by a number happens in that interval. The probability of seeing a value between (0, 0.5) and the probability of seeing a value between (0.5, 1) for the standard uniform distribution are both 50%. All values in the finite set are equally likely to occur in the discrete version of the uniform distribution. If the set of potential values is 0 and 1, for instance, there is 50% probability of obtaining 0 and a 50% probability of getting 1, like how a fair coin has a 50% chance of coming up heads and a 50% chance of coming up tails.
Because uniform probabilities imply unpredictability whereas the society is defined by regularity, or a short of randomness, uniform distributions are uncommon. The uniform distribution is still useful to social scientists, though. In fact, after the normal distribution, it is the distribution that is employed the most frequently. Finding random variables from other probability distributions, such the normal distribution, is the uniform distribution's most well-known use. A number picked at random from a group of numbers with the distribution's probability is referred to as a random variable. Any distribution's random variables are created by feeding a uniform normal variable through the distribution's inverse cumulative probability function.
4.5. Expected values
For a discrete random variable such as \( X \) , the expectation or expected value \( E(X) \) is defined as
\( E(X)= \sum _{x}x∙p(x) \) where \( x \) 's possible values are all added together to get the total.
When the expected value exists (i.e., the sum is well defined), the population mean is another name for the expected value, as the expectation is just a kind of weighted average, where more probable values of x are given more weight \( p(x) \) .
Note that the expected value \( E(X) \) of a discrete random variable can be equal to a value that is not a possible result.
The expected value represents what Mathematicians anticipate seeing on average over a long series of observations, not what Mathematicians will see in a single observation.
For example, Expected value of Binomial distribution \( X (n,p) \)
\( E(X)=np \) , i.e. which is the mean value of the random variable \( np \) .
Expectations of functions of the random variable
Theorem: Let \( X \) be a discrete random variable, and with the probability mass function \( p(x) \) , also let \( g(X) \) be a real valued function of \( X \) . Next, the expected value of \( g(X) \) is given by \( E[g(X)]= \sum _{x}g(x)∙p(x) \) . Observe that if \( X \) is a random variable, Mathematicians will say \( Y=g(X) \) is also a kind of random variable.
Therefore, this theorem allows us to obtain the expected value of the random variable \( Y \) without having to obtain the probability mass function of \( Y \) .
Linearity of the \( E(x) \) expectation
\( {X_{1}}, . . . , {X_{n}} \) random variable’s with expectation \( E({X_{i}} \) )
\( Y = {c_{1}}{X_{1}}+ . . . +{c_{n}}{X_{n}} \) Then \( E(Y) = {c_{1}}E({X_{1}})+ . . . +{c_{n}}E({X_{n}}) \)
Joint distribution
Two random variables characters, \( X \) and \( Y \) , which interested in their joint outcome. \( (X, Y) = (x, y) ⇔ X = x \) , \( Y = y \) .
The joint frequency functions
The values that discrete random variables \( X, Y \) take on \( {x_{1}} \) , \( {x_{2}},… \) and \( {y_{1}} \) , \( y,… \) resp.
Joint frequency function: \( p({x_{i}} \) , \( {y_{i}}) = P (X = {x_{i}}, Y = {y_{i}} ) \)
Marginal probabilities \( p(X={x_{i}} \) \( ) =\sum _{j} P (X = {x_{i}}, Y = {y_{i}} ) \)
The random variables:
similar story random variable’s \( {X_{1}}, . . . , {X_{n}} \) are defined on the same sample space \( p({X_{1}}, . . . , {X_{n}}) \) \( = P ({X _{1}}= {x _{1}}, . . . , {X _{n}} = {x _{n}}) \)
\( {p_{{X _{1}} }}({x _{1}})=\sum _{{X_{2}}, . . . , {X_{n}}} p({x _{1}},{x _{2}}, . . . , {x _{n}}) \) , \( {p_{{X _{1}},{X_{2}} }} ({x _{1}},{x _{2}})=\sum _{{X_{3}}, . . . , {X_{n}}} p({x _{1}},{x _{2}},{x _{3}}, . . . , {x _{n}}) \)
4.6. The geometric distribution
Suppose there is an experiment that repeats independent identical Bernoulli trials until Mathematicians observe the first success s. For example, flip a coin until Mathematicians observe the first heads. Since the first success may occur on the first trial, or at any time after the first trial, Mathematicians get that the sample space S of this experiment consists of an infinite but countable number of sample points (or outcomes) \( {s_{i}} \) of the f
\( {s_{1}} : S (a success on the first trial) \) \( { s_{k}} : F F F . . . Fs(success on kth trial). \) In this case, if Mathematicians define the random variable character \( X \) as the number of trials in which the first success is observed, Mathematicians note that the events \( (X = 1) \) , \( (X = 2) \) , and \( (X = k) \) all contain only one sample point, \( { s_{1}} \) , \( { s_{2}} \) , and \( { s_{3}} \) , respectively. In general, the event \( (X = k) \) contains only the sample point \( { s_{k}} \) , and the possible values of the random variable \( X \) are \( X = 1,2,3 ... \) Now, since the trials are independent, if Mathematicians assume that the probability of trial success for several trial is equal to \( p \) , Mathematicians have this result for any \( x \) \( p(x)=P(X=x)=P(FFF…FS)=qqq…qp={q^{x-1}}p \) where \( q = 1 -p \) . This observation leads to the following definition.
Definition: A random variable \( X \) is shown to have a Geometric distribution with success probability \( p \) , if and only if \( p(x) = {q^{x-1}}p \) , where \( x = 1, 2, 3, . . . \) and \( 0 ≤p ≤1 \) . Mathematicians write \( X ∼geometric(p) \) .
The mean of the geometric distribution
Suppose that \( X ∼geomertric(p) \) . From the definition of the expected value Mathematicians obtain \( E(X) = \sum _{x=1}^{∞}x{q^{x-1}}p=p \sum _{x=1}^{∞}x{q^{x-1}}, where q=1-p \) . Now, because x is an integer, Mathematicians observe that for each term in the sum above, Mathematicians have \( \frac{d}{dx} ({q^{x}})= x{q^{x-1}}, \)
and therefore, interchanging the order of the derivative and the sum, Mathematicians obtain \( \frac{d}{dq} (\sum _{x=1}^{∞}{q^{x}}) = \sum _{x=1}^{∞}\frac{d}{dq}({q^{x}})=\sum _{x=1}^{∞}x{q^{x-1}}. \)
Now, Mathematicians observe that \( \sum _{x=1}^{∞}{q^{x}}= q+ {q^{2}}+ {q^{3}}+… \)
corresponds to the geometric series, and from calculus Mathematicians know that \( \sum _{x=1}^{∞}{q^{x}}= q+ {q^{2}}+ {q^{3}}+…= \frac{q}{1-q} \) . Then, Mathematicians obtain that \( E(x)= p \sum _{x=1}^{∞}x{q^{x-1}} =p\frac{d}{dq} (\sum _{x=1}^{∞}{q^{x}})=p\frac{d}{dq}(\frac{q}{1-q}) =p[\frac{1}{{(1-q)^{2}}}] \) \( = \frac{p}{{p^{2}}} =\frac{1}{p} \) .
4.7. The Negative Binomial distribution
Consider the same setup for the geometric distribution. But Mathematicians are now interested not in the first success, but in the number of second, fifth, or, in general, eighth successes. If Mathematicians denote by \( X \) the number of \( rth \) successes, the first thing Mathematicians notice is that the previous \( r - 1 \) successes may occur in any order in the sequence of trials, so Mathematicians need to consider their possible permutations. Accordingly, Mathematicians can derive the random variable \( X \) 's probability mass function, which yields \( p(x)= (\frac{x-1}{r-1}) {p^{r}} {q^{x-r}}, x = r, r + 1, , r + 2, . . . , \) which leads to the following definition If and only if, a random variable X is shown to have a negative binomial distribution \( p(x)= (\frac{x-1}{r-1}) {p^{r}} {q^{x-r}}, x = r, r + 1, , r + 2, . . . , \) for a given integer \( r \) and \( 0 ≤p ≤1 \) . Mathematicians write \( X ∼NegBin(r, p) \) .
4.8. The Hypergeometric distribution
One way to motivate the hypergeometric distribution is to assume that Mathematicians have a large urn filled with N balls that are identical in all respects except that r balls are red and N - r balls are blue. If Mathematicians take n balls from the urn at random, what is the probability that x balls are red? Mathematicians observe that the all number of samples of size \( n \) that can be taken from the n balls in the urn is \( (\frac{N}{n}) \) .
On the other hand, Mathematicians can choose the \( x \) red balls in \( (\frac{r}{x}) \) different ways, and the reminder \( n -x \) blue balls in \( (\frac{N-r}{n-x}) \) different ways.
Combining these results, Mathematicians obtain \( P (x balls are red)= \frac{(\frac{r}{x})(\frac{N-r}{n-x})}{(\frac{N}{n})} \)
which leads to the following definition.
Definition: A random variable character \( X \) is shown to have a hypergeometric distribution, if and only if \( P (x balls are red)= \frac{(\frac{r}{x})(\frac{N-r}{n-x})}{(\frac{N}{n})} \) where \( x = 0, 1, 2, . . . , r, n ≤N \) and \( n -x ≤N -r \) .
4.9. The Poisson distribution
Let's say Mathematicians want to determine the probability of a collision occurring at a specific crossroads over the course of a week. Divide the time period (in this example, a week) into n subintervals that are so tiny that Mathematicians can see at most one accident with a non-zero probability for each of them. This is one technique to tackle the problem. The probability of finding an accident in any given subinterval, denoted by \( p \) , then Mathematicians can get
\( P(There are no accidents throughout the subinterval) = 1 -p \)
\( P(One accident happened in the subinterval) = p \)
\( P(more than one accidents occur within the subinterval) = 0 \)
In this instance, Mathematicians discover that the total number of accidents that take place in a week equals the total number of subintervals that Mathematicians record incidents. If Mathematicians assume that accidents occur independently in each interval, the all number of accidents obeys like one of distribution mentioned before, binomial distribution. The problem Mathematicians encounter here is that Mathematicians are unsure of how to determine the subintervals and therefore Mathematicians don't really know the values of \( n \) and \( p \) for the binomial distribution. However, according to intuition, as \( n \) rises, there are more subintervals., the probability that an accident will occur in one of these shorter intervals, \( p \) , falls. Mathematicians can formalize this idea by assuming that \( λ = np \) remains constant as n increases. Then, if Mathematicians write the binomial Probability mass function in terms of the coefficient λ, Mathematicians obtain \( (\frac{n}{x}){p^{x}}{(1-p)^{n-x}}=(\frac{n}{x}){(\frac{λ}{n})^{x}}{(1-\frac{λ}{n})^{n-x}}, \) where \( λ = np \) , which is the expected number of accidents for the entire week. Now, taking the limit as \( n →∞ \) , Mathematicians have \( \underset{n→∞}{lim}{(\frac{n}{x}){p^{x}}{1-p^{n-x}}}= \underset{n→∞}{lim}\frac{n(n-1)(n-2)…(n-x+1)}{x!}{(\frac{λ}{n})^{x}}{(1-\frac{λ}{n})^{n-x}}=\frac{{λ^{x}}}{x!} \underset{n→∞}{lim}{(1-\frac{λ}{n})^{n}}{(1-\frac{λ}{n})^{-x}}\frac{(n-1)(n-2)…(n-x+1)}{{n^{x-1}}} =\frac{{λ^{x}}}{x!} \underset{n→∞}{lim}{(1-\frac{λ}{n})^{n}}{(1-\frac{λ}{n})^{-x}}(1-\frac{1}{n})(1-\frac{2}{n})…(1-\frac{x-1}{n})=\frac{{λ^{x}}}{x!}{e^{-λ}}. \) Because \( \underset{n→∞}{lim}{(1-\frac{λ}{n})^{n}}= {e^{-λ}}, \) and all other terms in the product go to \( 1 \) . This result leads to the following definition.Definition: A random variable X is shown to have a Poisson distribution if and only if \( p(x)= =\frac{{λ^{x}}}{x!}{e^{-λ}}, \)
where \( x = 0,1,2,... \) and \( λ \gt 0 \) . Mathematicians write \( X ∼Poisson(λ) \) .
One of the most useful statistical distributions in the world is likely the Poisson distribution today for answering many questions. It has been in use for over a century. Use cases can cover a wide variety of problems in insurance, business, medicine, banking, risk management, and science.
Siméon-Denis Poisson, a French mathematician, created his function in 1830 to quantify the number of times with which a player succeeds on successive bets in a game of chance with few winners. If \( p \) denotes the probability of winning in any particular trial, the average number of wins in \( n \) trials \( (λ) \) is represent by \( λ = np \) . Using the binomial distribution of Jakob Bernoulli, a Swiss mathematician, and Poisson proved that the probability of receiving k victories is around \( \frac{{λ^{k}}}{k!}{e^{-λ}} \) .
Nowadays, the Poisson distribution is regarded as a highly significant distribution and has had very representative applications throughout history.
The British statistician R.D. Clarke presented his investigation of the distribution of flying bomb strikes in London during World War II in his 1946 publication, "An Applications of the Poisson Distribution." Certain sections were more exposed than others. The British Army was interested in learning if the Germans deliberately targeted certain regions, whether the strikes showed excellent technical precision, or whether the distribution was accidental. In reality, if the missiles were just randomly targeted, the British could easily disperse critical infrastructure to lessen the probability that it would be struck.
Clark split the region into thousands of pieces of the same size. There could not have been even one strike in any of these cases, let alone more. Additionally, if missiles fall at random, the probability that any plot will be struck is the same for all plots. In a lot of repetitions of a game of chance with a very tiny probability of winning, the total number of hits would therefore be fairly comparable to the total number of victories. This justification prompted Clark to formalize the model's derivation using the Poisson distribution. The observed hit rate and the expected Poisson rate are fairly similar. According to Clark, the observed variance appears to have developed solely by chance.
5. Conclusion
When the statistical distribution is correctly applied to different scenes, Mathematicians can get the probability or law through calculation and analysis, so as to help us quickly analyze and apply the corresponding scenes in the future. It seems that they all calculate the same probability, probability density function, cumulative distribution function, etc., but different distributions are also found according to their own characteristics. Mathematicians need to combine different fields and societies with statistics, analyze the accumulated data, and then find patterns and model future data.
References
[1]. Anderson, W. J. (1991). Continuous-Time Markov Chains. Springer-Verlag, New York.
[2]. Adler, R. J. (1981). The Geometry of Random Fields., Wiley, New York.
[3]. Baldi, P., Mazliak, L. and P. Priouret. (2002). Martingales and Markov Chains: Solved Exercises and Elements of Theory. Chapman-Hall/CRC, Boca Raton.
[4]. Basawa, I. V. and B. L. S. Prakasa Rao. Statistical Inference for Stochastic Processes. Academic Press, London.
[5]. Billingsley, P. (2000). Convergence of Probability Measures, 2nd Ed., Wiley, New York.
[6]. Chung, K. L. (1967). Markov Chains with Stationary Transition Probabilities, 2nd Ed. SpringerVerlag, New York.
[7]. Chung, K. L. and R. J. Williams (1990). Introduction to Stochastic Integration, 2nd Ed. Birkhauser, Boston.
[8]. Embrechts, P. and Maejima, M. (2002). Selfsimilar Processes. Princeton University Press, Princeton.
[9]. Hida, T. and M. Hitsuda. (1991). Gaussian Processes. American Mathematical Society, Providence, R.I.
[10]. Kallenberg, O. (1976). Random Measures. Academic Press, London.
[11]. Karatzas, I. and S. E. Shreve (1991). Brownian Motion and Stochastic Calculus. Springer-Verlag, New York.
[12]. Lukacs, E. (1970). Characteristic Functions, 2nd Ed. Griffin.
[13]. Meyn, S. P. and Tweedie, R. L. (1993). Markov Chains and Stochastic Stability. Springer-Verlag, New York.
[14]. Petrov, V. V. (1995). Limit Theorems of Probability Theory. Oxford University Press, Oxford.
[15]. Samorodnitsky, G. and M. S. Taqqu. (1994). Stable Non-Gaussian Random Processes. ChapmanHall/CRC, Boca Raton.
[16]. Shorack, G. R. (2000). Probability for Statisticians. Springer-Verlag, New York.
[17]. Durrett, R. (2019). Probability: theory and examples (Vol. 49). Cambridge university press.
Cite this article
Liu,Y. (2023). Statistical species distribution and their respective development. Applied and Computational Engineering,6,283-291.
Data availability
The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.
Disclaimer/Publisher's Note
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
About volume
Volume title: Proceedings of the 3rd International Conference on Signal Processing and Machine Learning
© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and
conditions of the Creative Commons Attribution (CC BY) license. Authors who
publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons
Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this
series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published
version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial
publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and
during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See
Open access policy for details).
References
[1]. Anderson, W. J. (1991). Continuous-Time Markov Chains. Springer-Verlag, New York.
[2]. Adler, R. J. (1981). The Geometry of Random Fields., Wiley, New York.
[3]. Baldi, P., Mazliak, L. and P. Priouret. (2002). Martingales and Markov Chains: Solved Exercises and Elements of Theory. Chapman-Hall/CRC, Boca Raton.
[4]. Basawa, I. V. and B. L. S. Prakasa Rao. Statistical Inference for Stochastic Processes. Academic Press, London.
[5]. Billingsley, P. (2000). Convergence of Probability Measures, 2nd Ed., Wiley, New York.
[6]. Chung, K. L. (1967). Markov Chains with Stationary Transition Probabilities, 2nd Ed. SpringerVerlag, New York.
[7]. Chung, K. L. and R. J. Williams (1990). Introduction to Stochastic Integration, 2nd Ed. Birkhauser, Boston.
[8]. Embrechts, P. and Maejima, M. (2002). Selfsimilar Processes. Princeton University Press, Princeton.
[9]. Hida, T. and M. Hitsuda. (1991). Gaussian Processes. American Mathematical Society, Providence, R.I.
[10]. Kallenberg, O. (1976). Random Measures. Academic Press, London.
[11]. Karatzas, I. and S. E. Shreve (1991). Brownian Motion and Stochastic Calculus. Springer-Verlag, New York.
[12]. Lukacs, E. (1970). Characteristic Functions, 2nd Ed. Griffin.
[13]. Meyn, S. P. and Tweedie, R. L. (1993). Markov Chains and Stochastic Stability. Springer-Verlag, New York.
[14]. Petrov, V. V. (1995). Limit Theorems of Probability Theory. Oxford University Press, Oxford.
[15]. Samorodnitsky, G. and M. S. Taqqu. (1994). Stable Non-Gaussian Random Processes. ChapmanHall/CRC, Boca Raton.
[16]. Shorack, G. R. (2000). Probability for Statisticians. Springer-Verlag, New York.
[17]. Durrett, R. (2019). Probability: theory and examples (Vol. 49). Cambridge university press.