Probabilistic Perspective of Weierstrass Approximation Theorem via Bernstein Polynomial

Boyi Xie

doi:10.54254/2753-8818/2025.25318

1. Introduction

The synergy between probability theory and mathematical analysis has profoundly enriched both fields, offering unified frameworks for tackling complex problems. A striking example lies in the probabilistic interpretation of Bernstein polynomials, where binomial distributions and expectation operators $E [f (S_{n} / n)]$ translate function approximation into probabilistic convergence analysis‌. By leveraging variance bounds $V a r (S_{n} / n) = x (1 - x) / n$ and the law of large numbers, discrete probabilistic models converge uniformly to continuous functions, directly proving the Weierstrass theorem‌. Such methods bypass traditional analytical complexities while highlighting how probability theory—via moment calculations and distributional limits—provides intuitive tools for approximating function spaces like $C (0,1)$ . Conversely, mathematical analysis underpins foundational probability concepts, such as measure-theoretic integration and density functions, ensuring rigorous continuity and convergence proofs‌. This cross-disciplinary interplay not only simplifies computational workflows but also deepens insights into stochastic processes and deterministic analysis alike‌.

The probability theory has such a close relationship with mathematical analysis that applying probability theory to mathematical analysis has its unique importance. Probability theory is based on measure theory, which is also a core tool for modern mathematical analysis such as Lebesgue integrals and function space theory. The introduction of probability measures is the unmeasurable set-in analysis. Generalized integration and other problems provide more rigorous mathematical descriptions, while expanding the research scope of function spaces. Mathematical analysis mainly studies deterministic phenomena, while probability theory incorporates randomness into mathematical frameworks through tools such as random variables and distribution functions. The combination of the two can handle complex problems such as stochastic differential equations and stochastic integrals, enriching the research objects of analysis. Techniques in probability theory, such as randomization and Monte Carlo methods, are used in theorem proofs for mathematical analysis. For example, $1 / \sqrt{2 π} \int_{- \infty}^{x} e^{- t^{2} / 2} d t ~ 1 / \sqrt{2 π} (e^{- x^{2} / 2}) / x$ has already been proved [1], but it can also be proved by probability theory.

The development of approximation theory has promoted the development of functional and other branches of mathematics. At the same time, Bernstein polynomials are very important in approximation theory. The functions which can be approximated by Bernstein polynomial have very good properties. For instance, if the Bernstein polynomial belongs to the class of Lipschitz functions, then the function it approximates also belongs to the class of Lipschitz functions [2]. In this article, the author will mainly talk about how to use Bernstein polynomial to prove the Weierstrass approximation theorem.

2. Method and theory

2.1. Weierstrass approximation theorem

Theorem: If $f$ is a continuous complex function on $[a, b]$ , there exists a sequence of polynomials $P_{n}$ such that

$P_{n} (x) = f (x)$ (1)

uniformly on $[a, b]$ , and $P_{n}$ can be taken real if $f$ is real. The theorem was firstly found by Weierstrass. The theorem is proved by some techniques in mathematical analysis which are quite difficult [3]. Below is the proof.

Proof: First the author assumes that, $[a, b] = [0,1]$ , since $[a, b]$ is a random interval. Else, assume $f (0) = f (1) = 0$ . Because if the theorem is proven in this situation, set $g (x) = f (x) - f (0) - x [f (1) - f (0)] (0 \leq x \leq 1)$ . In this case $g (0) = g (1) = 0$ , and if $g$ can be derived as the limit of polynomial uniform convergence sequence, it can be easily proved that the same situation is also correct for $f$ , since $f - g$ is also a polynomial.

To continue, $f (x)$ is defined to be 0 as $x$ is not concluded in $[0,1]$ . In order let $f$ to be uniformly continuous on the interval. Put

$Q_{n} (x) = c_{n} (1 - x^{2})^{n} (n = 1, 2, 3, \dots)$ (2)

where $c_{n}$ is chosen so that $\int_{- 1}^{1} Q_{n} (x) d x = 1$ . More information about the order of magnitude of $c_{n}$ should be known, as

$\int_{- 1}^{1} {(1 - x^{2})}^{n} d x = 2 \int_{0}^{1} {(1 - x^{2})}^{n} d x \geq 2 \int_{0}^{1 / \sqrt{n}} {(1 - x^{2})}^{n} d x$

$\geq 2 \int_{0}^{\frac{1}{\sqrt[]{n}}} (1 - n x^{2}) d x = \frac{4}{3 \sqrt[]{n}} > \frac{1}{\sqrt[]{n}}$ (3)

It follows from Eq. (3) that $c_{n} < \sqrt{n}$ .

The inequality ${(1 - x^{2})}^{n} \geq 1 - n x^{2}$ which is used the author above can be proved by taking the consideration of the function ${(1 - x^{2})}^{n} - 1 + n x^{2}$ . The function equals to 0 when $x$ equals to 0 and its derivative is non-negative on the interval $(0,1)$ . For any $δ > 0$ , Eq. (3) indicates

$Q_{n} (x) \leq \sqrt[]{n} (1 - δ^{2})^{n} (δ \leq a b s (x) \leq 1)$ (4)

so that $Q_{n} \to 0$ uniformly in $δ \leq | x | \leq 1$ . Set

$P_{n} (x) = \int_{- 1}^{1} f (x + t) Q_{n} (t) d t (0 \leq x \leq 1)$ (5)

The assumptions listed above about $f$ show that

$P_{n} (x) = \int_{- x}^{1 - x} f (x + t) Q_{n} (t) d t = \int_{0}^{1} f (t) Q_{n} (t - x) d t$ (6)

What is more, the last integral is obviously a polynomial included in $x$ . Therefore ${P_{n}}$ is a sequence of polynomials. If $f$ is real, then the polynomials are real.

Given $ε > 0$ , let $δ > 0$ such that $| y - x | < δ$ indicates $| f (y) - f (x) | < \frac{ε}{2}$ . Let $M = s u p (| f (x) |)$ . Using (1), (3), and the fact that $Q_{n} (x) \geq 0$ , for $0 \leq x \leq 1$ ,

$| P_{n} (x) - f (x) | = | \int_{- 1}^{1} [f (x + t) - f (x)] Q_{n} (t) d t |$

$\leq \int_{- 1}^{1} a b s (f (x + t) - f (x)) Q_{n} (t) d t$

$\leq 2 M \int_{- 1}^{- δ} Q_{n} (t) d t + \frac{ε}{2} \int_{- δ}^{δ} Q_{n} (t) d t + 2 M \int_{δ}^{1} Q_{n} (t) d t$

$\leq 4 M \sqrt[]{n} (1 - δ^{2})^{n} + \frac{ε}{2} < ε$ (7)

for all $n$ which is large enough. Now the theorem is proved.

Obviously, this method is very complicated. So, can the polynomial approximation of continuous functions be transformed into a problem of probabilistic expectation, and use probability theory methods to prove this theorem? This can be achieved using Bernstein polynomial and probabilities methods.

2.2. Bernstein polynomial

Definition: For a continuous function $f \in C [0,1]$ , the Bernstein polynomial of degree $n$ is defined as $B_{n} (f) (x) = \sum_{k = 0}^{n} f (\frac{k}{n}) (\binom{n}{k}) x^{k} {(1 - x)}^{1 - k}$ [4].

This can be reinterpreted as the expectation of a binomial random variable: $B_{n} (f) (x) = E [f (\frac{x}{n})]$ , where $X ~ B i n o m i a l (n, x)$ . Assume that $S_{n} ~ B i n o m i a l (n, x)$ , and standardized random variable $\frac{S_{n}}{n}$ , its expectation is $E [\frac{S_{n}}{n}] = x$ , variance is $V a r (\frac{S_{n}}{n}) = \frac{x (1 - x)}{n}$ .Next, to build the realtion between Bernstein polynomial and the expectation, the polynomial coefficients can be interpreted as probability mass function of the binomial distribution $B_{n} (f) (x) = \sum_{k = 0}^{n} f (\frac{k}{n}) P (S_{n} = k) = E [f (\frac{S_{n}}{n})]$ .So for any $x \in [0,1]$ ,

$∣ B_{n} (f) (x) - f (x) ∣ = ∣ E [f (\frac{S_{n}}{n}) - f (x)] ∣ \leq E [∣ f (\frac{S_{n}}{n}) - f (x) ∣]$ (8)

This step is achieved by Jensen inequality in order to control the absolute value of the expectation. With the continuity of a function, $\forall ε > 0, \exists δ > 0,$

$s . t . ∣ \frac{S_{n}}{n} - x ∣ < δ \to ∣ f (\frac{S_{n}}{n}) - f (x) ∣ < ε$ (9)

Next, the expectation can be divided into two parts:

$E [∣ f (\frac{S_{n}}{n}) - f (x) ∣] \leq ε ∙ P (∣ \frac{S_{n}}{n} - x ∣ < δ) + 2 M ∙ P (∣ \frac{S_{n}}{n} - x ∣ \geq δ) (M = s u p ∣ f ∣)$ (10)

The second part has an upper bound:

$P (∣ \frac{S_{n}}{n} - x ∣ \geq δ) \leq \frac{V a r (\frac{S_{n}}{n})}{δ^{2}} = \frac{x (1 - x)}{n δ^{2}} \leq \frac{1}{4 n δ^{2}}$ (11)

Finally,

$∣ B_{n} (f) (x) - f (x) ∣ \leq E [∣ f (\frac{S_{n}}{n}) - f (x) ∣] \leq ε + \frac{2 M}{4 n δ^{2}}$ (12)

$\forall ε > 0,$ take the $δ$ which satisfies the continuity condition, then take $n > \frac{M}{2 ε δ^{2}}$ ,

$∣ B_{n} (f) (x) - f (x) ∣ \leq 2 ε$ (13)

Now the theorem is proved by Bernstein polynomial and probabilities methods. The author also wants to make a comparison between traditional mathematic analysis proof and probabilities methods proof.

In traditional analytic proofs, ‌explicit computation of Bernstein polynomial coefficients is required to approximate continuous functions‌. This involves cumbersome algebraic manipulations and often leads to computational errors. What is more, proof relies heavily on intricate combinatorial summations. These steps obscure the intuition behind the Weierstrass approximation theorem, requiring advanced analysis skills to grasp‌. The complexity of coefficient manipulation and combinatorial arguments makes the proof inaccessible to beginners, as it prioritizes algebraic rigor over conceptual clarity.

In contrast, probabilistic methods circumvent the explicit construction of polynomial coefficients. By leveraging the linearity of expectation and applying Jensen’s inequality, the proof becomes more natural. The error estimation is simplified using variance, avoiding advanced techniques‌. This approach not only enhances clarity for novices but also establishes a profound connection between probability theory and mathematical analysis.

3. Results and applications

3.1. Core methodologies and technical features

The author has already used the Bernstein polynomial to prove Weierstrass approximation theorem. To derive more general and detailed conclusions, it is necessary to summarize the core ideas in the procedure of proof.

The first step is about constructor. Bernstein polynomial approximates functions through discrete sampling points, its standard form is:

$B_{n} (f) (x) = \sum_{k = 0}^{n} f (\frac{k}{n}) (\frac{n}{k}) x^{k} (1 - x)^{1 - k}$ (14)

where $n$ is the degree of the polynomial, $f (\frac{x}{n})$ is the sampling values at equally spaced nodes $k / n$ . Bernstein polynamial can be regarded as the weighted average of function values, with weights determined by the binomial distribution $(\binom{n}{k}) x^{k} {(1 - x)}^{1 - k}$ . Essentially, they represent an expectation approximation in a probabilistic sense.

Then, the second issue is the convergence and error analysis of Bernstein polynomial. According to Weierstrass theorem shown in Eq. (1), when $n \to \infty$ , $B_{n} (f) (x)$ converge uniformly to a continuous function on the interval $(0,1)$ . For the convergence speed, the error bound is related to the smoothness of the function, if $f (x)$ satisfies the Lipschitz condition. This condition states that for functions which are defined on a subset of the real number set $f : D \subseteq R \to R$ , if there exists a constant $K$ , s.t. $| f (a) - f (b) | \leq K | a - b |$ for $\forall a, b \in D$ . The minimum constant $K$ for $f$ is called the Lipschitz constant for $f$ . Then the error bound is $ο (\frac{1}{\sqrt{n}})$ ; if $f (x)$ is second-order continuously differentiable, then the error bound comes to $ο (\frac{1}{n})$ . Finally, the third step is method characteristics and optimization.

Before closing, the author shall give a remark the advantages. Bernstein polynomial possesses the properties of preserving non-negativity and monotonicity, and they do not exhibit the Runge phenomenon due to uneven spacing between interpolation nodes. What is more, the values of Bernstein polynomial always lie within the convex hull of the function’s sampling points, making them suitable for geometric modeling, such as in the case of Bezier curves.

Furthermore, to obtain deeper conclusions, talking about some limitations are necessary. When the computation of binomial coefficients involves large amounts, $n$ needs to be combined with recursive or numerical optimization methods. Else, in the approximation of functions with non-uniform nodes or high oscillations, it is necessary to combine with other basis functions(such as Chebyshev polynomials) to improve efficiency.

In conclusion, Bernstein polynomial achieves function approximation in the form of probability weighting, combining both theoretical completeness and practical engineering applicability. However, balancing its efficiency and accuracy requires adjusting parameters or combining with other approximation methods according to specific problems.

3.2. Some applications of approximation theory

The author has already proved that Weierstrass approximation theorem can be proved by Bernstein polynomial. However, approximation theory also holds a significant position in other fields.

First, stable numerical interpolation‌, in numerical analysis, high-degree polynomial interpolation often suffers from the Runge phenomenon, where oscillations amplify near interval endpoints. Bernstein polynomials mitigate this issue through their shape-preserving and smoothing properties. By design, they avoid overshooting and maintain non-negativity, ensuring stable approximations even for noisy data. This stability is critical in engineering simulations, where robustness outweighs the need for rapid convergence [5].

Second, geometric modeling: Bézier Curves‌, Bernstein polynomials form the mathematical backbone of Bézier curves, the cornerstone of computer-aided design (CAD) systems. A Bézier curve of degree $n$ is expressed as: $C (t) = \sum_{i = 0}^{n} B_{i, n} (t) P_{i} t \in [0,1]$ , where $B_{i, n} (t)$ are Bernstein basis functions, and $P_{i}$ are control points. The convex hull property (guaranteed by the non-negativity and partition of unity of Bernstein polynomials) ensures curves remain within the convex hull of control points, enabling intuitive shape manipulation in graphic design and animation [6].

‌Third, image processing and feature extraction,‌ in image segmentation, Bernstein polynomials are employed to approximate object boundaries. By fitting polynomial curves to pixel data, they enable efficient representation of complex shapes with minimal control points. This technique is particularly valuable in medical imaging for reconstructing organ contours or tumor margins from discrete voxel data. Additionally, their smoothness reduces aliasing artifacts in image resizing algorithms [7].

‌Forth, emerging applications in machine learning‌, recent advances explore Bernstein polynomials as activation functions or basis functions in neural networks. Their bounded derivatives and smoothness enhance training stability for regression tasks, especially when modeling physical systems governed by continuous but non-analytic laws.

All in all, Bernstein polynomials bridge theoretical mathematics and practical engineering [8], offering a robust framework for approximating continuous functions. From foundational proofs in analysis to real-world applications in CAD and image processing, their unique blend of simplicity, stability, and geometric interpretability ensures enduring relevance. As computational demands grow, these polynomials will likely inspire new algorithms at the intersection of approximation theory and data-driven modeling [9]. Their legacy exemplifies how abstract mathematical constructs can evolve into indispensable tools for technological innovation.

4. Conclusion

This paper presents a novel probabilistic approach to establishing the Weierstrass approximation theorem via Bernstein polynomials, highlighting the synergy between probability theory and mathematical analysis. By interpreting Bernstein polynomials through the lens of Bernoulli trials and leveraging the law of large numbers, the author demonstrates the uniform convergence of polynomials to continuous functions. The proposed methodology simplifies computational procedures while maintaining rigorous convergence guarantees, offering an instructive example of cross-domain synergy. The technical contributions involve bounding the approximation error through variance analysis and employing piecewise estimation to address non-uniform continuity. The results reveal that Bernstein polynomials achieve function approximation in the form of probabilistic weighting, combining theoretical completeness with practical engineering applicability. However, balancing efficiency and accuracy requires adjusting parameters or combining with other approximation methods according to specific problems. Overall, this work deepens insights into stochastic processes and deterministic analysis, enriching the research scope of function spaces.

References

[1]. Mao Weibing. (2017). Simple Approximate Relationships Between the Normal Distribution Function and Its Density. Yinshan Academic Journal, 31(04), 8-9.

[2]. Guo Cundi. (2000). Bernstein polynomials and the continuous functions they approximate. Journal of Xi'an University of Engineering and Technology, (03), 325-326.

[3]. Walter Rudin. (2013). Principles of Mathematical Analysis. McGraw-Hill Education.

[4]. A.N. Shiryaev. (2004). Probability. World Publishing Corporation.

[5]. Kaur, J., et al. (2024). A generalization of modified α-Bernstein operators and its related estimations and errors. Arabian Journal of Mathematics, 13(3), 521–531.

[6]. Zhang Jiuting. (2018). A class of inequalities based on Bernstein polynomials and their applications. Journal of Inner Mongolia Normal University, 47 (03), 199-202.

[7]. Wei Yanjun & Feng Bojin & Wu Weiguo. (2016). Multi threshold image segmentation algorithm based on polynomial consistent approximation. Journal of Communications, 37 (10), 56-64.

[8]. Liu Yong & Wang Changqin. (2005). The approximation degree of Bernstein polynomial on the convergence interval. Journal of Dalian Railway Institute, (04), 1-3.

[9]. Bustamante, J., & Muñoz-Delgado, F. J. (2014). Bernstein polynomial and discontinuous functions. Journal of Mathematical Analysis and Applications, 411(2), 829–837.

Cite this article

Xie,B. (2025). Probabilistic Perspective of Weierstrass Approximation Theorem via Bernstein Polynomial. Theoretical and Natural Science,104,19-25.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of CONF-MPCS 2025 Symposium: Mastering Optimization: Strategies for Maximum Efficiency

ISBN：978-1-80590-165-5(Print) / 978-1-80590-166-2(Online)

Editor：Marwan Omar

Conference website: https://2025.confmpcs.org/workshop_chicago.html

Conference date: 21 March 2025

Series: Theoretical and Natural Science

Volume number: Vol.104

ISSN：2753-8818(Print) / 2753-8826(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).