A to Z
- Absolute value
- Accuracy and precision in statistics
- Alpha level
- Alternate hypothesis
- Autocorrelation
- Autocorrelation function ACF
- Bayesian Statistics
- Bessel functions
- Beta function
- Bias in statistics
- Boxplot
- Center of a distribution
- Central limit theorem
- Central tendency
- Coefficient
- Characteristic function
- Class width
- Closed interval / open interval
- Coefficient of variation
- Combinations and permutations
- Complementary in statistics
- Complementary cumulative distribution function (CCDF)
- Concordant and discordant pairs
- Conditional probability
- Confidence interval.
- Conjunction rule
- Consistency in statistics
- Control charts
- Convergence (in probability, in distribution)
- Convolution in probability
- Correlation definition
- Correlation coefficient
- Correlation analysis.
- Covariance
- Critical value
- Cumulants
- Cumulative distribution function
- Degrees of Freedom
- Dependent variable
- Dichotomous variable
- Discrete vs continuous variables
- Disjoint events
- Dispersion in statistics
- Dispersion parameter
- Domain in statistics
- Dummy variables
- Efficiency in statistics
- Empirical cumulative distribution function (ECDF)
- Empirical rule (68 95 99.7 rule)
- Error function
- Estimation in statistics
- Expected value
- Exponential dispersion models
- Factorials
- Failure rate
- Fisher information
- Fixed effects
- Frequency curve
- Frequency distribution & table
- Frequentist statistics
- Hazard function
- Hazard rate
- Hermite polynomial
- Histogram
- Homogeneity and heterogeneity
- Hypothesis Test
- Independent and identically distributed (i.i.d)
- Independence of Events
- Independent variable
- Interaction effects
- Interquartile range (IQR)
- Interval estimate
- Inverse survival function
- Jackknife estimator.
- Kendall’s Tau
- Kurtosis
- Large enough sample condition
- Law of large numbers
- Least squares
- Likelihood function
- Likelihood-ratio statistic
- Linear relationship
- Linear Transformation
- Link function
- Location parameter
- Log odds
- Mahalanobis distance
- Mallows’ Cp
- Marginal probability function
- Maximum likelihood
- Mean (average) deviation
- Measures of spread
- Moments
- Moment Generating Function (MGF)
- Multicollinearity
- Mutually exclusive
- Mutually independent
- Neyman Structure
- Statistical noise
- Nonlinearity
- Non-probability sampling
- Normalizing constant
- Null hypothesis
- Outliers in data
- P-value
- Pairwise independence
- Parameter definition in statistics
- Parameterize a probability distribution
- Percentiles
- Percent point function
- Plackett-Luce model
- Point estimate
- Pooled standard deviation
- Population in statistics
- Population mean
- Population variance
- Power
- Probability density function (PDF).
- Generalized beta distributions
- Probability generating function (PGF).
- Probability mass function (PMF).
- Probability sampling.
- Qualitative vs. quantitative data
- Quantile function
- Quartiles in statistics
- Random effects
- Random Variable
- Range in statistics
- Rate parameter
- Rejection region
- Reliability in statistics
- Representative sample
- The Reproductive Property of Distributions
- Robust statistics
- Run chart / run sequence plot
- Sample in statistics
- Sample mean: definition & Examples
- Sampling with and without replacement.
- Sample space
- Sample variance
- Sampling theory
- Scale parameter
- Seasonality
- Shapes of probability distributions
- Shape parameter
- Simple random sample
- Skewness
- Smooth distribution
- Spearman footrule distance.
- Standard deviation
- Standard error
- Statistic
- Statistical inference
- Statistical regularity
- Statistical significance
- Stratified random sampling
- Sufficient statistic
- Summation notation
- Sum of squares
- Support of a probability distribution
- Support or reject the null hypothesis
- Survival function
- Symmetric distribution
- Tails of a distribution
- T-Score
- Threshold parameter
- Thurstone model
- Transformations
- Truncation
- Upper tail and lower tail
- Validity in statistics
- Variability in statistics
- Variable in statistics
- Variance
- Weighted mean
- Z-score

Statistics and Probability: Two Sides of the Same Coin?
Everybody knows that math can be divided into many different branches. There’s algebra, geometry, trigonometry, and calculus, to name a few. But did you know that there’s a branch of math specifically devoted to the study of random events? It’s called probability, and it’s closely related to another branch called statistics. In this blog post, we’ll explore the similarities and differences between these two important areas of mathematics.
Probability vs. Statistics: What’s the Difference?
At first glance, probability and statistics may seem like two sides of the same coin. After all, they both deal with the collection and analysis of numerical data. However, there are some important distinctions between the two fields. Probability is concerned with the likelihood of something happening, while statistics is focused on actual numerical data. Put another way, probability deals with hypothetical situations, while statistics deals with actual observations.
For example, let’s say you’re trying to determine the probability that it will rain tomorrow. To do this, you would look at past weather patterns and make a prediction based on that data. On the other hand, if you were collecting statistical data on rainfall in your area, you would simply go outside and measure how much rain fell over a given period of time. As you can see, probability is concerned with predicting future events, while statistics is concerned with describing past events.
A Few Statistics and Probability Definitions
Factorials
The ! symbol after a number indicates it’s a factorial:
- 6! is “six factorial.”
- 3! is “three factorial.”
To solve, multiply “n” by every whole number below it. For example, 3! means that n is 3, so
3! is 3 x 2 x 1 = 6.
Factorials are a shorthand way of writing numbers. For example, instead of writing 479001600, you could write 12! instead (which is 12 x 11 x 10 x 9 x 8 x 7 x 6 x 5 x 4 x 3 x 2 x 1). A few more examples:
- 1! = 1
- 2! = 2
- 3! = 6
- 4! = 24
- 5! = 120
The formal definition is that factorials are products of every whole number (counting numbers 1, 2, 3…) from 1 to n. Watch the video for some examples:
Hermite Polynomial
Hermite polynomials are widely-used polynomials defined over the interval (-∞, ∞), with a weight function proportional to w(x) = e-x2.
Multiple definitions exist for “Hermite polynomials,” which can be confusing. There are two distinct sets of polynomials referred to as the “physicists'” and “probabilists'” polynomials, depending on the starting point. Most authors assume the reader is working in either physics or probability and simply mention Hermite polynomials without clarification.
In calculus, the “physicists'” Hermite polynomials are commonly used, constructed from monomials. The first few are [1]:
- H0 (x) = 1
- H1 (x) = x
- H2 (x) = x2 – 1
- H3 (x) = x3 – 3x
- H4 (x) = x4 – 6x2 + 3
- H5 (x) = x5 – 10x3 + 15x
- H6 (x) = x6 – 15x4 + 45x2 – 15.
- Another definition, with w(x) = e-x2/2 is occasionally used, particularly in statistics. The “probabilists'” polynomials are sometimes referred to as Chebyshev-Hermite polynomials. The first few are:
- H0 (x) = 1
- H1 (x) = 2x
- H2 (x) = 4x2 – 2
- H3 (x) = 8x3 – 12x
- H4 (x) = 16x4 – 48x2 + 12
- H5 (x) = 32x5 – 160x3 + 120x
- H6 (x) = 64x6 – 480x4 + 720x2 – 120
Hermite polynomials are useful as interpolation functions since their value, as well as their derivative values up to order n, are equal to unity at the endpoints of the closed interval [0, 1] [see ref. 2]. They provide an alternative method for representing cubic curves, allowing the curve to be defined based on endpoints and the derivatives at those endpoints [3].
Hermite polynomials arise in various areas of physics, including in the solution to the quantum harmonic oscillator Hamiltonian. They also come up in numerical analysis as Gaussian quadrature.
References
[1] Sawitzki, G. (2009). Computational Statistics: An Introduction to R, CRC Press.
[2] Huebner, K. et al. (2001). The Finite Element Method for Engineers. Wiley.
[3] Buss, S. (2003). 3D Computer Graphics. A Mathematical Introduction with OpenGL. Cambridge University Press.
Linear Transformation

< Statistics and Probability Definitions < Linear transformation
What is linear transformation?
A linear transformation is a special case of a vector transformation with additional properties:
- Addition must be Preserved: The linear transformation must be additive, which means that T(u + v) = T(u) + T(v) for any vectors u and v in the domain of the transformation. To check if addition is preserved, take u and v and add them together. Then transform each vector individually and add the results. If the sum of the transformed vectors is the same as the sum of the original vectors, then addition is preserved and your transformation is linear.
- Scalar Multiplication must be Preserved: This means that if we multiply a vector by a scalar, the linear transformation must also multiply the vector by the same scalar. To check if scalar multiplication is preserved, take vector u and multiply it by a scalar c. Then transform u and multiply the result by scalar c. If the product of the transformed vector is the same as the product of the original vector, then scalar multiplication is preserved by your transformation and it is linear.
- Homogeneity: The transformation must be homogeneous, which means that T(ku) = k * T(u) for any vector u in the transformation’s domain and any scalar k.
What is the role of linear transformation?
Linear transformations are used in many different areas of mathematics and computer science. For example, they are used in linear algebra, differential geometry, and machine learning:
- In linear algebra, linear transformations are used to represent geometric transformations, such as rotations, reflections, and scalings. They are also used to solve systems of linear equations.
- In differential geometry, linear transformations are used to represent smooth mappings between manifolds. They are also used to study the properties of differential operators.
- In machine learning, linear transformations are used to represent features in data. They are also used to train machine learning models.
Here are some of the roles of linear transformations:
- Representing a geometric transformation. For example, a rotation can be represented by a linear transformation that rotates vectors by a certain angle.
- Solving systems of linear equations. This is because a system of linear equations can be represented by a matrix, and the solution to the system can be found by finding the inverse of the matrix.
- Representing features in data. For example, a linear transformation can be used to represent the color of an image as a vector.
- Training machine learning models. This is because machine learning models can be represented as linear transformations, and the parameters of the model can be learned by minimizing a loss function.
Overall, linear transformations are a powerful tool that can be used in many different areas of mathematics and computer science. They are used to represent geometric transformations, solve systems of linear equations, represent features in data, and train machine learning models.
Linear Transformation Example
Example Question: Is the following transformation a linear transformation?
T(x, y)→ (x – y, x + y, 9x)
Step 1: Give the vectors u and v (from rule 1) some components. The choice of a and b here is arbitrary:
- u = (a1, a2)
- v = (b1, b2)
Step 2: Find an expression for the addition part of the left side of the addition preservation equation T(u + v) = T(u) = T(v):
(u + v) = (a1, a2) + (b1, b2)
Add these two vectors together to get:
((a1 + b1), (a2 + b2))
In matrix form, the addition is:

Step 3: Apply the transformation. We’re given the rule T(x,y)→ (x – y, x + y, 9x), so transforming our additive vector from Step 2, we get:
- T ((a1 + b1), (a2+ b2)) =
- ((a1 + b1) – (a2 + b2),
- (a1 + b1) + (a2 + b2),
- 9(a1 + b1)).
Simplifying/Distributing using algebra:
(a1 + b1 – a2 – b2,
a1 + b1 + a2 + b2,
9a1 + 9b1).
Set this aside for a moment: we’re going to compare this result to the result from the right hand side of the equation in a later step.
Step 4: Find an expression for the right side of the Rule 1 equation, T(u) + T(v). Using the same a/b variables we used in Steps 1 to 3, we get:
T((a1,a2) + T(b1,b2))
Step 5: Transform the vector u, (a1,a2). We’re given the rule T(x,y)→ (x – y, x + y, 9x), so transforming vector u, we get:
- (a1 – a2,
- a1 + a2,
- 9a1)
Step 6: Transform the vector v. We’re given the rule T(x,y)→ (x – y, x + y,9x), so transforming vector v, (a1,a2), we get:
- (b1 – b2,
- b1 + b2,
- 9b1)
Step 7: Add the two vectors from Steps 5 and 6:
(a1 – a2, a1 + a2, 9a1) + (b1 – b2, b1 + b2, 9b1) =
((a1 – a2 + b1 – b2,
a1 + a2 + b1 – b2,
9a1 + 9b1)

Step 8: Compare Step 3 to Step 7. They are the same, so condition 1 (the additive condition) is satisfied.
Part Two: Is Scalar Multiplication Preserved?
In other words, in this part we want to know if T(cu)=cT(u) is true for T(x,y)→ (x-y,x+y,9x). We’re going to use the same vector from Part 1, which is u = (a1, a2).
Step 1: Work the left side of the equation, T(cu). First, multiply the vector by a scalar, c.
c * (a1, a2) = (c(a1), c(a2))
Step 2: Transform Step 1, using the rule T(x,y)→ (x-y,x+y,9x):
(ca1 – ca2,
ca1 + ca2,
9ca1)
Put this aside for a moment. We’ll be comparing it to the right side in a later step.
Step 3: Transform the vector u using the rule T(x,y)→ (x-y,x+y,9x). We’re working the right side of the rule 2 equation here:
(T(a1, a2)=
a1 – a2
a1 + a2
9a1)
Step 4: Multiply Step 3 by the scalar, c.
(c(a1 – a2)
c(a1 + a2)
c(9a1))
Distributing c using algebra, we get:
(ca1 – ca2,
ca1 + ca2,
9ca1)
Step 5: Compare Steps 2 and 4. they are the same, so the second rule is true. This function is a linear transformation.
References
[1] Jochen Burghardt, CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0, via Wikimedia Commons