< List of probability distributions < *Hotelling’s T-Squared*

**Hotelling’s T-Squared** **distribution **[1] is a multivariate statistical tool proportional to the F distribution; it is used to compare two or more samples and identify any differences between them. Thus, it is the multivariate counterpart of the T-test.

Hotelling’s T^{2} forms the basis for various multivariate control charts and can describe the Mahalanobis distance between two populations.

## What Is Hotelling’s T^{2}?

Hotelling’s test statistic measures the difference in means between two or more populations, and can also be used to identify outliers or nonconformities in a data set.

Formally, Hotelling’s T-squared distribution is defined as follows [2]: Suppose that a vector **d **is normally distributed with a mean of zero and unit covariance matrix N_{p}(0, I), and M is an *m* * *p* matrix with a Wishart distribution with unit scale matrix and *n* degrees of freedom W_{p}(I, m). Then, m**d**^{T}M^{-1}**d** has a two-parameter Hotelling distribution T^{2}(p,m).

- A
**covariance matrix**shows the correlation between each pair of variables in a multivariate dataset. Covariance is a measure of how much two variables vary together. - A
**unit scale matrix**is a covariance matrix where all variances are equal to 1, which means that all the variables are equally spread out around their means.

The Hotelling’s T^{2} *test *uses linear combinations of variables to detect any significant changes in means between populations. It then compares these linear combinations to each other using a chi-squared test statistic. If the test statistic indicates that there are significant differences in means between populations, then these changes are considered statistically significant.

Two versions of the test exist with the following null hypotheses:

**One sample**: The multivariate vector means for a group equals a hypothetical vector of means.**Two sample**: The multivariate vector of means for two groups are equal.

For more than two samples, run a MANOVA instead; MANOVA is more powerful than Hotelling’s T-squared when there are more than two groups.

## Two-sample Hotelling’s T-Squared

The steps to run Hotelling’s T-squared are roughly the same as running a two-sample t-test, except that there are differences in formulas and tables (Hotelling’s uses the F-table instead of the t-table). Hotelling’s has several advantages over the t-test [3]:

- A well controlled Type I error rate.
- The relationship between multiple variables is taken into account,
- It can generate an overall conclusion even if multiple (single) t-tests are inconsistent. While a t-test will tell you which variable differ between groups, Hotelling’s summarizes the between-group differences.

**Assumptions**:

- Samples are normally distributed.
- Independent samples.
- Samples have equal variance-covariance matrices. Run Bartlett’s test to check this assumption.

**Test hypotheses**:

**Null hypothesis**(H_{0}): the two samples are from populations with the same multivariate mean.**Alternate hypothesis**(H_{1}): the two samples are from populations with different multivariate means.

Similarly to the t-test, find a value for T-squared, then compare it to a table value; if the calculated value is greater than the table value, you can reject the null hypothesis. For ease of calculation, Hotelling’s T^{2} is first transformed into an F-statistic.

where

- n
_{1}& n_{2}= sample sizes, - p = number of variables measured,
- N
_{1}+ N_{2}– p – 1 = degrees of freedom.

Reject the null hypothesis (at a chosen significance level) if the calculated value is greater than the F-table critical value. Rejecting the null hypothesis means that at least one of the parameters, or a combination of one or more parameters working together, is significantly different.

## Why Is Hotelling’s T-Squared Distribution Important?

Hotelling’s T^{2} is an important tool for identifying changes in means between multiple populations. By using linear combinations of variables, it allows us to compare multiple samples at once, instead of having to run separate tests for each sample. This makes it much easier and faster to identify any meaningful changes in means between populations over time or across different groups of people. Additionally, because it uses a chi-squared test statistic, it helps us determine whether or not these changes are statistically significant – something that traditional t-tests cannot do on their own.

## References

[1] Hotelling H (1931) The generalization of Student’s ratio. Ann Math Stat. 2(3):360–378.

[2] Weisstein, E. (2002). CRC Concise Encyclopedia of Mathematics. CRC Press.

[3] Fang, J. (2017). Handbook of Medical Statistics.