< List of probability distributions < *Balding-Nichols distribution *

The **Balding-Nichols distribution **[1] is a reparametrized Beta distribution model developed by David Balding and Richard Nichols to describe allele frequency at biallelic loci. The model is extensively used in DNA profile forensic analysis and genetic epidemiology population models in both Bayesian statistics and likelihood-based approaches.

Suppose the gene at a particular locus has a dominant allele *G *and a recessive allele *g*, with genotypes GG, Gg, and gg. According to the Balding-Nichols model, the allele frequency *x* for finding *G* at the locus follows a beta distribution with parameters A = µ(1 − λ) / λ and B = (1 − µ)(1 − λ)/ λ, where

**µ**= the mean of the distribution — the expected frequency of the dominant allele*G*in the population.**λ**= a measure of the genetic differentiation between two populations or sub-populations. It is calculated by dividing the variance between populations by the variance within populations.

The measurements are the number of members (N) of each genotype observed within a sampling of the k*th* population, N_{k} = N_{k,GG} + N_{k,Gg} + N_{k,gg} [3].

When populations are exchangeable, a Dirichlet distribution can be used instead [2]. The allele frequency for single nucleotide polymorphisms (SNPs) can be modeled with a truncated normal model [4].

## Balding-Nichols distribution properties

With background allele frequency *p*, the allele frequencies — in sub-populations separated by Wright’s *F _{ST}* — are distributed according to independent draws from [5]

Wright defined the inbreeding coefficient, *F _{IT}* as the correlation between genes on uniting gametes relative to the total array of those in random derivatives of foundation stock. Similarly,

*F*is the correlation between uniting gametes relative to those across all subdivisions [6].

_{ST}“Most importantly,

Sewell Wright [6].Fis the ratio of the actual variance of gene frequencies of subdivisions to its limiting value, irrespective of their own structure.”_{ST}

The probability density function (PDF) is

The distribution has mean *p* and variance *Fp*(1 – *p*).

## Uses of the Balding-Nichols Distribution

Balding and Nichols proposed the Balding-Nichols (BN) allele frequency model to allow for the effects of population differentiation and other factors, in forensic inference based on DNA profiles. The widely used distribution matches probabilities in forensic applications that rely on the Dirichlet model by using evolutionary theory and analysis of moments [8].

It is a common belief that the Balding-Nichols distribution is a reasonable approximation of STR allele fraction distribution in forensics for calculating match probabilities in criminal trials [9].

## References

[1] Balding D. J., Nichols R. A., 1995. A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity. *Genetica* **96**: 3–12. 10.1007/BF01441146 [Google Scholar]

[2] Balding, D.J. 2003 Likelihood-based inference for genetic correlation coefficients. Theoret. Popul. Biol. 63 221 – 230

[3] Applications of the Beta Distribution Part 1: Transformation Group Approach

[4] Nicholson, G., Smith, A.V., Jónsson, F., Gústaffson, Ó., Steánsson, K., Donnelly, P. 2002 Assessing population differentiation and isolation from single-nucleotide polymorphism data. J. R. Stat. Soc. Ser. B64695– 716

[5] O’Brien, J.D., Amenga-Etego, L. & Li, R. Approaches to estimating inbreeding coefficients in clinical isolates of *Plasmodium falciparum* from genomic sequence data. *Malar J* **15**, 473 (2016). https://doi.org/10.1186/s12936-016-1531-z

[6] Wright Sewall. The interpretation of population structure by F-statistics with special regard to systems of mating. *Evolution. *1965;19:395–420. [Google Scholar]

[7] Image: IkamusumeFan, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons

[8] Alkes L. Price; Nick J. Patterson; Robert M. Plenge; Michael E. Weinblatt; Nancy A. Shadick; David Reich (2006). “Principal components analysis corrects for stratification in genome-wide association studies” . Nature Genetics. 38 (8): 904–909. doi:10.1038/ng1847. PMID 16862161. S2CID 8127858.

[9] Weir B. S., Cockerham C. C., 1984. Estimating F-statistics for the analysis of population structure. *Evolution* **38**: 1358–1370. [Google Scholar]

Pingback: Dirichlet Distributions - P-Distribution