A whitening transformation or sphering transformation is a linear transformation that transforms a vector of random variables with a known covariance matrix into a set of new variables whose covariance is the identity matrix, meaning that they are uncorrelated and each have variance 1.[1] The transformation is called "whitening" because it changes the input vector into a white noise vector.
Several other transformations are closely related to whitening:
- the decorrelation transform removes only the correlations but leaves variances intact,
- the standardization transform sets variances to 1 but leaves correlations intact,
- a coloring transformation transforms a vector of white random variables into a random vector with a specified covariance matrix.[2]
Definition
Suppose
X
{\displaystyle X}
is a random (column) vector with non-singular covariance matrix
Σ
{\displaystyle \Sigma }
and mean
0
{\displaystyle 0}
. Then the transformation
Y
=
W
X
{\displaystyle Y=WX}
with
a whitening matrix
W
{\displaystyle W}
satisfying the condition
W
T
W
=
Σ
−
1
{\displaystyle W^{\mathrm {T} }W=\Sigma ^{-1}}
yields the whitened random vector
Y
{\displaystyle Y}
with unit diagonal covariance.
If
X
{\displaystyle X}
has non-zero mean
μ
{\displaystyle \mu }
, then whitening can be performed by
Y
=
W
(
X
−
μ
)
{\displaystyle Y=W(X-\mu )}
.
There are infinitely many possible whitening matrices
W
{\displaystyle W}
that all satisfy the above condition. Commonly used choices are
W
=
Σ
−
1
/
2
{\displaystyle W=\Sigma ^{-1/2}}
(Mahalanobis or ZCA whitening),
W
=
L
T
{\displaystyle W=L^{T}}
where
L
{\displaystyle L}
is the Cholesky decomposition of
Σ
−
1
{\displaystyle \Sigma ^{-1}}
(Cholesky whitening),[3] or the eigen-system of
Σ
{\displaystyle \Sigma }
(PCA whitening).[4]
Optimal whitening transforms can be singled out by investigating the cross-covariance and cross-correlation of
X
{\displaystyle X}
and
Y
{\displaystyle Y}
.[3] For example, the unique optimal whitening transformation achieving maximal component-wise correlation between original
X
{\displaystyle X}
and whitened
Y
{\displaystyle Y}
is produced by the whitening matrix
W
=
P
−
1
/
2
V
−
1
/
2
{\displaystyle W=P^{-1/2}V^{-1/2}}
where
P
{\displaystyle P}
is the correlation matrix and
V
{\displaystyle V}
the diagonal variance matrix.
Whitening a data matrix
Whitening a data matrix follows the same transformation as for random variables. An empirical whitening transform is obtained by estimating the covariance (e.g. by maximum likelihood) and subsequently constructing a corresponding estimated whitening matrix (e.g. by Cholesky decomposition).
High-dimensional whitening
This modality is a generalization of the pre-whitening procedure extended to more general spaces where
X
{\displaystyle X}
is usually assumed to be a random function or other random objects in a Hilbert space
H
{\displaystyle H}
. One of the main issues of extending whitening to infinite dimensions is that the covariance operator has an unbounded inverse in
H
{\displaystyle H}
, therefore only partial standardization is possible in infinite dimensions. A whitening operator can be then defined from the factorization of a degenerated covariance operator. High-dimensional features of the data can be exploited through kernel regressors or basis function systems.[5]
R implementation
An implementation of several whitening procedures in R, including ZCA-whitening and PCA whitening but also CCA whitening, is available in the "whitening" R package [6] published on CRAN. The R package "pfica"[7] allows the computation of high-dimensional whitening representations using basis function systems (B-splines, Fourier basis, etc.).
See also
- Decorrelation
- Principal component analysis
- Weighted least squares
- Canonical correlation
- Mahalanobis distance (is Euclidean after W. transformation).
References
- Koivunen, A.C.; Kostinski, A.B. (1999). "The Feasibility of Data Whitening to Improve Performance of Weather Radar". Journal of Applied Meteorology. 38 (6): 741–749. Bibcode:1999JApMe..38..741K. doi:10.1175/1520-0450(1999)038<0741:TFODWT>2.0.CO;2. ISSN 1520-0450.
- Hossain, Miliha. "Whitening and Coloring Transforms for Multivariate Gaussian Random Variables". Project Rhea. Retrieved 21 March 2016.
- Kessy, A.; Lewin, A.; Strimmer, K. (2018). "Optimal whitening and decorrelation". The American Statistician. 72 (4): 309–314. arXiv:1512.00809. doi:10.1080/00031305.2016.1277159. S2CID 55075085.
- Friedman, J. (1987). "Exploratory Projection Pursuit" (PDF). Journal of the American Statistical Association. 82 (397): 249–266. doi:10.1080/01621459.1987.10478427. ISSN 0162-1459. JSTOR 2289161. OSTI 1447861.
- Ramsay, J.O.; Silverman, J.O. (2005). Functional Data Analysis. Springer New York, NY. doi:10.1007/b98888. ISBN 978-0-387-40080-8.
- "whitening R package". Retrieved 2018-11-25.
- "pfica R package". 6 January 2023. Retrieved 2023-02-11.
External links
- https://courses.media.mit.edu/2010fall/mas622j/whiten.pdf
- The ZCA whitening transformation. Appendix A of Learning Multiple Layers of Features from Tiny Images by A. Krizhevsky.