In probability theory, in the context of light-tailed distributions, one definition of a subexponential distribution is as a probability distribution whose tails decay at an exponential rate, or faster: a real-valued distribution
D
{\displaystyle {\cal {D}}}
is called subexponential if, for a random variable
X
∼
D
{\displaystyle X\sim {\cal {D}}}
,
-
P
(
|
X
|
≥
x
)
=
O
(
e
−
K
x
)
{\displaystyle {\mathbb {P}}(|X|\geq x)=O(e^{-Kx})}
, for large x {\displaystyle x}
and some constant K > 0 {\displaystyle K>0}
.
Note that this is almost the opposite of the more established meaning of subexponential in the context of Heavy-tailed distributions, where "sub" means that the rate of decay is slower than exponential, rather than that the tail is lighter than exponential.
The subexponential norm,
‖
⋅
‖
ψ
1
{\displaystyle \|\cdot \|_{\psi _{1}}}
, of a random variable is defined by
-
‖
X
‖
ψ
1
:=
inf
{
K
>
0
∣
E
(
e
|
X
|
/
K
)
≤
2
}
,
{\displaystyle \|X\|_{\psi _{1}}:=\inf \ \{K>0\mid {\mathbb {E}}(e^{|X|/K})\leq 2\},}
where the infimum is taken to be + ∞ {\displaystyle +\infty }
if no such K {\displaystyle K}
exists.
This is an example of a Orlicz norm. An equivalent condition for a distribution
D
{\displaystyle {\cal {D}}}
to be subexponential is then that
‖
X
‖
ψ
1
<
∞
.
{\displaystyle \|X\|_{\psi _{1}}<\infty .}
[1]: §2.7
Subexponentiality can also be expressed in the following equivalent ways:[1]: §2.7
-
P
(
|
X
|
≥
x
)
≤
2
e
−
K
x
,
{\displaystyle {\mathbb {P}}(|X|\geq x)\leq 2e^{-Kx},}
for all x ≥ 0 {\displaystyle x\geq 0}
and some constant K > 0 {\displaystyle K>0}
.
-
E
(
|
X
|
p
)
1
/
p
≤
K
p
,
{\displaystyle {\mathbb {E}}(|X|^{p})^{1/p}\leq Kp,}
for all p ≥ 1 {\displaystyle p\geq 1}
and some constant K > 0 {\displaystyle K>0}
.
- For some constant
K
>
0
{\displaystyle K>0}
, E ( e λ | X | ) ≤ e K λ {\displaystyle {\mathbb {E}}(e^{\lambda |X|})\leq e^{K\lambda }}
for all 0 ≤ λ ≤ 1 / K {\displaystyle 0\leq \lambda \leq 1/K}
.
-
E
(
X
)
{\displaystyle {\mathbb {E}}(X)}
exists and for some constant K > 0 {\displaystyle K>0}
, E ( e λ ( X − E ( X ) ) ) ≤ e K 2 λ 2 {\displaystyle {\mathbb {E}}(e^{\lambda (X-{\mathbb {E}}(X))})\leq e^{K^{2}\lambda ^{2}}}
for all − 1 / K ≤ λ ≤ 1 / K {\displaystyle -1/K\leq \lambda \leq 1/K}
.
-
|
X
|
{\displaystyle {\sqrt {|X|}}}
is sub-Gaussian.
References
- High-Dimensional Probability: An Introduction with Applications in Data Science, Roman Vershynin, University of California, Irvine, June 9, 2020
Further reading
- High-Dimensional Statistics: A Non-Asymptotic Viewpoint, Martin J. Wainwright, Cambridge University Press, 2019, ISBN 9781108498029.