Stress majorization is an optimization strategy used in multidimensional scaling (MDS) where, for a set of
n
{\displaystyle n}
m
{\displaystyle m}
-dimensional data items, a configuration
X
{\displaystyle X}
of
n
{\displaystyle n}
points in
r
{\displaystyle r}
(
≪
m
)
{\displaystyle (\ll m)}
-dimensional space is sought that minimizes the so-called stress function
σ
(
X
)
{\displaystyle \sigma (X)}
. Usually
r
{\displaystyle r}
is
2
{\displaystyle 2}
or
3
{\displaystyle 3}
, i.e. the
(
n
×
r
)
{\displaystyle (n\times r)}
matrix
X
{\displaystyle X}
lists points in
2
−
{\displaystyle 2-}
or
3
−
{\displaystyle 3-}
dimensional Euclidean space so that the result may be visualised (i.e. an MDS plot). The function
σ
{\displaystyle \sigma }
is a cost or loss function that measures the squared differences between ideal (
m
{\displaystyle m}
-dimensional) distances and actual distances in r-dimensional space. It is defined as:
-
σ
(
X
)
=
∑
i
<
j
≤
n
w
i
j
(
d
i
j
(
X
)
−
δ
i
j
)
2
{\displaystyle \sigma (X)=\sum _{i<j\leq n}w_{ij}(d_{ij}(X)-\delta _{ij})^{2}}
where
w
i
j
≥
0
{\displaystyle w_{ij}\geq 0}
is a weight for the measurement between a pair of points
(
i
,
j
)
{\displaystyle (i,j)}
,
d
i
j
(
X
)
{\displaystyle d_{ij}(X)}
is the euclidean distance between
i
{\displaystyle i}
and
j
{\displaystyle j}
and
δ
i
j
{\displaystyle \delta _{ij}}
is the ideal distance between the points (their separation) in the
m
{\displaystyle m}
-dimensional data space. Note that
w
i
j
{\displaystyle w_{ij}}
can be used to specify a degree of confidence in the similarity between points (e.g. 0 can be specified if there is no information for a particular pair).
A configuration
X
{\displaystyle X}
which minimizes
σ
(
X
)
{\displaystyle \sigma (X)}
gives a plot in which points that are close together correspond to points that are also close together in the original
m
{\displaystyle m}
-dimensional data space.
There are many ways that
σ
(
X
)
{\displaystyle \sigma (X)}
could be minimized. For example, Kruskal[1] recommended an iterative steepest descent approach. However, a significantly better (in terms of guarantees on, and rate of, convergence) method for minimizing stress was introduced by Jan de Leeuw.[2] De Leeuw's iterative majorization method at each step minimizes a simple convex function which both bounds
σ
{\displaystyle \sigma }
from above and touches the surface of
σ
{\displaystyle \sigma }
at a point
Z
{\displaystyle Z}
, called the supporting point. In convex analysis such a function is called a majorizing function. This iterative majorization process is also referred to as the SMACOF algorithm ("Scaling by MAjorizing a COmplicated Function").
The SMACOF algorithm
The stress function
σ
{\displaystyle \sigma }
can be expanded as follows:
-
σ
(
X
)
=
∑
i
<
j
≤
n
w
i
j
(
d
i
j
(
X
)
−
δ
i
j
)
2
=
∑
i
<
j
w
i
j
δ
i
j
2
+
∑
i
<
j
w
i
j
d
i
j
2
(
X
)
−
2
∑
i
<
j
w
i
j
δ
i
j
d
i
j
(
X
)
{\displaystyle \sigma (X)=\sum _{i<j\leq n}w_{ij}(d_{ij}(X)-\delta _{ij})^{2}=\sum _{i<j}w_{ij}\delta _{ij}^{2}+\sum _{i<j}w_{ij}d_{ij}^{2}(X)-2\sum _{i<j}w_{ij}\delta _{ij}d_{ij}(X)}
Note that the first term is a constant
C
{\displaystyle C}
and the second term is quadratic in
X
{\displaystyle X}
(i.e. for the Hessian matrix
V
{\displaystyle V}
the second term is equivalent to tr
X
′
V
X
{\displaystyle X'VX}
) and therefore relatively easily solved. The third term is bounded by:
-
∑
i
<
j
w
i
j
δ
i
j
d
i
j
(
X
)
=
tr
X
′
B
(
X
)
X
≥
tr
X
′
B
(
Z
)
Z
{\displaystyle \sum _{i<j}w_{ij}\delta _{ij}d_{ij}(X)=\,\operatorname {tr} \,X'B(X)X\geq \,\operatorname {tr} \,X'B(Z)Z}
where
B
(
Z
)
{\displaystyle B(Z)}
has:
-
b
i
j
=
−
w
i
j
δ
i
j
d
i
j
(
Z
)
{\displaystyle b_{ij}=-{\frac {w_{ij}\delta _{ij}}{d_{ij}(Z)}}}
for d i j ( Z ) ≠ 0 , i ≠ j {\displaystyle d_{ij}(Z)\neq 0,i\neq j}
and
b
i
j
=
0
{\displaystyle b_{ij}=0}
for
d
i
j
(
Z
)
=
0
,
i
≠
j
{\displaystyle d_{ij}(Z)=0,i\neq j}
and
b
i
i
=
−
∑
j
=
1
,
j
≠
i
n
b
i
j
{\displaystyle b_{ii}=-\sum _{j=1,j\neq i}^{n}b_{ij}}
.
Proof of this inequality is by the Cauchy-Schwarz inequality, see Borg[3] (pp. 152–153).
Thus, we have a simple quadratic function
τ
(
X
,
Z
)
{\displaystyle \tau (X,Z)}
that majorizes stress:
-
σ
(
X
)
=
C
+
tr
X
′
V
X
−
2
tr
X
′
B
(
X
)
X
{\displaystyle \sigma (X)=C+\,\operatorname {tr} \,X'VX-2\,\operatorname {tr} \,X'B(X)X}
-
≤
C
+
tr
X
′
V
X
−
2
tr
X
′
B
(
Z
)
Z
=
τ
(
X
,
Z
)
{\displaystyle \leq C+\,\operatorname {tr} \,X'VX-2\,\operatorname {tr} \,X'B(Z)Z=\tau (X,Z)}
The iterative minimization procedure is then:
- at the
k
t
h
{\displaystyle k^{th}}
step we set Z ← X k − 1 {\displaystyle Z\leftarrow X^{k-1}}
-
X
k
←
min
X
τ
(
X
,
Z
)
{\displaystyle X^{k}\leftarrow \min _{X}\tau (X,Z)}
- stop if
σ
(
X
k
−
1
)
−
σ
(
X
k
)
<
ϵ
{\displaystyle \sigma (X^{k-1})-\sigma (X^{k})<\epsilon }
otherwise repeat.
This algorithm has been shown to decrease stress monotonically (see de Leeuw[2]).
Use in graph drawing
Stress majorization and algorithms similar to SMACOF also have application in the field of graph drawing.[4][5] That is, one can find a reasonably aesthetically appealing layout for a network or graph by minimizing a stress function over the positions of the nodes in the graph. In this case, the
δ
i
j
{\displaystyle \delta _{ij}}
are usually set to the graph-theoretic distances between nodes
i
{\displaystyle i}
and
j
{\displaystyle j}
and the weights
w
i
j
{\displaystyle w_{ij}}
are taken to be
δ
i
j
−
α
{\displaystyle \delta _{ij}^{-\alpha }}
. Here,
α
{\displaystyle \alpha }
is chosen as a trade-off between preserving long- or short-range ideal distances. Good results have been shown for
α
=
2
{\displaystyle \alpha =2}
.[6]
References
- Kruskal, J. B. (1964), "Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis", Psychometrika, 29 (1): 1–27, doi:10.1007/BF02289565.
- de Leeuw, J. (1977), "Applications of convex analysis to multidimensional scaling", in Barra, J. R.; Brodeau, F.; Romie, G.; et al. (eds.), Recent developments in statistics, pp. 133–145.
- Borg, I.; Groenen, P. (1997), Modern Multidimensional Scaling: theory and applications, New York: Springer-Verlag.
- Michailidis, G.; de Leeuw, J. (2001), "Data visualization through graph drawing", Computation Stat., 16 (3): 435–450, CiteSeerX 10.1.1.28.9372, doi:10.1007/s001800100077.
- Gansner, E.; Koren, Y.; North, S. (2004), "Graph Drawing by Stress Majorization", Proceedings of 12th Int. Symp. Graph Drawing (GD'04), Lecture Notes in Computer Science, vol. 3383, Springer-Verlag, pp. 239–250.
- Cohen, J. (1997), "Drawing graphs to convey proximity: an incremental arrangement method", ACM Transactions on Computer-Human Interaction, 4 (3): 197–229, doi:10.1145/264645.264657.