The Curse of Dimensionality

September 20, 2018
statistics

The current Wikipedia entry on the curse of dimensionality describes the problem well with respect to having enough data for a problem, and also with respect to distance functions. I’m copying these entries here, in case they change or disappear later.

Sampling

There is an exponential increase in volume associated with adding extra dimensions to a mathematical space. For example, \(10^2=100\) evenly spaced sample points suffice to sample a unit interval (a “1-dimensional cube”) with no more than \(10^{−2}=0.01\) distance between points; an equivalent sampling of a 10-dimensional unit hypercube with a lattice that has a spacing of \(10^{−2}=0.01\) between adjacent points would require \(10^{20}=(10^2)^{10}\) sample points. In general, with a spacing distance of \(10^{−n}\) the 10-dimensional hypercube appears to be a factor of \(10^{n(10-1)}=(10^n)^{10}/(10^n)\) “larger” than the 1-dimensional hypercube, which is the unit interval. In the above example \(n=2\): when using a sampling distance of 0.01 the 10-dimensional hypercube appears to be \(10^{18}\) “larger” than the unit interval. This effect is a combination of the combinatorics problems above and the distance function problems explained below.

Distance Functions

When a measure such as a Euclidean distance is defined using many coordinates, there is little difference in the distances between different pairs of samples.

One way to illustrate the “vastness” of high-dimensional Euclidean space is to compare the proportion of an inscribed hypersphere with radius \(r\) and dimension \(d\), to that of a hypercube with edges of length \(2r\). The volume of such a sphere is \(\dfrac{2r^{d}\pi ^{d/2}}{d\Gamma (d/2)}\), where \(\Gamma\) is the gamma function, while the volume of the cube is \((2r)^d\). As the dimension \(d\) of the space increases, the hypersphere becomes an insignificant volume relative to that of the hypercube. This can clearly be seen by comparing the proportions as the dimension \(d\) goes to infinity:

\[\dfrac{V_{hypersphere}}{V_{hypercube}}=\dfrac{\pi ^{d/2}}{d2^{d-1}\Gamma (d/2)}\rightarrow 0 \ \ \text{as}\ \ d \rightarrow \infty\]

Furthermore, the distance between the center and the corners is \(r{\sqrt {d}}\), which increases without bound for fixed \(r\). In this sense, nearly all of the high-dimensional space is “far away” from the centre. To put it another way, the high-dimensional unit hypercube can be said to consist almost entirely of the “corners” of the hypercube, with almost no “middle”.