[PRML] 1.4 The Curse of Dimensionality

1.4 The Curse of Dimensionality

Jan 2021

Wufei Ma
Purdue University

Notes for Patter Recognition and Machine Learning
- 1 Introduction
  - 1.1 Example: Polynomial Curve Fitting
  - 1.2 Probability Theory
  - 1.3 Probability Theory
  - 1.4 The Curse of Dimensionality
  - 1.5 Decision Theory

For practical applications of pattern recognition, we will have to deal with spaces of high dimensionality. This poses some serious challenges and is an important factor influencing the design of pattern recognition techniques.

We consider a synthetically generated data set representing measurements taken from a pipeline containing a mixture of oil, water, and gas. Each data point comprises a 12-dimensional input vector and is labeled as one of three classes. The figure below shows 100 points from this data set on a plot showing two of the measurements $x_6$ and $x_7$.

The goal is to classify the new data point denoted by the cross. The intuition here is that the identity of the cross should be determined more strongly by nearby points from the training set. One very simple approach would be to divide the input space into regular cells. When we are given a test point, we first decide which cell it belongs to, and we then make the prediction by taking the majority vote from training data points in the same cell.

One problem with this approach is that the number of such cells grows exponentially with the dimensionaltiy of the space, and so we need to find a more sophiscated approach.

Consider a sphere of radius $r=1$ in a space of $D$ dimensions, and ask what is the fraction of the volume of the sphere that lies between radius $r = 1 - \epsilon$ and $r=1$. Since the volume of a sphere in $D$ dimensions must scale as $r^D$, we have \[ V_D(r) = K_Dr^D \] where $K_D$ is a constant depending only on $D$. The required fraction is given by \[ \frac{V_D(1) - V_D(1 - \epsilon)}{V_D(1)} = 1 - (1-\epsilon)^D \] which is plotted as a function of $\epsilon$ in the figure below. We see that for large $D$, this fraction tends to $1$ even for small values of $\epsilon$.

As a further example, consider the behaviour of a Gaussian distribution in a high-dimensional space. TODO: Exercise 1.20

The severe difficulty that can arise in spaces of many dimensions is sometimes called the curse of dimensionality. However, it does not prevent us from finding effective techniques applicable to high-dimensional spaces. First, real data will often be confined to a region of the space having lower effective dimensionality, and in particular the directions over which important variations in the target variables occur. Second, real data will typically exhibit some smoothness properties, and so we can exploit local interpolation-like techniques to allow us to make predictions of the target variables.