Classification

Classification#

A simple algorithm for classification#

In binary classification using kernel methods, you can classify a test point \( x \) by comparing its distance to the centers of mass (centroids) of positive and negative examples. Here’s a step-by-step derivation and explanation based on the kernelized distance approach:

Step-by-Step Derivation#

Define the Center of Mass:

Let \( S^+ \) and \( S^- \) be the sets of positive and negative training examples, respectively. The center of mass (centroid) in the feature space for positive examples \( \phi(S^+) \) and negative examples \( \phi(S^-) \) are:

\[ \phi_S^+ = \frac{1}{|S^+|} \sum_{x_i \in S^+} \phi(x_i) \]

\[ \phi_S^- = \frac{1}{|S^-|} \sum_{x_i \in S^-} \phi(x_i) \]
Distance from the Centroids:

The squared distance from a test point \( x \) to the positive centroid \( \phi_S^+ \) is:

\[ d^+(x) = \|\phi(x) - \phi_S^+\|^2 \]

Similarly, the squared distance from \( x \) to the negative centroid \( \phi_S^- \) is:

\[ d^-(x) = \|\phi(x) - \phi_S^-\|^2 \]
Calculate the Distances:

Using the properties of the kernel function and the centroids, we get:

\[ d^+(x) = \|\phi(x) - \phi_S^+\|^2 = \|\phi(x)\|^2 + \|\phi_S^+\|^2 - 2 \langle \phi(x), \phi_S^+ \rangle \]

\[ d^-(x) = \|\phi(x) - \phi_S^-\|^2 = \|\phi(x)\|^2 + \|\phi_S^-\|^2 - 2 \langle \phi(x), \phi_S^- \rangle \]

Substitute \( \|\phi(x)\|^2 = \kappa(x, x) \) and the inner products \( \langle \phi(x), \phi_S^+ \rangle \) and \( \langle \phi(x), \phi_S^- \rangle \):

\[ \langle \phi(x), \phi_S^+ \rangle = \frac{1}{|S^+|} \sum_{x_i \in S^+} \kappa(x, x_i) \]

\[ \langle \phi(x), \phi_S^- \rangle = \frac{1}{|S^-|} \sum_{x_i \in S^-} \kappa(x, x_i) \]

So:

\[ d^+(x) = \kappa(x, x) + \frac{1}{|S^+|^2} \sum_{i=1}^{|S^+|} \sum_{j=1}^{|S^+|} \kappa(x_i, x_j) - \frac{2}{|S^+|} \sum_{i=1}^{|S^+|} \kappa(x, x_i) \]

\[ d^-(x) = \kappa(x, x) + \frac{1}{|S^-|^2} \sum_{i=1}^{|S^-|} \sum_{j=1}^{|S^-|} \kappa(x_i, x_j) - \frac{2}{|S^-|} \sum_{i=1}^{|S^-|} \kappa(x, x_i) \]
Simplify the Classification Rule:

To classify \( x \), compare \( d^+(x) \) and \( d^-(x) \). The classification rule is:

\[\begin{split} h(x) = \begin{cases} +1 & \text{if } d^-(x) > d^+(x) \\ -1 & \text{otherwise} \end{cases} \end{split}\]

We can simplify this using the sign function. Express \( d^+(x) \) and \( d^-(x) \) in terms of kernel evaluations:

\[ d^+(x) = \kappa(x, x) + \frac{1}{|S^+|^2} \sum_{i=1}^{|S^+|} \sum_{j=1}^{|S^+|} \kappa(x_i, x_j) - \frac{2}{|S^+|} \sum_{i=1}^{|S^+|} \kappa(x, x_i) \]

\[ d^-(x) = \kappa(x, x) + \frac{1}{|S^-|^2} \sum_{i=1}^{|S^-|} \sum_{j=1}^{|S^-|} \kappa(x_i, x_j) - \frac{2}{|S^-|} \sum_{i=1}^{|S^-|} \kappa(x, x_i) \]

Thus:

\[ h(x) = \text{sgn} \left( \frac{1}{|S^+|^2} \sum_{i=1}^{|S^+|} \sum_{j=1}^{|S^+|} \kappa(x_i, x_j) - \frac{1}{|S^-|^2} \sum_{i=1}^{|S^-|} \sum_{j=1}^{|S^-|} \kappa(x_i, x_j) - \frac{2}{|S^+|} \sum_{i=1}^{|S^+|} \kappa(x, x_i) + \frac{2}{|S^-|} \sum_{i=1}^{|S^-|} \kappa(x, x_i) \right) \]
Final Classification Function:

To simplify, let:

\[ b = \frac{1}{2} \left( \frac{1}{|S^+|^2} \sum_{i=1}^{|S^+|} \sum_{j=1}^{|S^+|} \kappa(x_i, x_j) - \frac{1}{|S^-|^2} \sum_{i=1}^{|S^-|} \sum_{j=1}^{|S^-|} \kappa(x_i, x_j) \right) \]

Therefore:

\[ h(x) = \text{sgn} \left( \frac{1}{|S^+|} \sum_{i=1}^{|S^+|} \kappa(x, x_i) - \frac{1}{|S^-|} \sum_{i=1}^{|S^-|} \kappa(x, x_i) - b \right) \]

Classification

Contents

Classification#

A simple algorithm for classification#

Step-by-Step Derivation#