Classification#

A simple algorithm for classification#

In binary classification using kernel methods, you can classify a test point \( x \) by comparing its distance to the centers of mass (centroids) of positive and negative examples. Here’s a step-by-step derivation and explanation based on the kernelized distance approach:

Step-by-Step Derivation#

  1. Define the Center of Mass:

    Let \( S^+ \) and \( S^- \) be the sets of positive and negative training examples, respectively. The center of mass (centroid) in the feature space for positive examples \( \phi(S^+) \) and negative examples \( \phi(S^-) \) are:

    \[ \phi_S^+ = \frac{1}{|S^+|} \sum_{x_i \in S^+} \phi(x_i) \]
    \[ \phi_S^- = \frac{1}{|S^-|} \sum_{x_i \in S^-} \phi(x_i) \]
  2. Distance from the Centroids:

    The squared distance from a test point \( x \) to the positive centroid \( \phi_S^+ \) is:

    \[ d^+(x) = \|\phi(x) - \phi_S^+\|^2 \]

    Similarly, the squared distance from \( x \) to the negative centroid \( \phi_S^- \) is:

    \[ d^-(x) = \|\phi(x) - \phi_S^-\|^2 \]
  3. Calculate the Distances:

    Using the properties of the kernel function and the centroids, we get:

    \[ d^+(x) = \|\phi(x) - \phi_S^+\|^2 = \|\phi(x)\|^2 + \|\phi_S^+\|^2 - 2 \langle \phi(x), \phi_S^+ \rangle \]
    \[ d^-(x) = \|\phi(x) - \phi_S^-\|^2 = \|\phi(x)\|^2 + \|\phi_S^-\|^2 - 2 \langle \phi(x), \phi_S^- \rangle \]

    Substitute \( \|\phi(x)\|^2 = \kappa(x, x) \) and the inner products \( \langle \phi(x), \phi_S^+ \rangle \) and \( \langle \phi(x), \phi_S^- \rangle \):

    \[ \langle \phi(x), \phi_S^+ \rangle = \frac{1}{|S^+|} \sum_{x_i \in S^+} \kappa(x, x_i) \]
    \[ \langle \phi(x), \phi_S^- \rangle = \frac{1}{|S^-|} \sum_{x_i \in S^-} \kappa(x, x_i) \]

    So:

    \[ d^+(x) = \kappa(x, x) + \frac{1}{|S^+|^2} \sum_{i=1}^{|S^+|} \sum_{j=1}^{|S^+|} \kappa(x_i, x_j) - \frac{2}{|S^+|} \sum_{i=1}^{|S^+|} \kappa(x, x_i) \]
    \[ d^-(x) = \kappa(x, x) + \frac{1}{|S^-|^2} \sum_{i=1}^{|S^-|} \sum_{j=1}^{|S^-|} \kappa(x_i, x_j) - \frac{2}{|S^-|} \sum_{i=1}^{|S^-|} \kappa(x, x_i) \]
  4. Simplify the Classification Rule:

    To classify \( x \), compare \( d^+(x) \) and \( d^-(x) \). The classification rule is:

    \[\begin{split} h(x) = \begin{cases} +1 & \text{if } d^-(x) > d^+(x) \\ -1 & \text{otherwise} \end{cases} \end{split}\]

    We can simplify this using the sign function. Express \( d^+(x) \) and \( d^-(x) \) in terms of kernel evaluations:

    \[ d^+(x) = \kappa(x, x) + \frac{1}{|S^+|^2} \sum_{i=1}^{|S^+|} \sum_{j=1}^{|S^+|} \kappa(x_i, x_j) - \frac{2}{|S^+|} \sum_{i=1}^{|S^+|} \kappa(x, x_i) \]
    \[ d^-(x) = \kappa(x, x) + \frac{1}{|S^-|^2} \sum_{i=1}^{|S^-|} \sum_{j=1}^{|S^-|} \kappa(x_i, x_j) - \frac{2}{|S^-|} \sum_{i=1}^{|S^-|} \kappa(x, x_i) \]

    Thus:

    \[ h(x) = \text{sgn} \left( \frac{1}{|S^+|^2} \sum_{i=1}^{|S^+|} \sum_{j=1}^{|S^+|} \kappa(x_i, x_j) - \frac{1}{|S^-|^2} \sum_{i=1}^{|S^-|} \sum_{j=1}^{|S^-|} \kappa(x_i, x_j) - \frac{2}{|S^+|} \sum_{i=1}^{|S^+|} \kappa(x, x_i) + \frac{2}{|S^-|} \sum_{i=1}^{|S^-|} \kappa(x, x_i) \right) \]
  5. Final Classification Function:

    To simplify, let:

    \[ b = \frac{1}{2} \left( \frac{1}{|S^+|^2} \sum_{i=1}^{|S^+|} \sum_{j=1}^{|S^+|} \kappa(x_i, x_j) - \frac{1}{|S^-|^2} \sum_{i=1}^{|S^-|} \sum_{j=1}^{|S^-|} \kappa(x_i, x_j) \right) \]

    Therefore:

    \[ h(x) = \text{sgn} \left( \frac{1}{|S^+|} \sum_{i=1}^{|S^+|} \kappa(x, x_i) - \frac{1}{|S^-|} \sum_{i=1}^{|S^-|} \kappa(x, x_i) - b \right) \]