Quantcast
Channel: Keep the gradient flowing
Viewing all articles
Browse latest Browse all 195

Loss Functions for Ordinal regression

$
0
0

Note: this post contains a fair amount of LaTeX, if you don't visualize the math correctly come to its original location

In machine learning it is common to formulate the classification task as a minimization problem over a given loss function. Given data input data $(x_1, ..., x_n)$ and associated labels $(y_1, ..., y_n), y_i \in \lbrace-1, 1\rbrace$, the problem becomes to find a function $f(x)$ that minimizes

$$L(x, y) = \sum_i ^n \text{loss}(f(x_i), y_i)$$

where loss is any loss function. These are usually functions that become close to zero when $f(x_i)$ agrees in sign with $y_i$ and have a non-negative value when $f(x_i)$ have opposite signs. Common choices of loss functions are:

  • Zero-one loss, $I(f(x_i) = y_i)$, where $I$ is the indicator function.
  • Hinge loss, $\text{max}(0, 1 - f(x_i) y_i)$
  • Logistic loss, $\log(1 + \exp{f(x_i) y_i})$

Loss functions

1

In the paper Loss functions for preference levels: Regression with discrete ordered labels, the above setting that is commonly used in the classification and regression setting is extended for the ordinal regression problem. In ordinal regression, classes can take one of several discrete, but ordered, labels. Think for example on movie ratings that go from zero to ten stars. Here there's an inherent order in the sense that (unlike the multiclass classification setting) not all errors are equally bad. For instance, it is worse to mistake a 1-star with a 10-star than a 4-star movie with a 5-star one.

To extend the binary loss to the case of ordinal regression, the author introduces K-1 thresholds $-\infty = \theta_0 \lt \theta_1 \lt ... \lt \theta_{K-1}=\infty$. Each of the K segments corresponds to one of K labels and a predicted value between $\theta_{d}$ and $\theta_{d+1}$ corresponds to the prediction of class $d$ (supposing that classes to from zero to $d$). This generalizes the binary case by considering zero the unique threshold.

Suppose K=7 and a that the correct outcome of $y_i$ is 2, that is, $f(x_i)$ must lie between $\theta_1$ and $\theta_2$. In that case, it must be verified that $f(x_i) > \theta_0$ and $f(x_i) > \theta_1$ as well as $f(x_i) < \theta_3 < \theta_4 < \theta_5$. Treating this as K-1=6 independent classification problems produces the following family of loss functions (for a hinge loss):

Ordinal loss functions

The idea behind Rennie's paper is to sum all these loss function to produce a single optimization problem that is then the sum of all threshold violations. Here, not only the function increases when thresholds are violated, but the slope of the loss function increases each time a threshold is crossed.

Ordinal loss functions


  1. Code for generating all figures can be found here 


Viewing all articles
Browse latest Browse all 195

Trending Articles


mayabang Quotes, Torpe Quotes, tanga Quotes


Sapos para colorear


Patama Quotes : Tagalog Inspirational Quotes


Pamatay na Banat and Mga Patama Love Quotes


5 Tagalog Relationship Rules


INUMAN QUOTES


Re:Mutton Pies (lleechef)


FORECLOSURE OF REAL ESTATE MORTGAGE


Tagalog Quotes About Crush – Tagalog Love Quotes


Girasoles para colorear


OFW quotes : Pinoy Tagalog Quotes


Long Distance Relationship Tagalog Love Quotes


Tagalog Quotes To Move on and More Love Love Love Quotes


BARKADA TAGALOG QUOTES


Best Crush Tagalog Quotes And Sayings 2017


Vimeo 10.7.0 by Vimeo.com, Inc.


Vimeo 10.7.1 by Vimeo.com, Inc.


Break up Quotes Tagalog Love Quote – Broken Hearted Quotes Tagalog