SVM and Polytope Distance

Given two point sets $P=\{\bm{u}_{1},\dots,\bm{u}_{n_{1}}\}$ and $Q=\{\bm{v}_{1},\dots,\bm{v}_{n_{2}}\}$ in $\bm{R}^{d}$ , the polytope distance problem is to find $\bm{u}\in CH(P),\bm{v}\in CH(Q)$ such that $\|\bm{u}-\bm{v}\|$ is minimized, where $CH(\cdot)$ is the convex hull. The problem can be formatted as an optimization problem:

	$\displaystyle\min_{\bm{\mu},\bm{\lambda}}\quad$	$\displaystyle\frac{1}{2}\\|\bm{P\mu}-\bm{Q\lambda}\\|^{2}$		(1)
	$\displaystyle\operatorname*{s.t.}\quad$	$\displaystyle\bm{1}^{T}\bm{\mu}=1\quad\bm{1}^{T}\bm{\lambda}=1\quad\bm{\mu},% \bm{\lambda}\geq 0,$		(1)

where $\bm{P}=(\bm{u}_{1},\dots,\bm{u}_{n_{1}})$ and $\bm{Q}=(\bm{v}_{1},\dots,\bm{v}_{n_{2}})$ .

Suppose $P$ and $Q$ are linearly separable, i.e. there exists a hyperplane that separates the two points sets, then there exists $\bm{w}\in\bm{R}^{d}$ and $\alpha,\beta\in\bm{R}$ , such that $\bm{w}^{T}\bm{u}_{i}\geq\alpha\forall\bm{u}_{i}\in P$ and $\bm{w}^{T}\bm{v}_{i}\leq\beta\forall\bm{v}_{i}\in Q$ . This is known as the standard support vector machine (SVM). The distance between the two supporting hyperplanes is $\frac{\alpha-\beta}{\|\bm{w}\|}$ . Therefore, the distance between the two planes can be maximized by minimizing $\|\bm{w}\|$ and maximizing $(\alpha-\beta)$ . This can be written as an optimization problem:

$\displaystyle\min_{\bm{w},\alpha,\beta}\quad$	$\displaystyle\frac{1}{2}\\|\bm{w}\\|^{2}-(\alpha-\beta)$	(2)
$\displaystyle\operatorname*{s.t.}\quad$	$\displaystyle\bm{P}^{T}\bm{w}\geq\alpha\bm{1}$
	$\displaystyle\bm{Q}^{T}\bm{w}\leq\beta\bm{1}.$

Theorem 1.

Problem (1) and Problem (2) are dual of each other.

Proof.

The Lagrangian of Problem (2) is:

L(\bm{w},\alpha,\beta,\bm{\mu},\bm{\lambda})=\frac{1}{2}\|\bm{w}\|^{2}-(\alpha% -\beta)-\bm{\mu}^{T}(\bm{P}^{T}\bm{w}-\alpha\bm{1})-\bm{\lambda}^{T}(\beta\bm{% 1}-\bm{Q}^{T}\bm{w}).

The problem is equivalent to:

\min_{\bm{w},\alpha,\beta}\left(\max_{\bm{\mu}\geq 0,\bm{\lambda}\geq 0}L(\bm{% w},\alpha,\beta,\bm{\mu},\bm{\lambda})\right).

By the minimax theorem, the dual problem is:

\max_{\bm{\mu}\geq 0,\bm{\lambda}\geq 0}\left(\min_{\bm{w},\alpha,\beta}L(\bm{% w},\alpha,\beta,\bm{\mu},\bm{\lambda})\right).

Set $\frac{\partial L}{\partial\bm{w}}=0,\frac{\partial L}{\partial\alpha}=0,\frac{% \partial L}{\partial\beta}=0$ , we have:

	$\displaystyle\frac{\partial L}{\partial\bm{w}}$	$\displaystyle=\bm{w}-\bm{P}\bm{\mu}+\bm{Q}\bm{\lambda}=0$
	$\displaystyle\frac{\partial L}{\partial\alpha}$	$\displaystyle=-1+\bm{1}^{T}\bm{\mu}=0$
	$\displaystyle\frac{\partial L}{\partial\beta}$	$\displaystyle=1-\bm{1}^{T}\bm{\lambda}=0.$

Substitute $\bm{w}=\bm{P\mu}-\bm{Q\lambda}$ , the problem becomes:

	$\displaystyle\max\quad$	$\displaystyle\frac{1}{2}\\|\bm{P\mu}-\bm{Q\lambda}\\|^{2}-(\alpha-\beta)-\bm{\mu% }^{T}(\bm{P}^{T}(\bm{P\mu}-\bm{Q\lambda})-\alpha\bm{1})-\bm{\lambda}^{T}(\beta% \bm{1}-\bm{Q}^{T}(\bm{P\mu}-\bm{Q\lambda}))$
	$\displaystyle\operatorname*{s.t.}\quad$	$\displaystyle\bm{1}^{T}\bm{\mu}=1\quad\bm{1}^{T}\bm{\lambda}=1\quad\bm{\mu},% \bm{\lambda}\geq 0,$

which simplifies to

	$\displaystyle\max\quad$	$\displaystyle\frac{1}{2}\\|\bm{P\mu}-\bm{Q\lambda}\\|^{2}-(\alpha-\beta)+\alpha-% \beta+(\bm{Q\lambda}-\bm{P\mu})^{T}(\bm{P\mu}-\bm{Q\lambda})$
	$\displaystyle\operatorname*{s.t.}\quad$	$\displaystyle\bm{1}^{T}\bm{\mu}=1\quad\bm{1}^{T}\bm{\lambda}=1\quad\bm{\mu},% \bm{\lambda}\geq 0,$

and can be further simplified to

	$\displaystyle\max\quad$	$\displaystyle-\frac{1}{2}\\|\bm{P\mu}-\bm{Q\lambda}\\|^{2}$
	$\displaystyle\operatorname*{s.t.}\quad$	$\displaystyle\bm{1}^{T}\bm{\mu}=1\quad\bm{1}^{T}\bm{\lambda}=1\quad\bm{\mu},% \bm{\lambda}\geq 0,$

which is equivalent to Problem (1) and completes the proof. ∎

If we fix $\alpha-\beta=2$ by defining $\alpha=\gamma+1$ and $\beta=\gamma-1$ , Problem (2) becomes:

$\displaystyle\min_{\bm{w},\gamma}\quad$	$\displaystyle\frac{1}{2}\\|\bm{w}\\|^{2}$	(3)
$\displaystyle\operatorname*{s.t.}\quad$	$\displaystyle\bm{P}^{T}\bm{w}\geq(1+\gamma)\bm{1}$
	$\displaystyle-\bm{Q}^{T}\bm{w}\leq(1-\gamma)\bm{1}.$

Theorem 2.

If $P$ and $Q$ are linearly separable, i.e. $\frac{1}{2}\|\bm{P\mu}-\bm{Q\lambda}\|^{2}>0$ , Problem (2) and Problem (3) are equivalent.

Proof.

Recall the Lagrangian of Problem (2) is:

L(\bm{w},\alpha,\beta,\bm{\mu},\bm{\lambda})=\frac{1}{2}\|\bm{w}\|^{2}-(\alpha% -\beta)-\bm{\mu}^{T}(\bm{P}^{T}\bm{w}-\alpha\bm{1})-\bm{\lambda}^{T}(\beta\bm{% 1}-\bm{Q}^{T}\bm{w}).

Each KKT point $(\bar{\bm{w}},\bar{\alpha},\bar{\beta},\bar{\bm{\mu}},\bar{\bm{\lambda}})$ of Problem (2) satisfies

\begin{aligned} \bm{P}\bar{\bm{\mu}}-\bm{Q}\bar{\bm{\lambda}}=\bar{\bm{w}}&\\ \bm{1}^{T}\bar{\bm{\mu}}=1&\\ \bm{1}^{T}\bar{\bm{\lambda}}=1&\end{aligned}\quad\quad\begin{aligned} \bar{\bm% {\mu}}^{T}(\bm{P}^{T}\bar{\bm{w}}-\bar{\alpha}\bm{1})=0&\\ \bar{\bm{\lambda}}^{T}(\bar{\beta}\bm{1}-\bm{Q}^{T}\bar{\bm{w}})=0&\end{% aligned}\quad\quad\begin{aligned} \bm{P}^{T}\bar{\bm{w}}\geq\bar{\alpha}\bm{1}% &\\ \bm{Q}^{T}\bar{\bm{w}}\leq\bar{\beta}\bm{1}&\end{aligned}\quad\quad\begin{% aligned} \bar{\bm{\mu}}&\geq 0\\ \bar{\bm{\lambda}}&\geq 0\end{aligned}

(4)

The Lagrangian of Problem (3) is:

L(\bm{w},\gamma,\bm{\mu},\bm{\lambda})=\frac{1}{2}\|\bm{w}\|^{2}-\bm{\mu}^{T}(% \bm{P}^{T}\bm{w}-(1+\gamma)\bm{1})-\bm{\lambda}^{T}((\gamma-1)-\bm{Q}^{T}\bm{w% }).

Each KKT point $(\hat{\bm{w}},\hat{\gamma},\hat{\bm{\mu}},\hat{\bm{\lambda}})$ of Problem (3) satisfies:

\begin{aligned} \bm{P}\hat{\bm{\mu}}-\bm{Q}\hat{\bm{\lambda}}=\hat{\bm{w}}&\\ \bm{1}^{T}\hat{\bm{\mu}}=\bm{1}^{T}\hat{\bm{\lambda}}&\end{aligned}\quad\quad% \begin{aligned} \hat{\bm{\mu}}^{T}(\bm{P}^{T}\hat{\bm{w}}-(1+\hat{\gamma})\bm{% 1})&=0\\ \hat{\bm{\lambda}}^{T}((\hat{\gamma}-1)\bm{1}-\bm{Q}^{T}\hat{\bm{w}})&=0\end{% aligned}\quad\quad\begin{aligned} \bm{P}^{T}\bar{\bm{w}}&\geq(1+\hat{\gamma})% \bm{1}\\ \bm{Q}^{T}\bar{\bm{w}}&\leq(\hat{\gamma}-1)\bm{1}\end{aligned}\quad\quad\begin% {aligned} \hat{\bm{\mu}}&\geq 0\\ \hat{\bm{\lambda}}&\geq 0\end{aligned}

(5)

Assuming $\tilde{\alpha}-\tilde{\beta}>0$ , set $\delta=\frac{2}{\tilde{\alpha}-\tilde{\beta}}$ , $\tilde{\alpha}=\frac{\hat{\gamma}+1}{\delta}$ , $\tilde{\beta}=\frac{\hat{\gamma}-1}{\delta}$ , $\tilde{\bm{w}}=\frac{\hat{\bm{w}}}{\delta}$ , $\tilde{\bm{\mu}}=\frac{\hat{\bm{\mu}}}{\delta}$ , $\tilde{\bm{\lambda}}=\frac{\hat{\bm{\lambda}}}{\delta}$ and $\bm{1}^{T}\hat{\bm{\mu}}=\bm{1}^{T}\hat{\bm{\lambda}}=\delta$ , and divide each KKT condition in (5) by $\delta$ or $\delta^{2}$ , we have

\displaystyle\begin{aligned} \bm{P}\tilde{\bm{\mu}}-\bm{Q}\tilde{\bm{\lambda}}% =\tilde{\bm{w}}&\\ \bm{1}^{T}\tilde{\bm{\mu}}=1&\\ \bm{1}^{T}\tilde{\bm{\lambda}}=1&\end{aligned}\quad\quad\begin{aligned} \tilde% {\bm{\mu}}^{T}(\bm{P}^{T}\tilde{\bm{w}}-\tilde{\alpha}\bm{1})=0&\\ \tilde{\bm{\lambda}}^{T}(\tilde{\beta}\bm{1}-\bm{Q}^{T}\tilde{\bm{w}})=0&\end{% aligned}\quad\quad\begin{aligned} \bm{P}^{T}\tilde{\bm{w}}\geq\tilde{\alpha}% \bm{1}&\\ \bm{Q}^{T}\tilde{\bm{w}}\leq\tilde{\beta}\bm{1}&\end{aligned}\quad\quad\begin{% aligned} \tilde{\bm{\mu}}&\geq 0\\ \tilde{\bm{\lambda}}&\geq 0\end{aligned}

which coincides with the KKT condition (4) and implies that $(\tilde{\bm{w}},\tilde{\alpha},\tilde{\beta},\tilde{\bm{\mu}},\tilde{\bm{% \lambda}})$ is a KKT point to Problem (2). Since $P$ and $Q$ are linearly separable, by strong duality, we have

\frac{1}{2}\|\tilde{\bm{w}}\|^{2}-(\tilde{\alpha}-\tilde{\beta})=-\frac{1}{2}% \|\bm{P}\tilde{\bm{\mu}}-\bm{Q}\tilde{\bm{\lambda}}\|^{2}<0,

which verifies our assumption $\tilde{\alpha}-\tilde{\beta}>0$ . ∎

References

[1] K. P. Bennett and E. J. Bredensteiner (2000) Duality and geometry in svm classifiers. In ICML, Vol. 2000, pp. 57–64.
[2] B. Gärtner and M. Jaggi (2009) Coresets for polytope distance. In Proceedings of the twenty-fifth annual symposium on Computational geometry, pp. 33–42.