Given two point sets P={𝒖1,…,𝒖n1} and Q={𝒗1,…,𝒗n2} in 𝑹d, the polytope distance problem is to find π’–βˆˆC⁒H⁒(P),π’—βˆˆC⁒H⁒(Q) such that βˆ₯𝒖-𝒗βˆ₯ is minimized, where C⁒H⁒(β‹…) is the convex hull. The problem can be formatted as an optimization problem:

min𝝁,𝝀  12⁒βˆ₯𝑷⁒𝝁-𝑸⁒𝝀βˆ₯2 (1)
s.t.  𝟏T⁒𝝁=1β€ƒπŸT⁒𝝀=1 𝝁,𝝀β‰₯0,

where 𝑷=(𝒖1,…,𝒖n1) and 𝑸=(𝒗1,…,𝒗n2).

Suppose P and Q are linearly separable, i.e.Β there exists a hyperplane that separates the two points sets, then there exists π’˜βˆˆπ‘Ήd and Ξ±,Ξ²βˆˆπ‘Ή, such that π’˜T⁒𝒖iβ‰₯Ξ±β’βˆ€π’–i∈P and π’˜T⁒𝒗iβ‰€Ξ²β’βˆ€π’—i∈Q. This is known as the standard support vector machine (SVM). The distance between the two supporting hyperplanes is Ξ±-Ξ²βˆ₯π’˜βˆ₯. Therefore, the distance between the two planes can be maximized by minimizing βˆ₯π’˜βˆ₯ and maximizing (Ξ±-Ξ²). This can be written as an optimization problem:

minπ’˜,Ξ±,β  12⁒βˆ₯π’˜βˆ₯2-(Ξ±-Ξ²) (2)
s.t.  𝑷Tβ’π’˜β‰₯α⁒𝟏
𝑸Tβ’π’˜β‰€Ξ²β’πŸ.
Theorem 1.

Problem (1) and Problem (2) are dual of each other.

Proof.

The Lagrangian of ProblemΒ (2) is:

L⁒(π’˜,Ξ±,Ξ²,𝝁,𝝀)=12⁒βˆ₯π’˜βˆ₯2-(Ξ±-Ξ²)-𝝁T⁒(𝑷Tβ’π’˜-α⁒𝟏)-𝝀T⁒(β⁒𝟏-𝑸Tβ’π’˜).

The problem is equivalent to:

minπ’˜,Ξ±,β⁑(max𝝁β‰₯0,𝝀β‰₯0⁑L⁒(π’˜,Ξ±,Ξ²,𝝁,𝝀)).

By the minimax theorem, the dual problem is:

max𝝁β‰₯0,𝝀β‰₯0⁑(minπ’˜,Ξ±,β⁑L⁒(π’˜,Ξ±,Ξ²,𝝁,𝝀)).

Set βˆ‚β‘Lβˆ‚β‘π’˜=0,βˆ‚β‘Lβˆ‚β‘Ξ±=0,βˆ‚β‘Lβˆ‚β‘Ξ²=0, we have:

βˆ‚β‘Lβˆ‚β‘π’˜ =π’˜-𝑷⁒𝝁+𝑸⁒𝝀=0
βˆ‚β‘Lβˆ‚β‘Ξ± =-1+𝟏T⁒𝝁=0
βˆ‚β‘Lβˆ‚β‘Ξ² =1-𝟏T⁒𝝀=0.

Substitute π’˜=𝑷⁒𝝁-𝑸⁒𝝀, the problem becomes:

max  12⁒βˆ₯𝑷⁒𝝁-𝑸⁒𝝀βˆ₯2-(Ξ±-Ξ²)-𝝁T⁒(𝑷T⁒(𝑷⁒𝝁-𝑸⁒𝝀)-α⁒𝟏)-𝝀T⁒(β⁒𝟏-𝑸T⁒(𝑷⁒𝝁-𝑸⁒𝝀))
s.t.  𝟏T⁒𝝁=1β€ƒπŸT⁒𝝀=1 𝝁,𝝀β‰₯0,

which simplifies to

max  12⁒βˆ₯𝑷⁒𝝁-𝑸⁒𝝀βˆ₯2-(Ξ±-Ξ²)+Ξ±-Ξ²+(𝑸⁒𝝀-𝑷⁒𝝁)T⁒(𝑷⁒𝝁-𝑸⁒𝝀)
s.t.  𝟏T⁒𝝁=1β€ƒπŸT⁒𝝀=1 𝝁,𝝀β‰₯0,

and can be further simplified to

max  -12⁒βˆ₯𝑷⁒𝝁-𝑸⁒𝝀βˆ₯2
s.t.  𝟏T⁒𝝁=1β€ƒπŸT⁒𝝀=1 𝝁,𝝀β‰₯0,

which is equivalent to Problem (1) and completes the proof. ∎

If we fix Ξ±-Ξ²=2 by defining Ξ±=Ξ³+1 and Ξ²=Ξ³-1, Problem (2) becomes:

minπ’˜,γ  12⁒βˆ₯π’˜βˆ₯2 (3)
s.t.  𝑷Tβ’π’˜β‰₯(1+Ξ³)⁒𝟏
-𝑸Tβ’π’˜β‰€(1-Ξ³)⁒𝟏.
Theorem 2.

If P and Q are linearly separable, i.e.Β 12⁒βˆ₯𝐏⁒𝛍-πβ’π›Œβˆ₯2>0, ProblemΒ (2) and ProblemΒ (3) are equivalent.

Proof.

Recall the Lagrangian of ProblemΒ (2) is:

L⁒(π’˜,Ξ±,Ξ²,𝝁,𝝀)=12⁒βˆ₯π’˜βˆ₯2-(Ξ±-Ξ²)-𝝁T⁒(𝑷Tβ’π’˜-α⁒𝟏)-𝝀T⁒(β⁒𝟏-𝑸Tβ’π’˜).

Each KKT point (π’˜Β―,Ξ±Β―,Ξ²Β―,𝝁¯,𝝀¯) of ProblemΒ (2) satisfies

𝑷⁒𝝁¯-𝑸⁒𝝀¯=π’˜Β―πŸT⁒𝝁¯=1𝟏T⁒𝝀¯=1  𝝁¯T⁒(𝑷Tβ’π’˜Β―-α¯⁒𝟏)=0𝝀¯T⁒(β¯⁒𝟏-𝑸Tβ’π’˜Β―)=0  𝑷Tβ’π’˜Β―β‰₯Ξ±Β―β’πŸπ‘ΈTβ’π’˜Β―β‰€Ξ²Β―β’πŸβ€ƒβ€ƒπΒ―β‰₯0𝝀¯β‰₯0 (4)

The Lagrangian of ProblemΒ (3) is:

L⁒(π’˜,Ξ³,𝝁,𝝀)=12⁒βˆ₯π’˜βˆ₯2-𝝁T⁒(𝑷Tβ’π’˜-(1+Ξ³)⁒𝟏)-𝝀T⁒((Ξ³-1)-𝑸Tβ’π’˜).

Each KKT point (π’˜^,Ξ³^,𝝁^,𝝀^) of ProblemΒ (3) satisfies:

𝑷⁒𝝁^-𝑸⁒𝝀^=π’˜^𝟏T⁒𝝁^=𝟏T⁒𝝀^  𝝁^T⁒(𝑷Tβ’π’˜^-(1+Ξ³^)⁒𝟏)=0𝝀^T⁒((Ξ³^-1)⁒𝟏-𝑸Tβ’π’˜^)=0  𝑷Tβ’π’˜Β―β‰₯(1+Ξ³^)β’πŸπ‘ΈTβ’π’˜Β―β‰€(Ξ³^-1)β’πŸβ€ƒβ€ƒπ^β‰₯0𝝀^β‰₯0 (5)

Assuming Ξ±~-Ξ²~>0, set Ξ΄=2Ξ±~-Ξ²~, Ξ±~=Ξ³^+1Ξ΄, Ξ²~=Ξ³^-1Ξ΄, π’˜~=π’˜^Ξ΄, 𝝁~=𝝁^Ξ΄, 𝝀~=𝝀^Ξ΄ and 𝟏T⁒𝝁^=𝟏T⁒𝝀^=Ξ΄, and divide each KKT condition in (5) by Ξ΄ or Ξ΄2, we have

𝑷⁒𝝁~-𝑸⁒𝝀~=π’˜~𝟏T⁒𝝁~=1𝟏T⁒𝝀~=1  𝝁~T⁒(𝑷Tβ’π’˜~-Ξ±~⁒𝟏)=0𝝀~T⁒(Ξ²~⁒𝟏-𝑸Tβ’π’˜~)=0  𝑷Tβ’π’˜~β‰₯Ξ±~β’πŸπ‘ΈTβ’π’˜~≀β~β’πŸβ€ƒβ€ƒπ~β‰₯0𝝀~β‰₯0

which coincides with the KKT condition (4) and implies that (π’˜~,Ξ±~,Ξ²~,𝝁~,𝝀~) is a KKT point to ProblemΒ (2). Since P and Q are linearly separable, by strong duality, we have

12⁒βˆ₯π’˜~βˆ₯2-(Ξ±~-Ξ²~)=-12⁒βˆ₯𝑷⁒𝝁~-𝑸⁒𝝀~βˆ₯2<0,

which verifies our assumption α~-β~>0. ∎

References

  • [1] K. P. Bennett and E. J. Bredensteiner (2000) Duality and geometry in svm classifiers. In ICML, Vol. 2000, pp.Β 57–64.
  • [2] B. GΓ€rtner and M. Jaggi (2009) Coresets for polytope distance. In Proceedings of the twenty-fifth annual symposium on Computational geometry, pp.Β 33–42.