Given two point sets P={𝒖1,…,𝒖n1} and Q={𝒗1,…,𝒗n2} in 𝑹d, the reduced polytope distance problem (RPD) is to find two density distributions 𝝁,𝝀 over the input points, such that βˆ₯𝑷⁒𝝁-𝑸⁒𝝀βˆ₯ is minimized, here 𝟏T⁒𝝁=𝟏T⁒𝝀=1 and 0≀𝝁,𝝀≀D for some constant D≀1. The idea of having the upperbound D is to prevent excessive influence of an outlier point. The problem can be formatted as the following optimization problem:

min𝝁,𝝀  12⁒βˆ₯𝑷⁒𝝁-𝑸⁒𝝀βˆ₯2 (RPD)
s.t.  𝟏T⁒𝝁=1β€ƒπŸT⁒𝝀=1
0≀𝝁,𝝀≀D

The RPD problem is equivalent to C-SVM, but before we prove the equivalence of RPD and C-SVM, we introduce an intermittent problem, the soft-margin maximization problem (S-Margin), which can be written as the following problem:

minπ’˜,𝝃,𝜼,Ξ±,β  12⁒βˆ₯π’˜βˆ₯2-(Ξ±-Ξ²)+D⁒(𝟏T⁒𝝃+𝟏T⁒𝜼) (S-Margin)
s.t.  𝑷Tβ’π’˜β‰₯α⁒𝟏-𝝃,𝝃β‰₯0
𝑸Tβ’π’˜β‰€Ξ²β’πŸ+𝜼,𝜼β‰₯0
Theorem 1.

(RPD) and (S-Margin) are dual of each other.

Proof.

Consider the Lagrangian of (S-Margin):

L⁒(π’˜,𝝃,𝜼,Ξ±,Ξ²,𝝁,𝝀,𝒓,𝒔) =12⁒βˆ₯π’˜βˆ₯2-(Ξ±-Ξ²)+D⁒(𝟏T⁒𝝃+𝟏T⁒𝜼)
-𝝁T⁒(𝑷Tβ’π’˜-α⁒𝟏+𝝃)-𝒓T⁒𝝃
+𝝀T⁒(𝑸Tβ’π’˜-β⁒𝟏-𝜼)-𝒔T⁒𝜼

The KKT condition for this problem is:

Stationary: βˆ‚β‘L/βˆ‚β‘π’˜=π’˜-𝑷⁒𝝁+𝑸⁒𝝀=0
βˆ‚β‘L/βˆ‚β‘πƒ=D⁒𝟏-𝝁-𝒓=0
βˆ‚β‘L/βˆ‚β‘π€=D⁒𝟏-𝝀-𝒔=0
βˆ‚β‘L/βˆ‚β‘Ξ±=𝟏T⁒𝝁-1=0
βˆ‚β‘L/βˆ‚β‘Ξ²=𝟏T⁒𝝀-1=0
Complementary: 𝝁T⁒(𝑷Tβ’π’˜-α⁒𝟏+𝝃)=0,𝒓T⁒𝝃=0
𝝀T⁒(𝑸Tβ’π’˜-β⁒𝟏-𝜼)=0,𝒔T⁒𝜼=0
Primal feasibility: 𝑷Tβ’π’˜β‰₯α⁒𝟏-𝝃,𝝃β‰₯0
𝑸Tβ’π’˜β‰€Ξ²β’πŸ+𝜼,𝜼β‰₯0
Dual feasibility: 𝝁,𝝀,𝒓,𝒔β‰₯0

Take π’˜=𝑷⁒𝝁-𝑸⁒𝝀,𝒓=D⁒𝟏-𝝁,𝒔=D⁒𝟏-𝝀, the dual of (S-Margin) becomes:

max𝝁,𝝀  L⁒(𝝃,𝜼,𝝁,𝝀)
s.t.  𝟏T⁒𝝁=𝟏T⁒𝝀=1
0≀𝝁,𝝀≀D

Furthermore,

L⁒(𝝃,𝜼,𝝁,𝝀)= 12⁒βˆ₯π’˜βˆ₯2-(Ξ±-Ξ²)+D⁒(𝟏T⁒𝝃+𝟏T⁒𝜼)
-𝝁T⁒(𝑷Tβ’π’˜-α⁒𝟏+𝝃)-𝒓T⁒𝝃
+𝝀T⁒(𝑸Tβ’π’˜-β⁒𝟏-𝜼)-𝒔T⁒𝜼
= 12⁒βˆ₯𝑷⁒𝝁-𝑸⁒𝝀βˆ₯2-(Ξ±-Ξ²)+D⁒(𝟏T⁒𝝃+𝟏T⁒𝜼)
-𝝁T⁒𝑷T⁒(𝑷⁒𝝁-𝑸⁒𝝀)+α⁒𝟏T⁒𝝁-𝝃T⁒𝝁-(D⁒𝟏-𝝁)T⁒𝝃
+𝝀T⁒𝑸T⁒(𝑷⁒𝝁-𝑸⁒𝝀)-β⁒𝟏T⁒𝝀-𝜼T⁒𝝀-(D⁒𝟏-𝝀)T⁒𝜼
= 12⁒βˆ₯𝑷⁒𝝁-𝑸⁒𝝀βˆ₯2-(Ξ±-Ξ²)+D⁒𝟏T⁒𝝃+D⁒𝟏T⁒𝜼
-𝝁T⁒𝑷T⁒(𝑷⁒𝝁-𝑸⁒𝝀)+Ξ±-𝝃T⁒𝝁-D⁒𝟏T⁒𝝃+𝝁T⁒𝝃
+𝝀T⁒𝑸T⁒(𝑷⁒𝝁-𝑸⁒𝝀)-Ξ²-𝜼T⁒𝝀-D⁒𝟏T⁒𝜼+𝝀T⁒𝜼
= 12⁒βˆ₯𝑷⁒𝝁-𝑸⁒𝝀βˆ₯2-(𝝁T⁒𝑷T-𝝀T⁒𝑸T)⁒(𝑷⁒𝝁-𝑸⁒𝝀)
= -12⁒βˆ₯𝑷⁒𝝁-𝑸⁒𝝀βˆ₯2,

which is equivalent to (RPD) and completes the proof. ∎

Now we turn to the C-SVM problem. The C-SVM problem can be formatted as

minπ’˜,𝝃,𝜼,γ  12⁒βˆ₯π’˜βˆ₯2+C⁒(𝟏T⁒𝝃+𝟏T⁒𝜼) (C-SVM)
s.t.  𝑷Tβ’π’˜β‰₯(Ξ³+1)⁒𝟏-𝝃
𝑸Tβ’π’˜β‰€(Ξ³-1)⁒𝟏+𝜼
𝝃β‰₯0β€ƒπœΌβ‰₯0

Here C>0 is a parameter that is chosen by the user to control the slackness. Intuitively, this problem is equivalent to (S-Margin) by fixing (Ξ±-Ξ²)=2 and set Ξ±=Ξ³+1,Ξ²=Ξ³-1, since maximizing the soft-margin is the same as fixing the offset and minimizing the norm of the normal vector π’˜. The formal proof is given in TheoremΒ 2.

Theorem 2.

With an appropriate choice of the parameters C and D, the problems (S-Margin) and (C-SVM) are equivalent if the optimal objective is strictly greater than 0.

Proof.

Recall that (π’˜Β―,𝝃¯,𝜼¯,Ξ±Β―,Ξ²Β―,𝝁¯,𝝀¯) is a KKT point of (S-Margin) if it satisfies:

Stationary: π’˜Β―-𝑷⁒𝝁¯+𝑸⁒𝝀¯=0 (1)
D⁒𝟏-𝝁¯β‰₯0
D⁒𝟏-𝝀¯β‰₯0
𝟏T⁒𝝁¯-1=0
𝟏T⁒𝝀¯-1=0
Complementary: 𝝁¯T⁒(𝑷Tβ’π’˜Β―-α⁒𝟏+𝝃¯)=0,(D⁒𝟏-𝝁¯)T⁒𝝃¯=0
𝝀¯T⁒(𝑸Tβ’π’˜Β―-β⁒𝟏-𝜼¯)=0,(D⁒𝟏-𝝀¯)T⁒𝜼¯=0
Primal feasibility: 𝑷Tβ’π’˜Β―β‰₯α⁒𝟏-𝝃¯,𝝃¯β‰₯0
𝑸Tβ’π’˜Β―β‰€Ξ²β’πŸ+𝜼¯,𝜼¯β‰₯0
Dual feasibility: 𝝁¯,𝝀¯β‰₯0

Each KKT point (π’˜^,𝝃^,𝜼^,Ξ³^,𝝁^,𝝀^) of (C-SVM) satisfies:

Stationary: π’˜^-𝑷⁒𝝁^+𝑸⁒𝝀^=0 (2)
C⁒𝟏-𝝁^β‰₯0
C⁒𝟏-𝝀^β‰₯0
𝟏T⁒𝝁^-𝟏T⁒𝝀^=0
Complementary: 𝝁^T⁒(𝑷Tβ’π’˜^-(Ξ³^+1)⁒𝟏+𝝃^)=0,(C⁒𝟏-𝝁^)T⁒𝝃^=0
𝝀^T⁒(𝑸Tβ’π’˜^-(Ξ³^-1)⁒𝟏-𝜼^)=0,(C⁒𝟏-𝝀^)T⁒𝜼^=0
Primal feasibility: 𝑷Tβ’π’˜^≀(Ξ³^+1)⁒𝟏-𝝃^,𝝃^β‰₯0
𝑸Tβ’π’˜^β‰₯(Ξ³^-1)⁒𝟏+𝜼^,𝜼^β‰₯0
Dual feasibility: 𝝁^,𝝀^β‰₯0

Assuming Ξ±~-Ξ²~>0, set Ξ΄=2Ξ±~-Ξ²~,Ξ±~=Ξ³^+1Ξ΄,Ξ²~=Ξ³^-1Ξ΄,π’˜~=π’˜^Ξ΄,𝝃~=𝝃^Ξ΄,𝜼~=𝜼^Ξ΄,𝝁~=𝝁^Ξ΄,𝝀~=𝝀^Ξ΄, and 𝟏T⁒𝝁^=𝟏T⁒𝝀^=Ξ΄, and take D=CΞ΄, we have:

Stationary: π’˜~-𝑷⁒𝝁~+𝑸⁒𝝀~=0 (3)
D⁒𝟏-𝝁~β‰₯0
D⁒𝟏-𝝀~β‰₯0
𝟏T⁒𝝁~=𝟏T⁒𝝀~=1
Complementary: 𝝁~T⁒(𝑷Tβ’π’˜~-Ξ±~⁒𝟏+𝝃~)=0,(D⁒𝟏-𝝁~)T⁒𝝃~=0
𝝀~T⁒(𝑸Tβ’π’˜~-Ξ²~⁒𝟏-𝜼~)=0,(D⁒𝟏-𝝀~)T⁒𝜼~=0
Primal feasibility: 𝑷Tβ’π’˜~≀α~⁒𝟏-𝝃~,𝝃~β‰₯0
𝑸Tβ’π’˜~β‰₯Ξ²~⁒𝟏+𝜼~,𝜼~β‰₯0
Dual feasibility: 𝝁~,𝝀~β‰₯0

which coincides with (1), and implies that (π’˜~,𝝃~,𝜼~,Ξ±~,Ξ²~,𝝁~,𝝀~) is a KKT point to (S-Margin). The assumption Ξ±~-Ξ²~>0 is from the strong duality of (S-Margin):

12⁒βˆ₯π’˜~βˆ₯2+D⁒(𝟏T⁒𝝃~+𝟏T⁒𝜼~)-(Ξ±~-Ξ²~)=-12⁒βˆ₯𝑷⁒𝝁~-𝑸⁒𝝀~βˆ₯2<0,

where we assume the objective value is strictly greater than 0, i.e. the reduced convex hulls are linear separable. ∎

The figure below illustrates how the reduced polytope works. The left picture shows the original problem where the polytopes are not linearly separable, which corresponds to the parameter choice D=1. The right picture shows an example of reduced polytope, where the reduced polytopes are linearly separable, and it corresponds to the case when the parameter D<1.

References

  • [1] K. P. Bennett and E. J. Bredensteiner (2000) Duality and geometry in svm classifiers. In ICML, Vol. 2000, pp.Β 57–64.
  • [2] B. GΓ€rtner and M. Jaggi (2009) Coresets for polytope distance. In Proceedings of the twenty-fifth annual symposium on Computational geometry, pp.Β 33–42.