do some machine learning support vector machine (support vector machine-SVM) is quite familiar, because before learning in depth, SVM has been occupying the seat of the big brother of machine learning. His theory is very beautiful, and various kinds of variant versions are also improved, such as latent-SVM, structural-SVM and so on. Let's take a look at the theory of SVM first. In Figure 1, there are two kinds of data sets in A diagram. B, C and D provide a linear classifier to classify data. But which is better?

(Figure 1)

on this data set, three classifiers are the same good enough, but in fact, this is just the training set, practical test sample the distribution may be relatively scattered, there are a variety of possible, in order to cope with this situation, we have to do is try to make the linear classifier from two data sets are as far as possible far, because this will reduce the risk of the real test samples across the classifier, improve the accuracy of detection. This idea of maximizing the distance between data sets to classifiers (margin) is the core idea of support vector machine, and the nearest sample from classifier is the support vector. Now that we know our goal is to find the maximum margin, how can we find the support vector? How do you realize it? The following is shown in (Figure two) to illustrate how to do the work.

(Figure two)

hypothesis (Figure two) in the linear representation of a super surface, in order to watch the show into one dimension linear features are super dimension, add a dimension, map you can also see, is two-dimensional, and the classifier is one-dimensional. If the feature is three dimensional, the classifier is a plane. The assumption of analytic formula for the alt= ", then point A to the distance of src=, this distance is given below:

(figure three) in

(Figure three), the green color diamond ultra, Xn data point, W is super weight, and W is perpendicular to the surface of the super. It is proved that the vertical is very simple. Suppose X'and X'' are all points on the hyperplane,


", so W is perpendicular to the hyperplane. Know the W perpendicular to the hyperplane, then Xn to the distance is actually a line Xn and super surface of an arbitrary point on the X on the W projection, as shown in (Figure four):


set into the Lagrange multiplier method such as formula (formula five) shown:

(formula five)

in (formula five) by Lagrange the sub function method respectively for W and B derivative, in order to get the extremum point that derivative is 0,


, then put them into the Lagrange multiplier method in formula (formula six) form:


(formula six)

(formula six) after the two line is at present we require optimization function solutions, now only need to make a plan for two times can be obtained by alpha, the two programming optimization such as (equation seven) shown:

" P style= text-align: "center" > (formula seven


by (formula seven) for alpha, you can use (formula six) in the first line for W. So far, the formula derivation of SVM has basically been completed. It can be seen that mathematical theory is very strict and beautiful. Although some colleagues think it's boring, it's not difficult to finish it down from scratch, but the difficulty is optimization. The two way planning solution is very large. In practice, SMO (Sequential minimal optimization) algorithm is commonly used, and SMO algorithm is intended to be combined with code in the next section.


[1]machine learning in action. Peter Harrington

above [2] Learning From Data. Yaser S.Abu-Mostafa

is the whole content of this paper, hope to help everyone to learn, I hope you will support a script.

This paper fixed link: | Script Home | +Copy Link

Article reprint please specify:Python machine learning theory and real battle (five) support vector machine | Script Home

You may also be interested in these articles!