Svm why normalization
So the normalization of feature vectors prior to feeding them to the SVM is very important. This is often called whitening, although there are different types of whitening. You want to make sure that for each dimension, the values are scaled to lie roughly within this range.
Otherwise, if e. In more detail, you have to normalize all of your feature vectors by dimension, not instance , prior to sending them to your svm library. Some libraries recommend doing a 'hard' normalization, mapping the min and max values of a given dimension to 0 and 1. However, in our experience, we found that is better to do a 'soft' normalization which subtracts the mean of the values and divides by twice the standard deviation again, by dimension.
Thus, if your input has d dimensions, then you will have d means and d standard deviations, no matter how many training examples you have. Note that if you're implementing this yourself, standard deviations for some dimensions might be zero, so the division could give you a divide-by-zero error.
So you should add a very small epsilon value to it to prevent this. These normalized vectors are sent to your SVM library for training. Then during testing, it is important to construct the test feature vectors in exactly the same way, except that you use the means and standard deviations saved from the training data, rather than computing it from the test data.
In other words, scale your test inputs using the saved means and standard deviations, prior to sending them to your SVM library for classification. If you're using scikit-learn, then this soft scaling is provided under preprocessing. SVMs can be quite sensitive to training parameters, but fortunately there are relatively few of them to tune:. However, if you're dealing with multiple classifiers, then these raw values are not directly comparable!
Similarly, if you need a proper probability i. You have to convert them to the proper range. The standard way of doing this is Platt scaling , where a logistic regression model usually a sigmoid curve is fit to the outputs of some labeled inputs to obtain probabilities. If you have enough labeled data, this is probably the right thing to do, but in many cases, there isn't enough labeled data available to properly estimate the parameters of the sigmoid.
This does not require any labeled data to estimate and has a few other benefits as well: it doesn't assume balanced sampling or symmetric error models that are common in many Platt implementations. Code for this normalization is provided here. The best answers are voted up and rise to the top. Stack Overflow for Teams — Collaborate and share knowledge with a private group.
Create a free Team What is Teams? Learn more. Ask Question. Asked 8 years, 6 months ago. Active 4 years, 1 month ago. Viewed 40k times. Improve this question. Ferdi 4, 6 6 gold badges 42 42 silver badges 60 60 bronze badges. Add a comment. Active Oldest Votes. Improve this answer. Ansari Ansari 4 4 silver badges 8 8 bronze badges. Sign up or log in Sign up using Google. Sign up using Facebook. Sign up using Email and Password. Post as a guest Name. Email Required, but never shown.
Featured on Meta. Now live: A fully responsive profile. Linked 1.
0コメント