芜湖方特夜场在几期?:Xu Cui ? SVM (support vector machine) with libsvm
来源:百度文库 编辑:偶看新闻 时间:2024/05/02 08:43:08
I am learning svm lately and tried libsvm. It’s a good package.
Linear kernel example (support vectors are in circles):
Nonlinear example (radial basis)
3-class example
Basic procedure to use libsvm:
- Preprocess your data. This including normalization (make all values between 0 and 1) and transform non-numeric values to numeric. You can use the following code to normalize (from libsvm webpage):
(data - repmat(min(data,[],1),size(data,1),1))*spdiags(1./(max(data,[],1)-min(data,[],1))',0,size(data,2),size(data,2))
- Find optimal parameter values. For linear kernel, you have 1 parameter C (penalize parameter). For commonly used radial kernel, you have two parameters (C and gamma). Different parameter values will yield different accuracy rate. To avoid over fitting, you use n-fold cross validation. For example, a 5-fold cross validation is to use 4/5 of the data to train the svm model and the rest 1/5 to test. The option -c, -g, and -v controls parameter C, gamma and n-fold cross validation. A piece of code from libsvm website is:
bestcv = 0;
for log2c = -1:3,
for log2g = -4:1,
cmd = ['-v 5 -c ', num2str(2^log2c), ' -g ', num2str(2^log2g)];
cv = svmtrain(heart_scale_label, heart_scale_inst, cmd);
if (cv >= bestcv),
bestcv = cv; bestc = 2^log2c; bestg = 2^log2g;
end
fprintf('%g %g %g (best c=%g, g=%g, rate=%g)\n', log2c, log2g, cv, bestc, bestg, bestcv);
end
end - You may have to run the above code several times with different range of parameter values to find the optimal values. For example, you might want to start from a bigger range with coarse resolution; then fine tune to smaller regions with higher resolution.
- After finding the optimal parameter values, use all data to train your model with your optimal parameter values.
cmd = ['-t 2 -c ', num2str(bestc), ' -g ', num2str(bestg)];
model = svmtrain(l, d, cmd); - If you have new data, you may use this model to classify the new data.
[predicted_label, accuracy, decision_values] = svmpredict(zeros(size(dd,1),1), dd, model);
Commonly used options
- -v n: n-fold cross validation
- -t 0: linear kernel
- -t 2: radial basis (default)
- -s 0: SVC type = C-SVC
- -C: C parameter value, default 1
- -g: gamma parameter value
libsvm performance
I tested on different data size and record the time spent (in second).
Computer: Processor: 2×2.66G, memory: 12G, OS: Windows XP installed in VMWare in Mac OS 10.5
data size # features svmtrain svmpredict
100 2 0.00 0.00
100 6 0.00 0.00
100 10 0.00 0.00
100 20 0.00 0.00
100 50 0.01 0.00
100 100 0.02 0.01
500 2 0.02 0.01
500 6 0.03 0.02
500 10 0.05 0.03
500 20 0.08 0.03
500 50 0.46 0.07
500 100 0.56 0.12
1000 2 0.07 0.04
1000 6 0.10 0.06
1000 10 0.15 0.10
1000 20 0.36 0.14
1000 50 1.09 0.30
1000 100 3.07 0.50
It’s fairly fast.
Resources:
MatLab code to generate the plots above:cuixu_test_svm1
SVM basics: http://en.wikipedia.org/wiki/Support_vector_machine
Download libsvm for matlab at: http://www.csie.ntu.edu.tw/~cjlin/libsvm/#matlab
The meaning of libsvm output is at: http://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html#f804