Large Scale Multiple Kernel Learning
This page contains information regarding our JMLR paper "Large Scale Multiple Kernel Learning" by Sören Sonnenburg, Gunnar Rätsch, Christin Schäfer & Bernhard Schölkopf. The paper can be found here.
Abstract
While classical kernel-based learning algorithms are based on a single kernel, in practice it is often desirable to use multiple kernels. Lanckriet et al. (2004) considered conic combinations of kernel matrices for classification, leading to a convex quadratically constrained quadratic program. We show that it can be rewritten as a semi-infinite linear program that can be efficiently solved by recycling the standard SVM implementations. Moreover, we generalize the formulation and our method to a larger class of problems, including regression and one-class classification. Experimental results show that the proposed algorithm works for hundred thousands of examples or hundreds of kernels to be combined, and helps for automatic model selection, improving the interpretability of the learning result. In a second part we discuss general speed up mechanism for SVMs, especially when used with sparse feature maps as appear for string kernels, allowing us to train a string kernel SVM on a 10 million real-world splice data set from computational biology. We integrated multiple kernel learning in our machine learning toolbox SHOGUN for which the source code is publicly available at http://raetschlab.org/suppl/shogun.
Multiple Kernel Learning Examples
These are matlab examples for classification and regression. They make use of our machine learning toolbox SHOGUN , which is a requirement.
MKL for classifying christmas stars
% This script should enable you to rerun the experiment in the % paper that we labeled with "christmas star". % % The task is to classify two star-shaped classes that share the % midpoint. The difficulty of the learning problem depends on the % distance between the classes, which is varied % % Our model selection leads to a choice of C = 0.5. The model % selection is not repeated inside this script. % Preliminary settings: C = 0.5; % SVM Parameter cache_size = 50; % cache per kernel in MB svm_eps=1e-3; % svm epsilon mkl_eps=1e-3; % mkl epsilon no_obs = 2000; % number of observations / data points (sum for train and test and both classes) k_star = 20; % number of "leaves" of the stars alpha = 0.3; % noise level of the data radius_star(:,1) = [4.1:0.2:10]'; % increasing radius of the 1.class radius_star(:,2) = 4*ones(length(radius_star(:,1)),1); % fixed radius 2.class % distanz between the classes: diff(radius_star(:,1)-radius_star(:,2)) rbf_width = [0.01 0.1 1 10 100]; % different width for the five used rbf kernels %%%% %%%% Great loop: train MKL for every data set (the different distances between the stars) %%%% sg('send_command','loglevel ERROR'); sg('send_command','echo OFF'); for kk = 1:size(radius_star,1) % data generation fprintf('MKL for radius %+02.2f \n', radius_star(kk,1)) dummy(1,:) = rand(1,4*no_obs); noise = alpha*randn(1,4*no_obs); dummy(2,:) = sin(k_star*pi*dummy(1,:)) + noise; % sine dummy(2,1:2*no_obs) = dummy(2,1:2*no_obs)+ radius_star(kk,1); % distanz shift: first class dummy(2,(2*no_obs+1):end) = dummy(2,(2*no_obs+1):end)+ radius_star(kk,2); % distanz shift: second class dummy(1,: ) = 2*pi*dummy(1,:); x(1,:) = dummy(2,:).*sin(dummy(1,:)); x(2,:) = dummy(2,:).*cos(dummy(1,:)); train_y = [-ones(1,no_obs) ones(1,no_obs)]; test_y = [-ones(1,no_obs) ones(1,no_obs)]; train_x = x(:,1:2:end); test_x = x(:,2:2:end); clear dummy x; % train MKL sg('send_command','clean_kernels'); sg('send_command','clean_features TRAIN'); sg('add_features','TRAIN', train_x); % set a trainingset for every SVM sg('add_features','TRAIN', train_x); sg('add_features','TRAIN', train_x); sg('add_features','TRAIN', train_x); sg('add_features','TRAIN', train_x); sg('set_labels','TRAIN', train_y); % set the labels sg('send_command', 'new_svm LIGHT'); sg('send_command', 'use_linadd 0'); sg('send_command', 'use_mkl 1'); sg('send_command', 'use_precompute 0'); sg('send_command', sprintf('mkl_parameters %f 0', mkl_eps)); sg('send_command', sprintf('svm_epsilon %f', svm_eps)); sg('send_command', 'set_kernel COMBINED 0'); sg('send_command', sprintf('add_kernel 1 GAUSSIAN REAL %d %f', cache_size, rbf_width(1) )); sg('send_command', sprintf('add_kernel 1 GAUSSIAN REAL %d %f', cache_size, rbf_width(2) )); sg('send_command', sprintf('add_kernel 1 GAUSSIAN REAL %d %f', cache_size, rbf_width(3) )); sg('send_command', sprintf('add_kernel 1 GAUSSIAN REAL %d %f', cache_size, rbf_width(4) )); sg('send_command', sprintf('add_kernel 1 GAUSSIAN REAL %d %f', cache_size, rbf_width(5) )); sg('send_command', sprintf('c %1.2e', C)) ; sg('send_command', 'init_kernel TRAIN'); sg('send_command', 'svm_train'); [b,alphas]=sg('get_svm') ; w(kk,:) = sg('get_subkernel_weights'); % calculate train error sg('send_command','clean_features TEST'); sg('add_features','TEST',train_x); sg('add_features','TEST',train_x); sg('add_features','TEST',train_x); sg('add_features','TEST',train_x); sg('add_features','TEST',train_x); sg('set_labels','TEST', train_y); sg('send_command', 'init_kernel TEST'); sg('send_command', 'set_threshold 0'); result.trainout(kk,:)=sg('svm_classify'); result.trainerr(kk) = mean(train_y~=sign(result.trainout(kk,:))); % calculate test error sg('send_command', 'clean_features TEST'); sg('add_features','TEST',test_x); sg('add_features','TEST',test_x); sg('add_features','TEST',test_x); sg('add_features','TEST',test_x); sg('add_features','TEST',test_x); sg('set_labels','TEST',test_y); sg('send_command', 'init_kernel TEST'); sg('send_command', 'set_threshold 0'); result.testout(kk,:)=sg('svm_classify'); result.testerr(kk) = mean(test_y~=sign(result.testout(kk,:))); end disp('done. now w contains the kernel weightings and result test/train outputs and errors')
MKL for regression
- sine wave:
% This script should enable you to rerun the experiment in the % paper that we labeled "sine". % % In this regression task a sine wave is to be learned. % We vary the frequency of the wave. % Preliminary settings: % Parameter for the SVMs. C = 10; % obtained via model selection (not included in the script) cache_size = 10; mkl_eps = 1e-3; % threshold for precision svm_eps = 1e-3; svr_tube_eps = 1e-2; debug = 0; % Kernel width for the 5 "basic" SVMs rbf_width(1) = 0.005; rbf_width(2) = 0.05; rbf_width(3) = 0.5; rbf_width(4) = 1; rbf_width(5) = 10; % data f = [0.1:0.2:5]; % values for the different frequencies no_obs = 1000; % number of observations if debug sg('send_command', 'loglevel ALL'); sg('send_command', 'echo ON'); else sg('send_command', 'loglevel ERROR'); sg('send_command', 'echo OFF'); end for kk = 1:length(f) % big loop for the different learning problems % data generation train_x = 1:(((10*2*pi)-1)/(no_obs-1)):10*2*pi; train_y = sin(f(kk)*train_x); kernels={}; % initialize MKL-SVR sg('send_command', 'new_svm SVRLIGHT'); sg('send_command', 'use_mkl 1'); sg('send_command', 'use_precompute 3'); sg('send_command', sprintf('mkl_parameters %f 0', mkl_eps)); sg('send_command', sprintf('c %f',C)); sg('send_command', sprintf('svm_epsilon %f',svm_eps)); sg('send_command', sprintf('svr_tube_epsilon %f',svr_tube_eps)); sg('send_command', 'clean_features TRAIN' ); sg('send_command', 'clean_kernels'); sg('set_labels', 'TRAIN', train_y); % set labels sg('add_features','TRAIN', train_x); % add features for every SVR sg('add_features','TRAIN', train_x); sg('add_features','TRAIN', train_x); sg('add_features','TRAIN', train_x); sg('add_features','TRAIN', train_x); sg('send_command', 'set_kernel COMBINED 0'); sg('send_command', sprintf('add_kernel 1 GAUSSIAN REAL %d %f', cache_size, rbf_width(1))); sg('send_command', sprintf('add_kernel 1 GAUSSIAN REAL %d %f', cache_size, rbf_width(2))); sg('send_command', sprintf('add_kernel 1 GAUSSIAN REAL %d %f', cache_size, rbf_width(3))); sg('send_command', sprintf('add_kernel 1 GAUSSIAN REAL %d %f', cache_size, rbf_width(4))); sg('send_command', sprintf('add_kernel 1 GAUSSIAN REAL %d %f', cache_size, rbf_width(5))); sg('send_command', 'init_kernel TRAIN'); sg('send_command', 'svm_train'); weights(kk,:) = sg('get_subkernel_weights') ; fprintf('frequency: %02.2f rbf-kernel-weights: %02.2f %02.2f %02.2f %02.2f %02.2f \n', f(kk), weights(kk,:)) end
- linear and sine mixture:
% This script should enable you to rerun the experiment in the % paper that we labeled "mixture linear and sine ". % % The task is to learn a regression function where the true function % is given by a mixture of 2 sine waves in addition to a linear trend. % We vary the frequency of the second higher frequency sine wave. % Setup: MKL on 10 RBF kernels of different widths on 1000 examples % Preliminary setting % kernel width for 10 basic SVMs rbf_width(1) = 0.001; rbf_width(2) = 0.005; rbf_width(3) = 0.01; rbf_width(4) = 0.05; rbf_width(5) = 0.1; rbf_width(6) = 1; rbf_width(7) = 10; rbf_width(8) = 50; rbf_width(9) = 100; rbf_width(10) = 1000; % SVM parameter C = 1; cache_size = 50; mkl_eps = 1e-4; svm_eps = 1e-4; svm_tube = 0.01; debug = 0; % data f = [0:20]; % parameter that varies the frequency of the second sine wave no_obs = 1000; % number of observations if debug sg('send_command', 'loglevel ALL'); sg('send_command', 'echo ON'); else sg('send_command', 'loglevel ERROR'); sg('send_command', 'echo OFF'); end for kk = 1:length(f) % Big loop % data generation train_x = 0:((4*pi)/(no_obs-1)):4*pi; trend = 2 * train_x* ((pi)/(max(train_x)-min(train_x))); wave1 = sin(train_x); wave2 = sin(f(kk)*train_x); train_y = trend + wave1 + wave2; % MKL learning kernels={}; sg('send_command', 'new_svm SVRLIGHT'); sg('send_command', 'use_mkl 1'); sg('send_command', 'use_precompute 0'); % precompute every SINGLE kernel! sg('send_command', sprintf('mkl_parameters %f 0',mkl_eps)); sg('send_command', sprintf('c %f',C)); sg('send_command', sprintf('svm_epsilon %f',svm_eps)); sg('send_command', sprintf('svr_tube_epsilon %f',svm_tube)); sg('send_command', 'clean_features TRAIN' ); sg('send_command', 'clean_kernels' ); sg('set_labels', 'TRAIN', train_y); % set labels sg('add_features','TRAIN', train_x); % add features for every basic SVM sg('add_features','TRAIN', train_x); sg('add_features','TRAIN', train_x); sg('add_features','TRAIN', train_x); sg('add_features','TRAIN', train_x); sg('add_features','TRAIN', train_x); sg('add_features','TRAIN', train_x); sg('add_features','TRAIN', train_x); sg('add_features','TRAIN', train_x); sg('add_features','TRAIN', train_x); sg('send_command', 'set_kernel COMBINED 0'); sg('send_command', sprintf('add_kernel 1 GAUSSIAN REAL %d %f', cache_size, rbf_width(1))); sg('send_command', sprintf('add_kernel 1 GAUSSIAN REAL %d %f', cache_size, rbf_width(2))); sg('send_command', sprintf('add_kernel 1 GAUSSIAN REAL %d %f', cache_size, rbf_width(3))); sg('send_command', sprintf('add_kernel 1 GAUSSIAN REAL %d %f', cache_size, rbf_width(4))); sg('send_command', sprintf('add_kernel 1 GAUSSIAN REAL %d %f', cache_size, rbf_width(5))); sg('send_command', sprintf('add_kernel 1 GAUSSIAN REAL %d %f', cache_size, rbf_width(6))); sg('send_command', sprintf('add_kernel 1 GAUSSIAN REAL %d %f', cache_size, rbf_width(7))); sg('send_command', sprintf('add_kernel 1 GAUSSIAN REAL %d %f', cache_size, rbf_width(8))); sg('send_command', sprintf('add_kernel 1 GAUSSIAN REAL %d %f', cache_size, rbf_width(9))); sg('send_command', sprintf('add_kernel 1 GAUSSIAN REAL %d %f', cache_size, rbf_width(10))); sg('send_command', 'init_kernel TRAIN') ; sg('send_command', 'svm_train'); weights(kk,:) = sg('get_subkernel_weights') ; fprintf('frequency: %02.2f rbf-kernel-weights: %02.2f %02.2f %02.2f %02.2f %02.2f %02.2f %02.2f %02.2f %02.2f %02.2f \n', f(kk), weights(kk,:)) end
15 Million Splice Dataset
The Splice dataset has the following format:
-1 TTCCAAACCCAAATAGTCAGAGTGCAAACCCTCACAGTAAACACAAGACTCTAAGCTCCCAGTGTGCCTCCAGCCATCTCCCCTGTTCATGTGGAGCTTTTCTCCTTTGCCAGCGGGGATCTGCAGCTATCTGGGAGTGCC -1 TTGTTTATTGATTCTCTTTATCCTGGTGATATATTTGCAGGTTGCAGATATTTGTGAAGAAGAAGTGATATGGTTGGCTGTGTCCTCACCCAAGTCTCATCTTGAATTACAGCTCCCATAATCTCCATGTGTTGTGGGAAG -1 AAAACAGGTACTAGAATTATATCTGTCATTGACCTAAAAAGGATAAAGAGAGTTGGCAGAAGATACAACTGCATGTAGGGGAATATGCTTTTCATTAACTCTGTAAAGTCGGGTTTTATCTGTTTGAAGGCTTATATAAGT -1 CACCAGTGAACGGCCAAGTGACACGAGTGACACCATGAGCTTGGTGCCCTCTCCATCCCAAGCCAGAGGCGGAAGCCAGGCCCTTCCTCCCAGCCCAGACTCCTACATCCCAAACTTGAGCCATGGCACACATGCTGGGCA -1 TCCACCCGCCCCGGCCTCCCGAAGTGCTGGGATTACCATGCCCAGCCCATCCAAATCTTTAGTGTTTTCCATCCATTTATCCCTTCCTCCATCTTGGAAGGACCCTAGAGCCAGACTTCCTGGGTTTTAAATCCTAATTCC -1 TTCGTCAAGATGACTAATGATAAACAGCAAGCCAGGTGCTGAGATTTTTGGGGGGAATGAAGGGGGTATGAAAAGAAGAGGAAATACAGCGCAGGTCTGGGGGCCCGTCACAGCCCTTGCACTTGGCCTTGTGCTTCCGCT -1 GGTTTGTGTGTACTTGCATACCCTGTAGTCTAGTACATTTTATATGGCTATGCTTTATAGAGCTTTAGAAAGTGAGGTCAAGCTAAATTTCTTGACTTTAAGGGTGGCCTGAATAGTTCACCATAATCTCATTATTGAAAC -1 GTGAGAATCTGTTCTTGGAGGTTTCAGGGAAGTGTTTACAGGGAGATGTTGTTTGAGCTGAGACTTGAAGAGTAGGTGTATACCAGGCTGACAAGGTGACAAAATGGCCTTCTGTGGAGGAGGAAATAATCTGTGCAAAGT -1 GTCCTCTCAACCAGGAAGGGAGCAGGGAGGGTGGCTGCAGGGCCGCAGGTGGGGAGGTGCAGGTGGGAGAGAGGCCCTCTGGTCTGGTCTGGTCTGGGCTGGGTGGTGCAGGGCAGATGGTCAGGCCCCAGCACATGCCAC -1 GTAGCTGGGACTACAGATGCGTGCCACCACGCCCAGCTAATTTTTTGTATTTTTTTAAGTAGAGATGGGGTTTCACCGTGTTAGCTAGGATGGTCTCGGTCTCCTGACCTTGTGGTTTGCCCACCTCGACCTCCCAAAGTG ...
Here the first column is the label (+1 or -1) seperated by a tab and followed by a 141 character long string which only consists of the characters A,C,G,T. As the file is uncompressed about 2GB in size find here bzip2 compressed splits 0-10 (each about 50MB):
- human_acceptor_splice_data.txt_00.bz2
- human_acceptor_splice_data.txt_01.bz2
- human_acceptor_splice_data.txt_02.bz2
- human_acceptor_splice_data.txt_03.bz2
- human_acceptor_splice_data.txt_04.bz2
- human_acceptor_splice_data.txt_05.bz2
- human_acceptor_splice_data.txt_06.bz2
- human_acceptor_splice_data.txt_07.bz2
- human_acceptor_splice_data.txt_08.bz2
- human_acceptor_splice_data.txt_09.bz2
- human_acceptor_splice_data.txt_10.bz2