anova ranking of variables syntax: [bestVariables bestToWorst p ] = sortVariablesANOVA( featureVect, ... classLabels, topVarsToKeep ) Inputs: featureVect: all the the data samples in (dim x numSamples) classLabels: all class labels (0 for not learned, 1 for learned, 2 unsure. The ones labeled class 2 will not be used. topVarsToKeep: index of number of best variables to return Outputs: bestVariables: indices of best variables to separate the classes bestToWorst: index ordering all the variables for all CV folds p: associated p values for those indices
0001 % anova ranking of variables 0002 % 0003 % syntax: [bestVariables bestToWorst p ] = sortVariablesANOVA( featureVect, ... 0004 % classLabels, topVarsToKeep ) 0005 % 0006 % Inputs: 0007 % featureVect: all the the data samples in (dim x numSamples) 0008 % classLabels: all class labels (0 for not learned, 1 for learned, 2 unsure. 0009 % The ones labeled class 2 will not be used. 0010 % topVarsToKeep: index of number of best variables to return 0011 % 0012 % Outputs: 0013 % bestVariables: indices of best variables to separate the classes 0014 % bestToWorst: index ordering all the variables for all CV folds 0015 % p: associated p values for those indices 0016 % 0017 0018 0019 function [bestVariables bestToWorst p ] = sortVariablesANOVA( featureVect, ... 0020 classLabels, topVarsToKeep ) 0021 0022 0023 if nargin < 4 || isempty( topVarsToKeep) 0024 topVarsToKeep = 10; 0025 end 0026 0027 % leave one subject out cross validation 0028 [ dim numSamples] = size( featureVect); 0029 p = zeros( dim,1); 0030 h = zeros(dim,1); 0031 0032 featureVect(:,classLabels ==2) = []; 0033 classLabels( classLabels ==2) = []; 0034 0035 % do the tests on each feature 0036 for i2 = 1:dim 0037 [h(i2,1) p(i2,1) ] = ttest2( featureVect( i2, classLabels==0)', ... 0038 featureVect( i2, classLabels==1)',... 0039 .05, 'both', 'unequal'); 0040 end 0041 % [h p] = ttest2(x,y, .05, 'both', 'unequal') %2 tailed, unequal 0042 % variances, 5% significance evel 0043 % h = 1 means reject, so different means 0044 % p very small is probability you see that data, given null hypothesis. 0045 % So a low value is good (different means) 0046 0047 % sort variables in order best to worst 0048 [ p bestToWorst] = sort( p, 1, 'ascend'); 0049 0050 % remove redundancies 0051 unqIdx = findRedundancies( featureVect( bestToWorst,:) ); 0052 bestToWorst = bestToWorst(unqIdx); 0053 0054 bestVariables = bestToWorst(1:topVarsToKeep); 0055 0056