function [cluster,mu,kseed] = kmeans(data,k,kseed) % KMEANS: [cluster,mu,kseed] = kmeans(data,k,kseed) % % Cluster the N x d data into k clusters using the % k means algorithm. Return the identities of the % clusters for each data point and the cluster means % in cluster and mu respectively. % % NOTE: its possible for a cluster to end up with 0 % data points after a "reassignment to nearest mean" % step - this is handled by restarting the process by % randomly picking k new means and hoping that the % problem does not continue to occur (potential for % an infinite loop here !!) % t0 = clock; [N d] = size(data); % initialize the vectors containing the cluster assignments oldcluster = zeros(N,1); cluster = ones(N,1); if(k==1) mu = mean(data); end if k>1 % select k initial random seeds if nargin<3 % if no seed is supplied kseed = rand('seed'); index = randperm(N); else % use the seed that is supplied rand('seed',kseed); index = randperm(N); end index = index(1:k); mu = data(index,:); %end % continue to loop until the the cluster assigments do not change while sum(oldcluster - cluster) ~= 0 dist = []; % calculate the Euclidean distance between each point and each cluster mean for i=1:k x = euclid(data,mu(i,:)); dist = [dist,x]; end % find the closest cluster mean for each point and assign it to that cluster oldcluster = cluster; [mindist cluster] = min(dist'); sum(mindist); % Based on the new assignment, recalculate the cluster means. for i=1:k if sum(cluster==i)>1 data(cluster==i,:); mu(i,:) = mean(data(cluster==i,:)); % this "else" is to handle a problem in MATLAB when it takes the % the mean of a single vector. elseif sum(cluster==i)>0 mu(i,:) = data(cluster==i,:); end end % If we have cluster(s) with no points then randomly start % over and regenerate new means. % flag=0; for i=1:k if(sum(cluster==i)==0) flag = 1; end end if flag==1 kseed = rand('seed'); index = randperm(N); index = index(1:k); mu = data(index,:); fprintf('Cluster of size 0: k = %d\n',k); cluster = zeros(N,1); else cluster = cluster'; end end end %etime(clock,t0)