In the following code, I fit a gaussian mixture model (GMM) to some randomly sampled data. I do this twice. Each time, the data represent two well separated gaussians, the only difference being the seed I use for the random number generator.
N = 100000; EFFECT_SIZE = 5; seedList = [1 6]; for s = seedList rng(s) X = [randn(N,1); randn(N,1)+EFFECT_SIZE]; figure hist(X,101) GMModel = fitgmdist(X,2) end
NOTE:-
Hi cyclist,
the default starting values are selected from the data at random, and you have discovered that this sometimes does not work well. There's an argument 'replicates' that you can use to have it try this multiple times and deliver the bone fit. However, in more recent releases including R2014b there is a new argument 'Start','plus' that uses a better starting method, one based on the kmeans++ algorithm for clustering. Here's a variant of your function that shows the relative performance of the default random start and the 'plus' start:
N = 100000; EFFECT_SIZE = 5; seedList = 1:20; means1 = zeros(length(seedList),2); means2 = zeros(length(seedList),2); for s = seedList s rng(s) X = [randn(N,1); randn(N,1)+EFFECT_SIZE]; rng(s) % randomness is used in the fit also GMModel = fitgmdist(X,2); means1(s,:) = GMModel.mu'; rng(s) GMModel = fitgmdist(X,2,'start','
SEE COMPLETE ANSWER CLICK THE LINK
Comments
Post a Comment