Two component mixture model, strange result from stan

I have been struggling with the following code for a while and decided to seek help from experts here.

I am fitting a two component mixture model, one component normal and one component laplace. When I fit data that is generated from normal distribution with a large sample size of N=1000 or N=5000, I would expect the weight on the normal component to be close to 1. However, stan result always gives a weight close to 0.65. The same is true if the data is generated from a laplace distribution.

The code and result are

liao1 = "data {
int N; // number of observations
real theta[N];
}

parameters {
real<lower=0.0, upper= 20/sqrt(2)> tau1; //sqrt(2)*tau1 is the standard deviation of laplace distribution
real<lower=0.0, upper=20.0> tau2; //tau2 is the standard deviation of normal distribution
real<lower=0.0, upper=1.0> lambda;
}

model {

real term1;
real term2;

tau1 ~ uniform(0.0, 20/sqrt(2.));
tau2 ~ uniform(0.0, 20.0);

lambda ~ uniform(0.0, 1); //lambda<0.0 and lambda>=0 represents binary choice of laplace distribution and normal distribution

target += log_mix(lambda, double_exponential_lpdf(theta |0.0, tau1), normal_lpdf(theta | 0.0, tau2));
}
"

N = 1000
theta = rnorm(N, 0, 2)

my.data = list(N=N, theta=theta)

stan(model_code = liao1, data=my.data, iter=10000, chains=1)

Inference for Stan model: ae78845bd99245d1341f7cb022b70ce0.
1 chains, each with iter=10000; warmup=5000; thin=1;
post-warmup draws per chain=5000, total post-warmup draws=5000.

       mean se_mean   sd     2.5%      25%      50%      75%    97.5% n_eff Rhat

tau1 7.00 0.06 4.04 0.36 3.52 6.95 10.48 13.75 5000 1
tau2 2.01 0.00 0.05 1.93 1.98 2.01 2.04 2.10 4437 1
lambda 0.33 0.00 0.24 0.01 0.12 0.28 0.50 0.85 4709 1
lp__ -2118.42 0.03 1.38 -2121.92 -2119.08 -2118.05 -2117.39 -2116.82 2027 1

Samples were drawn using NUTS(diag_e) at Sat May 04 20:25:49 2019.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).

Thanks in advance.

Jason Liao

I’m fairly certain log_mix does not do what you think it is doing. Iirc, in the stan manual in the section on mixtures/clustering models, it talks about this.

You need to loop over the thetas.
Currently, you are estimating:

p(\theta) = (1-\lambda)\prod_{i=1}^Np(\theta|\tau_1) + \lambda\prod_{i=1}^Np(\theta|\tau_2)

What you probably want is:

p(\theta) = \prod_{i=1}^N \left[(1-\lambda)p(\theta_i|\tau_1) + \lambda p(\theta_i|\tau_2)\right]

Which means you want to loop over each theta and call log_mix. E.g.,

for(n in 1:N){
  target += log_mix(lambda, double_exponential_lpdf(theta[n]|0, tau1), normal_lpdf(theta[n]|0, tau2));
}

Someone correct me if I’m wrong.

Stephen, this is amazing. I changed the code to what you suggested and got the answer I desperately wanted.

liao1 = ’

data {
int N; // number of observations
real theta[N];
}

parameters {
real<lower=0.0, upper= 20/sqrt(2)> tau1; //sqrt(2)*tau1 is the standard deviation of laplace distribution
real<lower=0.0, upper=20.0> tau2; //tau2 is the standard deviation of normal distribution
real<lower=0.0, upper=1.0> lambda;
}

model {

real term1;
real term2;

tau1 ~ uniform(0.0, 20/sqrt(2.));
tau2 ~ uniform(0.0, 20.0);

lambda ~ uniform(0.0, 1); //lambda<0.0 and lambda>=0 represents binary choice of laplace distribution and normal distribution

for(n in 1:N){
target += log_mix(lambda, double_exponential_lpdf(theta[n]|0, tau1), normal_lpdf(theta[n]|0, tau2));
}
}

N = 1000
theta = rnorm(N, 0, 2)

#theta = rexp(N)
#theta = ifelse(rnorm(N)>0, theta, -theta)

my.data = list(N=N, theta=theta)

stan(model_code = liao1, data=my.data, iter=10000, chains=1)

     mean se_mean   sd     2.5%      25%      50%      75%    97.5% n_eff

tau1 1.84 0.09 2.39 0.13 0.70 1.15 1.69 10.57 655
tau2 1.95 0.00 0.05 1.86 1.92 1.95 1.99 2.06 1236
lambda 0.05 0.00 0.05 0.00 0.01 0.04 0.08 0.18 767
lp__ -2082.41 0.06 1.58 -2086.27 -2083.28 -2082.10 -2081.16 -2080.35 678

One million thanks.

Jason