Parameter recovery issues with normal mixture model

Hello everyone,

I’m trying to fit my model (a class of mixture model) to simulated data, and I have trouble in recovering the model parameters. To this end, I followed the 9th suggestion provided by the Stan Team and moved all model parameters (i.e., true parameters) to the data block except for the phi parameter. Here is my Stan code:

Stan code

  int<lower = 1> D;  // number of dimensions of person parameters
  int<lower = 1> N;  // number of examinees
  int<lower = 1> K;  // number of items
  int<lower = 1> Ntot; // number of data points
  int<lower = 1> jj[Ntot];  // item id
  int<lower = 1> ii[Ntot];  // person id
  real logRT[Ntot];  // log response times
  real gamma[K];  
  real beta[K];
  real simga;
  real beta_c;
  real sigma_c;
  real tua[N];

 vector[N] phi;

transformed parameters{
  real<lower=0,upper=1> pDelta[Ntot];
  for(n in 1:Ntot){
    pDelta[n] = 1/(1+exp(-phi[ii[n]]+gamma[jj[n]])); 

  // prior person parameter
  phi ~ normal(0,1);
  //  target distribution
  for (n in 1:Ntot) {
    target += log_sum_exp(

What was strange, however, was that the phi parameter still cannot be recovered well, which can be seen visually in the graphical posterior predictive checks (from ShinyStan package).

Below is my model as well as the code to simulate the data in R.


P\left( \boldsymbol{t|\varphi ,\tau ,\beta ,}\sigma ,\beta _c,\sigma _c \right) =\prod_{i=1}^N{\prod_{j=1}^K{\left[ P\left( \varDelta _{ij}=1 \right) \times normal\left( t_{ij}|\beta _c,\sigma _{c}^{2} \right) +\left( 1-P\left( \varDelta _{ij}=1 \right) \right) \times normal\left( t_{ij}|\beta _j-\tau _i,\sigma ^2 \right) \right]}}


P\left( \varDelta _{ij}=1 \right) =\frac{1}{1+\exp \left( -\varphi _i+\gamma _j \right)}

Simulation data



K = 30   #  number of items
N = 500  #  number of examinees    

### 1. simulate data
### 1.1 person parameters
sigma <- matrix(c(1,0,0,0.25),2,2)
phi <- person[,1]                                    
tua <- person[,2]                                    

### 1.2 item parameters
gamma <- rnorm(K,2.25,1)
beta <- runif(K,0.25,0.75)
sigma <- 0.5  
beta_c <- -1         
sigma_c <- sqrt(0.1) 

### 1.3 simulating responses time for each examinee
logRT_Matrix <- matrix(data=NA,nrow=N,ncol=K) 
for(i in 1:N){
  for(j in 1:K){ 
      pDelta<- 1/(1+exp(-phi[i]+gamma[j])) 
      logRT_cheat <- rnorm(1,beta_c,sigma_c) 
      logRT_norm <- rnorm(1,beta[j]-tua[i],sigma)
      logRT_Matrix[i,j] <- pDelta*logRT_cheat + (1-pDelta)*logRT_norm  

### 2. fit model in stan

logRT_vector <- as.vector(logRT_Matrix)
ii <- rep(1,K)%x%c(1:N)
jj <- c(1:K)%x%rep(1,N)
data_list <- list(D=2,N=N,K=K,Ntot=K*N,jj=jj,ii=ii,logRT=logRT_vector,
fit <- stan(file = "model.stan", data=data_list,iter=5000,chains=3,warmup = 2500,cores = 4) 

Could you please give me some suggestions, I am very grateful.

Hi Thomass,

I think it is a problem with your simulation, not your Stan code. In particular, if I am understanding your generative model correctly, then you are not properly simulating values for logRT_Matrix . The generative model is that there is probability pDelta that the value for logRT_Matrix is drawn from one normal distribution, and probability 1-pDelta that is drawn from the other normal distribution. So the simulation of logRT_Matrix should look more like this:

### 1.3 simulating responses time for each examinee
logRT_Matrix <- matrix(data=NA,nrow=N,ncol=K) 
library(purrr) # load for rbernoulli function

for(i in 1:N){
  for(j in 1:K){ 
      pDelta<- 1/(1+exp(-phi[i]+gamma[j])) 
      if(rbernoulli(n = 1, p = pDelta)) {
              logRT_Matrix[i,j] <- rnorm(1,beta_c,sigma_c) 
             logRT_Matrix[i,j] <- rnorm(1,beta[j]-tua[i],sigma)


When I make this change and fit the simulated data with your model, the phi parameter is well estimated. I decreased the size of the simulation to make it run faster, but here is the result for K = 15, N = 100:

Let me know if I misunderstood anything about your generative model and if this helped,

1 Like

Also confirmed that estimates got better with more data; here are results with K = 50

And posterior predictive checks look good

1 Like

Awesome response, thank you!