Parameter recovery issues with normal mixture model

Thomass · June 15, 2022, 1:35pm

Hello everyone,

I’m trying to fit my model (a class of mixture model) to simulated data, and I have trouble in recovering the model parameters. To this end, I followed the 9th suggestion provided by the Stan Team and moved all model parameters (i.e., true parameters) to the data block except for the phi parameter. Here is my Stan code:

Stan code

data{
  int<lower = 1> D;  // number of dimensions of person parameters
  int<lower = 1> N;  // number of examinees
  int<lower = 1> K;  // number of items
  int<lower = 1> Ntot; // number of data points
  int<lower = 1> jj[Ntot];  // item id
  int<lower = 1> ii[Ntot];  // person id
  real logRT[Ntot];  // log response times
  real gamma[K];  
  real beta[K];
  real simga;
  real beta_c;
  real sigma_c;
  real tua[N];
}

parameters{
 vector[N] phi;
}

transformed parameters{
  real<lower=0,upper=1> pDelta[Ntot];
  for(n in 1:Ntot){
    pDelta[n] = 1/(1+exp(-phi[ii[n]]+gamma[jj[n]])); 
  }
}

model{
  // prior person parameter
  phi ~ normal(0,1);
  //  target distribution
  for (n in 1:Ntot) {
    target += log_sum_exp(
        log(pDelta[n])+
           normal_lpdf(logRT[n]|beta_c,sigma_c),
        log1m(pDelta[n])+
            normal_lpdf(logRT[n]|(beta[jj[n]]-tua[ii[n]]),simga));
    }
}

What was strange, however, was that the phi parameter still cannot be recovered well, which can be seen visually in the graphical posterior predictive checks (from ShinyStan package).

Below is my model as well as the code to simulate the data in R.

Model

P\left( \boldsymbol{t|\varphi ,\tau ,\beta ,}\sigma ,\beta _c,\sigma _c \right) =\prod_{i=1}^N{\prod_{j=1}^K{\left[ P\left( \varDelta _{ij}=1 \right) \times normal\left( t_{ij}|\beta _c,\sigma _{c}^{2} \right) +\left( 1-P\left( \varDelta _{ij}=1 \right) \right) \times normal\left( t_{ij}|\beta _j-\tau _i,\sigma ^2 \right) \right]}}

and

P\left( \varDelta _{ij}=1 \right) =\frac{1}{1+\exp \left( -\varphi _i+\gamma _j \right)}

Simulation data

set.seed(1)

library("MASS")
library("rstan")     

K = 30   #  number of items
N = 500  #  number of examinees    

### 1. simulate data
### 1.1 person parameters
sigma <- matrix(c(1,0,0,0.25),2,2)
person<-mvrnorm(N,rep(0,2),sigma)       
phi <- person[,1]                                    
tua <- person[,2]                                    

### 1.2 item parameters
gamma <- rnorm(K,2.25,1)
beta <- runif(K,0.25,0.75)
sigma <- 0.5  
  
beta_c <- -1         
sigma_c <- sqrt(0.1) 

### 1.3 simulating responses time for each examinee
logRT_Matrix <- matrix(data=NA,nrow=N,ncol=K) 
  
for(i in 1:N){
  for(j in 1:K){ 
      pDelta<- 1/(1+exp(-phi[i]+gamma[j])) 
      
      logRT_cheat <- rnorm(1,beta_c,sigma_c) 
      logRT_norm <- rnorm(1,beta[j]-tua[i],sigma)
        
      logRT_Matrix[i,j] <- pDelta*logRT_cheat + (1-pDelta)*logRT_norm  
  } 
}

### 2. fit model in stan

logRT_vector <- as.vector(logRT_Matrix)
ii <- rep(1,K)%x%c(1:N)
jj <- c(1:K)%x%rep(1,N)
  
data_list <- list(D=2,N=N,K=K,Ntot=K*N,jj=jj,ii=ii,logRT=logRT_vector,
                   gamma=gamma,beta=beta,sigma=sigma,
                   beta_c=beta_c,sigma_c=sigma_c,tua=tua) 
  
fit <- stan(file = "model.stan", data=data_list,iter=5000,chains=3,warmup = 2500,cores = 4)

Could you please give me some suggestions, I am very grateful.

iwv_rna · June 15, 2022, 2:32pm

Hi Thomass,

I think it is a problem with your simulation, not your Stan code. In particular, if I am understanding your generative model correctly, then you are not properly simulating values for logRT_Matrix . The generative model is that there is probability pDelta that the value for logRT_Matrix is drawn from one normal distribution, and probability 1-pDelta that is drawn from the other normal distribution. So the simulation of logRT_Matrix should look more like this:

### 1.3 simulating responses time for each examinee
logRT_Matrix <- matrix(data=NA,nrow=N,ncol=K) 
  
library(purrr) # load for rbernoulli function

for(i in 1:N){
  for(j in 1:K){ 
      pDelta<- 1/(1+exp(-phi[i]+gamma[j])) 
      
      if(rbernoulli(n = 1, p = pDelta)) {
              logRT_Matrix[i,j] <- rnorm(1,beta_c,sigma_c) 
      }else{
             logRT_Matrix[i,j] <- rnorm(1,beta[j]-tua[i],sigma)
      }

  } 
}

When I make this change and fit the simulated data with your model, the phi parameter is well estimated. I decreased the size of the simulation to make it run faster, but here is the result for K = 15, N = 100:

Let me know if I misunderstood anything about your generative model and if this helped,
Isaac

iwv_rna · June 15, 2022, 2:51pm

Also confirmed that estimates got better with more data; here are results with K = 50

And posterior predictive checks look good

Thomass · June 16, 2022, 6:51am

Awesome response, thank you!

Topic		Replies	Views
Having trouble recovering some model parameters when fitting the simulated data Modeling techniques , fitting-issues , irt	4	1114	June 11, 2022
Estimating parameters of two normal distribution, mixed at proportions Modeling	5	1414	May 10, 2019
Recovering parameters of fake model Modeling	1	658	April 27, 2022
Problem recovering parameters from mixture Rasch with Dirichlet prior Modeling	1	382	April 14, 2020
Unable to recover simulated parameters Modeling specification	10	1841	February 11, 2018

Parameter recovery issues with normal mixture model

Related topics