How to deal with NaN for the outcome variable/dependent variables

HuaiyuLiu · March 23, 2023, 7:42pm

Dear Stan community,

I am fitting a basic linear model to my data, following is the Stan model.

data {
  int<lower=0> N;
  vector[N] y;
}

parameters {
  real mu;
  real<lower=0> sigma;
}

model {
  y ~ normal(mu, sigma);
}

However, around 50% of the y (dependent variable) are missing values (NaN). I don’t want to delete these missing values, as in my experiment the trial sequence is meaningful.

Hence I wonder if is it OK to keep these NaNs as they were in Stan when fitting the model to my data, or is there any tricky thing I need to do to deal with these NaNs?

Many thanks and very much appreciated to any suggestions/comments/solutions.

Best,

jonah · March 26, 2023, 4:25am

You can’t put NaNs into Stan but you can declare the missing data points as unknown (in the parameters block), which lets you take into account the uncertainty due to the missingness (and also estimate the missing values if you want):

https://mc-stan.org/docs/stan-users-guide/missing-data.html

HuaiyuLiu · March 26, 2023, 8:39pm

Hi @jonah ,

Thank you very much for your help!
I’ll take a look.

Best

HuaiyuLiu · March 27, 2023, 3:04pm

Hi @jonah ,

Sorry for my disturb, may I ask for your suggestion?
Following is my Stan script, which is syntactically correct, and I include random intercept and slope at the participant level.

data {
  int<lower=0> N_obs; // number of observed data points
  int<lower=0> N_mis; // number of missing data points 
  int<lower=1> N; // number of total data points (i.e., observed + missing)
  int<lower=1, upper=N> ii_obs[N_obs]; // indexes of the observed data
  int<lower=1, upper=N> ii_mis[N_mis]; // indexes of the missing data
  int<lower=1> L; // number of participants
  
  int<lower=1, upper=L> ID[N]; // participant's ID
  int<lower=1, upper=2> x[N]; // the predictor variable
  array[N_obs] real y_obs; // observed data
  
}


parameters {
  
  // missing data
  array[N_mis] real y_mis; // missing data

 // Hierarchical mu parameters (at the population-level)
  real mu_b0; // intercept
  real mu_b1; // slope
  
  // sigma parameters
  real<lower=0> sigma; // from the Normal distribution of y
  real<lower=0> sigma_b0; // from the hierarchical distribution of intercept
  real<lower=0> sigma_b1; // from the hierarchical distribution of slope
  
  // Normal distributions for non-center parameterization, i.e., Matt trick
  real z_b0[L];
  real z_b1[L]; 

  
}


transformed parameters {
  
  // data
  array[N] real y; 
  y[ii_obs] = y_obs;
  y[ii_mis] = y_mis;
  
  real b0[L];
  real b1[L];
  
  // trial-level parameters
  
  real u[N]; // mean estimate per trial
  
  for (l in 1:L) {
    b0[l] = mu_b0 + z_b0[l]*sigma_b0;
    b1[l] = mu_b1 + z_b1[l]*sigma_b1;
  }
  
  for (n in 1:N) {
    
    // the mean estimate per trial
    u[n] = b0[ID[n]] + b1[ID[n]]*x[n];
    
  }
  

}
  

model {
  
  mu_b0 ~ normal(0, 10);
  mu_b1 ~ normal(0, 10);
  
  
  sigma ~ gamma(2, 2);
  sigma_b0 ~ gamma(2, 2);
  sigma_b1 ~ gamma(2, 2);
  
  z_b0 ~ normal(0, 1);
  z_b1 ~ normal(0, 1);
  
  y ~ normal(u, sigma);
  
}
 
 
generated quantities {
  vector[N] log_lik;
  
  {for (n in 1:N) {
    log_lik[n] = normal_lpdf(y[n] | u[n], sigma);
  }
    
  }
  
}

I am not sure whether it is the right code, may I have your suggestions?
Thank you very much in advance.

Best

jonah · March 28, 2023, 2:49am

I haven’t had a chance to look at the whole model in detail, but the code related to the missing data points looks good! (Feel free to start a new topic if you have different questions about the model structure or problems fitting it.)

This is ok but uses deprecated syntax. You can update it to array[N_obs] int<lower=1, upper=N> ii_obs, etc.

HuaiyuLiu · March 28, 2023, 8:21am

Hi @jonah ,

I see, thank you very much for your reply!
I will start a new topic if I encounter problems fitting it.

Topic		Replies	Views
How to handle missing values in Stan Modeling	2	725	November 30, 2021
Can't understand an example for handling missing value in rstan Modeling rstan , missing-data	1	831	June 26, 2022
Can Stan handle Nans? Built in Size Constraints? Modeling	3	580	October 4, 2018
Missed trials (nan) not to be added to the end of predictions but to their actual place Developers stan	2	472	March 15, 2020
Missing response model (section 10.3 of Stan manual) Modeling	11	2449	May 24, 2017

How to deal with NaN for the outcome variable/dependent variables

Related topics