Missing data

Hi all
How can I solve the problem of missing data (NA) in this example?

data <- read.table(‘data.txt’,sep="")
mle <- read.table(‘mle.txt’,sep="")
ns <- ncol(data)
nt <- nrow(data)

pass prior standard devs as data

gev_data = list(ns=ns, nt=nt, y=data)

initf_manual = list(list(mu=mle[,1], sigma=mle[,2], xi=rep(0.01,ns)))

run stan

model_file = “model_loc_0.stan”
stan_model = stan(model_file, data = gev_data, ini = initf_manual,iter = 3500,
chains = 1, control=list(adapt_delta=0.85))

The san model:

functions{
  vector gev_v_log_v(vector y, real mu, real sigma, real xi) {
    real inv_xi;
    real neg_inv_xi; 
    real inv_xi_p1; 
    vector[rows(y)] z;
    vector[rows(y)] lp;
    int N;
    N = rows(y);
    z = 1 + (y - mu) * xi / sigma;
    inv_xi = 1 / xi;
    neg_inv_xi = -inv_xi; 
    inv_xi_p1 = 1 + inv_xi; 
    
    for(n in 1:N){
        lp[n] = log(sigma) + inv_xi_p1*log(z[n]) + pow(z[n],neg_inv_xi);
      }
    
    return -lp;
  }
}

data {
  int<lower=1> ns; // number of sites
  int<lower=1> nt; // number of times
  vector [nt] y[ns]; //data
}

parameters {
  real<lower=0> mu[ns];
  real<lower=0> sigma[ns];
  real<lower=-1,upper=1> xi[ns];
}

model {
  vector[nt] lp[ns];

  for (i in 1:ns){
       lp[i] = gev_v_log_v(y[i], mu[i], sigma[i], xi[i]);
  }
}

[data.txt|attachment](uploadata.txt (46.2 KB) mle.txt (2.5 KB) d://pztMiBLXTY6rYKHZqhjyhZ3TiFn.txt) (46.2 KB)

Hi! Have a look here for some general advise on how to approach this: https://mc-stan.org/docs/2_20/stan-users-guide/missing-data.html

I think the typical route is to declare your missing data separately to your observed data . Your observed data is declared in the data block, missing data is declared in the parameters block and treated as a parameter with the same distribution as your observed data. The example at the very top of the link above should give you a good sense of what to do.

Note also, so far as I can see you model is incomplete since it doesn’t use a sampling statement or target+=.

Thank you so much Emir for your clarifications
But the problem is that all observed datasets contained missing data

You could use the GNU R package AMELIA II to impute your dataset. From that you have something defined to work on and use that for comparison of your Stan Imputation.

You could also have a look at Bayesian PPCA for missing value imputation in the case of multivariate data with missing values. I personally don’t know a way to do imputation for multivariate data in Stan directly.

Thank you so much I will try with it

Thank you so much Emir
I will try to use it

Hi Lazhar,

is there missingness only in the outcome or is there missingness also in one or more predictors?

Hi Torkar
Thank you so much for your message and help
I solve the problem by removing the missing data and take it as one large vector