Missing data

Lazhar · February 29, 2020, 7:19pm

Hi all
How can I solve the problem of missing data (NA) in this example?

data <- read.table(‘data.txt’,sep="")
mle <- read.table(‘mle.txt’,sep="")
ns <- ncol(data)
nt <- nrow(data)

pass prior standard devs as data

gev_data = list(ns=ns, nt=nt, y=data)

initf_manual = list(list(mu=mle[,1], sigma=mle[,2], xi=rep(0.01,ns)))

run stan

model_file = “model_loc_0.stan”
stan_model = stan(model_file, data = gev_data, ini = initf_manual,iter = 3500,
chains = 1, control=list(adapt_delta=0.85))

The san model:

functions{
  vector gev_v_log_v(vector y, real mu, real sigma, real xi) {
    real inv_xi;
    real neg_inv_xi; 
    real inv_xi_p1; 
    vector[rows(y)] z;
    vector[rows(y)] lp;
    int N;
    N = rows(y);
    z = 1 + (y - mu) * xi / sigma;
    inv_xi = 1 / xi;
    neg_inv_xi = -inv_xi; 
    inv_xi_p1 = 1 + inv_xi; 
    
    for(n in 1:N){
        lp[n] = log(sigma) + inv_xi_p1*log(z[n]) + pow(z[n],neg_inv_xi);
      }
    
    return -lp;
  }
}

data {
  int<lower=1> ns; // number of sites
  int<lower=1> nt; // number of times
  vector [nt] y[ns]; //data
}

parameters {
  real<lower=0> mu[ns];
  real<lower=0> sigma[ns];
  real<lower=-1,upper=1> xi[ns];
}

model {
  vector[nt] lp[ns];

  for (i in 1:ns){
       lp[i] = gev_v_log_v(y[i], mu[i], sigma[i], xi[i]);
  }
}

[data.txt|attachment](uploadata.txt (46.2 KB) mle.txt (2.5 KB) d://pztMiBLXTY6rYKHZqhjyhZ3TiFn.txt) (46.2 KB)

emiruz · March 1, 2020, 12:45pm

Hi! Have a look here for some general advise on how to approach this: https://mc-stan.org/docs/2_20/stan-users-guide/missing-data.html

I think the typical route is to declare your missing data separately to your observed data . Your observed data is declared in the data block, missing data is declared in the parameters block and treated as a parameter with the same distribution as your observed data. The example at the very top of the link above should give you a good sense of what to do.

Note also, so far as I can see you model is incomplete since it doesn’t use a sampling statement or target+=.

Lazhar · March 1, 2020, 1:14pm

Thank you so much Emir for your clarifications
But the problem is that all observed datasets contained missing data

andre.pfeuffer · March 1, 2020, 1:32pm

You could use the GNU R package AMELIA II to impute your dataset. From that you have something defined to work on and use that for comparison of your Stan Imputation.

emiruz · March 1, 2020, 2:31pm

You could also have a look at Bayesian PPCA for missing value imputation in the case of multivariate data with missing values. I personally don’t know a way to do imputation for multivariate data in Stan directly.

Lazhar · March 1, 2020, 7:44pm

Thank you so much I will try with it

Lazhar · March 1, 2020, 7:45pm

Thank you so much Emir
I will try to use it

torkar · March 2, 2020, 7:14am

Hi Lazhar,

is there missingness only in the outcome or is there missingness also in one or more predictors?

Lazhar · May 8, 2020, 3:55am

Hi Torkar
Thank you so much for your message and help
I solve the problem by removing the missing data and take it as one large vector

Topic		Replies	Views
How to handle missing values in Stan Modeling	2	705	November 30, 2021
Missing response model (section 10.3 of Stan manual) Modeling	11	2442	May 24, 2017
Missing data handling Modeling	10	4425	June 23, 2017
Help with simple missing data in the predictor example Modeling rstan , specification	2	290	July 18, 2023
Missing data in Stan - some difficulties understanding Modeling	6	573	August 16, 2021

Missing data

pass prior standard devs as data

run stan

Related topics