SUR model with censoring and data augmentation

Antti_Simola · April 24, 2018, 1:52pm

Hi,

I am trying to model a seemingly unrelated regression (SUR) model with censored dependent variables. I also need to include data augmentation in the model. I have started with a model, which is a combination of the example SUR model in Stan manual (chapter 9.15.) and this censored probit example I found in SO:

The answer to that post advises not to use data augmentation for a binary probit as it is unnecessary. I wonder whether the same applies to continuous censored model as well? Here is the data augmentation mapping I need:

if (ystar[K, N] > 0) {
	y[K, N] = ystar[K, N]/sum(ystar[N])
}
if (ystar[K, N] <= 0)
	y[K, N] = 0
}

Would that be equivalent to that probit solution with Phi() function or do I need to model that explicitly? Thus far I have tried my model with that Phi() version only. However, I am not sure whether I manage to get that right. Here is my Stan model code thus far:

data {
  int<lower=1> K;
  int<lower=1> J;
  int<lower=0> N;
  vector[J] x[N];
  vector<lower=0, upper=1>[K] y[N];
}
parameters {
  matrix[K, J] beta;
  cholesky_factor_corr[K] L_Omega;
  vector<lower=0>[K] L_sigma;
}
model {
  vector[K] mu[N];
  matrix[K, K] L_Sigma;

  for (n in 1:N) {
    mu[n] = beta*x[n];
    /* mu[n] = Phi(mu[n]); */           // 1. attempt
    /* for (k in 1:K) */                // 2. attempt
    /*   mu[k, n] = Phi(mu[k, n]); */
  }

  L_Sigma = diag_pre_multiply(L_sigma, L_Omega);

  to_vector(beta) ~ normal(0,5);
  L_Omega ~ lkj_corr_cholesky(4);
  L_sigma ~ cauchy(0,.25);

  y ~ multi_normal_cholesky(mu, L_Sigma);
}

I have had two attempts thus far (see the comments in the code). The first takes way too much time even with a small subset of data, and the second, which my intuition would favor, gives an error of nans found. Do you have any suggestions of how to proceed? I could provide some code to construct fake data if needed.

Bob_Carpenter · May 14, 2018, 2:36pm

Sorry this took so long to get to. We’ve been getting a bit overwhelmed with posts here.

The first attempt is the better way to implement this in Stan. If you wanted to loop, it’d be the other way, with the array index first:

for (n in 1:N) for (k in 1:K) mu[n, k] = Phi(mu, k);

We have an approximation function Phi_approx, which is very close numerically to Phi, but based on an inverse-logit implementation.

For efficiency, you can also try something with thinner tails than cauchy(0, 0.25).

Have you tried fitting with simulated data to make sure everything fits?

Topic		Replies	Views
Censored data is not required in modelling Modeling	6	572	February 4, 2019
Seemingly Unrelated Regression with Tobit regression equations Modeling specification	8	2260	February 6, 2021
Censored regression via rstan Modeling	4	1037	May 12, 2020
Censored Data: Modeling vs Integrating Out Modeling	6	533	October 2, 2018
Integrating out censored data in negative binomial model Modeling specification	8	1399	August 1, 2018

SUR model with censoring and data augmentation

Related topics