Censored regression

saja · July 25, 2022, 11:52pm

[edit: escaped code]

I ran a censored regression with the following stan code:

data {
   int<lower=0> K;
   int<lower=0> N;
   int<lower=0> cens[N];  // censoring flag=1 or 0
   matrix[N,K] X ;
   vector[N] y;
   vector[N] C;    //vector of left censoring pt=0
  }
  
  parameters {
    vector[K] beta;  
    real alpha;   
    real lsigma;   //log(sigma)
  }
 
  transformed parameters {
  real<lower=0> sigma;
  sigma = exp(lsigma);
}
  model {
    vector[N] mu= X*beta + alpha;  
    
    //prior model
 
     beta ~ normal(0,10) ;
     alpha ~ normal(0,10);  
     lsigma ~ normal(0,10);  

     //observational model
      for (i in 1:N)
        target += cens[i]*normal_lcdf(C[i] | mu[i],sigma)  + (1-cens[i])*normal_lpdf(y[i] |  mu[i],sigma);
   }

The model ran successfully with the following results:

           mean se_mean   sd    2.5%     25%     50%     75%   97.5% n_eff
beta[1]   -0.07    0.00 0.05   -0.16   -0.10   -0.07   -0.04    0.02  2450
beta[2]    0.12    0.00 0.04    0.04    0.09    0.12    0.15    0.20  2530
beta[3]    0.52    0.00 0.05    0.43    0.49    0.52    0.55    0.61  1979
beta[4]   -0.37    0.00 0.05   -0.47   -0.40   -0.37   -0.34   -0.28  2012
beta[5]   -0.38    0.00 0.05   -0.48   -0.42   -0.38   -0.35   -0.29  2501
alpha     -0.06    0.00 0.04   -0.14   -0.09   -0.06   -0.03    0.02  2120
lsigma    -0.04    0.00 0.03   -0.11   -0.07   -0.04   -0.02    0.03  2281

Now, I reran the same model with the vectorized version of the stan code given below where I only changed a) cesoring flag (cens) from “integer array” to a “vector” and b) used vectorized log likelihood specification instead of loop. I expected same results, but there were some non-trivial differences between the two and I am wondering why? Anybody has any insights?

data {
   int<lower=0> K;
   int<lower=0> N;
   vector[N] cens;  // int<lower=0> cens[N];
   matrix[N,K] X ;
   vector[N] y;
   vector[N] C;    
  }
  
  parameters {
    vector[K] beta; 
    real alpha;     
    real lsigma;   
  }
 
  transformed parameters {
  real<lower=0> sigma;
  sigma = exp(lsigma);
}
  model {
    vector[N] mu= X*beta + alpha;  
    
    //prior model
 
     beta ~ normal(0,10) ;
     alpha ~ normal(0,10);  
     lsigma ~ normal(0,10);   //mean=1 and variance=10

     //observational model
       target += cens*normal_lcdf(C | mu,sigma) + (1-cens)*normal_lpdf(y | mu,sigma);
    }

              mean se_mean   sd       2.5%        25%        50%        75%
beta[1]      -0.03    0.00 0.00      -0.04      -0.04      -0.03      -0.03
beta[2]       0.06    0.00 0.00       0.06       0.06       0.06       0.06
beta[3]       0.31    0.00 0.00       0.30       0.31       0.31       0.31
beta[4]      -0.20    0.00 0.00      -0.21      -0.20      -0.20      -0.20
beta[5]      -0.20    0.00 0.00      -0.20      -0.20      -0.20      -0.19
alpha        -0.39    0.00 0.00      -0.39      -0.39      -0.39      -0.39

Bob_Carpenter · August 1, 2022, 10:03pm

The problem you’re running into is that normal_lcdf(...) returns a single real value that is the sum of the vectorization. So the these two are very different:

target += cens*normal_lcdf(C | mu,sigma) + (1-cens)*normal_lpdf(y | mu,sigma);

for (i in 1:N)
  target += cens[i]*normal_lcdf(C[i] | mu[i],sigma)  + (1-cens[i])*normal_lpdf(y[i] |  mu[i],sigma);

The latter one using a loop is the mathematically correct form assuming you are censoring observation i below at C[I].

A much more efficient way to do this is to create two integer indices, one for censored values and one for uncensored values and then use slicing to vectorize the uncensored observations.

saja · August 8, 2022, 3:28am

Ah!. Thank you.

Topic		Replies	Views
Censored regression via rstan Modeling	4	1037	May 12, 2020
Censored data loglikelihood trouble Modeling	1	369	April 21, 2023
Censored data is not required in modelling Modeling	6	572	February 4, 2019
Clarification about definitions associated with "censored values" Modeling specification	2	365	July 18, 2023
Failed initialization–Stan - Can't figure out - Please help Modeling	2	496	August 22, 2018

Censored regression

Related topics