Hi, I have a model in Stan for measurements of chemical concentrations in water samples across counties (actually several more hierarchical levels, but I’m looking to simplify for this question). Each measurement also has an indicator for whether it was gathered from a treated system (i.e., to reduce the concentration) or not. The measurements are both rounded to the nearest 1 ppb (integer) and are left censored at 2 ppb. A Stan model I’ve been working on, which seems to be working well, is shown at the bottom of the post. Hopefully it will help clarify the brms model I’m looking for.
The reason I’m posting is that I’d really like to use brms to model this data because, if possible, I think it would be much more efficient for checking and comparing various different parameterizations of covariates and levels and such. It’d also be a nice way to double-check my own code, which has a lot of moving parts once I start adding and removing additional hierarchical levels and predictors.
The obstacle I’ve run into, however, is that I’m not sure how to deal with some specific aspects of this data via brms, mostly related to rounding. I’ve seen some examples of measurement error for the response in brms, for example, but I’m unsure of how to deal with rounding, specifically. Or if it is even possible, actually. Second, because of the left censoring, I believe there is also a catch to the rounding at the censoring boundary ( i.e., measures reported at y = 2 ppb could only be y + 0.5 instead of y +/- 0.5) that I’m also unsure how to deal with in brms. For maybe more clarity, if the issue were only censoring, I think my brms model would be something like:
brm( conc | cens( det ) ~ obs_trt + county_trt + ( 1 | county ), data = dat, family = lognormal() )
And ‘dat’ would contain the indicator vector ‘det’ with -1 at every observation where ‘conc’ is left censored and 0 for non-censored observations.
Where I’m stuck, again, is (1) finding a brms specification for rounding of the response and (2) dealing with rounding at the left-bounded observations at 2 ppb.
Thanks a bunch for any insight, and sorry if the question is not yet clear.
The Stan model:
data{
int prior_only; // should likelihood be ignored?
int< lower = 1 > N;
int< lower = 1 > N_obs;
int< lower = 1 > N_cens;
int< lower = 0 > N_counties;
vector[ N ] county_trt;
int< lower = -1 , upper = 1 > trt[ N ];
int< lower = 0 > county[ N ];
vector [ N ] y; //lower is 2 in Stan datalist
int<lower = 0 , upper = 1 > Det[ N ]; // 1 = detected, 0 = not det, MDL = 2ppb
int< lower = 0 , upper = 1 > bound[ N ]; // indicator (0 , 1) for obs at exactly 2ppb
}
transformed data {
//for observed measurements ( > MDL (2 ppb) )
vector[ N_obs ] y_obs;
vector[ N_obs ] county_trt_o;
int< lower = -1 , upper = 1 > trt_o[ N_obs ];
int< lower = 0 > county_o[ N_obs ];
int< lower = 0 , upper = 1 > bound_o[ N_obs ]; // indicator that obs reported at MDL;
//for censored measurements ( < 2ppb )
vector[ N_cens ] county_trt_c;
int< lower = -1 , upper = 1 > trt_c[ N_cens ];
int< lower = 0 > county_c[ N_cens ];
{
int i = 1;
int j = 1;
for ( n in 1:N ){
if( Det[ n ] == 0 ){ //group indices for censored obs
trt_c[ i ] = trt[ n ];
county_trt_c[ i ] = county_trt[ n ];
county_c[ i ] = county[ n ];
i = i + 1;
} else { //group indices for obs > MDL
bound_o[ j ] = bound[ n ];
y_obs[ j ] = y[ n ];
trt_o[ j ] = trt[ n ];
county_trt_o[ j ] = county_trt[ n ];
county_o[ j ] = county[ n ];
j = j + 1;
}
}
}
}
parameters{
real a0;
real b_trt;
real b_county_trt;
vector[ N_counties ] z_county;
real< lower = 0 > sd_county;
real< lower = 0 > sigma;
}
transformed parameters{
vector[ N_counties ] a_county;
a_county = sd_county * z_county;
}
model{
vector[ N_obs ] mu_obs;
vector[ N_cens ] mu_cens;
//priors
target += normal_lpdf( sd_county | 0 , 0.7 ) - 1 * normal_lccdf( 0 | 0 , 0.7 );
target += normal_lpdf( sigma | 0 , 0.7 ) - 1 * normal_lccdf( 0 | 0 , 0.7 );
target += normal_lpdf( a0 | 0 , 3 );
target += normal_lpdf( b_trt | 0 , 2 );
target += normal_lpdf( b_county_trt | 0 , 2 );
target += normal_lpdf( z_county | 0 , 1 );
// linear predictor
for ( o in 1:N_obs )
mu_obs[ o ] = a0 + b_trt * trt_o[ o ] + b_county_trt * county_trt_o[ o ] + a_county[ county_o[ o ] ];
for ( c in 1:N_cens )
mu_cens[ c ] = a0 + b_trt * trt_c[ c ] + b_county_trt * county_trt_c[ c ] + a_county[ county_c[ c ] ];
//likelihood
if( prior_only == 0 ){ //sample from prior only?
for ( o in 1:N_obs ){
if( bound_o[ o ] == 1 ){ // measurement observed and exactly 2 ppb
target += log( lognormal_cdf( y_obs[ o ] + 0.5 , mu_obs[ o ] , sigma )
- lognormal_cdf( y_obs[ o ] , mu_obs[ o ], sigma ) );
} else {
target += log( lognormal_cdf( y_obs[ o ] + 0.5 , mu_obs[ o ] , sigma )
- lognormal_cdf( y_obs[ o ] - 0.5 , mu_obs[ o ], sigma ) );
}
}
target += lognormal_lcdf( 2 | mu_cens , sigma ); // censored observations
}
}
- Operating System: Windows 10 x64
- brms Version: 2.10.0