I’m looking for some advice on dealing with discrete latent variables. I’m trying to fit a regression discontinuity model, but rather than fix the discontinuity at some value a-priori, I’d like to allow the model to estimate it from the data.
Here’s some simulated data in R
:
n <- 1000
dta <-
tibble(id = 1:n) %>%
mutate(x = rnorm(n),
c = 0,
y = 0 + .5*(x-c)*(c>x) + 1*(x-c)*(x>c) + 2*(x>c) + rnorm(n, 0, .1))
That is, the slope before the discontinuity is .5, the slope after is 1, and the whole slope jumps by 2 units at the discontinuity itself. For simplicity’s sake, I let the discontinuity occur at zero here.
I fit the following model to the data, which returned the correct values but with only a very low number of effective samples:
data {
int<lower=0> N; // Number of cases
vector[N] x; // Running variable
vector[N] y; // Outcome variable
}
parameters {
real a;
real b1;
real b2;
real b3;
real c;
real<lower=0> sigma;
}
model {
// Model
vector[N] mu;
for (n in 1:N) {
// compute non-linear predictor values
mu[n] = a + b1*(x[n]-c)*step(c-x[n]) + b2*(x[n]-c)*step(x[n]-c) + b3*step(x[n]-c);
}
// Priors
target += normal_lpdf( a | 0, 1 );
target += normal_lpdf( b1 | 0, 1 );
target += normal_lpdf( b2 | 0, 1 );
target += normal_lpdf( b3 | 0, 1 );
target += normal_lpdf( c | 0, 1 );
target += normal_lpdf( sigma | 0, 1 );
// Likelihood
target += normal_lpdf( y | mu, sigma );
}
I’m pretty sure that the issue is the b3*step(x[n]-c)
part and arises because Stan does not like discrete latent variables. Is this correct? If so, does anyone have any advice on how I might deal with this problem?
I’m still a relative newbie and found the documentation on latent discrete variables pretty obtuse.
Any other general advice on making the model more efficient is also appreciated.