I’m looking for some advice on dealing with discrete latent variables. I’m trying to fit a regression discontinuity model, but rather than fix the discontinuity at some value a-priori, I’d like to allow the model to estimate it from the data.

Here’s some simulated data in R:

n <- 1000

dta <-
tibble(id = 1:n) %>%
mutate(x = rnorm(n),
c = 0,
y = 0 + .5*(x-c)*(c>x) + 1*(x-c)*(x>c) + 2*(x>c) + rnorm(n, 0, .1))


That is, the slope before the discontinuity is .5, the slope after is 1, and the whole slope jumps by 2 units at the discontinuity itself. For simplicity’s sake, I let the discontinuity occur at zero here.

I fit the following model to the data, which returned the correct values but with only a very low number of effective samples:

data {
int<lower=0> N; // Number of cases
vector[N] x;    // Running variable
vector[N] y;    // Outcome variable
}
parameters {
real a;
real b1;
real b2;
real b3;
real c;
real<lower=0> sigma;
}
model {

// Model
vector[N] mu;
for (n in 1:N) {
// compute non-linear predictor values
mu[n] = a + b1*(x[n]-c)*step(c-x[n]) + b2*(x[n]-c)*step(x[n]-c) + b3*step(x[n]-c);
}

// Priors
target += normal_lpdf( a | 0, 1 );
target += normal_lpdf( b1 | 0, 1 );
target += normal_lpdf( b2 | 0, 1 );
target += normal_lpdf( b3 | 0, 1 );
target += normal_lpdf( c | 0, 1 );
target += normal_lpdf( sigma | 0, 1 );

// Likelihood
target += normal_lpdf( y | mu, sigma );

}


I’m pretty sure that the issue is the b3*step(x[n]-c) part and arises because Stan does not like discrete latent variables. Is this correct? If so, does anyone have any advice on how I might deal with this problem?

I’m still a relative newbie and found the documentation on latent discrete variables pretty obtuse.

Any other general advice on making the model more efficient is also appreciated.

Your simulated data has a discontinuity that is about 10 times bigger than the random scatter. You don’t need to fit a model to see the where the jump happens. But that’s just a simulation; real data is rarely so clean.
A common hack to deal with discontinuities is to smooth them out. The following model is continuous for nonzero \lambda but in the limit \lambda\to0 is the same as yours (except that b2 is the change in slope).

data {
int<lower=0> N; // Number of cases
vector[N] x;    // Running variable
vector[N] y;    // Outcome variable
}
parameters {
real a;
real b1;
real b2;
real b3;
real c;
real<lower=0> sigma;
real<lower=0> lambda;
}
model {

// Model
vector[N] mu;
for (n in 1:N) {
// compute non-linear predictor values
mu[n] = a + b1*x[n]
+ b2*lambda*log1p_exp((x[n]-c)/lambda)
+ b3*inv_logit((x[n]-c)/lambda);
}

// Priors
target += normal_lpdf( a | 0, 1 );
target += normal_lpdf( b1 | 0, 1 );
target += normal_lpdf( b2 | 0, 1 );
target += normal_lpdf( b3 | 0, 1 );
target += normal_lpdf( c | 0, 1 );
target += normal_lpdf( sigma | 0, 1 );
target += normal_lpdf( lambda | 0, 1 );

// Likelihood
target += normal_lpdf( y | mu, sigma );

}


The model fits with decent effective sample size, although there are some divergent transitions. The posterior estimate for lambda is around 0.00005 which is about the same as distance between last x before the discontinuity and the first x after it.

2 Likes