Hi!
I have a model that I am struggling to implement well in Stan, and would be grateful for any advice.
Theoretically, this is a case where it could make sense to condition a part of the model on a function of some parameters (which actually seems to work quite well in JAGS.) However, this does not work with Hamiltonian MC (and I know it may generally be a bad idea). I think your general advice is to aim for marginalization in such circumstances, but that does not seem possible here, as I would typically have about 5000 binary conditions/parameters to marginalize. Thus, I am wondering if there is a better alternative. Is there an efficient way of achieving something akin to marginalization of a large number of latent binary parameters?
My model (see code below) is a mixture model with two components and my issue arises in the component p[i,j]. p[i,j] should itself be taken from a different function depending on the binary condition l[i,j], which could be operationalized and modeled in several ways. l[i,j] can essentially be observed as V[i] > Y[i,j], but the data will inevitably contain error, so setting l[i,j] = V[i] > Y[i,j] results in endogeneity/post-treatment bias. l[i,j] could be plausibly be estimated as the conditional statement V[i] > (alpha[i] + beta[i] * theta[j]), but that would not work with HMC. I have been experimenting with a logistic transformation: l[i,j] = (1 / (1 + exp(-(V[i]-(alpha[i] + beta[i] * theta[j]))*10))), which works, but is theoretically unattractive and tends to fit certain patterns of noise when using real data.
I am attaching some downsized fake data and an example model. The true gamma parameters in the attached data are all zero. (The model is doing some very explicit transformations of the parameters to clarify the issues at hand. It can probably also be sped up by vectorization(?).)
To sum up: Is there a way to treat l as a vector of latent binary parameters in Stan?
Any help would be greatly appreciated!
data {
int<lower=0> J; // n of clusters
int<lower=0> N; // n of individuals
int B; // scale bound
int<lower=-B,upper=B> Y[N,J]; // outcome of interest
real<lower=-B,upper=B> U[N,J]; // predictive info
int<lower=-B,upper=B> V[N]; // predictive info
}
parameters {
real theta[J]; // latent factor
real alpha[N]; // individual intercept (in the first mixture component)
real beta[N]; // individual coef (in the first mixture component)
real<lower=0> phi; // sd of alpha
real<lower=0> bhi; // sd of beta
real<lower=0> sigma; // sd of errors in y
real<lower=0,upper=1> gamma[N]; // weight between two mixture components
real<lower=0> gammaa; // hyperparameter
real<lower=0> gammab; // hyperparameter
}
transformed parameters {
real p[N,J];
real<lower=0,upper=1> l[N,J]; // This should be an integer condition
for (i in 1:N) {
for (j in 1:J) {
// It could make sense with:
// l[i,j] = V[i] > (alpha[i] + beta[i] * theta[j]);
// Or even some version of this (though it would give post-treatment bias):
// l[i,j] = V[i] > Y[i,j]
// This works, but it is theoretically unattractive (and fits noise in real data):
l[i,j] = (1 / (1 + exp(-(V[i]-(alpha[i] + beta[i] * theta[j]))*10)));
p[i,j] = (l[i,j] * (U[i,j]*V[i] + B*U[i,j] - B) + (1-l[i,j]) * (U[i,j]*V[i] - B*U[i,j] + B));
}
}
}
model {
alpha ~ normal(0, phi);
phi ~ cauchy(0,B);
beta ~ normal(1, bhi);
bhi ~ cauchy(0,1);
gamma ~ beta(gammaa, gammab);
gammaa ~ gamma(1.5,.5);
gammab ~ gamma(1.5,.5);
theta ~ normal(0,10);
sigma ~ cauchy(0,10);
for (i in 1:N)
for (j in 1:J)
Y[i,j] ~ normal((1-gamma[i])*(alpha[i] + beta[i] * theta[j]) + gamma[i]*p[i,j], sigma);
}
model.stan (1.7 KB)
data.R (1.8 KB)
fit model.R (445 Bytes)