I tried different priors on e, but anything but std_normal() seemed to poison the model. I’m also wondering if its ok just to discard e as I’m doing. Should it be added back in somewhere?
Is this the approach you intended? Just wondering if I understood correctly.
Edit: I added a group level intercept and it seems to improve things a bit:
parameters {
...
real Intcpt;
}
transformed parameters{
vector[n_obs] ystar = Intcpt + X*beta + G + e;
}
@jroon Just want to throw out there that this code below doesn’t give me crazy level of convergence issues, although sigma is pretty unidentified still:
data {
int<lower=0> N;
vector[N] x;
array[N] int<lower=0, upper=1> y;
}
parameters {
real alpha;
real beta;
real<lower=0> sigma;
vector[N] z;
}
transformed parameters {
vector[N] y_star;
y_star = alpha + beta*x + sigma*z;
}
model {
y ~ bernoulli_logit(y_star);
alpha ~ std_normal();
beta ~ std_normal();
sigma ~ std_normal();
z ~ std_normal();
}
Some example simulation code:
set.seed(1234)
N <- 100
alpha <- 1
beta <- 0.5
sigma <- 0.1
x <- rnorm(N)
y_star <- rnorm(N, alpha + beta * x, sigma)
y <- rbinom(N, size=1, prob=plogis(y_star))
m <- cmdstanr::cmdstan_model(stan_file="bernoulli-gaussian.stan")
fit <- m$sample(
data=list(N=N, x=x, y=y),
iter_warmup=1000,
iter_sampling=1000,
parallel_chains=4,
)
This model still includes the e term, but it should be tractable because you are not trying to estimate the variance of that term; you have fixed the variance to 1 by assumption. Is that what you intended?
More broadly, my question for you is (why) do you need the e term?
You suggest a prior on the variance of sigma? Simply didn’t occur to me to be honest. I will add it and see what happens.
Well I don’t as such. Decomposing the GP into X*beta + G + e wasn’t something I knew I could do until you suggested it. What I’m interested in estimating is G (well, really V), and to a lesser extent beta. As such e is just something to help me fit the model to the binomial outcome data.
Ok I am confused! Do you mean remove it as a parameter completely? (in which case I’m confused because I only put it in at your suggestion), or do you mean do this:
Sorry, I think we’ve been talking past each other for a while.
Your model from post #3 didn’t have an e term in it at all. However, your original model from post #1 did have an e term, as did your brms model from post #8. In post #11, I suggested an alternative brms model that doesn’t have the e term, but in post #12 you suggested that this model isn’t the one you want. I never understood why that would be the case.
We then spent several posts discussing how to isolate the e term from the gaussian process term, but my goal in stepping through this was to show why e would almost always cause problems when its variance is a parameter to be estimated. My advice would be to remove it.
In post #23, you show a model that includes the e term, but does not attempt to use the data to estimate the variance of that term; you fix the variance to 1 by construction in that model with e ~ std_normal();. That should remove the non-identifiability present in your models from posts #1 and #8, but I still see no reason to include this term.
Hi. Yes you are right we were talking past each. Sorry in my efforts to simplify the model, I introduced things I didnt’ intend. Lets do a soft restart :)
So the model now is:
functions {
matrix cov_Kzr(matrix Z, vector r) {
int n_obs = dims(Z)[1];
int n_exp = dims(Z)[2];
matrix[n_obs, n_exp] Zr = Z .* rep_matrix(sqrt(r'), n_obs);
matrix[n_obs, n_obs] K = rep_matrix(0.0, n_obs, n_obs);
for (i in 1:n_obs){
for (j in 1:n_obs){
if (i < j){
for (k in 1:n_exp){
K[i, j] += square( Zr[i, k] - Zr[j, k] );
}
K[j, i] = K[i, j];
}
}
}
return K;
}
// Calculate distance/covariance matrix for exposures at new values
matrix cov_Kz2r(matrix Z1, matrix Z2, vector r) {
// Define variables
int n_obs = dims(Z1)[1];
int n_exp = dims(Z1)[2];
int nobs2 = dims(Z2)[1];
matrix[n_obs, n_exp] Z1r = Z1 .* rep_matrix(sqrt(r'), n_obs);
matrix[nobs2, n_exp] Z2r = Z2 .* rep_matrix(sqrt(r'), nobs2);
matrix[n_obs, nobs2] K = rep_matrix(0.0, n_obs, nobs2);
// calculate K - cycling through all rows because K may not be square here
for (i in 1:n_obs){
for (j in 1:nobs2){
for (k in 1:n_exp){
K[i, j] += square( Z1r[i, k] - Z2r[j, k] );
}
}
}
return K;
}
}
data {
int n_obs;
int n_exp;
int n_cov;
int y[n_obs];
matrix[n_obs, n_exp] Z;
matrix[n_obs, n_cov] X;
}
parameters {
real<lower=0.0> sigma;
real<lower=0.0> lambda;
vector[n_cov] beta;
vector[n_obs] G;
//vector[n_obs] e;
vector<lower=0.0>[n_exp] r;
}
transformed parameters{
vector[n_obs] ystar = X*beta + G; // + e;
}
model {
// local params
matrix[n_obs, n_obs] V;
V = add_diag( lambda * exp(-cov_Kzr(Z, r)) , 1.0 ) ;
// Priors
sigma ~ inv_gamma(1, 0.25);
lambda ~ inv_gamma(10, 10);
r ~ cauchy(0, 1.0);
beta ~ std_normal();
//e ~ std_normal();
// Likelihood
//ystar ~ multi_normal_cholesky(X*beta, sigma * cholesky_decompose(V));
G ~ multi_normal_cholesky(rep_vector(0, n_obs), sigma * cholesky_decompose(V));
y ~ bernoulli_logit( ystar );
}
This version has resolved the diverengces I had to start with, but now I get chain mixing and ESS EFMI warnings: