I’m trying to fit some injected parameters, and for some reason when I use the bernoulli_logit on top of the multi_normal_cholesky I can’t recover my standard deviation (tau). If I don’t use bernoulli_logit the model fits just fine. I’m also getting divergent transitions. Any clues what I might be doing wrong here? This seems really straightforward and I’ve spent days trying everything to no avail.
Logit model that doesn’t work:
data {
int<lower=1> M; //Number of observations
int<lower=1> N_features; //Number of predictor variables
int<lower=0, upper=1> y_p_baseline[M, N_features]; //Boolean Observations
}
transformed data {
cholesky_factor_corr[N_features] L_Omega = to_matrix([[1,0,0,0],[0,1,0,0],[0,0,1,0],[0,0,0,1]]);
vector[N_features] mu = to_vector([0.1,0.4,-0.3,-0.9]);
}
parameters {
vector<lower=0>[N_features] tau;
vector[N_features] logit_p_baseline;
}
model {
matrix[N_features,N_features] L_Sigma = diag_pre_multiply(tau, L_Omega);
logit_p_baseline ~ multi_normal_cholesky(mu, L_Sigma);
tau ~ normal(0,0.5);
for (m in 1:M) {
y_p_baseline[m] ~ bernoulli_logit(logit_p_baseline);
}
}
Wrong values (should be 0.1,0.2,0.3,0.4)
In addition: Warning messages:
1: There were 12 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
2: Examine the pairs() plot to diagnose sampling problems
mean se_mean sd 10% 90% n_eff Rhat
tau[1] 0.2074085 0.01745183 0.2115113 0.02606833 0.4987222 146.8875 0.9962213
tau[2] 0.2519658 0.01535765 0.2236631 0.03904499 0.5640318 212.0991 1.0017057
tau[3] 0.2553860 0.02014701 0.2468789 0.03511752 0.5762193 150.1574 1.0055040
tau[4] 0.2543119 0.01650095 0.2171723 0.03971491 0.5733396 173.2170 1.0171034
R code for injecting the values:
M=1000 # observations
N_features = 4 # columns
mu = c(0.1,0.4,-0.3,-0.9)
Omega <- rbind(
c(1, 0, 0, 0),
c(0, 1, 0, 0),
c(0, 0, 1, 0),
c(0, 0, 0, 1)
)
tau = c(0.1, 0.2, 0.3, 0.4)
Sigma = diag(tau) %*% Omega %*% diag(tau)
y_p_baseline = plogis(mvtnorm::rmvnorm(M, mu, Sigma))
for(i in 1:N_features) {
for (j in 1:M) {
y_p_baseline[j,i]=rbinom(1,1,y_p_baseline[j,i])
}
}
Non-logit model that works just fine:
data {
int<lower=1> M; //Number of observations
int<lower=1> N_features; //Number of predictor variables
vector[N_features] y_p_baseline[M]; // Observations
}
transformed data {
cholesky_factor_corr[N_features] L_Omega = to_matrix([[1,0,0,0],[0,1,0,0],[0,0,1,0],[0,0,0,1]]);
vector[N_features] mu = to_vector([0.1,0.4,-0.3,-0.9]);
}
parameters {
vector<lower=0>[N_features] tau;
}
model {
matrix[N_features,N_features] L_Sigma = diag_pre_multiply(tau, L_Omega);
tau ~ normal(0,0.5);
y_p_baseline ~ multi_normal_cholesky(mu, L_Sigma);
}
summary(fit, pars = “tau”, probs = c(0.1,0.9))$summary
mean se_mean sd 10% 90% n_eff Rhat
tau[1] 0.1006119 0.0001386487 0.002224633 0.09789271 0.1035016 257.4455 0.9963595
tau[2] 0.2005033 0.0002549114 0.004368046 0.19502377 0.2057183 293.6270 0.9960373
tau[3] 0.2863369 0.0004980778 0.006869206 0.27825528 0.2946214 190.2036 1.0009162
tau[4] 0.3894062 0.0004740003 0.008429819 0.37932946 0.4006192 316.2854 0.9976708