Hello Stan pros,
I’m trying to model outcomes based on weighting different aspects. The model I’m using is essentially
I’ve taken the model from literature where it was fit using EM-algorithms, but I was looking to use Stan to learn about the model and Stan.
There are 1 \leq d \leq N outcomes, K properties and 3 ratings per property. The ratings are the a_{dij}, \alpha_j is the coefficient mixing the ratings and w_{di} are individual weights for each property, which I’m actually interested in. For simplicity and possible performance increases, the outcome is standardised on sample mean and stdev.
Since the properties are correlated, the individual weightings will be too, thus there is a multivariate normal prior on the w's:
This resulted in my following Stan Code:
data {
int<lower=1> N;
int<lower=1> K;
vector[N] y;
real input[N, K, 3];
}
parameters {
cholesky_factor_corr[K] Lcorr;
vector<lower=0>[K] sigma;
vector[K] mu_weight;
vector[K] weight[N];
vector[3] alpha;
real<lower=0> sigma_sq;
}
transformed parameters {
real mu_rating[N];
cholesky_factor_cov[K] L_Sigma;
// Calculate terms and mean for likelihod
for (i in 1:N) {
mu_rating[i] = sum(weight[i,] * (to_row_vector(alpha) * to_matrix(input[i,,])')) * sigma_sq;
}
// Precalc MultiNormal Sigma
L_Sigma = diag_pre_multiply(sigma, Lcorr);
}
model {
// Priors
sigma ~ cauchy(0, 2.5);
Lcorr ~ lkj_corr_cholesky(1);
mu_weight ~ normal(0, 5);
alpha ~ normal(0, 10);
sigma_sq ~ normal(0, 5);
// Model for weights
for (i in 1:N)
weight[i] ~ multi_normal_cholesky(mu_weight, L_Sigma);
// Model for outcome
y ~ normal(mu_rating, 1);
}
The model performs very, very slowly with a lot of divergent transitions and low BFMI - especially as my data-set has N \approx 20000 and K \approx 150. To investigate the problems with my model, I used some simulated data and looked at the resulting posterior:
# Dimensions
N_SIM = 100
K_SIM = 10
# Model scale
sigma_sq_SIM = abs(rcauchy(1, 0, 2.5))
# Mean weightings
mu_weight_SIM = rnorm(K_SIM, 0, 5)
# Mixing coefficients
alpha_SIM = rnorm(3, 0, 10)
# Ratings between 0 and 1
input_SIM = array(rbeta(N_SIM*K_SIM*3, shape1=1, shape2=5), c(N_SIM, K_SIM, 3))
# Generate covariance matrix for weights
A <- matrix(runif(K_SIM^2)*2-1, ncol=K_SIM)
Sigma_SIM = t(A) %*% A
# Individual weightings drawn from posterior
weight_SIM <- rmvnorm(N_SIM, mu_weight_SIM, Sigma_SIM)
# Generating Outcome
mu_rating_SIM <- rep(NA, N_SIM)
y_SIM <- rep(NA, N_SIM)
for (i in 1:N_SIM) {
mu_rating_SIM[i] <- sum(weight_SIM[i,] * (alpha_SIM %*% t(input_SIM[i,,])))
y_SIM[i] <- rnorm(1, mean = mu_rating_SIM[i], sd = sigma_sq_SIM)
}
# Running Stan
smfit_simulated <- sampling(stanmodel, data = list(N = N_SIM,
K = K_SIM,
y = y_SIM,
input = input_SIM,
simulate_only = 0),
init_r = 1, chains = 1, iter = 1000)
The model is already fitting quite slowly, but more importantly, the posterior estimates are nowhere near the simulated data. In fact, the mu_weight
-parameter posterior estimates are bimodal - which they shouldn’t be as they are generated from a normal distribution.
This results in some questions I am not yet able to answer (trying to learn here…):
- Did I not properly set up the model? That is, is there a problem with the model in general that leads to the problems when using Stan?
- How can I reparameterize the model for improved quality and performance?
- How can I improve the performance in terms of speed aside from the reparameterization?
- Is it realistic to have a model that works well on the data-set of the size explained above? Or might a different approach, such as variational Bayes or EM, be more promising?
I’m hoping for some valuable lessons on how to set up better models. Simple regressions from textbooks did not prepare me for this… :)
Thanks!
Chris