I am trying to model interactions between nominal predictors, following Kruschke’s strategy from DBDA. That is, I model my data with unconstrained parameters, and then I calculate the parameters so that each predictor sum to zero, and the data is interpreted as deflections from the overall mean. I have copied my model code below, b0, b1, b2, b1b2, b1b3 are the unconstrained parameters and a0, a1, a2, a3, a1a2, a1a3 are the constrained parameters, a0 is the overall mean.
As my models becomes more complex I get warnings about transitions that exceed max_treedepth. However, I have noticed that while the unconstrained parameters chains are jagged and highly autocorrelated, the constrained parameters seem to be just fine. The posterior estimates of the constrained parameters are virtually identical to what is obtained with higher max_treedepth
that does not produce warning messages.
In my example it seem that the interaction term b1b3 is trading off with b0, but once the constraints (sumtozero) are imposed there is not such correlation, and posterior distributions/chains look clean.

Can I trust these posteriors for the constrained parameters even though the unconstrained parameters look bad? My real data/model is larger and more complex so increasing max_treedepth results in sampling time of days.

Is there a way to know which of the parameters are the culprits for the warnings other than checking the chains? I understand that parameters represent simultaneous credible values as a set but I can see that usually there are particularly problematic ones. Checking the pairs plot just shows me the problematic transition in all the parameters.

My posteriors are very sharp but still a bit off the true values. Specifically, the main effects seem to be a little smaller and the interactions bigger. I am guessing this is a problem with any interactions model, where it is not possible to distinguish between a particular model and the sample noise. Are there any tricks or ways to improve this?
This is the data I simulated to test my model.
# simulate 2 interactions
# main effects
a1 = c( 2,1,0,1,2 ); a2=c(1,0,1); a3=c( 3,0,3 )
n1=length(a1); n2=length(a2); n3=length(a3)
# interaction
a1a2 = rbind(c(1,0,0),
c(0,0,0),
c(0,0,0),
c(0,0,0),
c(0,0,1))
a1a3 = rbind(c(1,0,0),
c(0,0,0),
c(0,0,0),
c(0,0,0),
c(0,0,1))
sigma=0.2
a1_s=0.2; a2_s=0.2; a3_s=0.2; a1a2_s=0.2; a1a3_s=0.2
# number of samples
n=50
df < list(); i=1
for (l1 in 1:n1){
for (l2 in 1:n2){
for (l3 in 1:n3){
# l1=1; l2=1; l3=1
b1 < rnorm(n, a1[l1], a1_s)
b2 < rnorm(n, a2[l2], a2_s)
b3 < rnorm(n, a3[l3], a3_s)
b1b2 < rnorm(n, a1a2[l1, l2], a1a2_s)
b1b3 < rnorm(n, a1a3[l1, l3], a1a3_s)
mu < b1 + b2 + b3 + b1b2 + b1b3 + rnorm(n, 0, sigma)
df[[i]] < tibble(a1=l1, a2=l2, a3=l3, b1=b1, b2=b2, b3=b3, b1b2=b1b2, b1b3=b1b3, value = mu)
i=i+1
}
}
}
df < bind_rows(df)
# run model
model_data < list(
n_obs = nrow(df),
n_a1 = df %>% distinct(a1) %>% nrow(),
n_a2 = df %>% distinct(a2) %>% nrow(),
n_a3 = df %>% distinct(a3) %>% nrow(),
obs2a1 = df$a1,
obs2a2 = df$a2,
obs2a3 = df$a3,
y = df$value,
y_mean = df %>% pull(value) %>% mean(),
y_sd = df %>% pull(value) %>% sd()
)
and this is the stan model
data{
int<lower=1> n_obs;
int<lower=1> n_a1;
int<lower=1> n_a2;
int<lower=1> n_a3;
int<lower=1, upper=n_a1> obs2a1[n_obs];
int<lower=1, upper=n_a2> obs2a2[n_obs];
int<lower=1, upper=n_a3> obs2a3[n_obs];
real y[n_obs];
real y_mean;
real y_sd;
}
parameters{
real b0;
vector[n_a1] b1;
real<lower=0> b1_s;
vector[n_a2] b2;
vector[n_a3] b3;
matrix[n_a1, n_a2] b1b2;
matrix[n_a1, n_a2] b1b3;
real<lower=0> b1b2_s;
real<lower=0> b1b3_s;
real<lower=0> sigma;
}
transformed parameters{
real mu[n_obs];
for (i in 1:n_obs){
mu[i] = b0 + b1[obs2a1[i]] + b2[obs2a2[i]] + b3[obs2a3[i]] +
b1b2[obs2a1[i], obs2a2[i]] + b1b3[obs2a1[i], obs2a3[i]];
}
}
model{
b0 ~ normal(y_mean,y_sd*5);
b1_s ~ normal(0,1);
b1 ~ normal(0, b1_s);
b2 ~ normal(0, 1);
b3 ~ normal(0, 1);
b1b2_s ~ normal(0,1);
for (j2 in 1:n_a2){
for (j1 in 1:n_a1){
b1b2[j1,j2] ~ normal(0, b1b2_s);
}
}
b1b3_s ~ normal(0,1);
for (j3 in 1:n_a3){
for (j1 in 1:n_a1){
b1b2[j1,j3] ~ normal(0, b1b3_s);
}
}
// likelihood
y ~ normal(mu, sigma);
}
generated quantities{
real a0;
vector[n_a1] a1;
vector[n_a2] a2;
vector[n_a3] a3;
matrix[n_a1, n_a2] a1a2;
matrix[n_a1, n_a3] a1a3;
real log_lik[n_obs];
real m[n_a1,n_a2, n_a3]; // cell means
real m_1x2[n_a1, n_a2];
real m_1x3[n_a1, n_a3];
// 
// for loo
for (i in 1:n_obs) {
log_lik[i] = normal_lpdf( y[i]  mu[i], sigma );
}
// 
// convert predictors to deviation from mean
for (j1 in 1:n_a1) {
for (j2 in 1:n_a2) {
for (j3 in 1:n_a3) {
m[j1,j2,j3] = b0 + b1[j1] + b2[j2] + b3[j3] + b1b2[j1,j2] + b1b3[j1,j3]; // cell means
}
}
}
// overall mean: a0 = mean(m);
a0 = 0.0;
for (j1 in 1:n_a1) {
for (j2 in 1:n_a2) {
for (j3 in 1:n_a3) {
a0 += m[j1,j2,j3];
}
}
}
a0 = a0 / num_elements(m);
// main effects
for (j1 in 1:n_a1) { a1[j1] = mean(to_array_1d(m[j1,:,:]))  a0; }
for (j2 in 1:n_a2) { a2[j2] = mean(to_array_1d(m[:,j2,:]))  a0; }
for (j3 in 1:n_a3) { a3[j3] = mean(to_array_1d(m[:,:,j3]))  a0; }
// interctions
for (j1 in 1:n_a1) {
for (j2 in 1:n_a2) {
m_1x2[j1,j2] = mean(to_array_1d(m[j1,j2,:]));
}
}
for (j1 in 1:n_a1) {
for (j3 in 1:n_a3) {
m_1x3[j1,j3] = mean(to_array_1d(m[j1,:,j3]));
}
}
for ( j1 in 1:n_a1 ) {
for ( j2 in 1:n_a2 ) {
a1a2[j1,j2] = m_1x2[j1,j2]  (a1[j1] + a2[j2] + a0);
}
}
for ( j1 in 1:n_a1 ) {
for ( j3 in 1:n_a3 ) {
a1a3[j1,j3] = m_1x3[j1,j3]  (a1[j1] + a3[j3] + a0);
}
}
}
Any help, comment, advice is greatly appreciated!