Estimation accuracy is high but predictive accuracy is low? Possibly overfitting?

I am trying to validate a model, and thus generating simulated data and running it through two models, one with and without a covariate. This seems to work well, as in the covariate model estimated parameters(theta, beta, type) are very close to the true parameters, and the baseline model estimated parameters (theta and beta) are not. I am generating my data according to the covariate model and running that data through both, so this result makes sense as the baseline model isnt able to account for the effect of type. However, when i compare them using LOO, they do not seem to be any different in predictive accuracy.

model_covariate <- stan_model(model_code = "data {
  int<lower=0> I;
  int<lower=0> J;
  int<lower=0> K;
  int<lower=1> N;
  int<lower=1,upper=I> ii[N];
  int<lower=1,upper=J> jj[N];
  int<lower=1,upper=K> kk[N];
  int<lower=0> y[N];
}
parameters {
  vector[J] theta;
  vector[I-1] b_free;
  vector[K-1] k_free;
}
transformed parameters{
  vector[I] beta = append_row(b_free, -sum(b_free));
  vector[K] type = append_row(k_free, -sum(k_free));
}
model {

  theta ~ normal(0, 1);
  target += normal_lpdf(beta | 0, 3);
  target += normal_lpdf(type | 0, 3);
  
  y ~ poisson_log(theta[jj] - beta[ii] - type[kk]);
}
generated quantities{
  vector[N] log_lik;
  for(n in 1:N){
    log_lik[n] = poisson_lpmf( y[n] | exp(theta[jj[n]] - beta[ii[n]] - type[kk[n]]));
  
  }
}

")

The only difference between my covariate model above and the baseline is the absence of “type”. I am assuming there is overfitting going on? I am not sure. Below is my data generation code if needed. And here are the LOO results.

> loo_baseline

Computed from 8000 by 8000 log-likelihood matrix

         Estimate    SE
elpd_loo -11373.2 103.6
p_loo       228.6   7.5
looic     22746.3 207.3
Monte Carlo SE of elpd_loo is 0.2.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     7991  99.9%   856       
 (0.5, 0.7]   (ok)          9   0.1%   234       
   (0.7, 1]   (bad)         0   0.0%   <NA>      
   (1, Inf)   (very bad)    0   0.0%   <NA>  
-------------------------------------------------
> loo_covariate

Computed from 8000 by 8000 log-likelihood matrix

         Estimate    SE
elpd_loo -11372.7 103.6
p_loo       228.2   7.5
looic     22745.3 207.2
------
Monte Carlo SE of elpd_loo is 0.2.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     7990  99.9%   644       
 (0.5, 0.7]   (ok)         10   0.1%   296       
   (0.7, 1]   (bad)         0   0.0%   <NA>      
   (1, Inf)   (very bad)    0   0.0%   <NA>   

   >   loo::loo_compare(loo_covariate,loo_baseline)
       elpd_diff se_diff
model1  0.0       0.0   
model2 -0.5       0.3   
  I <- 40;
  J <- 200; 
  K <- I/10
  
 theta <- rnorm(J, 0, 1)
 beta <- rnorm(I, 0, 1)
 type <- rnorm(K, 0, 1)
  
  N <- I*J
  ii <- rep(1:I, times = J)
  jj <- rep(1:J, each = I)
  kk <- rep(1:K, times = N/K)

  y <- numeric(N)
  for(n in 1:N){
    y[n] <- rpois(1,exp(theta[jj[n]] - beta[ii[n]] - type[kk[n]]))
  }

  fit_covariate <- sampling(model_covariate, data = list(I = I,
                                                         J = J,
                                                         K = K,
                                                         N = N,
                                                         ii = ii,
                                                         jj = jj,
                                                         kk = kk,
                                                         y = y), 
                            iter = 4000)
  fit_baseline <- sampling(model_baseline, data = list(I = I,
                                                       J = J,
                                                       N = N,
                                                       ii = ii,
                                                       jj = jj,
                                                       y = y), 
                           iter = 4000)
  
  
  loo_baseline <- loo::loo(fit_baseline)
  loo_covariate <- loo::loo(fit_covariate)
  loo::loo_compare(loo_covariate,loo_baseline)


When you know (because this is based on simulation) that your model recapitulates the data generating process, and you can see that one model is doing a better job of parameter recovery, then the fact that the loo results are near identical causes me to wonder if there’s some bug in the code such that you are accidentally comparing the same model to itself. Even more so because the p_loo estimates are essentially identical, but clearly the more complicated model ought to have more effective parameters.

Oh, I see what might be happening, I think. The model without type already has the flexibility to estimate a unique value theta for each ii. These ii are completely nested inside the kk. So while the type should in theory provide some covariance structure for the values, the thetas are estimated flexibly enough to effectively subsume all of that variability, and the model doesn’t benefit much from the additional flexibility afforded by the type. I find this at least a little bit surprising since the prior you have on theta should be moderately regularizing (and corresponds to the generative process), but if your simulated values for type (of which there are only four) are all relatively near zero (and therefore don’t result in values of theta + type that are inconsistent with the normal prior on theta), then you might see something like this.