How to choose proper ranges of parameters which resulting in huge LOOIC change

I’ve been using Rstan and loo to develop my utility model with hierarchical structure (38 subjects).

However, due to the novelty of current model, I can’t decide the optimal ranges for my six parameters. So I tried different ranges, for example, for parameter alpha, I tried [-2,2] (model A) and [-20,20] (model B).

# model A
transformed parameters {
  vector<lower=-2, upper=2>[N] alpha;
}

# model B
transformed parameters {
  vector<lower=-20, upper=20>[N] alpha;
}

the corresponding results differed a lot. In model B, the distribution of alpha was more normal and seemed better.
图片

However, when it comes to the diagnostics, the model B has a much huger LOOIC than model A.
#model A

Computed from 6000 by 38 log-likelihood matrix

         Estimate   SE
elpd_loo  -1795.3 44.7
p_loo       106.2 19.7
looic      3590.6 89.5
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     18    47.4%   328       
 (0.5, 0.7]   (ok)       11    28.9%   68        
   (0.7, 1]   (bad)       4    10.5%   15        
   (1, Inf)   (very bad)  5    13.2%   1         
See help('pareto-k-diagnostic') for details.

#model B

Computed from 6000 by 38 log-likelihood matrix

         Estimate     SE
elpd_loo  -6813.3 1246.0
p_loo      5626.6 1161.0
looic     13626.6 2491.9
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)      0     0.0%   <NA>      
 (0.5, 0.7]   (ok)        0     0.0%   <NA>      
   (0.7, 1]   (bad)       2     5.3%   16        
   (1, Inf)   (very bad) 36    94.7%   0         
See help('pareto-k-diagnostic') for details.

the following the short version of scripts:

fit = rstan::stan(model)
looic = loo(fit)
looic

Moreover, I also calculated BIC which indicated an inversed pattern, that model B had a lower BIC.
Hence, I was completely confused by the these results and wanted to know (1) whether the data is not big enough or there are some problems in model specification to result in such a large LOOIC ( the smallest looic is still big); (2) why did the change of parameter range elicited such huge difference?

Thanks!!

These loo estimates are not reliable as there are many very bad k diagnostic values. Also, since in both cases, p_loo is much larger than the number of observations, there is likely something wrong with your models. If you want more help, please tell more about your data and model, and post the full model code.

Thanks for your reply!!!
In my task, participants need to integrate different information to make a decision. Model-free analysis has revealed the effect of these variables.Hence, it should be plausible to build a model and analyze it formally.

the following is model details, and higher or *range were used to adjust the parameter ranges.

data {
  int T; //trial-Num
  int N; // N SubNum
  real alpharange; // alpharange
  real gammarange;
  real Rconstrange;
  real taurange;
  real higher;
  int<lower=-1,upper=1>  choice[T,N]; // every choice in every trial
  matrix [T,6] stiset; // all the experimental parameters
}

parameters {
  // Hyper(group)-parameters
  vector[6] mu_pr;
  vector<lower = 0>[6] sigma;
  
  // Nject-level raw parameters 
  vector[N] alpha_pr; 
  vector[N] gamma_pr; 
  vector[N] delta_pr; 
  vector[N] tau_pr; 
  vector[N] Rconst_pr; 
  vector[N] IT_pr; //softmax inver tem
}

transformed parameters {
  // Transform Nject-level raw parameters
  vector<lower=-1*alpharange*higher, upper=1*alpharange*higher>[N] alpha;
  vector<lower=-1*gammarange*higher, upper=1*gammarange*higher>[N] gamma;
  vector<lower=-1*higher, upper=1*higher>[N] delta;
  vector<lower=-1*taurange*higher, upper=1*taurange*higher>[N] tau;
  vector<lower=-1*Rconstrange*higher, upper=1*Rconstrange*higher>[N] Rconst; 
  vector<lower=0, upper=10>[N] IT;

  for (i in 1:N) {
    alpha[i] = Phi_approx(mu_pr[1] + sigma[1] * alpha_pr[i])*alpharange*higher;
    gamma[i] = Phi_approx(mu_pr[2] + sigma[2] * gamma_pr[i])*higher;
    delta[i] = Phi_approx(mu_pr[3] + sigma[3] * delta_pr[i])*higher;
    tau[i]   = Phi_approx(mu_pr[4] + sigma[4] * tau_pr[i])*taurange*higher;
    Rconst[i]   = Phi_approx(mu_pr[5] + sigma[5] * Rconst_pr[i])*Rconstrange*higher;
    IT[i]   = Phi_approx(mu_pr[6] + sigma[6] * IT_pr[i])*10;
  }
}

model {
  
  real opAvalue;
  real opBvalue;
  int c;
  // Hyperparameters 
  mu_pr ~ normal(0,1); 
  sigma ~ normal(0,3); 

  // individual parameters
  gamma_pr ~ normal(0,1.0);
  tau_pr ~ normal(0,1.0);
  alpha_pr ~ normal(0,1.0); 
  delta_pr ~ normal(0,1.0);
  Rconst_pr ~ normal(0,1.0);
  IT_pr ~ normal(0,1.0);
  
  for (i in 1:N){
    // Define values
    for (t in 1:T){
      
        opAvalue= ((alpha[i] - stiset[t,6]*delta[i])*stiset[t,2])*(exp(-1*tau[i]*stiset[t,1])) - gamma[i]*stiset[t,4];
        opBvalue= -Rconst[i]*exp(-1*tau[i]*stiset[t,1]);

      // softmax decision-making
      choice[t,i] ~ bernoulli_logit(IT[i]*(opAvalue- opBvalue));
    }
  }
}
  
generated quantities {
  
  real opAvalue;
  real opBvalue;
  
  //for group level parameters
  real<lower=-1*alpharange*higher, upper=1*alpharange*higher> mu_alpha;
  real<lower=-1*gammarange*higher, upper=1*gammarange*higher> mu_gamma;
  real<lower=-1*higher, upper=1*higher> mu_delta;
  real<lower=-1*taurange*higher, upper=1*taurange*higher> mu_tau;
  real<lower=-1*Rconstrange*higher, upper=1*Rconstrange*higher> mu_Rconst;
  real<lower=0, upper=10> mu_IT;
  
  //for loglik calculation
  real log_lik[N];
  
  //for posterior predictive check
  real  y_pred[T,N];
  
  // set all posterior predictions to 0(avoids NULL values)
  for (i in 1:N){
    for (t in 1:T){
      y_pred[t,i] = -1;
    }
  }
  mu_alpha = Phi_approx(mu_pr[1])*alpharange*higher;
  mu_gamma = Phi_approx(mu_pr[2])*higher;
  mu_delta = Phi_approx(mu_pr[3])*higher;
  mu_tau = Phi_approx(mu_pr[4])*taurange*higher;
  mu_Rconst = Phi_approx(mu_pr[5])*Rconstrange*higher;
  mu_IT = Phi_approx(mu_pr[6])*10;
  
  { //local section, this saves time and space

  for (i in 1:N){
    // Define values
    log_lik[i] = 0.0;
    for (t in 1:T){
        opAvalue= ((alpha[i] - stiset[t,6]*delta[i])*stiset[t,2])*(exp(-1*tau[i]*stiset[t,1])) - gamma[i]*stiset[t,4];
        opBvalue= -Rconst[i]*exp(-1*tau[i]*stiset[t,1]);
      
      // softmax decision-making
      y_pred[t,i] = bernoulli_rng(inv_logit(IT[i]*(opAvalue- opBvalue)));
      log_lik[i] +=  bernoulli_logit_lpmf(choice[t,i] | (opAvalue- opBvalue)); 
    }
  }
  }
}

I think you should not sum all the log-likelihood for each participant, cause each participant differs a lot. If you use a matrix[T,N] to store the likelihood and then submit it to loo package, the k diagnostic values would be much better.

Thanks for your answer!! I’m not an expert in using Rstan. And the way i calculate log-likelihood was copied from other package hBayesDM.

Hence, would you please explain more about the difference of two log-likelihood calculation protocols? Is it because the loo package is more suitable for a log-likelihood matrix[T,N]?

The way that hbayesdm does is called leave one group out cross validation(LOGO). Usually, PSIS is not suitable for LOGO. You can check this post: [Cross-validation for hierarchical models](https://Cross-validation for hierarchical models). I guess the reason that habyesdm adopted this method is models that are included in this package are mainly reinforcement learning models which assume a time-dependent relationship between each trial, surely you can not perform simple leave one trial out CV for these models.