Hi,
I’m fairly new to stan, R and Bayesian, so sorry if my mistake is very rudimentary.
I’m ‘attempting’ to estimate a hierarchical model in rstan, but I’m receiving the following error messages when attempting to executing the sampling:
'[1] "Error in sampler$call_sampler(args_list[[i]]) : ’
and,
'[2] “Exception: []: accessing element out of range. index 6 out of range; expecting index to be between 1 and 0; index position = 1alpha (in ‘model126678353ad_Varying_Slope_Model_NoCov’ at line 31)”
Here is the model:
data {
int N; //number of participants
real behaviour[N]; //outcome
int respondent[N]; //vector of respondent IDs
real prevalence[N]; //predictor
int K; //number of respondents
}
parameters {
real alpha[K]; //respondent intercepts
real beta_prevalence[K]; //respondent slopes
real sigma[K]; //respondent variance
real alpha_top; //pooled intercept
real<lower=0> alpha_sigma; //pooled intercept variance
real beta_p_top; //pooled slope
real beta_p_sigma; //pooled slope variance
}
model {
for(i in 1:N) {
int aRespondent;
aRespondent = respondent[i];
behaviour[i] ~ normal(alpha[aRespondent] + beta_prevalence[aRespondent]*prevalence[i], sigma[aRespondent]);
}
alpha ~ normal(alpha_top, alpha_sigma); //priors
beta_prevalence ~ normal(beta_p_top, beta_p_sigma); //priors
sigma ~ normal(0,2); //priors
alpha_top ~ normal(0,2); //hyperpriors
beta_p_top ~ normal(0,2); //hyperpriors
alpha_sigma ~ normal(0,2); //hyperpriors
beta_p_sigma ~ normal(0,2); //hyperpriors
}
generated quantities {
real alpha_overall;
real beta_p_overall;
alpha_overall = normal_rng(alpha_top, alpha_sigma);
beta_p_overall = normal_rng(beta_p_top, beta_p_sigma);
Does anyone know if i’ve made a silly coding error?
What values do you have for N and K and what values does the respondent vector contain?
This makes me guess you have K=0
Hi @jtimonen,
N contains values 1 to 41756
K contains values 1 to 4646
‘respondent’ is an id variable with values 1 to K
@jtimonen
You were right regarding the K=0.
The code used to create my data list in R reads:
data ← list(behaviour=behaviour, prevalence=prevalence, N=length(behaviour), respondent=respondent, K=nlevels(as.factor(PRINT_DATA_1_IMP_LONG_NoNaN$respondent_id)))
Originally the code was devoid of the ‘as.factor’ function, so the ‘nlevels’ function wasn’t picking up the number of levels.
Strangely, its now producing a similar error message, but it now states that respondent values must be be less than or equal to 4143.
Exception: model1267eb17627_Varying_Slope_Model_NoCov_namespace::model1267eb17627_Varying_Slope_Model_NoCov: respondent[i_0__] is 4144, but must be less than or equal to 4143 (in ‘model1267eb17627_Varying_Slope_Model_NoCov’ at line 15)
The first thing to do is to check the data that’s being passed to Stan. What do you get when you run in R:
length(behaviour)
length(prevalence)
length(respondent)
nlevels(as.factor(PRINT_DATA_1_IMP_LONG_NoNaN$respondent_id))
Oh and range(respondent)
would probably be good to check as well
They should be only one integer so you probably mea N=41756 and K=4646? But you want to print their values to check what you are actually passing to Stan.
I suspect your PRINT_DATA_1_IMP_LONG_NoNaN$respondent_id
contains only 4143 unique values, and then nlevels(as.factor()) for it is 4143. You can put
int<lower=1, upper=K> respondent[N];
to your model to make these things easier to catch
1 Like
@andrjohns @jtimonen I’ve added <lower=1, upper=k> and that appears to have done the trick. I encountered similar problems with ‘behaviour’ and ‘prevalence’ and this solution has worked for all.
Thanks for you help and solutions!
Adding the bounds should have only thrown a more informative error and not solve the problem. So something weird is going on. And it makes no sense to add those bounds to prevalence and behaviour if they are continuous data. For sigma parameter you want to set <lower=0> if it is a variance parameter that should not be negative
I really recommend you to check the things @andrjohns said