I am trying to sample from the prior to compare the posterior to the prior. However, if I just take out the distribution of the data, then I get bad RHat values. Strangely, this only happens, if I just take out those parts of the model. If I actually take some more and just reduce the prior to the bare minimum (i.e. remove all other variables) the problem seems to disappear. So if I just use this model:
data {
int<lower=1> M; // Number of subjects
}
parameters {
// Group level parameters
real alpha_mu; // Boundary separation mean (parameter to log normal)
real<lower=0> alpha_sigma; // Boundary separation variance (parameter to log normal)
// Individual parameters
vector<lower=0>[M] alpha; // Individual boundary separation
}
model {
alpha_mu ~ std_normal();
alpha_sigma ~ exponential(10);
alpha ~ lognormal(alpha_mu,alpha_sigma);
}
Then everything is fine (note that all individual alphas are about the same):
Inference for Stan model: LogNormal_Priors.
4 chains, each with iter=4000; warmup=1000; thin=1;
post-warmup draws per chain=3000, total post-warmup draws=12000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
alpha_mu -0.01 0.11 1.03 -1.79 -0.73 -0.03 0.64 2.25 95 1.04
alpha_sigma 0.16 0.01 0.12 0.04 0.08 0.12 0.21 0.48 202 1.03
alpha[1] 1.82 0.33 2.90 0.16 0.48 0.97 1.90 9.84 76 1.06
alpha[2] 1.82 0.33 2.91 0.16 0.48 0.97 1.90 9.49 76 1.06
alpha[3] 1.82 0.33 2.92 0.16 0.48 0.97 1.90 9.57 78 1.06
alpha[4] 1.82 0.34 2.96 0.16 0.48 0.97 1.92 9.60 76 1.06
alpha[5] 1.81 0.33 2.93 0.16 0.47 0.97 1.89 9.55 77 1.06
alpha[6] 1.83 0.34 3.11 0.16 0.48 0.97 1.92 9.63 83 1.05
alpha[7] 1.83 0.34 2.96 0.16 0.48 0.97 1.93 9.81 78 1.06
alpha[8] 1.82 0.33 2.95 0.16 0.48 0.97 1.91 9.65 78 1.06
alpha[9] 1.82 0.33 2.92 0.16 0.48 0.96 1.90 9.65 78 1.06
alpha[10] 1.82 0.33 2.92 0.16 0.47 0.97 1.90 9.48 79 1.06
alpha[11] 1.82 0.33 2.89 0.16 0.48 0.97 1.92 9.48 77 1.06
alpha[12] 1.82 0.33 2.92 0.16 0.48 0.97 1.91 9.55 77 1.06
alpha[13] 1.82 0.33 2.92 0.16 0.47 0.97 1.92 9.47 76 1.06
alpha[14] 1.82 0.33 2.96 0.16 0.47 0.97 1.92 9.59 80 1.05
alpha[15] 1.82 0.33 2.88 0.16 0.47 0.97 1.90 9.60 76 1.06
alpha[16] 1.82 0.34 2.94 0.16 0.47 0.98 1.89 9.64 76 1.06
alpha[17] 1.83 0.33 2.95 0.16 0.47 0.97 1.89 9.63 79 1.06
alpha[18] 1.82 0.33 2.92 0.16 0.48 0.97 1.92 9.52 79 1.06
alpha[19] 1.83 0.33 2.99 0.16 0.48 0.97 1.91 9.65 80 1.05
alpha[20] 1.83 0.34 2.97 0.16 0.48 0.96 1.92 9.56 77 1.06
Samples were drawn using NUTS(diag_e) at Mon Sep 21 14:24:46 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
However, if I include more of the original model, suddenly RHat values become worse (note that the parts concerning alpha are exactly the same):
data {
int<lower=1> N; // Number of trials
int<lower=1> M; // Number of subjects
vector<lower=0>[N] RT; // Reaction times
int<lower=1> subj[N]; // Subject number for RT i
vector<lower=0,upper=1>[N] resp_l; // 1 if the response was on the left, 0 otherwise
vector<lower=0,upper=1>[N] incomp; // 1 if the trial was incompatible, 0 otherwiese
vector<lower=0,upper=1>[N] acc; // Accuracy: correct (1) or incorrect (0) response
real<lower=0> NDTMin; // Minimal Non-Decision time
real<lower=0> minRT[M];
}
parameters {
// Group level parameters
real alpha_mu; // Boundary separation mean (parameter to log normal)
real<lower=0> alpha_sigma; // Boundary separation variance (parameter to log normal)
real<lower=1> beta_alpha; // alpha parameter to beta distribution for starting value
real<lower=1> beta_beta; // beta parameter to beta distribution for starting value
real delta_mu; // mean drift rate (group level)
real<lower=0> delta_sigma; // variance
real eta_mu; // Drift rate difference between compatible / incompatible trials
real<lower=0> eta_sigma; // Drift rate difference (variance component)
// Individual parameters
vector<lower=0>[M] alpha; // Individual boundary separation
vector<lower=0,upper=1>[M] beta; // Individual starting value
vector[M] delta; // Individual drift rate
vector[M] eta_z; // Effect of this participants (z-score)
vector<lower=NDTMin>[M] tau; // non-decision time (no hierarchical model)
}
transformed parameters {
vector[N] beta_trl; // Beta for each trial
vector[N] delta_trl; // Drift rate in each trial
vector[M] eta; // Individual compatibility effects
eta = eta_mu + eta_z*eta_sigma;
// initial offset should mostly depend on handedness etc.
// i.e. a single offset towards left/right responses
// therefore, we reverse the beta, if the response was on
// the left
beta_trl = beta[subj] + resp_l-2*beta[subj] .* resp_l;
delta_trl = (delta[subj] + incomp .* eta[subj]) .* (2*acc-1);
}
model {
alpha_mu ~ std_normal();
alpha_sigma ~ exponential(10);
tau ~ uniform(NDTMin, minRT);
delta_mu ~ normal(0,10);
delta_sigma ~ cauchy(0,10);
beta_alpha ~ exponential(1);
beta_beta ~ exponential(1);
eta_mu ~ normal(0,10);
eta_sigma ~ cauchy(0,100);
alpha ~ lognormal(alpha_mu,alpha_sigma);
beta[1] ~ beta(beta_alpha, beta_beta);
delta ~ normal(delta_mu,delta_sigma);
eta_z ~ std_normal();
}
Then I get the result:
Inference for Stan model: DDM_Priors.
4 chains, each with iter=4000; warmup=1000; thin=1;
post-warmup draws per chain=3000, total post-warmup draws=12000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
alpha_mu -0.22 0.18 0.38 -0.86 -0.46 -0.27 0.01 0.56 5 1.71
alpha_sigma 0.48 0.12 0.22 0.07 0.33 0.48 0.63 0.90 4 2.27
alpha[1] 0.99 0.18 0.69 0.23 0.59 0.83 1.20 2.55 14 1.25
alpha[2] 0.96 0.12 0.61 0.12 0.56 0.82 1.24 2.45 25 1.12
alpha[3] 0.87 0.37 0.69 0.11 0.23 0.79 1.26 2.88 4 2.52
alpha[4] 1.07 0.14 0.72 0.22 0.57 0.89 1.40 3.11 25 1.18
alpha[5] 0.96 0.11 0.56 0.25 0.55 0.77 1.25 2.30 24 1.13
alpha[6] 0.80 0.29 0.61 0.10 0.35 0.66 1.12 2.30 4 1.45
alpha[7] 0.98 0.15 0.73 0.26 0.55 0.80 1.22 2.63 24 1.16
alpha[8] 1.19 0.22 0.77 0.27 0.64 0.97 1.58 3.13 12 1.29
alpha[9] 0.86 0.20 0.67 0.09 0.37 0.56 1.22 2.61 11 1.46
alpha[10] 0.89 0.15 0.57 0.15 0.52 0.71 1.13 2.32 14 1.34
alpha[11] 1.02 0.20 0.50 0.30 0.66 0.88 1.30 2.19 6 1.35
alpha[12] 0.99 0.22 0.62 0.15 0.49 0.87 1.38 2.52 8 1.31
alpha[13] 1.23 0.26 1.47 0.29 0.59 0.90 1.40 3.86 33 1.08
alpha[14] 0.94 0.13 0.57 0.26 0.52 0.78 1.25 2.28 20 1.15
alpha[15] 0.91 0.14 0.60 0.21 0.49 0.75 1.23 2.25 17 1.16
alpha[16] 1.15 0.09 0.64 0.26 0.75 1.09 1.43 2.68 48 1.05
alpha[17] 0.91 0.21 0.58 0.18 0.48 0.74 1.26 2.32 8 1.27
alpha[18] 1.20 0.18 0.54 0.40 0.76 1.13 1.53 2.42 9 1.28
alpha[19] 1.07 0.16 0.71 0.25 0.56 0.84 1.40 2.93 20 1.25
alpha[20] 0.88 0.18 0.48 0.35 0.50 0.74 1.13 2.04 7 1.35
Samples were drawn using NUTS(diag_e) at Mon Sep 21 15:01:45 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
So the individual alphas differ from each other and the RHat have become worse. I don’t really understand what is going on here, as the parts concerning these variables remain unchanged.