Hello everyone,
I am posting because I have two major issue with my model and one minor.
First, Stan struggle to randomly find good initial value for my parameters, especially for the transformed parameters covS100, covS101, … I try to specify initial value for these parameters (to 0 in this code) but, the sampler persist in randomly choosing initial value for these parameters. They are highly restricted parameters but it should be working when initializing them at zero.
Error message:
Chain 1: Error evaluating the log probability at the initial value.
Chain 1: Exception: model1d0861eb27c2_3kit_fem_se_model_namespace::write_array: covS100 is -0.181278, but must be greater than or equal to -0.0621561 (in ‘model1d0861eb27c2_3kit_fem_se_model’ at line 24)
Second, when Stan finally sample, it report that there were many divergent transition after warm up. I already increased the adapt_delta to 0.99 but it does not improve sampling. Should I be worried about a potential bias during sampling? I look at the pairs and they seem quite good, but I do not know exactly what to look for in the pairs.
Warning messages:
1: There were 93794 divergent transitions after warmup. Increasing adapt_delta above 0.99 may help. See
*Runtime warnings and convergence problems *
2: Examine the pairs() plot to diagnose sampling problems
Third and final minor problem: I would like to change the model in order to allow the pt parameter (simplex vector of probability for the multinomial distribution), but multinomial distribution seem to not accept row vector as a distribution parameters. Is it possible to use a row of a matrix or something like that as a vector probability for a multinomial distribution? I solve this problem by write as many times as I wanted the “pt” parameters (for example tenfold with pt1, pt2, …, pt10 each of them being a vector of probability for the multinomial distribution) but the code for the model is much heavier and it is not very pretty…
Thank you very much in advance for your help.
Stan model
data {
int<lower=0> J; // Number of all types of possible test results. J=8 for three tests.
int<lower=0> observed_frequency[J]; //number of each type of test result (T1,T2,T3), in order of (1,1,1), (1,1,0), ..., (0,0,0)
}
parameters {
real<lower=0, upper=1> pi; // prevalence
real<lower=0, upper=1> c1; // specificity of T1
real<lower=0, upper=1> c2; // specificity of T2
real<lower=0, upper=1> c3; // specificity of T3
real<lower=1-c1, upper=1> s1; // sensitivity of T1
real<lower=1-c2, upper=1> s2; // sensitivity of T2
real<lower=1-c3, upper=1> s3; // sensitivity of T3
// covS111 = conditional dependence term for result (T1=1, T2=1, T3=1 | D=1)
real<lower=-s1*s2*s3, upper=fmin(s1, fmin(s2, s3))-s1*s2*s3> covS111;
real<lower=-(1-s1)*s2*s3, upper=fmin(1-s1, fmin(s2, s3))-(1-s1)*s2*s3> covS011;
real<lower=-(1-s1)*(1-s2)*s3, upper=fmin(1-s1, fmin(1-s2, s3))-(1-s1)*(1-s2)*s3> covS001;
real<lower=-(1-s1)*(1-s2)*(1-s3), upper=fmin(1-s1, fmin(1-s2, 1-s3))-(1-s1)*(1-s2)*(1-s3)> covS000;
// conditional dependence term for result (T1=1, T2=1, T3=1 | D=0)
real<lower=-(1-c1)*(1-c2)*(1-c3), upper=fmin(1-c1, fmin(1-c2, 1-c3))-(1-c1)*(1-c2)*(1-c3)> covC111;
real<lower=-c1*(1-c2)*(1-c3), upper=fmin(c1, fmin(1-c2, 1-c3))-c1*(1-c2)*(1-c3)> covC011;
real<lower=-c1*c2*(1-c3), upper=fmin(c1, fmin(c2, 1-c3))-c1*c2*(1-c3)> covC001;
real<lower=-c1*c2*c3, upper=fmin(c1, fmin(c2, c3))-c1*c2*c3> covC000;
}
transformed parameters {
real<lower=-s1*(1-s2)*(1-s3), upper=fmin(s1, fmin(1-s2, 1-s3))-s1*(1-s2)*(1-s3)> covS100;
real<lower=-s1*(1-s2)*s3, upper=fmin(s1, fmin(1-s2, s3))-s1*(1-s2)*s3> covS101;
real<lower=-s1*s2*(1-s3), upper=fmin(s1, fmin(s2, 1-s3))-s1*s2*(1-s3)> covS110;
real<lower=-(1-s1)*s2*(1-s3), upper=fmin(1-s1, fmin(s2, 1-s3))-(1-s1)*s2*(1-s3)> covS010;
real<lower=-(1-c1)*c2*c3, upper=fmin(1-c1, fmin(c2, c3))-(1-c1)*c2*c3> covC100;
real<lower=-(1-c1)*c2*(1-c3), upper=fmin(1-c1, fmin(c2, 1-c3))-(1-c1)*c2*(1-c3)> covC101;
real<lower=-(1-c1)*(1-c2)*c3, upper=fmin(1-c1, fmin(1-c2, c3))-(1-c1)*(1-c2)*c3> covC110;
real<lower=-c1*(1-c2)*c3, upper=fmin(c1, fmin(1-c2, c3))-c1*(1-c2)*c3> covC010;
vector<lower=0, upper=1>[J] pt; // joint probability of each type of possible test result
real covST12; // the pairwise conditional dependence between (T1=1|D=1) and (T2=1|D=1)
real covST13; // the pairwise conditional dependence between (T1=1|D=1) and (T3=1|D=1)
real covST23; // the pairwise conditional dependence between (T2=1|D=1) and (T3=1|D=1)
real covCT12; // the pairwise correlation between (T1=1|D=0) and (T2=1|D=0)
real covCT13; // the pairwise correlation between (T1=1|D=0) and (T2=3|D=0)
real covCT23; // the pairwise correlation between (T1=2|D=0) and (T2=3|D=0)
// calculate the transformed conditional dependence terms
covS100 = covS011 + covS111 - covS000;
covS101 = -(covS001 + covS011 + covS111);
covS110 = covS000 + covS001 - covS111;
covS010 = -(covS000 + covS001 + covS011);
covC100 = covC011 + covC111 - covC000;
covC101 = -(covC001 + covC011 + covC111);
covC110 = covC000 + covC001 - covC111;
covC010 = -(covC000 + covC001 + covC011);
// calculate the pairwise conditional dependence terms
covST12 = covS111 + covS110;
covST13 = covS111 + covS101;
covST23 = covS011 + covS111;
covCT12 = covC111 + covC110;
covCT13 = covC111 + covC101;
covCT23 = covC011 + covC111;
// pt[1]: probability for having result 111, pt[2] for 110, pt[8] for 000
pt[1] = pi*((1-s1)*(1-s2)*(1-s3)+covS000)+(1-pi)*((c1)*c2*c3 + covC000);
pt[2] = pi*((1-s1)*(1-s2)*s3+covS001)+(1-pi)*(c1*(c2)*(1-c3) + covC001);
pt[3] = pi*((1-s1)*s2*(1-s3)+covS010)+(1-pi)*(c1*(1-c2)*(c3) + covC010);
pt[4] = pi*((1-s1)*s2*s3+covS011)+(1-pi)*((c1)*(1-c2)*(1-c3) + covC011);
pt[5] = pi*(s1*(1-s2)*(1-s3)+covS100)+(1-pi)*((1-c1)*c2*c3 + covC100);
pt[6] = pi*(s1*(1-s2)*s3+covS101)+(1-pi)*((1-c1)*(c2)*(1-c3) + covC101);
pt[7] = pi*(s1*s2*(1-s3)+covS110)+(1-pi)*((1-c1)*(1-c2)*(c3) + covC110);
pt[8] = pi*(s1*s2*s3+covS111)+(1-pi)*((1-c1)*(1-c2)*(1-c3) + covC111);
}
model {
// priors:
pi ~ beta(0.5,0.5);
s1 ~ beta(10,1); //
s2 ~ beta(10,1);
s3 ~ beta(10,1);
c1 ~ beta(10,1); //
c2 ~ beta(10,1); //
c3 ~ beta(10,1); //
// distribution
observed_frequency ~ multinomial(pt);
}
generated quantities {
real log_lik;
log_lik = multinomial_lpmf(observed_frequency | pt);
}
R code for the launching model and init :
library(rstan)
data4stan ← list(J = 8,
observed_frequency = c(1000,50,150,12,20,5,10,100))
Initialisation
inits ← list(list(),list(),list(),list())
for (k in 1:4) {
s1 ← rbeta(1,10,0.5)
s2 ← rbeta(1,10,0.5)
s3 ← rbeta(1,10,0.5)
c1 ← rbeta(1,10,0.5)
c2 ← rbeta(1,10,0.5)
c3 ← rbeta(1,10,0.5)
covS111 ← 0
covS011 ← 0
covS001 ← 0
covS000 ← 0
covS100 ← 0
covS101 ← 0
covS110 ← 0
covS010 ← 0
inits[[k]] ← list(s1 = s1, s2 = s2, s3 = s3,c1 = c1, c2 = c2, c3 = c3, covS111 = covS111, covS011 = covS011, covS001 = covS001, covS000 = covS000,covS100 = covS100, covS101 =covS101, covS110 = covS110, covS010 = covS010)
}
sampling
fit ← stan(file = ‘3kit_fem_se_model.stan’, data = data4stan, init = inits,control = list(adapt_delta = 0.99,max_treedepth = 15),iter = 50000)
And part of the R output
Chain 4: Rejecting initial value:
Chain 4: Error evaluating the log probability at the initial value.
Chain 4: Exception: model1d0861eb27c2_3kit_fem_se_model_namespace::write_array: covS100 is 0.309272, but must be less than or equal to 0.190704 (in ‘model1d0861eb27c2_3kit_fem_se_model’ at line 24)
Chain 4: Rejecting initial value:
Chain 4: Error evaluating the log probability at the initial value.
Chain 4: Exception: model1d0861eb27c2_3kit_fem_se_model_namespace::write_array: covS100 is -0.0675094, but must be greater than or equal to -0.0271819 (in ‘model1d0861eb27c2_3kit_fem_se_model’ at line 24)
Chain 4: Rejecting initial value:
Chain 4: Error evaluating the log probability at the initial value.
Chain 4: Exception: model1d0861eb27c2_3kit_fem_se_model_namespace::write_array: covS100 is -0.365994, but must be greater than or equal to -0.00929893 (in ‘model1d0861eb27c2_3kit_fem_se_model’ at line 24)
Chain 4: Rejecting initial value:
Chain 4: Error evaluating the log probability at the initial value.
Chain 4: Exception: model1d0861eb27c2_3kit_fem_se_model_namespace::write_array: covS100 is -0.390722, but must be greater than or equal to -0.0181616 (in ‘model1d0861eb27c2_3kit_fem_se_model’ at line 24)
Chain 4:
Chain 4: Gradient evaluation took 0 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4:
Chain 4:
Chain 4: Iteration: 1 / 50000 [ 0%] (Warmup)
Chain 4: Iteration: 5000 / 50000 [ 10%] (Warmup)
Chain 4: Iteration: 10000 / 50000 [ 20%] (Warmup)
Chain 4: Iteration: 15000 / 50000 [ 30%] (Warmup)
Chain 4: Iteration: 20000 / 50000 [ 40%] (Warmup)
Chain 4: Iteration: 25000 / 50000 [ 50%] (Warmup)
Chain 4: Iteration: 25001 / 50000 [ 50%] (Sampling)
Chain 4: Iteration: 30000 / 50000 [ 60%] (Sampling)
Chain 4: Iteration: 35000 / 50000 [ 70%] (Sampling)
Chain 4: Iteration: 40000 / 50000 [ 80%] (Sampling)
Chain 4: Iteration: 45000 / 50000 [ 90%] (Sampling)
Chain 4: Iteration: 50000 / 50000 [100%] (Sampling)
Chain 4:
Chain 4: Elapsed Time: 114.509 seconds (Warm-up)
Chain 4: 116.137 seconds (Sampling)
Chain 4: 230.646 seconds (Total)
Chain 4:
Warning messages:
1: There were 93794 divergent transitions after warmup. Increasing adapt_delta above 0.99 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
2: Examine the pairs() plot to diagnose sampling problems
get_inits(fit) # only shown the first one
[[1]]
[[1]]$pi
[1] 0.3049692
[[1]]$c1
[1] 0.9998908
[[1]]$c2
[1] 0.9967834
[[1]]$c3
[1] 0.9950182
[[1]]$s1
[1] 0.8352106
[[1]]$s2
[1] 0.9600342
[[1]]$s3
[1] 0.9797707
[[1]]$covS111
[1] -8.722165e-17
[[1]]$covS011
[1] -2.380066e-17
[[1]]$covS001
[1] -8.139976e-19
[[1]]$covS000
[1] 2.271552e-18
[[1]]$covS100
[1] -1.132939e-16
[[1]]$covS101
[1] 1.118363e-16
[[1]]$covS110
[1] 8.86792e-17
[[1]]$covS010
[1] 2.23431e-17
[[1]]$pt
[1] 0.6893091004 0.0054188855 0.0032002879 0.0472823143 0.0002811954 0.0099742740 0.0049469999 0.2395869425
[[1]]$covST12
[1] 1.457555e-18
[[1]]$covST13
[1] 2.461465e-17
[[1]]$covST23
[1] -1.110223e-16
[[1]]$log_lik
[1] -661.8647