@bbbales2, @anon75146577, @wds15, @yizhang Thank you all for your advice!
@wds15: When you use a box-cox transform, do you estimate the scale and shift parameter before you load the data into your Stan model or do you estimate these in your full model?
For testing, I am using the crude method right now that wds15 suggested. It works well for the current purposes to keep the solution positive in case the ODE solution gets very close to zero. This happens only in case of parameters in low-probability regions anyways.
However, one thing bothers me still: When I run diagnose test=gradient
on my test model then I get some really large values for the finite difference approximations for some parameters. The model is so simple that I cannot really see where I should have made an error in my equations. Could this also be related to numbers getting really small in my ode solution for low-probability parameters? I guess the finite differences approximating the sensitivities of the ODE to get the gradient of the parameters going into the ODE could be very unstable once the solution gets close to zero.
For example, this is a result from a run of diagnose
Log probability=-3358.23
param idx value model finite diff error
0 1.17108 6483.47 6483.47 1.54628e-007
1 1.04201 -310.916 -242185 241874
2 0.297269 -21.5027 3461.6 -3483.1
3 0.24182 -416.358 -524502 524085
4 -0.876616 -74.6664 -74.6666 0.000220595
5 0.923798 -9.82334 3720.79 -3730.62
6 0.52609 -548.188 -8787.05 8238.86
7 1.39752 302.246 258035 -257732
8 -0.546857 4.52945 41220.4 -41215.9
9 0.831083 407.035 197269 -196862
10 1.72614 71.2488 71.2495 -0.000730462
11 -1.07626 5.26091 48026.4 -48021.2
12 1.49623 534.252 -242825 243359
13 1.47067 -3.12632 143452 -143455
14 -1.34691 -706.859 27951.8 -28658.6
15 1.13975 -3.43895 370977 -370981
16 0.832852 -0.981895 -0.981967 7.12531e-005
17 0.695118 -256.892 -14628.4 14371.6
18 0.280268 -6.0746 -93763.3 93757.3
I noticed that model derivatives via AD and finite differences are much more in line for parameter values with higher log probability. Maybe I am using diagnose
wrong?
Data and model are attached in case I am doing something obvious wrong.
exp.data.R (901 Bytes)
low_tol.stan (1.4 KB)
EDIT: Out of curiosity, I also tested @Bob_Carpenter’s model which was posted here. Running diagnose test=gradient
on that model also leads to large differences between model
and finite_diff
on my machine. It’s maybe not directly my model after all?
A typical run from diagnose looked like
TEST GRADIENT MODE
Log probability=-1334.85
param idx value model finite diff error
0 0.196752 726.335 -2873.67 3600.01
1 -1.01118 -456.268 3212.47 -3668.74
2 0.131759 -755.859 -3677.38 2921.52
3 0.363325 -437.119 2108.33 -2545.45
4 -1.71535 366.054 2707.01 -2340.96
5 0.45776 270.602 2972.9 -2702.3
6 1.96192 -16.8809 -16.8821 0.0011476
7 -1.35977 1777.21 1777.56 -0.35014