I considered a simpler model. The simpler has the following form
X_i(t_{ij}) = \eta + w_{0i} + w_{1i}log(t_{ij}) + \varepsilon_{ij}, where
\varepsilon_{ij} \sim N(0, \sigma^2), and
\Sigma_{w} =
\begin{pmatrix}
\sigma^2_1& \rho\sigma_1\sigma_2 \\
\rho\sigma_1\sigma_2 & \sigma^2_2
\end{pmatrix}.
In other words, X_i \sim N(\eta + w_{0i} + w_{1i}log(t_{ij}), \sigma_1^2 + \rho\sigma_1\sigma_2\log(t_{ij}) + \log(t_{ij})[\rho\sigma_1\sigma_2 + \sigma^2_2\log(t_{ij})] + \sigma^2).
I simulated a small dataset from this model. This dataset contains 50 units (i = 1,\dots,50), and each unit has 100 measurements (j = 1,\dots,100). The true parameter values are
\eta = 1, \sigma_1 = 0.46, \sigma_2 = 0.22, \rho = 0.20, \sigma = 0.05.
I used the following Stan code:
data {
int<lower=1> n; // Sample size
int<lower=1> N; // Total number of times considered (all times for all units)
real<lower=0> tt[N]; //All times for all units
real y_tt[N]; //All of the use-rates
int w_ind[N]; //Indices for w
}
parameters {
real <lower = 0> sig;
real <lower = 0> sig1;
real <lower = 0> sig2;
real <lower = -1, upper = 1>rho;
real eta;
matrix[2, n] w;
}
transformed parameters{
//Defining covariance matrix, and cholesky decomposition
cov_matrix[2] Sigma;
matrix[2,2] L;
matrix[2, n] w2;
Sigma[1,1] = sig1^2;
Sigma[1,2] = rho*sig1*sig2;
Sigma[2,1] = rho*sig1*sig2;
Sigma[2,2] = sig2^2;
//Cholesky decomposition
L = cholesky_decompose(Sigma);
w2 = L*w;
}
model {
//Defining mean and variance of mixed effects model
real Mu[N];
real Sig[N];
//Parameters used in mixed effects model
for(i in 1:N){
Mu[i] = eta + w2[1,w_ind[i]] + w2[2,w_ind[i]]*log(tt[i]);
Sig[i] = sqrt(sig1^2 + log(tt[i])*(2*rho*sig1*sig2 + sig2^2*log(tt[i])) + sig^2);
}
//Likelihood
target += normal_lpdf(y_tt|Mu, Sig);
// Prior:
eta ~ normal(1, 0.1);
to_vector(w) ~ normal(0, 1);
sig ~ normal(0.05, 0.1);
sig1 ~ normal(0.46, 0.1);
sig2 ~ normal(0.22, 0.1);
rho ~ normal(0.2, 0.1);
}
I also set the initial values (using one chain) to be the true values. I have posted the output below.
Inference for Stan model: covariate_history.
1 chains, each with iter=1000; warmup=500; thin=1;
post-warmup draws per chain=500, total post-warmup draws=500.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
sig 0.00147989 0.00004558 0.00109131 0.00008564 0.00056554 0.00131792 0.00210129 0.00407314 573 1.0050062
sig1 0.12062747 0.00025085 0.00375407 0.11334060 0.11832019 0.12053361 0.12286424 0.12849077 224 0.9980990
sig2 0.03797979 0.00005621 0.00089803 0.03612970 0.03738746 0.03803156 0.03854388 0.03971250 255 0.9980047
rho -0.64260523 0.00136347 0.01910576 -0.67802336 -0.65486925 -0.64397021 -0.62984465 -0.60499704 196 0.9986664
eta 0.94631716 0.00090889 0.01255101 0.92417353 0.93723272 0.94619426 0.95472676 0.97200233 191 0.9980701
Sigma[1,1] 0.01456505 0.00006055 0.00090766 0.01284609 0.01399967 0.01452835 0.01509562 0.01650988 225 0.9980670
Sigma[1,2] -0.00295015 0.00001621 0.00023883 -0.00345336 -0.00308949 -0.00294962 -0.00279064 -0.00249462 217 0.9981057
Sigma[2,1] -0.00295015 0.00001621 0.00023883 -0.00345336 -0.00308949 -0.00294962 -0.00279064 -0.00249462 217 0.9981057
Sigma[2,2] 0.00144327 0.00000423 0.00006811 0.00130536 0.00139782 0.00144640 0.00148563 0.00157708 259 0.9980188
L[1,1] 0.12062747 0.00025085 0.00375407 0.11334060 0.11832019 0.12053361 0.12286424 0.12849077 224 0.9980990
L[1,2] 0.00000000 NaN 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 NaN NaN
L[2,1] -0.02442087 0.00008574 0.00125753 -0.02688328 -0.02521050 -0.02444180 -0.02360880 -0.02193476 215 0.9983378
L[2,2] 0.02907222 0.00001766 0.00034375 0.02838163 0.02884897 0.02907822 0.02929589 0.02969408 379 1.0009660
lp__ 3641.15104392 0.44610895 6.74073912 3626.35592137 3637.30179038 3641.17944491 3645.82128203 3653.33882154 228 0.9990915
The samples are, once again, not near the true values (apart from eta). After putting Sigma, L and w2 in the transformed parameters block, I am able to see these parameters in the Stan output.
It appears that something is going wrong with the Cholesky decomposition. There are NaN values in the L[1,2] row of the output.
In addition, I used the parameter samples for \eta, \sigma_1, \sigma_2, \rho, and \sigma to make some predictions. Namely, I forecasted the covariate process of a new unit that may have just joined the cohort of units. I compared these forecasts to forecasts using the true values.
I have attached a plot of the forecasts:
forecasts.pdf (7.3 KB)
The forecasts from the Stan samples are much narrower compared to the forecasts from the true parameter values.
Aside: when writing
sqrt(vector)
does Stan take the square root of each element of the vector?