I run the following `stan`

program as a parameter estimation task for a three factor model for market prices.

```
data
{
int N_days;
int N_markets;
int N_factors;
int minT;
int N_tenors;
vector[N_markets*N_tenors] y[N_days];
real mu;
real nu;
}
transformed data
{
vector[N_markets*N_tenors] mu_vec;
for (i in 1:(N_markets*N_tenors)) mu_vec[i] = mu;
}
parameters
{
real<lower=0, upper=10> a[N_markets];
real<lower=0, upper=10> b[N_markets];
real<lower=0, upper=10> c[N_markets];
real<lower=0.05, upper=0.2> d[N_markets];
// independent noise
vector<lower=0>[N_markets] sigma;
cholesky_factor_corr[N_factors*N_markets] L;
}
transformed parameters
{
vector[N_markets] sigma2;
vector[N_markets*N_tenors] sigma2_vec;
sigma2=sigma .* sigma;
for (i in 1:N_tenors)
{
sigma2_vec[(i-1)*N_markets+1:i*N_markets]=sigma2;
}
}
model
{
matrix[N_markets*N_tenors,N_markets*N_factors] Z;
matrix[N_markets*N_tenors,N_markets*N_tenors] LL;
real z;
Z = rep_matrix(0,N_markets*N_tenors,N_markets*N_factors);
for (i in 1:N_tenors)
{
for (u in 1:N_markets)
{
// this is adapted to the 3-factor model.
z=a[u]/((minT+i-1)/12.0+b[u]);
Z[(i-1)*N_markets+u, (u-1)*N_factors+1] = z;
Z[(i-1)*N_markets+u, (u-1)*N_factors+2] = sqrt(z)*c[u];
Z[(i-1)*N_markets+u, (u-1)*N_factors+3] = d[u];
}
}
LL = cholesky_decompose(add_diag(Z*multiply_lower_tri_self_transpose(L)*Z', sigma2_vec)/252.0);
L ~ lkj_corr_cholesky(nu);
y ~ multi_normal_cholesky(mu_vec, LL);
}
generated quantities
{
matrix[N_factors*N_markets, N_factors*N_markets] Cov;
Cov = multiply_lower_tri_self_transpose(L); //L*L'
}
```

For small dimensions, the performance is reasonably good with a run time of around 10 minutes. With small dimensions, I mean `N_days=625`

, `N_factors=3`

, `N_markets=2`

and `N_tenors=48`

resulting in a cholesky matrix `L`

\in \mathbb{R}^{n\times n} with `n=6`

.

However, the program scales really badly. For `N_markets=13`

and a resulting cholesky matrix with `n=39`

, the program takes over a week to finish, respectively, it does not finish at all.

I was wondering if there is a way to change the structure of the code to improve the performance for large dimensions or if this scaling behavior is to be expected and there is not much I can do?

Unfortunately, I cannot provide theactual dataset but if need be, I should be able to patch together a dummy dataset.