Hi,
I am relatively new to Stan and to Bayesian analysis generally. I have been working on a multi-step problem involving estimating intergenerational income elasticities. The first step is to account for measurement error (transitory variance) in log-income as a child to estimate long-term “permanent childhood income.”
I have multiple measures (at different ages) for each child, and multiple children per family. In the data, there are about 100,000 person-year observations, nested within about 9,000 persons, nested within about 4,000 families. I control for age and estimate “true” permanent income using the family and person random effects. The model has been painfully slow. I am using rstan in Rstudio.
I have created a smaller artificial data set (6000 observations, 1500 persons, 500 families) to validate the model, and it does a nice job recovering the parameters in a reasonable time frame (<1hr for 5,000 iterations). But when applied to the real data, it is a day-long ordeal, if not longer (e.g., with 5,000 or 10,000 iterations).
I am pasting the model in below, hoping for some guidance on speeding things up. I have done my best to use vectors rather than loops, and to reparameterize some of the hierarchical parameters.
data {
int<lower=1> Np; // number of person-year observations
int<lower=1> J; // number of persons
int<lower=1> F; // number of families
int<lower=1> Kp; // number of control variables
vector[Np] Yp; // outcome
matrix[Np,Kp] Xp; // controls
int<lower=1, upper=J> SubjP[Np]; // person IDs
int<lower=1, upper=F> FamP[Np]; // family IDs
int<lower=1, upper=J> Subj2[J]; // person IDs
int<lower=1, upper=F> Fam2[J]; // family IDs
}
parameters {
// parent income model
real alphaP; // intercept
vector[Kp] betaP; // control coefficients
vector[J] aP_i_raw; // person effects
vector[F] aP_f_raw; // family effects
real<lower=0> sP_e; // within-person sd
real<lower=0> sP_i; // between-person sd
real<lower=0> sP_f; // between-family sd
}
transformed parameters {
vector[J] aP_i;
vector[F] aP_f;
vector[J] Xtrue; // "true" person-specific outcome
aP_i = 0 + sP_i*aP_i_raw; // implies aP_i ~ normal(0, sP_i)
aP_f = 0 + sP_f*aP_f_raw; // implies aP_f ~ normal(0, sP_f)
Xtrue = alphaP + aP_f[Fam2] + aP_i[Subj2];
}
model {
vector[Np] muP;
alphaP ~ normal(0, 5);
betaP ~ normal(0, 1);
aP_i_raw ~ normal(0, 1);
aP_f_raw ~ normal(0, 1);
sP_i ~ normal(0, 2);
sP_f ~ normal(0, 2);
sP_e ~ normal(0, 2);
muP = alphaP + Xp*betaP + aP_f[FamP] + aP_i[SubjP];
Yp ~ normal(muP, sP_e);
}
[edit: set code as code]