Help speeding up a slow hierarchical model?

jfiel · June 21, 2017, 12:40am

Hi,

I am relatively new to Stan and to Bayesian analysis generally. I have been working on a multi-step problem involving estimating intergenerational income elasticities. The first step is to account for measurement error (transitory variance) in log-income as a child to estimate long-term “permanent childhood income.”

I have multiple measures (at different ages) for each child, and multiple children per family. In the data, there are about 100,000 person-year observations, nested within about 9,000 persons, nested within about 4,000 families. I control for age and estimate “true” permanent income using the family and person random effects. The model has been painfully slow. I am using rstan in Rstudio.

I have created a smaller artificial data set (6000 observations, 1500 persons, 500 families) to validate the model, and it does a nice job recovering the parameters in a reasonable time frame (<1hr for 5,000 iterations). But when applied to the real data, it is a day-long ordeal, if not longer (e.g., with 5,000 or 10,000 iterations).

I am pasting the model in below, hoping for some guidance on speeding things up. I have done my best to use vectors rather than loops, and to reparameterize some of the hierarchical parameters.

data {
	int<lower=1> Np; 					// number of person-year observations
	int<lower=1> J;						// number of persons
	int<lower=1> F; 					// number of families
	int<lower=1> Kp; 					// number of control variables
	vector[Np] Yp; 						// outcome
	matrix[Np,Kp] Xp; 					// controls
	int<lower=1, upper=J> SubjP[Np]; 	// person IDs
	int<lower=1, upper=F> FamP[Np]; 	// family IDs
	int<lower=1, upper=J> Subj2[J]; 	// person IDs
	int<lower=1, upper=F> Fam2[J]; 		// family IDs
}
parameters {
	// parent income model
	real alphaP; 						// intercept
	vector[Kp] betaP; 					// control coefficients 
	vector[J] aP_i_raw; 				// person effects
	vector[F] aP_f_raw; 				// family effects
	real<lower=0> sP_e; 				// within-person sd
	real<lower=0> sP_i; 				// between-person sd
	real<lower=0> sP_f; 				// between-family sd
}
transformed parameters {
	vector[J] aP_i;						
	vector[F] aP_f;						
	vector[J] Xtrue;					// "true" person-specific outcome
	aP_i = 0 + sP_i*aP_i_raw;			// implies aP_i ~ normal(0, sP_i)
	aP_f = 0 + sP_f*aP_f_raw;			// implies aP_f ~ normal(0, sP_f)
	Xtrue = alphaP + aP_f[Fam2] + aP_i[Subj2]; 
}
model {
	vector[Np] muP;
	alphaP ~ normal(0, 5); 
	betaP ~ normal(0, 1); 
	aP_i_raw ~ normal(0, 1); 	
	aP_f_raw ~ normal(0, 1); 	
	sP_i ~ normal(0, 2);
	sP_f ~ normal(0, 2);
	sP_e ~ normal(0, 2);
	muP = alphaP + Xp*betaP + aP_f[FamP] + aP_i[SubjP];    
	Yp ~ normal(muP, sP_e); 
}

[edit: set code as code]

Bob_Carpenter · July 17, 2017, 10:59pm

I missed that post ages ago, but your program’s already written about as efficiently as possible.

P.S. Sorry for the slow response—@betanalpha just fixed my settings and now I’m seeing all the old messages I missed.

Bob_Carpenter · July 17, 2017, 11:00pm

I should’ve added that if you have a large number of observations, the centered parameterization can be faster to mix.

And I don’t know how many iterations you’re targeting, but you shouldn’t need an n_eff in the thousands for most inferential purposes.

jfiel · July 19, 2017, 12:08am

Thanks for following up! There are a few parameters (especially some of the variance parameters) that have low n_eff values. This seems to be more of a problem in one data set with relatively few observations per group (family). A larger data set with more observations per group seems to require much fewer iterations.

But overall, I think things are manageable at this point, and I have ironed out enough of the kinks with my models using fewer numbers of iterations that I don’t mind waiting for them to re-run for a longer period of time.

Topic		Replies	Views
(Another) Slow Hierarchical Model in Stan Modeling fitting-issues , specification , performance	3	1772	September 29, 2017
How to speed up model? Modeling rstan , techniques , performance	7	811	September 9, 2020
Speeding up a hierarchical multinomial logit model Modeling	27	7745	October 18, 2022
Improving Hierarchical Model Efficiency Modeling rstan , fitting-issues , specification	7	536	February 14, 2023
Improving speed in hierarchical VAR modelling Modeling	0	369	August 22, 2019

Help speeding up a slow hierarchical model?

Related topics