Large model with skewed distributions

The following is a simplistic multilevel model to estimate effects of locations and products on transaction distributions. Results are good (using parameters to simulate and test actual transactions), but it takes a LONG time to run. Hoping for suggestions on how to improve approach and performance. Thanks:

data {
int N; // number of observations   
vector[N] y; // observations

int J; // number of groups- Prod
int PGrp[N]; // group indexes

int K; // number of grps - Location
int LGrp[N]; // Location level grp indexes

real p_Mu;
real p_sigma;
real p_alpha;

vector[J] p_PGrpMu;
vector<lower=0>[J] p_sigma_PGrp;
vector[J] p_alpha_PGrp;

vector[K] p_LGrpMu;
vector<lower=0>[K] p_sigma_LGrp;
vector[K] p_alpha_LGrp;

real  muSigma; 
real  sigmaSigma; 
real alphaSigma; 

real pGrpMuSigma; 
real pGrpSigmaSigma; 
real pGrpAlphaSigma; 

real lGrpMuSigma; 
real lGrpSigmaSigma; 
real lGrpAlphaSigma; 

parameters {

real mu;
real<lower=0> sigma;
real alpha;

vector[J] PGrpMu;
vector<lower=0>[J] sigma_PGrp;
vector[J] alpha_PGrp;

vector[K] LGrpMu;
vector<lower=0>[K] sigma_LGrp;
vector[K] alpha_LGrp;

model {

target += cauchy_lpdf(mu | p_Mu,  muSigma);
target += cauchy_lpdf(PGrpMu | p_PGrpMu, pGrpMuSigma);
target += cauchy_lpdf(LGrpMu | p_LGrpMu, lGrpMuSigma);

target += cauchy_lpdf(sigma | p_sigma, sigmaSigma);
target += cauchy_lpdf(sigma_PGrp | p_sigma_PGrp, pGrpSigmaSigma);
target += cauchy_lpdf(sigma_LGrp | p_sigma_LGrp, lGrpSigmaSigma);

target += cauchy_lpdf(alpha | p_alpha, alphaSigma);
target += cauchy_lpdf(alpha_PGrp | p_alpha_PGrp, pGrpAlphaSigma);
target += cauchy_lpdf(alpha_LGrp | p_alpha_LGrp, lGrpAlphaSigma);

target += skew_normal_lpdf(y | mu+PGrpMu[PGrp]+LGrpMu[LGrp], sigma+sigma_PGrp[PGrp]+sigma_LGrp[LGrp], alpha+alpha_PGrp[PGrp]+alpha_LGrp[LGrp] );

1 Like

you are using target += which includes propotional constants.


target += cauchy_lpdf(mu | p_Mu,  muSigma);

the sampling statement drops the proportional constant:

mu ~ cauchy( p_Mu,  muSigma);

If your model is well behaved (no divergences or other), you could try decreasing the adapt_delta, which will increase the step size and thus require fewer gradient evaluations. It could be risky as you could start seeing divergences that perhaps before you didn’t have.

Perhaps someone will be able to say if replacing those Cauchy priors with something else could lead to some speedups, I don’t have direct experience with that.

For reasonable models decreasing adapt_delta below 0.8 should generally lower performance. The step size will be lower, and fewer gradients evaluated, but the trajectories will be less accurate and more poorly explore the relevant parts of the space.

Cauchys are almost always a bad idea, especially in priors. There may be poor identifiability between the additive components, which could be resolved by more informative priors.

Hey @EllenIAH, the obvious suggestion here is to parallelise your model using the map_rect function function. There is a minimalist tutorial here.

It’ll essentially make parallel the log probability evaluation and so drastically decrease the clock time your model takes to run.

1 Like