Slow Stan Code

onoufrios · June 22, 2018, 6:53am

Aftter a lot of effort I managed to get some results :

                  mean se_mean   sd      2.5%       25%       50%       75%     97.5% n_eff Rhat
z[1,1]           -0.12    0.01 0.39     -0.90     -0.39     -0.12      0.13      0.65   833 1.00
z[1,2]           -0.12    0.01 0.40     -0.88     -0.39     -0.12      0.15      0.63   804 1.00
z[1,3]            0.54    0.01 0.41     -0.25      0.26      0.53      0.80      1.38   808 1.00
z[1,4]            0.54    0.02 0.88     -1.17     -0.05      0.52      1.11      2.34  2100 1.00
z[1,5]            1.77    0.03 0.59      0.66      1.37      1.76      2.17      2.94   496 1.00
z[1,6]           -0.04    0.02 0.40     -0.81     -0.32     -0.05      0.22      0.77   623 1.00
z[1,7]           -1.75    0.02 0.56     -2.87     -2.13     -1.73     -1.37     -0.71   766 1.01
z[1,8]           -0.63    0.01 0.43     -1.47     -0.91     -0.63     -0.33      0.20   814 1.00
z[2,1]            0.29    0.01 0.41     -0.53      0.02      0.30      0.57      1.10  1030 1.00
z[2,2]            0.44    0.01 0.57     -0.64      0.06      0.43      0.82      1.59  1750 1.00
z[2,3]           -0.35    0.01 0.44     -1.24     -0.63     -0.35     -0.06      0.48  1147 1.00
z[2,4]           -0.49    0.02 0.73     -1.96     -0.96     -0.48      0.02      0.89  2100 1.00
z[2,5]           -0.97    0.02 0.72     -2.38     -1.45     -0.95     -0.50      0.43  1244 1.00
z[2,6]            0.47    0.01 0.41     -0.33      0.20      0.47      0.74      1.25  1079 1.00
z[2,7]           -0.61    0.02 0.68     -1.99     -1.08     -0.58     -0.14      0.68  1342 1.00
z[2,8]            1.08    0.02 0.56      0.05      0.70      1.07      1.45      2.23  1095 1.00
z[3,1]            0.48    0.01 0.50     -0.46      0.15      0.46      0.83      1.45  1174 1.00
z[3,2]            0.94    0.02 0.76     -0.57      0.41      0.94      1.47      2.45  1788 1.00
z[3,3]           -0.01    0.01 0.47     -0.96     -0.32      0.00      0.33      0.87  1262 1.00
z[3,4]           -0.09    0.02 0.89     -1.82     -0.71     -0.08      0.50      1.61  2100 1.00
z[3,5]           -0.84    0.02 0.76     -2.35     -1.33     -0.83     -0.35      0.65  1519 1.00
z[3,6]            0.25    0.01 0.48     -0.69     -0.07      0.26      0.58      1.18  1208 1.00
z[3,7]            0.46    0.02 0.72     -1.03     -0.03      0.48      0.95      1.89  1705 1.00
z[3,8]           -1.10    0.02 0.69     -2.57     -1.53     -1.07     -0.64      0.17  1518 1.00
z[4,1]           -0.33    0.01 0.52     -1.41     -0.67     -0.31      0.02      0.61  1299 1.00
z[4,2]            0.10    0.02 0.68     -1.35     -0.32      0.10      0.56      1.36  1347 1.00
z[4,3]           -0.37    0.01 0.49     -1.36     -0.68     -0.37     -0.04      0.56  1552 1.00
z[4,4]           -0.20    0.02 0.70     -1.58     -0.66     -0.22      0.25      1.24  1700 1.00
z[4,5]           -0.43    0.02 0.88     -2.18     -1.02     -0.46      0.16      1.22  1932 1.00
z[4,6]            0.54    0.02 0.56     -0.55      0.17      0.53      0.91      1.66  1332 1.00
z[4,7]            0.79    0.02 0.84     -0.96      0.24      0.79      1.37      2.43  2100 1.00
z[4,8]           -0.15    0.02 0.82     -1.77     -0.70     -0.16      0.39      1.51  1689 1.00
z[5,1]            0.79    0.02 0.56     -0.29      0.42      0.80      1.13      1.96  1285 1.00
z[5,2]           -1.77    0.02 0.81     -3.33     -2.32     -1.75     -1.22     -0.16  1751 1.00
z[5,3]            0.23    0.01 0.46     -0.72     -0.08      0.23      0.54      1.12  1473 1.00
z[5,4]            0.19    0.01 0.67     -1.13     -0.23      0.18      0.64      1.46  2100 1.00
z[5,5]            0.06    0.02 0.76     -1.51     -0.42      0.07      0.56      1.56  2100 1.00
z[5,6]            0.68    0.01 0.55     -0.42      0.33      0.67      1.03      1.78  1586 1.00
z[5,7]            0.25    0.02 0.76     -1.35     -0.24      0.26      0.73      1.80  1936 1.00
z[5,8]           -0.40    0.02 0.76     -1.86     -0.88     -0.40      0.09      1.08  1999 1.00
L_Omega[1,1]      1.00    0.00 0.00      1.00      1.00      1.00      1.00      1.00  2100  NaN
L_Omega[1,2]      0.00    0.00 0.00      0.00      0.00      0.00      0.00      0.00  2100  NaN
L_Omega[1,3]      0.00    0.00 0.00      0.00      0.00      0.00      0.00      0.00  2100  NaN
L_Omega[1,4]      0.00    0.00 0.00      0.00      0.00      0.00      0.00      0.00  2100  NaN
L_Omega[1,5]      0.00    0.00 0.00      0.00      0.00      0.00      0.00      0.00  2100  NaN
L_Omega[2,1]     -0.26    0.01 0.26     -0.70     -0.45     -0.28     -0.09      0.29  1197 1.00
L_Omega[2,2]      0.93    0.00 0.08      0.71      0.89      0.96      0.99      1.00  1633 1.00
L_Omega[2,3]      0.00    0.00 0.00      0.00      0.00      0.00      0.00      0.00  2100  NaN
L_Omega[2,4]      0.00    0.00 0.00      0.00      0.00      0.00      0.00      0.00  2100  NaN
L_Omega[2,5]      0.00    0.00 0.00      0.00      0.00      0.00      0.00      0.00  2100  NaN
L_Omega[3,1]     -0.18    0.01 0.27     -0.66     -0.38     -0.20      0.01      0.37  1914 1.00
L_Omega[3,2]     -0.04    0.01 0.29     -0.57     -0.24     -0.05      0.16      0.53  2100 1.00
L_Omega[3,3]      0.89    0.00 0.09      0.66      0.85      0.92      0.96      1.00  1176 1.00
L_Omega[3,4]      0.00    0.00 0.00      0.00      0.00      0.00      0.00      0.00  2100  NaN
L_Omega[3,5]      0.00    0.00 0.00      0.00      0.00      0.00      0.00      0.00  2100  NaN
L_Omega[4,1]     -0.69    0.01 0.22     -0.95     -0.86     -0.74     -0.59     -0.14  1103 1.00
L_Omega[4,2]      0.04    0.01 0.25     -0.46     -0.13      0.03      0.20      0.53  1550 1.00
L_Omega[4,3]      0.18    0.01 0.23     -0.32      0.04      0.19      0.33      0.64  1405 1.00
L_Omega[4,4]      0.54    0.01 0.18      0.22      0.40      0.52      0.67      0.90  1053 1.00
L_Omega[4,5]      0.00    0.00 0.00      0.00      0.00      0.00      0.00      0.00  2100  NaN
L_Omega[5,1]      0.04    0.01 0.25     -0.47     -0.15      0.04      0.22      0.52  1925 1.00
L_Omega[5,2]     -0.18    0.01 0.30     -0.70     -0.40     -0.21      0.03      0.41  1675 1.00
L_Omega[5,3]     -0.10    0.01 0.30     -0.64     -0.31     -0.12      0.11      0.49  1530 1.00
L_Omega[5,4]     -0.01    0.01 0.32     -0.62     -0.25     -0.01      0.23      0.57  1273 1.00
L_Omega[5,5]      0.77    0.00 0.13      0.47      0.69      0.79      0.87      0.96  1336 1.00
tau_unif[1]       0.56    0.00 0.12      0.37      0.48      0.55      0.63      0.83   858 1.01
tau_unif[2]       0.06    0.00 0.03      0.03      0.05      0.06      0.07      0.13  1039 1.00
tau_unif[3]       0.07    0.00 0.03      0.03      0.05      0.06      0.08      0.15   907 1.00
tau_unif[4]       0.24    0.00 0.06      0.15      0.19      0.23      0.27      0.40   815 1.00
tau_unif[5]       0.06    0.00 0.02      0.03      0.04      0.05      0.07      0.12  1251 1.00
gamma[1,1]        3.95    0.02 0.61      2.73      3.57      3.95      4.32      5.23   610 1.00
gamma[1,2]        0.18    0.00 0.06      0.06      0.15      0.18      0.22      0.31   905 1.00
gamma[1,3]        0.33    0.00 0.07      0.19      0.29      0.33      0.37      0.49   958 1.00
gamma[1,4]       -0.85    0.01 0.24     -1.35     -0.99     -0.85     -0.69     -0.37   833 1.00
gamma[1,5]        1.25    0.00 0.06      1.14      1.22      1.25      1.29      1.36  1119 1.00
sigma             1.07    0.00 0.00      1.06      1.06      1.07      1.07      1.08  2100 1.00
beta[1,1]         3.76    0.01 0.25      3.27      3.59      3.76      3.92      4.25  2100 1.00
beta[1,2]         0.23    0.00 0.02      0.19      0.22      0.23      0.24      0.26  2100 1.00
beta[1,3]         0.40    0.00 0.02      0.35      0.38      0.39      0.41      0.44  2100 1.00
beta[1,4]        -0.83    0.00 0.10     -1.03     -0.90     -0.83     -0.77     -0.64  2100 1.00
beta[1,5]         1.32    0.00 0.01      1.30      1.31      1.32      1.33      1.34  2100 1.00
beta[2,1]         3.77    0.00 0.23      3.33      3.62      3.77      3.92      4.20  2100 1.00
beta[2,2]         0.25    0.00 0.06      0.13      0.21      0.25      0.29      0.37  2100 1.00
beta[2,3]         0.47    0.00 0.11      0.27      0.40      0.47      0.54      0.69  1689 1.00
beta[2,4]        -0.67    0.00 0.08     -0.83     -0.72     -0.67     -0.62     -0.51  2100 1.00
beta[2,5]         1.02    0.00 0.02      0.98      1.01      1.02      1.04      1.06  1747 1.00
beta[3,1]         4.76    0.00 0.17      4.42      4.64      4.76      4.87      5.07  2100 1.00
beta[3,2]         0.12    0.00 0.02      0.09      0.11      0.12      0.13      0.15  2100 1.00
beta[3,3]         0.31    0.00 0.02      0.27      0.30      0.31      0.33      0.35  2010 1.00
beta[3,4]        -1.20    0.00 0.06     -1.32     -1.24     -1.20     -1.16     -1.08  2100 1.00
beta[3,5]         1.29    0.00 0.01      1.27      1.29      1.29      1.30      1.31  2100 1.00
beta[4,1]         4.83    0.03 1.52      2.07      3.84      4.74      5.74      8.11  2100 1.00
beta[4,2]         0.09    0.00 0.10     -0.12      0.02      0.09      0.16      0.28  2100 1.00
beta[4,3]         0.31    0.00 0.17     -0.05      0.21      0.30      0.40      0.64  2100 1.00
beta[4,4]        -1.20    0.01 0.47     -2.23     -1.48     -1.18     -0.89     -0.35  2100 1.00
beta[4,5]         1.30    0.00 0.04      1.21      1.27      1.30      1.32      1.37  2100 1.00
beta[5,1]         6.63    0.01 0.28      6.07      6.43      6.64      6.81      7.17  2100 1.00
beta[5,2]        -0.01    0.00 0.02     -0.05     -0.03     -0.01      0.00      0.02  2100 1.00
beta[5,3]         0.17    0.00 0.03      0.11      0.15      0.17      0.19      0.22  2100 1.00
beta[5,4]        -1.88    0.00 0.09     -2.05     -1.94     -1.88     -1.81     -1.70  2100 1.00

However, I dont undestand how is possible to have NAN for some parameters and why n_eff is 2100 for all betas. Here is the code I ran:

data {
int<lower=0> N; // num individuals
int<lower=1> K; // num ind predictors
int<lower=1> J; // num groups
// int<lower=1> L; // num group predictors
int<lower=1,upper=J> jj[N]; // group for individual
matrix[N, K] x; // individual predictors
 matrix[8,1] u; // group predictors
vector[N] y; // outcomes
}
parameters {
matrix[K, J] z;
cholesky_factor_corr[K] L_Omega;
vector<lower=0,upper=pi()/2>[K] tau_unif;
matrix[1, K] gamma; // group coeffs
real<lower=0> sigma; // prediction error scale
}
transformed parameters {
matrix [J,K] beta;
vector<lower=0>[K] tau; // prior scale
for (k in 1:K) tau[k] = 2.5 * tan(tau_unif[k]);
beta = u * gamma + (diag_pre_multiply(tau,L_Omega) * z)';
// beta = (diag_pre_multiply(tau,L_Omega) * z)';
}
model {
to_vector(z) ~ normal(0, 1);
L_Omega ~ lkj_corr_cholesky(2);
to_vector(gamma) ~ normal(0, 5);
y ~ normal(rows_dot_product(beta[jj] , x), sigma);
}

I ran the code with L=1 because i dont want to have regressors for beta

Bob_Carpenter · June 22, 2018, 10:55pm

Any value that’s a constant winds up getting a NaN n_eff estimate. The upper triangular portion of a Cholesky factor is just constant zeros.

We should probably have flags in there that replace NaN with n/a or * or something to indicate they didn’t change.

onoufrios · June 23, 2018, 6:16am

Ohh I see. Thanks!! 2 more questions:
a) Should it be worrying that all betas have so high n_eff?
b) Could the non-centered parameterization of y improve the speed??(I checked the traceplots and it seems that all the parameters converge really fast ~ after 200 iter so the problem seems to be the slow sampling of the posterior)

Bob_Carpenter · June 23, 2018, 3:43pm

No, high n_eff is a good thing! But what you should be looking at isn’t the high n_eff, which is typical for things like regression coefficients and error terms, but the lower n_eff “parameters” like lp__, which is measuring how well different log density values are mixing. Typically, hierarchical parameters in a hierarchical model mix more slowly.

Even if some parameters mix well, samplers can be slower mixing among log density levels in the posterior (this is related to how the momenta are randomized in HMC, but other samplers suffer the same problem). So we usually want to make sure that has a decent R-hat before we stop sampling.

onoufrios · June 23, 2018, 4:28pm

Thanks Bob! n_eff of lp_ is super small (~300). Also I noticed that there is correlation among some of thetau_unif parameters and also among some of the gamma. Could it be the problem? Capture

andre.pfeuffer · June 24, 2018, 2:47am

This is from post 14. Then, the run of the model was with “group-level regressors”.

These require values for group predictors

What did you specify for them?

onoufrios · June 24, 2018, 4:25am

I ran the code with L=1 so the betas had only intercept. I had normal prior for the gamma.and I set u=1. This is how I prepare the data(when I use u and gamma):


X          <- mydata[,c("Price", "Distribution", "Distribution on Display","Distribution on Leaflet","Distribution Leaflet + Display"]
X          <- cbind(intercept=1,X)
stanDat    <- list()
stanDat$N  <- nrow(mydata)                           # num of observations   
stanDat$K  <- ncol(X)                                 # num of predictors
stanDat$J  <- length(unique(mydata$Brand))                # number of groups
stanDat$jj <- as.numeric(factor(mydata$Brand))         # group for individual
stanDat$x  <- X;                                      # individual predictors
stanDat$y  <- log1p(mydata$`number of units in the package`);                # outcomes
stanDat$u <- as.matrix(rep(1,8))

I think that the problem is caused by the correlations. They have really low number of n_eff. How can I remove them from the model?

onoufrios · June 26, 2018, 2:34pm

Could the non-stanionariry of the time series cause a problem?

Bob_Carpenter · June 27, 2018, 3:44am

Super small is 5. 300 is fine.

Yes, correlation among parameters can cause reduced effective sample size, but you really need to look at the mixing of the energy levels as reflected by lp__ directly. You can look at correlation of parameters with lp__ itself—they all tend to be bell shaped.

The almost banna-shaped one at position (1, 3) in the diagram is the most problematic here, but even that doesn’t look too bad.

Bob_Carpenter · June 27, 2018, 3:47am

When I say 300 is fine, the reason is that effective sample size goes in the denominator of the MCMC central limit theorem under a square root. So with n_eff = 300, the MCMC standard error is the posterior standard deviation divided by the square root of 300, which means we’ve located the posterior mean to within about 1/16th of a posterior standard deviation. Getting to 1/100th would require about n_eff = 10000.

andre.pfeuffer · June 27, 2018, 5:29am

He can remove the gamma’s and include one intercept.

This is 1 * gamma.

onoufrios · June 27, 2018, 6:10am

Thank you! I changed it. No matter how hard I try I cannot improve the speed. I am trying to see if there is anything weird with the data. I have used different transforms (logy-logx) but still its so slow. Sometimes it starts really fast(I get 20% after 1-2 minutes) and then it sticks at the same point for hours

andre.pfeuffer · June 27, 2018, 7:45am

Sure the speed, but does your fit look reasonable? What are your speed expectations, and what
is your current runtime? And what you want to do with your output? Prediction, Interpretation?
Do you expect in the future products not in your model?

I would transform all variables to roughly unit-normal. I know this is not required, but I
like when my variables are not totally different scaled.

qqplot(scale(log(bbb$Number of Units Sold)),rnorm(1000))

onoufrios · June 27, 2018, 7:47am

The fit was ok(some positive price elasticities but in general ok) The problem is that every time the time changes. I was waiting for more than 12 hours and it was still at 20-30%. I am using a small dataset(~ 30000 observations). I just want to do interpretation and predict the sales of product that are already included in my model

Topic		Replies	Views
My first Stan model - hierarchical logistic regression Modeling cognitive-science	28	8417	June 6, 2017
Speeding up a hierarchical nonlinear model Modeling specification , performance	17	1867	February 28, 2018
Case study on spatial models for areal data - Poisson CAR/IAR Modeling	116	11133	July 12, 2018
Request for help understanding model behavior Modeling	17	1570	March 24, 2018
Large Poisson model with Individual effects is too slow Modeling specification , performance	10	1831	November 13, 2017

Slow Stan Code

Related topics