Slow Stan Code

Aftter a lot of effort I managed to get some results :

                  mean se_mean   sd      2.5%       25%       50%       75%     97.5% n_eff Rhat
z[1,1]           -0.12    0.01 0.39     -0.90     -0.39     -0.12      0.13      0.65   833 1.00
z[1,2]           -0.12    0.01 0.40     -0.88     -0.39     -0.12      0.15      0.63   804 1.00
z[1,3]            0.54    0.01 0.41     -0.25      0.26      0.53      0.80      1.38   808 1.00
z[1,4]            0.54    0.02 0.88     -1.17     -0.05      0.52      1.11      2.34  2100 1.00
z[1,5]            1.77    0.03 0.59      0.66      1.37      1.76      2.17      2.94   496 1.00
z[1,6]           -0.04    0.02 0.40     -0.81     -0.32     -0.05      0.22      0.77   623 1.00
z[1,7]           -1.75    0.02 0.56     -2.87     -2.13     -1.73     -1.37     -0.71   766 1.01
z[1,8]           -0.63    0.01 0.43     -1.47     -0.91     -0.63     -0.33      0.20   814 1.00
z[2,1]            0.29    0.01 0.41     -0.53      0.02      0.30      0.57      1.10  1030 1.00
z[2,2]            0.44    0.01 0.57     -0.64      0.06      0.43      0.82      1.59  1750 1.00
z[2,3]           -0.35    0.01 0.44     -1.24     -0.63     -0.35     -0.06      0.48  1147 1.00
z[2,4]           -0.49    0.02 0.73     -1.96     -0.96     -0.48      0.02      0.89  2100 1.00
z[2,5]           -0.97    0.02 0.72     -2.38     -1.45     -0.95     -0.50      0.43  1244 1.00
z[2,6]            0.47    0.01 0.41     -0.33      0.20      0.47      0.74      1.25  1079 1.00
z[2,7]           -0.61    0.02 0.68     -1.99     -1.08     -0.58     -0.14      0.68  1342 1.00
z[2,8]            1.08    0.02 0.56      0.05      0.70      1.07      1.45      2.23  1095 1.00
z[3,1]            0.48    0.01 0.50     -0.46      0.15      0.46      0.83      1.45  1174 1.00
z[3,2]            0.94    0.02 0.76     -0.57      0.41      0.94      1.47      2.45  1788 1.00
z[3,3]           -0.01    0.01 0.47     -0.96     -0.32      0.00      0.33      0.87  1262 1.00
z[3,4]           -0.09    0.02 0.89     -1.82     -0.71     -0.08      0.50      1.61  2100 1.00
z[3,5]           -0.84    0.02 0.76     -2.35     -1.33     -0.83     -0.35      0.65  1519 1.00
z[3,6]            0.25    0.01 0.48     -0.69     -0.07      0.26      0.58      1.18  1208 1.00
z[3,7]            0.46    0.02 0.72     -1.03     -0.03      0.48      0.95      1.89  1705 1.00
z[3,8]           -1.10    0.02 0.69     -2.57     -1.53     -1.07     -0.64      0.17  1518 1.00
z[4,1]           -0.33    0.01 0.52     -1.41     -0.67     -0.31      0.02      0.61  1299 1.00
z[4,2]            0.10    0.02 0.68     -1.35     -0.32      0.10      0.56      1.36  1347 1.00
z[4,3]           -0.37    0.01 0.49     -1.36     -0.68     -0.37     -0.04      0.56  1552 1.00
z[4,4]           -0.20    0.02 0.70     -1.58     -0.66     -0.22      0.25      1.24  1700 1.00
z[4,5]           -0.43    0.02 0.88     -2.18     -1.02     -0.46      0.16      1.22  1932 1.00
z[4,6]            0.54    0.02 0.56     -0.55      0.17      0.53      0.91      1.66  1332 1.00
z[4,7]            0.79    0.02 0.84     -0.96      0.24      0.79      1.37      2.43  2100 1.00
z[4,8]           -0.15    0.02 0.82     -1.77     -0.70     -0.16      0.39      1.51  1689 1.00
z[5,1]            0.79    0.02 0.56     -0.29      0.42      0.80      1.13      1.96  1285 1.00
z[5,2]           -1.77    0.02 0.81     -3.33     -2.32     -1.75     -1.22     -0.16  1751 1.00
z[5,3]            0.23    0.01 0.46     -0.72     -0.08      0.23      0.54      1.12  1473 1.00
z[5,4]            0.19    0.01 0.67     -1.13     -0.23      0.18      0.64      1.46  2100 1.00
z[5,5]            0.06    0.02 0.76     -1.51     -0.42      0.07      0.56      1.56  2100 1.00
z[5,6]            0.68    0.01 0.55     -0.42      0.33      0.67      1.03      1.78  1586 1.00
z[5,7]            0.25    0.02 0.76     -1.35     -0.24      0.26      0.73      1.80  1936 1.00
z[5,8]           -0.40    0.02 0.76     -1.86     -0.88     -0.40      0.09      1.08  1999 1.00
L_Omega[1,1]      1.00    0.00 0.00      1.00      1.00      1.00      1.00      1.00  2100  NaN
L_Omega[1,2]      0.00    0.00 0.00      0.00      0.00      0.00      0.00      0.00  2100  NaN
L_Omega[1,3]      0.00    0.00 0.00      0.00      0.00      0.00      0.00      0.00  2100  NaN
L_Omega[1,4]      0.00    0.00 0.00      0.00      0.00      0.00      0.00      0.00  2100  NaN
L_Omega[1,5]      0.00    0.00 0.00      0.00      0.00      0.00      0.00      0.00  2100  NaN
L_Omega[2,1]     -0.26    0.01 0.26     -0.70     -0.45     -0.28     -0.09      0.29  1197 1.00
L_Omega[2,2]      0.93    0.00 0.08      0.71      0.89      0.96      0.99      1.00  1633 1.00
L_Omega[2,3]      0.00    0.00 0.00      0.00      0.00      0.00      0.00      0.00  2100  NaN
L_Omega[2,4]      0.00    0.00 0.00      0.00      0.00      0.00      0.00      0.00  2100  NaN
L_Omega[2,5]      0.00    0.00 0.00      0.00      0.00      0.00      0.00      0.00  2100  NaN
L_Omega[3,1]     -0.18    0.01 0.27     -0.66     -0.38     -0.20      0.01      0.37  1914 1.00
L_Omega[3,2]     -0.04    0.01 0.29     -0.57     -0.24     -0.05      0.16      0.53  2100 1.00
L_Omega[3,3]      0.89    0.00 0.09      0.66      0.85      0.92      0.96      1.00  1176 1.00
L_Omega[3,4]      0.00    0.00 0.00      0.00      0.00      0.00      0.00      0.00  2100  NaN
L_Omega[3,5]      0.00    0.00 0.00      0.00      0.00      0.00      0.00      0.00  2100  NaN
L_Omega[4,1]     -0.69    0.01 0.22     -0.95     -0.86     -0.74     -0.59     -0.14  1103 1.00
L_Omega[4,2]      0.04    0.01 0.25     -0.46     -0.13      0.03      0.20      0.53  1550 1.00
L_Omega[4,3]      0.18    0.01 0.23     -0.32      0.04      0.19      0.33      0.64  1405 1.00
L_Omega[4,4]      0.54    0.01 0.18      0.22      0.40      0.52      0.67      0.90  1053 1.00
L_Omega[4,5]      0.00    0.00 0.00      0.00      0.00      0.00      0.00      0.00  2100  NaN
L_Omega[5,1]      0.04    0.01 0.25     -0.47     -0.15      0.04      0.22      0.52  1925 1.00
L_Omega[5,2]     -0.18    0.01 0.30     -0.70     -0.40     -0.21      0.03      0.41  1675 1.00
L_Omega[5,3]     -0.10    0.01 0.30     -0.64     -0.31     -0.12      0.11      0.49  1530 1.00
L_Omega[5,4]     -0.01    0.01 0.32     -0.62     -0.25     -0.01      0.23      0.57  1273 1.00
L_Omega[5,5]      0.77    0.00 0.13      0.47      0.69      0.79      0.87      0.96  1336 1.00
tau_unif[1]       0.56    0.00 0.12      0.37      0.48      0.55      0.63      0.83   858 1.01
tau_unif[2]       0.06    0.00 0.03      0.03      0.05      0.06      0.07      0.13  1039 1.00
tau_unif[3]       0.07    0.00 0.03      0.03      0.05      0.06      0.08      0.15   907 1.00
tau_unif[4]       0.24    0.00 0.06      0.15      0.19      0.23      0.27      0.40   815 1.00
tau_unif[5]       0.06    0.00 0.02      0.03      0.04      0.05      0.07      0.12  1251 1.00
gamma[1,1]        3.95    0.02 0.61      2.73      3.57      3.95      4.32      5.23   610 1.00
gamma[1,2]        0.18    0.00 0.06      0.06      0.15      0.18      0.22      0.31   905 1.00
gamma[1,3]        0.33    0.00 0.07      0.19      0.29      0.33      0.37      0.49   958 1.00
gamma[1,4]       -0.85    0.01 0.24     -1.35     -0.99     -0.85     -0.69     -0.37   833 1.00
gamma[1,5]        1.25    0.00 0.06      1.14      1.22      1.25      1.29      1.36  1119 1.00
sigma             1.07    0.00 0.00      1.06      1.06      1.07      1.07      1.08  2100 1.00
beta[1,1]         3.76    0.01 0.25      3.27      3.59      3.76      3.92      4.25  2100 1.00
beta[1,2]         0.23    0.00 0.02      0.19      0.22      0.23      0.24      0.26  2100 1.00
beta[1,3]         0.40    0.00 0.02      0.35      0.38      0.39      0.41      0.44  2100 1.00
beta[1,4]        -0.83    0.00 0.10     -1.03     -0.90     -0.83     -0.77     -0.64  2100 1.00
beta[1,5]         1.32    0.00 0.01      1.30      1.31      1.32      1.33      1.34  2100 1.00
beta[2,1]         3.77    0.00 0.23      3.33      3.62      3.77      3.92      4.20  2100 1.00
beta[2,2]         0.25    0.00 0.06      0.13      0.21      0.25      0.29      0.37  2100 1.00
beta[2,3]         0.47    0.00 0.11      0.27      0.40      0.47      0.54      0.69  1689 1.00
beta[2,4]        -0.67    0.00 0.08     -0.83     -0.72     -0.67     -0.62     -0.51  2100 1.00
beta[2,5]         1.02    0.00 0.02      0.98      1.01      1.02      1.04      1.06  1747 1.00
beta[3,1]         4.76    0.00 0.17      4.42      4.64      4.76      4.87      5.07  2100 1.00
beta[3,2]         0.12    0.00 0.02      0.09      0.11      0.12      0.13      0.15  2100 1.00
beta[3,3]         0.31    0.00 0.02      0.27      0.30      0.31      0.33      0.35  2010 1.00
beta[3,4]        -1.20    0.00 0.06     -1.32     -1.24     -1.20     -1.16     -1.08  2100 1.00
beta[3,5]         1.29    0.00 0.01      1.27      1.29      1.29      1.30      1.31  2100 1.00
beta[4,1]         4.83    0.03 1.52      2.07      3.84      4.74      5.74      8.11  2100 1.00
beta[4,2]         0.09    0.00 0.10     -0.12      0.02      0.09      0.16      0.28  2100 1.00
beta[4,3]         0.31    0.00 0.17     -0.05      0.21      0.30      0.40      0.64  2100 1.00
beta[4,4]        -1.20    0.01 0.47     -2.23     -1.48     -1.18     -0.89     -0.35  2100 1.00
beta[4,5]         1.30    0.00 0.04      1.21      1.27      1.30      1.32      1.37  2100 1.00
beta[5,1]         6.63    0.01 0.28      6.07      6.43      6.64      6.81      7.17  2100 1.00
beta[5,2]        -0.01    0.00 0.02     -0.05     -0.03     -0.01      0.00      0.02  2100 1.00
beta[5,3]         0.17    0.00 0.03      0.11      0.15      0.17      0.19      0.22  2100 1.00
beta[5,4]        -1.88    0.00 0.09     -2.05     -1.94     -1.88     -1.81     -1.70  2100 1.00	

However, I dont undestand how is possible to have NAN for some parameters and why n_eff is 2100 for all betas. Here is the code I ran:

data {
int<lower=0> N; // num individuals
int<lower=1> K; // num ind predictors
int<lower=1> J; // num groups
// int<lower=1> L; // num group predictors
int<lower=1,upper=J> jj[N]; // group for individual
matrix[N, K] x; // individual predictors
 matrix[8,1] u; // group predictors
vector[N] y; // outcomes
}
parameters {
matrix[K, J] z;
cholesky_factor_corr[K] L_Omega;
vector<lower=0,upper=pi()/2>[K] tau_unif;
matrix[1, K] gamma; // group coeffs
real<lower=0> sigma; // prediction error scale
}
transformed parameters {
matrix [J,K] beta;
vector<lower=0>[K] tau; // prior scale
for (k in 1:K) tau[k] = 2.5 * tan(tau_unif[k]);
beta = u * gamma + (diag_pre_multiply(tau,L_Omega) * z)';
// beta = (diag_pre_multiply(tau,L_Omega) * z)';
}
model {
to_vector(z) ~ normal(0, 1);
L_Omega ~ lkj_corr_cholesky(2);
to_vector(gamma) ~ normal(0, 5);
y ~ normal(rows_dot_product(beta[jj] , x), sigma);
}

I ran the code with L=1 because i dont want to have regressors for beta

Any value that’s a constant winds up getting a NaN n_eff estimate. The upper triangular portion of a Cholesky factor is just constant zeros.

We should probably have flags in there that replace NaN with n/a or * or something to indicate they didn’t change.

Ohh I see. Thanks!! 2 more questions:
a) Should it be worrying that all betas have so high n_eff?
b) Could the non-centered parameterization of y improve the speed??(I checked the traceplots and it seems that all the parameters converge really fast ~ after 200 iter so the problem seems to be the slow sampling of the posterior)

No, high n_eff is a good thing! But what you should be looking at isn’t the high n_eff, which is typical for things like regression coefficients and error terms, but the lower n_eff “parameters” like lp__, which is measuring how well different log density values are mixing. Typically, hierarchical parameters in a hierarchical model mix more slowly.

Even if some parameters mix well, samplers can be slower mixing among log density levels in the posterior (this is related to how the momenta are randomized in HMC, but other samplers suffer the same problem). So we usually want to make sure that has a decent R-hat before we stop sampling.

Thanks Bob! n_eff of lp_ is super small (~300). Also I noticed that there is correlation among some of thetau_unif parameters and also among some of the gamma. Could it be the problem?Capture

This is from post 14. Then, the run of the model was with “group-level regressors”.

These require values for group predictors

What did you specify for them?

I ran the code with L=1 so the betas had only intercept. I had normal prior for the gamma.and I set u=1. This is how I prepare the data(when I use u and gamma):


X          <- mydata[,c("Price", "Distribution", "Distribution on Display","Distribution on Leaflet","Distribution Leaflet + Display"]
X          <- cbind(intercept=1,X)
stanDat    <- list()
stanDat$N  <- nrow(mydata)                           # num of observations   
stanDat$K  <- ncol(X)                                 # num of predictors
stanDat$J  <- length(unique(mydata$Brand))                # number of groups
stanDat$jj <- as.numeric(factor(mydata$Brand))         # group for individual
stanDat$x  <- X;                                      # individual predictors
stanDat$y  <- log1p(mydata$`number of units in the package`);                # outcomes
stanDat$u <- as.matrix(rep(1,8))

I think that the problem is caused by the correlations. They have really low number of n_eff. How can I remove them from the model?

Could the non-stanionariry of the time series cause a problem?

Super small is 5. 300 is fine.

Yes, correlation among parameters can cause reduced effective sample size, but you really need to look at the mixing of the energy levels as reflected by lp__ directly. You can look at correlation of parameters with lp__ itself—they all tend to be bell shaped.

The almost banna-shaped one at position (1, 3) in the diagram is the most problematic here, but even that doesn’t look too bad.

When I say 300 is fine, the reason is that effective sample size goes in the denominator of the MCMC central limit theorem under a square root. So with n_eff = 300, the MCMC standard error is the posterior standard deviation divided by the square root of 300, which means we’ve located the posterior mean to within about 1/16th of a posterior standard deviation. Getting to 1/100th would require about n_eff = 10000.

He can remove the gamma’s and include one intercept.

This is 1 * gamma.

Thank you! I changed it. No matter how hard I try I cannot improve the speed. I am trying to see if there is anything weird with the data. I have used different transforms (logy-logx) but still its so slow. Sometimes it starts really fast(I get 20% after 1-2 minutes) and then it sticks at the same point for hours

Sure the speed, but does your fit look reasonable? What are your speed expectations, and what
is your current runtime? And what you want to do with your output? Prediction, Interpretation?
Do you expect in the future products not in your model?

I would transform all variables to roughly unit-normal. I know this is not required, but I
like when my variables are not totally different scaled.

qqplot(scale(log(bbb$Number of Units Sold)),rnorm(1000))

The fit was ok(some positive price elasticities but in general ok) The problem is that every time the time changes. I was waiting for more than 12 hours and it was still at 20-30%. I am using a small dataset(~ 30000 observations). I just want to do interpretation and predict the sales of product that are already included in my model