Hi there,
Trying to get up and going with modeling in stan, and the first step I’m taking is comparing the results of sklearn’s linear regression with stan.
To keep it simple, I’m using one predictor variable, and one response variable (x, y). I’ve separated the dataset (length 26304) into two subsets, where x and y have length 17544, and x_new has length 8760.
Stan code:
data {
int<lower=0> N; // number of observations
vector[N] x; // predictor
vector[N] y; // outcome
int<lower=0> N_new;
vector[N_new] x_new;
}
parameters {
real alpha; // intercept
real beta; // slope
real<lower=0> sigma; // assumes noise is normally distributed
}
model {
y ~ normal(alpha + beta * x, sigma);
}
generated quantities {
vector[N_new] y_new;
for (n in 1:N_new)
y_new[n] = normal_rng(x_new[n] * beta, sigma);
}
After compiling, in the fit step I get this error:
WARNING:pystan:Maximum (flat) parameter count (1000) exceeded: skipping diagnostic tests for n_eff and Rhat.
To run all diagnostics call pystan.check_hmc_diagnostics(fit)
I have default parameters (in pystan) for compiling – 4 chains, 1000 draws per chain.
What do I do with this warning? What does it mean?
@betanalpha ?
this is a PyStan error message - could you try this using CmdStanPy?
https://cmdstanpy.readthedocs.io/en/latest/ -https://cmdstanpy.readthedocs.io/en/latest/getting_started.html#installation
Note for PyStan users: PyStan and CmdStanPy should be installed in separate environments. If you already have PyStan installed, you should take care to install CmdStanPy in its own virtual environment.
also, put some priors on alpha and beta -
alpha ~ normal(0,1)
beta ~ normal(0,1)
secondly, in the 2nd to last line of the file -
y_new[n] = normal_rng(x_new[n] * beta, sigma);
you omit the intercept - is this intentional? or did you mean
y_new[n] = normal_rng(alpha + x_new[n] * beta, sigma);
There is no error message. It is just a warning that some automatic diagnostics are skipped. (They might take a long time for large number of parameters).
Edit. We do automatically diagnostics for the sample, but usually users might want to save the fit before doing diagnostics which can take a long time (ess / rhat for 100k for example).
You can run all diagnostics manually as pystan.check_hmc_diagnostics(fit)
2 Likes
omitting the intercept – not intentional.
I’ve added the priors and the intercept.
I think there’s something basic I’m not understanding – I want to predict 8760 new values. Is this the “parameter count” which is exceeding 1000?
Is this the correct way to generate predictions (quantities)?
As @ahartikainen notes that warning only indicates that your have more than 1000 variables in your Stan output (“parameter count” in the warning refers to all variables saved, including those in the parameters
block, transformed parameters
block, and generated quantities
block) and so it’s not going to run diagnostics on all of them. All 8760 predictive variables defined in the generated quantities
block will still be in the fit and available for your analysis.
1 Like
Thank you both for clarifying this! I appreciate it.