Generating quantities (beginner)

Hi there,

Trying to get up and going with modeling in stan, and the first step I’m taking is comparing the results of sklearn’s linear regression with stan.

To keep it simple, I’m using one predictor variable, and one response variable (x, y). I’ve separated the dataset (length 26304) into two subsets, where x and y have length 17544, and x_new has length 8760.

Stan code:

data {
  int<lower=0> N;  // number of observations
  vector[N] x;  // predictor
  vector[N] y;  // outcome
  int<lower=0> N_new;
  vector[N_new] x_new;
}
parameters {
  real alpha;  // intercept
  real beta;   // slope
  real<lower=0> sigma; // assumes noise is normally distributed
}
model {
  y ~ normal(alpha + beta * x, sigma);
}
generated quantities {
  vector[N_new] y_new;
  for (n in 1:N_new)
    y_new[n] = normal_rng(x_new[n] * beta, sigma);
}

After compiling, in the fit step I get this error:
WARNING:pystan:Maximum (flat) parameter count (1000) exceeded: skipping diagnostic tests for n_eff and Rhat.
To run all diagnostics call pystan.check_hmc_diagnostics(fit)

I have default parameters (in pystan) for compiling – 4 chains, 1000 draws per chain.

What do I do with this warning? What does it mean?

@betanalpha ?

this is a PyStan error message - could you try this using CmdStanPy?
https://cmdstanpy.readthedocs.io/en/latest/ -https://cmdstanpy.readthedocs.io/en/latest/getting_started.html#installation

Note for PyStan users: PyStan and CmdStanPy should be installed in separate environments. If you already have PyStan installed, you should take care to install CmdStanPy in its own virtual environment.

also, put some priors on alpha and beta -

alpha ~ normal(0,1)
beta ~ normal(0,1)

secondly, in the 2nd to last line of the file -

y_new[n] = normal_rng(x_new[n] * beta, sigma);

you omit the intercept - is this intentional? or did you mean

y_new[n] = normal_rng(alpha + x_new[n] * beta, sigma);

There is no error message. It is just a warning that some automatic diagnostics are skipped. (They might take a long time for large number of parameters).

Edit. We do automatically diagnostics for the sample, but usually users might want to save the fit before doing diagnostics which can take a long time (ess / rhat for 100k for example).

You can run all diagnostics manually as pystan.check_hmc_diagnostics(fit)

2 Likes

omitting the intercept – not intentional.
I’ve added the priors and the intercept.
I think there’s something basic I’m not understanding – I want to predict 8760 new values. Is this the “parameter count” which is exceeding 1000?
Is this the correct way to generate predictions (quantities)?

As @ahartikainen notes that warning only indicates that your have more than 1000 variables in your Stan output (“parameter count” in the warning refers to all variables saved, including those in the parameters block, transformed parameters block, and generated quantities block) and so it’s not going to run diagnostics on all of them. All 8760 predictive variables defined in the generated quantities block will still be in the fit and available for your analysis.

1 Like

yes.

Thank you both for clarifying this! I appreciate it.