Getting an error I'm not Familiar with Trying to plot the ppc

I’m getting two errors when trying to use arviz to plot_ppc. I want to see the simulations/mean/observed for this model. The mode compiles and samples with no issue. The model is a basic logistic regression…which I have not done in STAN…or any Bayesian framework for that matter.

This is the code I’m running for plotting the ppc.

fit = my_model.sample(data=gh_dict, iter_warmup=200, iter_sampling =1000,
                                  show_progress=True,
                                  output_dir = "D:\\Stan")

inf_data = az.convert_to_inference_data(fit)

ppc = az.plot_ppc(inf_data, data_pairs={"y_rep":"issued_flag"})
fig2 = ppc.figure
fig2.savefig(r"C:\Users\JORDAN.HOWELL.GITDIR\PycharmProjects\pythonProject\gooseheadquotes\output\ppc.jpg",
             dpi=200, bbox_inches="tight")

Error/Warning 1

UserWarning: rcParams['plot.max_subplots'] (40) is smaller than the number of variables to plot (18179) in plot_posterior, generating only 40 plots
  warnings.warn(

Error/Warning 2

UserWarning: Your data appears to have a single value or no finite values
  warnings.warn("Your data appears to have a single value or no finite values")

Here is the model I’m running.

data {
    int<lower=0> N; // number policy
    int<lower=0, upper=1>  issued_flag[N]; // conversion numbers
    vector[N] n_lmt1;// normalized limit 1
    vector[N] n_pp_lmt; //normalized personal property limit
    vector[N] n_age;//normalized age of insured
    vector[N] n_aoh;//normalized age of home

}

parameters {
    real<lower=0> mu;
    real limit1_beta;
    real pplimit_beta;
    real age_beta;
    real aoh_beta;
}

model {
       mu ~ normal(0,3);
       limit1_beta ~ normal(0,5);
       pplimit_beta ~ normal(0,5);
       age_beta ~ normal(0,5);
       aoh_beta ~ normal(0,5);
       issued_flag~ bernoulli_logit(mu + n_lmt1*limit1_beta + n_pp_lmt*pplimit_beta + n_age*age_beta + n_aoh*aoh_beta);
}

generated quantities {
      vector[N] eta = mu + n_lmt1*limit1_beta + n_pp_lmt*pplimit_beta + n_age*age_beta + n_aoh*aoh_beta;
  int y_rep[N];
  if (max(eta) > 20) {
    // avoid overflow in poisson_log_rng
    print("max eta too big: ", max(eta));
    for (n in 1:N)
      y_rep[n] = -1;
  } else {
      for (n in 1:N)
        y_rep[n] = bernoulli_rng(eta[n]);
  }
}

If you need to use math formula, use Latex syntax:

Y \sim N(\mu, \sigma)

Don’t forget to attach tags (top right of this form) for application area/class of models or other general subject areas your topic touches on.

I think you need to use az.from_xyz interface to move ppc variables to the correct group.

Do you mean in lieu of “az.convert_to_inference_data(fit)” I use az.from_cmdstanpy(fit)?

az.from_cmdstanpy(fit, posterior_predictive=["y_rep"])

See

https://arviz-devs.github.io/arviz/api/generated/arviz.from_cmdstanpy.html

and

https://arviz-devs.github.io/arviz/getting_started/CreatingInferenceData.html#from-cmdstanpy

So i tried this:

inf_data = az.from_cmdstanpy(posterior=fit,
                             posterior_predictive=["y_rep"],
                             observed_data={"y":gh_dict['issued_flag']})

I get the error:

KeyError: 'var names: "[\'y\'] are not present" in dataset'

I thought I had to choose a key name for the observed data but apparently that’s not right. What should go there? I also tried using ‘issued_flag’ which is the key name for the observed in the dataset and I got the same error.

I think this error comes from ppc = az.plot_ppc(inf_data, data_pairs={"y_rep":"issued_flag"})

There data_pairs should change the key and values

ppc = az.plot_ppc(inf_data, data_pairs={"issued_flag": "y_rep"})

("issued_flag" ← change to observed data name)