I’m taking part in the Learn Bayesian Data Analysis and Stan with Jonah Gabry course.
Wanted to duplicate the examples given in CmdStanR & bayesplot using CmdStanPy & ArviZ but running into an issue with the PPC plotting.
Using the same Stan file and the same data in both R & Python. (The provided .RDS file is read into Python using the rpy2 package)
In R
y_rep <- fit_simple_pois$draws("y_rep", format = "matrix")
ppc_dens_overlay(y = standata_simple$complaints, yrep = y_rep[1:200,])
results in
while in Python
az_data = az.from_cmdstanpy(
posterior=fit_simple_pois,
posterior_predictive='y_rep',
observed_data={'complaints': pest_data.complaints},
)
az.plot_ppc(az_data, data_pairs={'complaints': 'y_rep'}
results in
Very confused by the shape of the ArviZ plot.
Any suggestions?
The Stan code is
data {
int<lower=1> N;
vector<lower=0>[N] traps; // vector implies real numbers
array[N] int<lower=0> complaints;
}
parameters {
real alpha;
real beta;
}
model {
complaints ~ poisson_log(alpha + beta * traps);
alpha ~ normal(log(7), 1);
beta ~ normal(-0.25, 0.5);
}
generated quantities {
array[N] int y_rep = poisson_log_rng(alpha + beta * traps);
}
The data in R
building_id date traps floors sq_footage_p_floor live_in_super monthly_average_rent average_tenant_age age_of_building total_sq_foot month complaints log_sq_foot_1e4
1 37 2017-01-15 8 8 5149.008 0 3846.949 53.87742 47 41192.06 1 1 1.415661
2 37 2017-02-14 8 8 5149.008 0 3846.949 53.87742 47 41192.06 2 3 1.415661
3 37 2017-03-16 9 8 5149.008 0 3846.949 53.87742 47 41192.06 3 0 1.415661
4 37 2017-04-15 10 8 5149.008 0 3846.949 53.87742 47 41192.06 4 1 1.415661
5 37 2017-05-15 11 8 5149.008 0 3846.949 53.87742 47 41192.06 5 0 1.415661
6 37 2017-06-14 11 8 5149.008 0 3846.949 53.87742 47 41192.06 6 0 1.415661
and in a Pandas DF in Python
building_id date traps floors sq_footage_p_floor live_in_super monthly_average_rent average_tenant_age age_of_building total_sq_foot month complaints log_sq_foot_1e4
1 37 17181.00 8.00 8.00 5149.01 0.00 3846.95 53.88 47.00 41192.06 1.00 1.00 1.42
2 37 17211.00 8.00 8.00 5149.01 0.00 3846.95 53.88 47.00 41192.06 2.00 3.00 1.42
3 37 17241.00 9.00 8.00 5149.01 0.00 3846.95 53.88 47.00 41192.06 3.00 0.00 1.42
4 37 17271.00 10.00 8.00 5149.01 0.00 3846.95 53.88 47.00 41192.06 4.00 1.00 1.42
5 37 17301.00 11.00 8.00 5149.01 0.00 3846.95 53.88 47.00 41192.06 5.00 0.00 1.42
CmdStanR v0.4.0
bayesplot v1.8.1
CmdStanPy v0.9.76
ArViz v0.11.2