Plotting LLH, exponential modelling and best practices

Hello,

I’m fairly new. I’ve set myself this task as an exercise. I’ve been trying to fit a set of data I have with an exponential and then find the fitted parameter by finding the maximum likelihood (which I’d also like to plot).

  • I assume stan might have this feature? (I’ve been trying to use the ‘lp__’ parameter)
  • However, my first problem: I can’t get the fit working without severely narrowing the range on the model parameter (I’ve looked into using the ‘target’ variable within my model, but unsure about this)
  • Is adding a constant the best way to deal with zeros in stan?

According to other fitting methods, the best fit for the exponential parameter should be 0.0096. My fit is essentially hitting its upper limit.
the data is taken from an MCA (so its essentially a binned histogram)

Any help/comments/advice would be appreciated!

m_code = """
data {
  int<lower=0> N;
  vector[N] y;
  vector[N] x;
}
parameters {
  real<lower=0, upper=0.1> beta;
  real sigma;
}
transformed parameters {
  vector[N] mu = (beta)*exp(-x*beta);
}
model {
  //vector[N] lh;

  sigma ~ uniform(0, 0.1); //or,
  //sigma ~ normal(0.1, 0.005);
  y ~ normal(mu, sigma);

  //alternative:
  //beta ~ gamma(2, 1./0.01); //or,
  //beta ~ normal(0.01, 0.0);
  //y ~ exponential(beta);

  //for (i in 1:N) {
  //  lh[i] = -log(mu[i])*y[i]; //?
  //}
  //target += sum(lh); //?
}
generated quantities{
  vector[N] log_lik;
  vector[N] y_hat;
  real log_lik_sum;

  // compute likelihood to plot later
  for (i in 1:N) {
    y_hat[i] = mu[i];
    log_lik[i] = -log(y_hat[i])*y[i];
  }
  log_lik_sum = sum(log_lik);
}
"""
import pystan
import matplotlib.pyplot as plt
import numpy as np
import arviz as az
%matplotlib inline

data = np.array([
0,6,11,0,1,1076,2142,3324,5786,10408,15185,15592,11576,7722,4812,2549,1385,703,333,210,121,
78,73,59,56,47,37,39,26,38,47,39,30,27,46,38,26,31,36,31,29,43,33,35,33,34,33,32,28,29,29,
30,26,34,33,28,29,34,25,26,31,32,35,15,29,19,21,23,20,31,23,28,18,32,31,30,31,14,29,18,19,
19,23,17,24,24,25,31,28,22,25,17,17,21,16,28,21,28,19,16,22,25,24,11,18,20,19,11,13,13,25,
18,21,16,20,13,16,12,11,13,13,17,19,17,17,10,8,12,11,14,18,12,17,16,11,20,18,10,10,12,15,17,
11,17,9,11,13,10,15,17,16,15,17,12,12,10,12,14,16,5,10,10,13,9,12,10,12,10,7,13,11,8,10,16,
7,5,12,11,7,9,11,15,7,9,12,8,10,10,7,3,10,6,10,5,4,9,6,6,8,8,7,5,13,2,8,7,6,7,11,4,6,6,7,12,
7,7,4,4,4,4,5,9,8,11,3,6,5,5,5,7,14,4,4,6,6,6,8,1,7,4,7,8,8,1,7,3,6,5,7,6,5,1,5,4,5,3,4,6,7,
5,4,8,4,3,2,2,1,9,5,6,3,12,5,7,6,4,5,4,8,3,1,2,6,7,4,5,4,5,0,3,3,3,5,1,4,1,1,1,4,4,4,2,3,2,
5,2,1,1,1,3,1,2,3,4,3,1,3,2,3,3,4,1,2,2,2,1,0,3,4,2,1,1,1,4,0,3,2,1,2,1,6,2,1,1,2,2,5,4,2,3,
1,1,1,2,0,5,0,4,2,2,1,1,1,3,0,2,1,1,1,2,2,1,2,3,0,1,2,1,2,2,1,0,1,3,2,1,2,0,2,2,2,1,1,2,5,3,
2,1,2,1,2,4,0,3,0,2,1,1,0,1,2,3,2,1,0,1,1,2,3,1,1,2,1,3,5,0,0,2,0,1,0,0,1,0,0,0,0,1,0,0,1,1,
0,0,1,0,1,2,0,1,0,1,0,0,0,0,2,0,0,0,1,0,1,1,1,1,0,1,1,1,2,1,1,0,2,0,2,2,2,0,2,1,0,1,0,0,2,0,
1,1,0,0,0,0,1,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0])

data = data+1e-6 # to avoid issues with zeros

data_trunc = data[25::1] # truncate set to remove noise peak and just focus on exp tail
data_truc_n = data_trunc/np.sum(data_trunc) # normalize data

bins = np.array(range(0,len(data_truc_n)))

# quick plot to visualise data
plt.rcParams["figure.figsize"] = [10,4]
fig, ax = plt.subplots()
plt.bar(bins, data_truc_n, width=1)
plt.xlabel(r"Bins")
plt.ylabel(r"Counts")
plt.xlim([0, len(bins)])
plt.ylim([1e-4, max(data_truc_n)])
ax.set_yscale('log')
plt.show()

#fitting
expo_data = {'N': len(data_truc_n), 'y': data_truc_n, 'x': bins}

sm = pystan.StanModel(model_code=m_code)
fit = sm.sampling(data=expo_data, iter=1000, warmup=100, chains=1, init=[dict(beta=0.009)])

az.plot_trace(fit, var_names=('beta', 'sigma'))
az.plot_trace(fit['lp__'])
plt.show()

# optimizing is better?
op = sm.optimizing(data=dict(y=data_truc_n, N=len(data_truc_n), x=bins))
print("beta:", op['beta'])

# LLH
betas = fit.extract()['beta'] # means
log_lik_sums = fit.extract()['log_lik_sum'] # means
plt.rcParams["figure.figsize"] = [10,4]
fig, ax = plt.subplots()
plt.scatter(betas, log_lik_sums)
plt.xlabel(r"Exponential factor, $\beta$")
plt.ylabel(r"Log Likelihood")
plt.show()

edit: found I had contrained my sigma parameter too much, but still it doesn’t find the minimum that the LLH does… 0.0092 vs 0.0096. Also tried flipping beta (i.e. fitting 1/beta), doesn’t really help. Pretty sure I need to alter the ‘target’?

data:
data
fit:


llh:
llh

Update:

I worked it out. This does what I wanted to do (but still open to comments if this is the right way to do things).

m_code = """
functions{
  real joint_exp_log(vector x, vector y, real beta){
    vector[num_elements(x)] l_prob;
    real l_prob_sum;
    l_prob = log(beta*exp(-x*beta)).*y; //elementwise multiplication
    l_prob_sum = sum(l_prob);
    return l_prob_sum;
  }
}
data {
  int<lower=0> N;
  vector[N] y;
  vector[N] x;
}
parameters {
  real beta;
}
model {
  // example distributions:
  //beta ~ uniform(0, 0.1); // or,
  beta ~ normal(0.01, 0.005); // or,
  //beta ~ gamma(2, 1./0.01); // note: stan inverts the second gamma parameter compared to numpy
  y ~ joint_exp_log(y, beta);
}
generated quantities{
  real y_hat = joint_exp_log(x, y, beta);
}
"""
import pystan
import matplotlib.pyplot as plt
import numpy as np
import arviz as az
%matplotlib inline

data = np.array([
0,6,11,0,1,1076,2142,3324,5786,10408,15185,15592,11576,7722,4812,2549,1385,703,333,210,121,
78,73,59,56,47,37,39,26,38,47,39,30,27,46,38,26,31,36,31,29,43,33,35,33,34,33,32,28,29,29,
30,26,34,33,28,29,34,25,26,31,32,35,15,29,19,21,23,20,31,23,28,18,32,31,30,31,14,29,18,19,
19,23,17,24,24,25,31,28,22,25,17,17,21,16,28,21,28,19,16,22,25,24,11,18,20,19,11,13,13,25,
18,21,16,20,13,16,12,11,13,13,17,19,17,17,10,8,12,11,14,18,12,17,16,11,20,18,10,10,12,15,17,
11,17,9,11,13,10,15,17,16,15,17,12,12,10,12,14,16,5,10,10,13,9,12,10,12,10,7,13,11,8,10,16,
7,5,12,11,7,9,11,15,7,9,12,8,10,10,7,3,10,6,10,5,4,9,6,6,8,8,7,5,13,2,8,7,6,7,11,4,6,6,7,12,
7,7,4,4,4,4,5,9,8,11,3,6,5,5,5,7,14,4,4,6,6,6,8,1,7,4,7,8,8,1,7,3,6,5,7,6,5,1,5,4,5,3,4,6,7,
5,4,8,4,3,2,2,1,9,5,6,3,12,5,7,6,4,5,4,8,3,1,2,6,7,4,5,4,5,0,3,3,3,5,1,4,1,1,1,4,4,4,2,3,2,
5,2,1,1,1,3,1,2,3,4,3,1,3,2,3,3,4,1,2,2,2,1,0,3,4,2,1,1,1,4,0,3,2,1,2,1,6,2,1,1,2,2,5,4,2,3,
1,1,1,2,0,5,0,4,2,2,1,1,1,3,0,2,1,1,1,2,2,1,2,3,0,1,2,1,2,2,1,0,1,3,2,1,2,0,2,2,2,1,1,2,5,3,
2,1,2,1,2,4,0,3,0,2,1,1,0,1,2,3,2,1,0,1,1,2,3,1,1,2,1,3,5,0,0,2,0,1,0,0,1,0,0,0,0,1,0,0,1,1,
0,0,1,0,1,2,0,1,0,1,0,0,0,0,2,0,0,0,1,0,1,1,1,1,0,1,1,1,2,1,1,0,2,0,2,2,2,0,2,1,0,1,0,0,2,0,
1,1,0,0,0,0,1,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0])

data = data+1e-6 # to avoid issues with zeros

data_trunc = data[25::1] # truncate set to remove noise
data_trunc_n = data_trunc/np.sum(data_trunc) # normalize

bins = np.array(range(0,len(data_trunc_n)))

# quick plot to visualize data
plt.rcParams["figure.figsize"] = [10,4]
fig, ax = plt.subplots()
plt.bar(bins, data_trunc_n, width=1)
plt.xlabel(r"Bins")
plt.ylabel(r"Counts")
plt.xlim([0, len(bins)])
plt.ylim([1e-4, max(data_trunc_n)])
ax.set_yscale('log')
#plt.savefig('./data.png', bbox_inches='tight')
plt.show()

sm = pystan.StanModel(model_code=m_code)
expo_data = {'N': len(data_trunc_n), 'y': data_trunc_n, 'x': bins}
fit = sm.sampling(data=expo_data, iter=10000, warmup=1000, chains=1, init=[dict(beta=0.009)])

az.plot_trace(fit, var_names=('beta'))
az.plot_trace(fit['lp__'])
plt.savefig('./parameterdist.png', bbox_inches='tight')
plt.show()

betas = fit.extract()['beta']
log_lik_sums = fit.extract()['y_hat']

plt.rcParams["figure.figsize"] = [10,3]
fig, ax = plt.subplots()
plt.scatter(betas, log_lik_sums, s=1)
plt.xlabel(r"Exponential factor, $\beta$")
plt.ylabel(r"Log Likelihood")

log_lik_sum_max = np.amax(log_lik_sums);
beta_fit = betas[np.argmax(log_lik_sums)];
str_i = "max: %.2f, beta: %.5f" % (log_lik_sum_max,beta_fit)
plt.text(0.85, 0.9,str_i, ha='center', va='center', transform=ax.transAxes)
plt.savefig('./mlh.png', bbox_inches='tight')
plt.show()

# Plot data & regression line
fig, ax = plt.subplots()
plt.plot(bins, beta_fit*np.exp(-bins*beta_fit), color='red')
plt.bar(bins, data_trunc_n, width=1)
plt.xlabel('$x$')
plt.ylabel('$y$')
plt.xlim([0, len(bins)])
plt.ylim([1e-4, max(data_trunc_n)])
ax.set_yscale('log')
plt.savefig('./data_fit.png', bbox_inches='tight')
plt.show()

Cool that you delved in deep :-) You’ve definitely find your way that is very non-standard (and has some issues), but I hope you’ve learned something along the way.

So few things that are weird to me:

  • Your data are in bins, so it looks like you observe a discrete quantity? If you are not observing continuous values, than the geometric distribution might be a better choice.
  • log(beta*exp(-x*beta)) is mathematically equivalent to beta - x * beta - your formulation is likely to cause numerical issues.
  • The data don’t really look like they come from exponential or geometric distribution because the first few bins are very low. A negative binomial (for discrete values) or gamma (for continuus values) might be preferred. (Oh, I see, you get rid of those, but anyway those could be included if you believe this is reasonable).
  • data+1e-6 shouldn’t be necessary (you just multiply be zero, so have zero contribution to target)
  • The paremeter beta needs a lower bound (e.g. real<lower=0> beta;), as exponential/geometric distribution is only defined for positive beta. Since negative beta will lead to rejected samples, this is almost certainly the reason why you had to put such a narrow prior on beta - you needed it to completely avoid negative values.

Having data in bins is slightly non-standard, so first, I’ll show how you would go about your business, if what you had were the actual (continuous) numbers, not bins:

data {
  int<lower=0> N;
  vector<lower=0>[N] x;
}

parameters {
  real<lower=0> beta;
}

model {
  //This is almost the same as x ~ exponential(beta), but for similarity to the next model, we'll keep it this way
  target += exponential(x | beta); 
}

Now if you have bins, geometric distribution is probably better and you need to weigh the likelihood contributions by the counts in the bin, so:

data {
  int<lower=0> N;
  int<lower=0> x[N]; //Value represented by the bin
  int<lower=0> y[N]; //Number of observations in the bin
}

parameters {
  real<lower=0> beta;
}

model {
  for(n in 1:N) {
    //geometric distribution is a special case of the neg.binomial with phi = 1
    target += y[n] * neg_binomial_2_lpmf(x[n] | 1.0 / beta, 1.0);
  }
}

If you really want the maximum likelihood estimate than maybe Stan is overkill, but the optimizing is really the better choice. But, since you’ve included a prior on beta, optimizing will give you MAP (maximum a posteriori) estimate. If you want maximum likelihood, you need to avoid priors (as I did in the examples).

Still including priors is usually good (but won’t make a big difference since you have quite a lot of data to estimate just one parameter).

Hope that illuminates more than confuses - feel free to ask for clarifications.

Hi Martin,
Thanks for replying!
You pointed out many mistakes (thanks!), I’ll be a bit more careful this time. Also thanks for the tip for the neg_binomial_2 distribution!

m_code = """
data {
  int<lower=0> n_data;
  int<lower=0> n_predictions;
  real<lower=0> y[n_data];
}

parameters {
  real<lower=0, upper=0.015> beta;
}

model {
    //beta ~ uniform(0, 0.15);
    y ~ exponential_lpdf(beta);
}
generated quantities{
  vector[n_predictions] y_hat;
  vector[n_data] log_lik;

  for(i in 1:n_predictions)
    y_hat[i] = exponential_rng(beta);

  for(i in 1:n_data)
    log_lik[i] = exponential_lpdf(y[i] | beta);
}
"""

for reference:

import pystan
import matplotlib.pyplot as plt
import numpy as np
import arviz as az
import corner
import time
%matplotlib inline
print(az.__version__)
print(pystan.__version__)
print(corner.__version__)

0.8.3
2.19.1.1
2.0.1

Create an array with my (small) data set, as I said earlier its taken from an MCA (multi-channel-analyser), this device bins data, so these numbers are bin-contents.

I then remove the first few bins in the set, which are full of noise, and then make a normalised data set:

data = np.array([
0,6,11,0,1,1076,2142,3324,5786,10408,15185,15592,11576,7722,4812,2549,1385,703,333,210,121,
78,73,59,56,47,37,39,26,38,47,39,30,27,46,38,26,31,36,31,29,43,33,35,33,34,33,32,28,29,29,
30,26,34,33,28,29,34,25,26,31,32,35,15,29,19,21,23,20,31,23,28,18,32,31,30,31,14,29,18,19,
19,23,17,24,24,25,31,28,22,25,17,17,21,16,28,21,28,19,16,22,25,24,11,18,20,19,11,13,13,25,
18,21,16,20,13,16,12,11,13,13,17,19,17,17,10,8,12,11,14,18,12,17,16,11,20,18,10,10,12,15,17,
11,17,9,11,13,10,15,17,16,15,17,12,12,10,12,14,16,5,10,10,13,9,12,10,12,10,7,13,11,8,10,16,
7,5,12,11,7,9,11,15,7,9,12,8,10,10,7,3,10,6,10,5,4,9,6,6,8,8,7,5,13,2,8,7,6,7,11,4,6,6,7,12,
7,7,4,4,4,4,5,9,8,11,3,6,5,5,5,7,14,4,4,6,6,6,8,1,7,4,7,8,8,1,7,3,6,5,7,6,5,1,5,4,5,3,4,6,7,
5,4,8,4,3,2,2,1,9,5,6,3,12,5,7,6,4,5,4,8,3,1,2,6,7,4,5,4,5,0,3,3,3,5,1,4,1,1,1,4,4,4,2,3,2,
5,2,1,1,1,3,1,2,3,4,3,1,3,2,3,3,4,1,2,2,2,1,0,3,4,2,1,1,1,4,0,3,2,1,2,1,6,2,1,1,2,2,5,4,2,3,
1,1,1,2,0,5,0,4,2,2,1,1,1,3,0,2,1,1,1,2,2,1,2,3,0,1,2,1,2,2,1,0,1,3,2,1,2,0,2,2,2,1,1,2,5,3,
2,1,2,1,2,4,0,3,0,2,1,1,0,1,2,3,2,1,0,1,1,2,3,1,1,2,1,3,5,0,0,2,0,1,0,0,1,0,0,0,0,1,0,0,1,1,
0,0,1,0,1,2,0,1,0,1,0,0,0,0,2,0,0,0,1,0,1,1,1,1,0,1,1,1,2,1,1,0,2,0,2,2,2,0,2,1,0,1,0,0,2,0,
1,1,0,0,0,0,1,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0])

data_trunc = data[25::1] # truncate set to remove noise
data_trunc_n = data_trunc/np.sum(data_trunc) # normalize

bins = np.array(range(0,len(data_trunc_n)))

Later, I find (especially when making the arviz.ppc_plot) it’s easier If I deconstruct the binned data and work with sampled data, so I multiply through the bins and produce events:

data_i = []
for i, n_i in enumerate(data_trunc):
    for ii in range(n_i):
         data_i.append(i)

quick plot to visualise the data:

# quick plot to visualize data
plt.rcParams["figure.figsize"] = [10,4]
fig, ax = plt.subplots()
plt.bar(bins, data_trunc, width=1)
plt.xlabel(r"Bins")
plt.ylabel(r"Counts")
plt.xlim([0, len(bins)])
#plt.ylim([1e-4, max(data_trunc_n)])
ax.set_yscale('log')
plt.show()

data

compile the model:

starttime = time.time()
sm = pystan.StanModel(model_code=m_code)
print('Took {} seconds'.format(time.time() - starttime))
Took 40.821542263031006 seconds

now begin fitting:

expo_data = {'n_data': len(data_i),
             'y': data_i,
             'n_predictions': 2000}
n_chains = 4
starttime = time.time()
fit = sm.sampling(data=expo_data,
                  iter=10000, warmup=5000,
                  chains=n_chains,
                  init=[dict(beta=0.009) for i in range(n_chains)])
print('Took {} seconds'.format(time.time() - starttime))
#print(fit.stansummary()) # comment this out, it takes forever when n_predictions is large
Took 23.4742693901062 seconds
Inference for Stan model: anon_model_c505dc25772a95ae8e4471abf9ed01b1.
4 chains, each with iter=10000; warmup=5000; thin=1; 
post-warmup draws per chain=5000, total post-warmup draws=20000.

                mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
beta          9.7e-3  1.8e-6 1.5e-4 9.4e-3 9.6e-3 9.7e-3 9.8e-3   0.01   7357    1.0
y_hat[1]      103.81    0.73 102.67   2.56  30.55  72.48 145.11 376.04  19805    1.0
(attenuated output)

use arviz data structure for conveniently formatted data, especially when using multiple chains!

az_data = az.from_pystan(
    posterior=fit,
    observed_data=["y"],
    posterior_predictive=["y_hat"],
    log_likelihood={"y": "log_lik"}
)

get the mean beta value and also the value for beta using the optimise method:

op = sm.optimizing(data=expo_data)
beta_fit_opt = op['beta']
beta_fit_mean = np.mean(np.concatenate(az_data.posterior['beta'].values)) # combine all chains
print("Best fit from optimize: {:.6f}, mean of betas: {:.6f}".format(beta_fit_opt, beta_fit_mean))
Best fit from optimize: 0.009713, mean of betas: 0.009714

the arviz.plot_trace shows this nicely

az.plot_trace(fit, var_names=('beta'))

to view the output and the predicted posteriors, I use the arviz.ppc_plot. I wanted the bins shown as well, so plotted it together with a histogram of the data. Also required a y-axis. I couldn’t find any options for these (is this the best way?)

az.style.use("arviz-darkgrid")

plt.rcParams["figure.figsize"] = [10,4]
ax = az.plot_ppc(az_data, kind='kde', data_pairs={"y":"y_hat"}, alpha=0.9)

ax[0].hist(az_data.observed_data["y"].values,
           bins=bins,
           density=True,
           color='k', lw=2, alpha=0.5, 
           width=1,
           zorder=100000)

ax[0].set_xlabel('y')
ax[0].set_xlim([0,20])

ax[0].set_ylabel('Density')
ax[0].set_yticks(np.arange(0,0.25,0.002))
ax[0].set_xlim([0,400])
ax[0].set_ylim([0,0.014])
ax[0].legend(loc='upper left', frameon=False, fontsize='small')
plt.show()

The posterior predictive ‘mean’ line has an artifact at x=0. Is there any way to fix this? Also is there any way to edit the legend labels? (I looked into passing my own ‘labels’ as some form of kwarg but couldn’t see one to use)

Aside number 1 - Binned data

If I want to use the original binned data (data_trunc), in my Stan model, instead of the exponential model, I modify my model code:

   for(i in 1:n_data) 
     target += y[i] * exponential_lpdf(x[i] | beta);

and then fit with:

 expo_data = {'n_data': len(data_trunc), 
              'y': data_trunc,
              'x': bins,
              'n_predictions': 1000}

But I couldn’t see any way of getting the arviz.ppc_plot to work with binned data. Is there any way?
I could replace the ‘observed_data’ in the arviz inference set with the sampled data generated from the binned data (but then really may as well use this to begin with)

observed_histogram = az_data.observed_data['y']
az_data_old = az.convert_to_inference_data(observed_histogram,group='observed_histogram')
az_data = az_data + az_data_old
del az_data.observed_data
az_data_new = az.convert_to_inference_data({"y":np.array(data_i)},group='observed_data')
az_data = az_data + az_data_new

Post prediction checks

I wanted to test the fitting. I checked the log likelihood,

betas = np.concatenate(az_data.posterior['beta'].values)
log_lik = np.concatenate(az_data.log_likelihood['y'].values)
lh = [i.sum() for i in log_lik]

beta_fit_lh_lh = np.amax(lh)
beta_fit_lh = betas[np.argmax(lh)]

print ("beta_fit_beta: {:.5f}, LH: {:.5f} (id: {:.0f})".format(beta_fit_lh, beta_fit_lh_lh, np.argmax(lh)))

plt.rcParams["figure.figsize"] = [10,4]
fig, ax = plt.subplots()
plt.scatter(betas, lh, s=1)

plt.xlabel(r"Exponential factor, $\beta$")
plt.ylabel(r"Log Likelihood")
plt.show()
beta_fit_beta: 0.00971, LH: -23494.87768 (id: 3827)

I see the maximum is at the same value returned by the optimise method (and the mean beta). Is this the mechanism behind ‘optimise’?

I then compute the negative log likelihood of the data, as something else to check against. Here I’m comparing binned models. So I’m computing

nLH += log(model_i) * data_i

So I first pull all the chains together, create histograms (with the same binning as my data), normalise and then compute the nLH against my binned, normalised data:

betas = np.concatenate(az_data.posterior['beta'].values) # get beta parameter to plot against, all chains combined
y_hats = np.concatenate(az_data.posterior_predictive['y_hat'].values) # get y_hat data, all chains combined

lh_fit = np.zeros_like(betas) # create empty list
lh_pred = np.zeros_like(betas) # create empty list

for j in range(len(betas)):
    
  # negative log likelihood of data
  f_opt = lambda x :betas[j]*np.exp(-x*betas[j])
  f_opt_n = [f_opt(x) for x in bins]
  lh_fit[j] = np.array([x*np.log(f) for x, f in zip(data_trunc_n, f_opt_n) if f>0]).sum()

  # negative log likelihood of predicted posteriors
  bincontents, npbins = np.histogram(y_hats[j], bins=bins, density=True)
  lh_pred[j] = np.array([x*np.log(f) for x, f in zip(bincontents, f_opt_n) if f>0]).sum()

# find maximum of nLH
beta_fit_lh_lh_fit = np.amax(lh_fit)
idx_fit = np.argmax(lh_fit)
beta_fit_lh_fit = betas[idx_fit]

# use mean value (i.e. comparable to the mean of the betas?)
idx_pred = (np.abs(lh_pred - lh_pred.mean())).argmin() #find mean then the closest value to it in array
beta_fit_lh_pred = betas[idx_pred]
beta_fit_lh_lh_pred = lh_pred[idx_pred]

print ("beta_fit_lh_fit: {:.5f}, LH: {:.5f} (id: {:.0f})".format(beta_fit_lh_fit, beta_fit_lh_lh_fit, idx_fit))
print ("beta_fit_lh_pred: {:.5f}, LH: {:.5f} (id: {:.0f})".format(beta_fit_lh_pred, beta_fit_lh_lh_pred, idx_pred))
beta_fit_lh_fit: 0.00971, LH: -5.63426 (id: 15887)
beta_fit_lh_pred: 0.00981, LH: -5.58654 (id: 14367)

this also returns a value for beta which is very close to the previous ones found (with optimise, mean, log_lik)

I look at the nLH of all the predicted posteriors. I couldn’t see how to use the arviz kde plot (or any other, such as pair plot) with specified contours. I want contours which show sigma levels. I found the corner package does this nicely:

az.style.use("default")
postsamples = np.vstack((betas, lh_pred)).T
levels = [0, 1-np.exp(-0.5),1-np.exp(-2)] # 1sigma and 2sigma contours
fig = corner.corner(postsamples,
                    quantiles=(0.05, 0.16, 0.84, 0.95), 
                    levels=levels,
                    plot_datapoints=False, 
                    plot_density=False,
                    fill_contours=True,
                    contour_kwargs={"colors":['red','blue']},
                    contourf_kwargs={"colors":['w','red','blue'], "alpha":0.4},
                    truths=[beta_fit_lh_pred, beta_fit_lh_lh_pred],
                    truth_color='grey',
                    show_titles=True, title_fmt=".5f",
                   )

The nLH’s of all the predicted data sets look consistent, this must be a good thing?!

I’ve never used an HDI before, but I found it in the arviz package. So I compute it too:

hdi_lower, hdi_upper = az.hdi(az_data, hdi_prob=0.95, var_names=['beta'])['beta'].values
beta_fit_hdi = hdi_lower + (hdi_upper-hdi_lower)/2
print("HDI lower: {:.5f}, HDI upper: {:.5f}, width: {:.5f}, mid-point: {:.5f}".format(hdi_lower, hdi_upper, hdi_upper-hdi_lower, beta_fit_hdi))
HDI lower: 0.00941, HDI upper: 0.01000, width: 0.00059, mid-point: 0.00971

it also returns a value for beta which is consistent with the others.

I attempt to use the arviz.hdi_plot. I first histogram all the predictive datasets and then plot the binned data. (Is there any other way to do this?)
It works, but the band looks quite large, I guess it does contain 95% of the values. Is this how it should be interpreted?

betas = np.concatenate(az_data.posterior['beta'].values) # get beta parameter to plot against
y_hats = np.concatenate(az_data.posterior_predictive['y_hat'].values) # get y_hat data
bincontents_list = []

for j in range(len(betas)):
  bincontents, npbins = np.histogram(y_hats[j], bins=range(len(bins)+1), density=True)
  bincontents_list.append(bincontents)

plt.rcParams["figure.figsize"] = [10,4]
fig, ax = plt.subplots()

ax = az.plot_hdi(bins, bincontents_list, hdi_prob=0.95, plot_kwargs={"ls": "--"})

plt.hist(data_i, width=1, bins=range(len(bins)+1), alpha=0.7, density=True)

plt.plot(bins, beta_fit_hdi*np.exp(-bins*beta_fit_hdi), color='red', label='hdi_midpoint ({:.5f})'.format(beta_fit_hdi))
plt.plot(bins, hdi_lower*np.exp(-bins*hdi_lower), color='yellow', label='hdi_lower ({:.5f})'.format(hdi_lower))
plt.plot(bins, hdi_upper*np.exp(-bins*hdi_upper), color='yellow', label='hdi_upper ({:.5f})'.format(hdi_upper))

plt.xlabel('$x$')
plt.ylabel('$y$')
plt.xlim([0, len(bins)])
plt.ylim([1e-4, max(data_trunc_n)*2])
ax.set_yscale('log')
plt.legend(loc="upper right", fontsize=9)
plt.show()

I could equally draw all the other lines of best fit for beta, but since they are so close, they would all fall ontop of each other on this plot. I like this plot, as long as I’m creating it correctly… I’ll have to check it over.

next

  • I’ll compare the two models, exponential and neg_binomial_2_lpmf.
  • fit the two parameters on neg_binomial_2_lpmf
  • I’d like to see how loo works. I’ll attempt to try this.

Many thanks if anybody read this all the way through.
I quite like Stan and Arviz. There’s some really nice tools here. In this post I’ve learnt and tested them with the tools I know (e.g. nLH), which might not be a standard way of doing things (is this mixing in frequentism?). Please point out any mistakes!

edit: changed log-likelihood ratio to just log-likelihood (not sure ratio was the correct thing to do).