Jeffreys priors for gamma distribution (unknown shape and scale)

Hello all,
I am doing Bayesian analysis on a gamma distribution, assuming the shape and scale parameters are unknown. The Jeffreys prior for this scenario is:

p (shape , scale ) = sqrt ( shape * trigamma ( shape ) - 1 ) / scale

My questions are:

  1. How do I encode this prior specificationin Stan? Is what I have below correct?
    target += 0.5 * log (shape * trigamma ( shape ) - 1 ) - log ( scale );

This code runs in Stan. But if priors for shape and scale are jointly distributed, don’t I need another distribution to define either the shape or scale distribution?

  1. To perform prior predictive check, how do I generate random numbers for this joint distribution prior without a separate expression defining either shape or scale distribution?

Your recommendations are highly appreciated.


1 Like

Hey Richie_m

I think you are on the right track. By default, Stan will adopt certain prior if you have not defined any.

presenting the code would be more insightful and you will get more feedback.

As suggested by GJGonzales, my sample code is given down below.

If I don’t supply a prior, it will default to uniform(0,positive_infinity). To make it explicit, I supplied shape ~ uniform(0,+infinity) and scale ~ uniform(0,+infinity) but it should have the same effect as commenting these out. Side question: encoding positive_infinity() results in an error, so I just used a very large number, say 5000. What is the correct syntax for specifying positive infinity?

So for these uniform priors, the generated quantities for prior predictive checks used uniform_rng for each of shape_prior and scale_prior.

If I used the Jeffreys prior, which is a joint function of shape and scale, how do I generate shape_prior and scale_prior?

==== R code =====
BayesGamma ← "data {
int N; // sample size of observations
real<lower = 0> Y[N]; // observed data
int N_Prior_Sim; // sample size for prior predictive check

parameters {
real<lower = 0> shape; // shape parameter of gamma distribution
real<lower = 0> scale; // scale parameter of gamma distribution

model {
// priors
// if uniform
shape ~ uniform ( 0,5000 ) ;
scale ~ uniform ( 0,5000 ) ;

// if Jeffreys
//target + =  0.5*log ( shape*trigamma ( shape ) - 1 ) - log ( scale ) ; 

// likelihood
target + = gamma_lpdf ( Y | shape, 1/scale ) ; // gamma_lpdf parameterized on shape and rate

generated quantities {
// for prior predictive check
vector [N_Prior_Sim] Y_Prior_Sim;

// if uniform priors
real shape_prior = fabs ( uniform_rng ( 0,5000 ) ) ; // fabs to keep prior positive
real scale_prior = fabs ( uniform_rng ( 0,5000 ) ) ; // fabs to keep prior positive

// if Jeffreys priors, how should shape_priors and scale_priors be formulated?

for ( i in 1:N_Prior_Sim ) Y_Prior_Sim[i] = gamma_rng ( shape_prior , 1/scale_prior ) ; //gamma_rng parameterized on shape and rate

model ← stan_model ( model_code = BayesGamma )

shape.true = 1; scale.true = 1
N = 20 # observed sample size
N_Prior_Sim ← N # sample size for prior predictive check
set.seed ( 1234 )
Y ← rgamma ( N,shape = shape.true,scale = scale.true ) # simulate data

compile into a data list

stan_data ← list ( Y = Y, N = N, N_Prior_Sim = N_Prior_Sim )

run Stan model

fit ← sampling ( model, data = stan_data, warmup = 500, iter = 2000, chains = 4, cores = 2, thin = 1 )

Extract posterior samples

post ← extract ( fit )

This effectively is the specification of the (joint) Jeffrey prior for the scale and the shape. There is no need to specify another separate prior for shape or scale.

I would however consider to specify your gamma distribution in terms of its mean and shape, as scale and shape are probably highly correlated parameters. Note that the mean of a gamma distribution is shape times scale. So given the specification of a mean and shape parameters in the model block [EDIT: Correction, this is the parameters block], the Jeffrey prior is:

target += 0.5 * log (shape * trigamma ( shape ) - 1 ) - log ( mean / shape);


target += 0.5 * log (shape * trigamma ( shape ) - 1 ) - log ( mean) + log(shape);


Thanks LucC for chiming in. Yes indeed, specifying the joint Jeffreys prior suffices, my code runs and outputs reasonable posteriors. My question though is how to generate random numbers for prior predictive checks. I don’t know how to do that with only the joint probability from the Jeffreys prior.

As for shape and scale parameters being highly correlated, that is indeed the case. So even if I specify priors independently, say with the respective uniform(0,+Inf), for each of the shape and scale, the posteriors show high correlation between the two parameters. In your suggestion, re-parameterizing from shape-scale to shape-mean, would that solve that correlation? The mean is derived as a function of shape and scale, so the effect of scale is still reflected in the mean.

I do plan to use another prior specification where shape and scale parameters are correlated, so I will compare the impact of that with using independent uniform priors and the joint Jeffreys prior.

Just remove the likelihood statement from your model, and then you will sample from the Jeffrey’s prior! If you want to be fancy, you add an if statement around your likelihood statement, and provide an additional integer data input “LL” (or named similar) that is allowed to be 0 or 1, and let that determine whether you calculate the likelihood or not.

The join posterior distribution of shape and scale of a gamma distribution should always be expected to show a lot of correlation. This is because infinitely multiple combinations of of shape and scale produce the same mean (because of the multiplicativeness). This has little to do with the priors that you specify for shape and scale. My suggestion is to reparameterise the model in terms of mean and shape, so to change the parameters that you estimate. Mean determines the location of the gamma distribution, and shape determines the variance relative to the mean. Therefore, mean and shape usually aren’t very correlated in the posterior (regardless of prior definition).

1 Like

Great insights and suggestions LucC!

Re: Prior Predictive Checks – I had read somewhere in McElreath’s Statistical Rethinking book that a way to do that is to run model without the data (i.e., by running an empty data set) but somehow I got an error message when I did that. I had also read in this Stan forum of commenting out the likelihood as you suggested. But I was doing it on the implicit default prior of uniform(-Inf,+inf), so without the likelihood (and the data), that did not work either. If I choose a finite yet large number for the prior, say uniform(0,5000), with the likelihood commented, I do get a posterior corresponding to the specified uniform prior. Applying that to the use of Jeffreys prior, again with likelihood commented, I get similar posteriors as when I used a uniform prior. So this observation validates that Jeffreys prior is indeed non-informative. I learned two things: 1) how to specify a joint prior such as Jeffreys; 2) do predictive check by omitting likelihoo.

Re: shape-mean vs shape-scale parameterization – I was able to demonstrate the non-correlated posteriors using the former parameterization, as you suggested. I will adopt that parameterization for my study.

Many thanks!!

1 Like

P.s.: note that a uniform prior does not by default mean that the prior is uninformative. A rare case where it truly is uninformative is the U(0,1) distribution for a proportion (and even there some may disagree XD). For unconstrained and positively constrained parameters, I would argue that a uniform prior is rarely uninformative. Look around the forum for discussions about what is and isn’t informative. Maybe here is a nice place to start: A Bayesian Workflow Question: prior sensitivity with respect to observed data for complex models

Hi @LucC,
If I can ask a follow-up question to you (and others who may also have some recommendations).

So I was able to proceed with specifying a joint prior density, actually using Reference prior as below, so slightly different from Jeffreys, but performance-wise, I get comparable posteriors between these two variations of weakly informative priors.

target += 0.5 * log ( ( shape * trigamma ( shape ) - 1 ) / shape ) - log ( mu / shape ) ;

This Reference prior works smoothly, over many scenarios I am evaluating, each with 2,000 iterations. No glitch whatsoever.

I am comparing the Reference prior to another prior that was designed to give more weight on the observed data, it is referred to as MDIP.

target += log ( mu / shape ) - log ( tgamma ( shape ) ) + ( shape - 1 ) * digamma ( shape ) / tgamma ( shape ) - shape ;

This MDIP prior is bit more complex than Reference. Using the MDIP, it also works, churning out results over many iterations, but then will just hang and be stucked so I have to stop and manually re-run the code skipping that data iteration. It will work again, for a number of iterations, until it gets hung up again.

This manual re-starting does not lend to running simulation that I can leave overnight. I thought it was a function of the particular data simulated at that iteration causing the hanging. But there’s nothing particularly extreme in the dataset at that iteration, just typical of the simulated gamma distribution parameters. Immediately re-running that data iteration does not work. But curiously, re-running that data iteration much, much later after the laptop has been turned off and “rested”, say the next morning, that data iteration will now work,

I tried using adapt_delta=0.99, but that did not solve the hanging. Any ideas how this issue can be overcome?


Hi Richie_M. I’m not familiar with the MDIP prior you are using. It could be that it’s an improper prior that doesn’t work well in the context of your model? If you’re not sure whether it’s an improper prior, you could do a 2-D integration of the probability density function over the parameter space and check that the integral is finite. I would also create and inspect plot of the pdf for each of the parameters, conditional on a range of values for the other parameter, and see how that compares to a similar plot for the Jeffrey’s prior. Are the tails thicker?

Side question: what are these repeated analyses for? Are you doing simulation based calibration?

Thanks LucC for the response.

This MDIP comes from this paper:

Moala, F. A.; Ramos, P. L.; Achcar, J. A., Bayesian Inference for Two-Parameter Gamma Distribution Assuming Different Noninformative Priors. Revista Colombiana de Estadistica 2013, 36 (2), 319-336.

According to this paper, the prior I wrote out in previous post yields proper posteriority.

As for what I am doing in my simulation, I am studying left-censoring of a gamma distribution. At increasing extent of censoring, I am evaluating how well frequentist and Bayesian methods could estimate the distribution parameters. So for each scenario on extent of censoring, I have to do many simulated data iterations (shooting for 1,000 iterations but I have to work out this hanging issue first) in order to evaluate mean, bias, MSE, etc of the estimates. If that’s what you mean as “simulation-based calibration”, then yes, that’s my goal.

The hanging issue may be due to: 1) the complex prior formula. 2) the censoring - If I simulate scenario with low extent of censoring, there’s no hanging issue. At increasing extent of censoring, that’s when hanging becomes more prevalent. So probably it is asking of HMC algorithm too much with highly censored data.

I would recommend that you recheck the equations in the paper. I’m assuming you’re trying to implement equation 17, which is supposed to yield a proper posterior? That equation uses the gamma function in several places (denoted capital letter Gamma), but you are using a “tgamma” in those places (which is the trigamma function, I guess?). Also, check the Stan documentation for numerically stable implementations of logarithms of functions. For instance, log(gamma(…)) has a stable implementation. Not sure about other functions, but it worth checking so can rule out numerical issues.

1 Like

Yes, I am using the modified MDIP (Equation 17). Stan has its own version/syntax of implementing some functions. Per the Stan reference manual, tgamma(real x) returns the gamma function applied to x. There is a separate function for trigamma as used in the Jeffreys and digamma as used in the MDIP. Using my code, I am able to get comparable shape and scale estimates from the paper, so I think I have the right code.

From what I experienced in this simulation, the instability of the Stan HMC leading to hanging is probably due to combination of that complicated MDIP function, having small sample size, having large extent of censoring, and also the shape and scale parameters (i.e., a shape=1, scale=1 scenario is less problematic as shape=2, scale=3).

Again much thanks for your insights and taking time to look at the paper and offering helpful coding tips.

Ok, good that you checked. Just consider this then: replace “log ( tgamma ( shape ) )” with “lgamma(shape)”. This provides the same result, but is numerically more stable.