Estimating the binomial rate when number of trials is uncertain


This is a simpler, but related issue to the one I posted on June 2.
Suppose we have 100 trials (N) and 40 (M) known successes, and we wish to estimate the binomial rate parameter for probability of success, theta. But suppose itself is uncertain (estimated) but we have information that the true value of N (NT) is less than and that the true proportion of individuals that are capable is a proportion of N, where the parameter governing the proportion is distributed as a Beta random variable. We wish to estimate theta accounting for the additional uncertainty due to uncertainty about N. I have easily programmed this in a fortran program using Metropolis-within-Gibbs. I am unable to do this in Stan/Rstan. The issue seems to be one of trying to sample a parameter declared a data or transformed data. Below are the .stan and .R files. any help will be much appreciated:
///Simple binomial inference of probability of success theta given known
//number of trials and successes.
// NG started June 7 2018
//This will not work.

data {
int<lower=0> N; //number of trials
int<lower=0> M; //number of successes
real<lower=0> B_Alpha;
real<lower=0> B_Beta;
real<lower=0> NT_Alpha;
real<lower=0> NT_Beta;

transformed data{
real<lower=0> NT;

parameters {
real <lower=0, upper=1> theta; //probability of success

real<lower=0, upper=1> z;


model {
// priors
theta ~ beta(B_Alpha,B_Beta);
z ~ beta(NT_Alpha, NT_Beta);
NT = floor(N*z);
// likelihood:


#NG started June 7 2018

N <- 100; # number of trials
M <- 40; # number successes
B_Alpha<-50;# beta alpha (number prior successes) parameter for prior on success param theta
B_Beta<-50; # beta Beta (prior failures) param for prior on theta
NT_Alpha<-100; #beta alpha param for distribution of true number of trials,z: (NT = N*z)
NT_Beta<-50; #beta beta param for z

stanDat <- list(N,M, B_Alpha, B_Beta, NT_Alpha, NT_Beta)

params<-c(“theta”, “NT”)

#sample posterior:
Binomtst2fit <- stan(file=“Binomtst2.stan”, data = stanDat, pars = params, iter = 1000, chains = 4)



That is a contradiction in thought. The things in data and transformed data are things you condition on. The things in parameters you draw from their joint distribution conditional on the things in data and transformed data. If you do not know N with certainty, then you need to marginalize it out of the posterior distribution.


I disagree. I have programed models like this easily in a Fortran program running Metropolis-within-Gibbs. If there is information (such as genetic stock assignment data for salmon populations, for example) the uncertainty in data concerning the number of trials can be delimited with either an uninformative or informative prior. We need the sampler to grab a random value for the prior on number of trials and for the parameters (probability of success in the example I asked about), and (in the case of the simple binomial) sample a value of the likelihood given the fixed data for the number of successes (n). My guess is that my problem here with stan is the need to estimate a random value for an parameter that is restricted to integers (the number of trials).


You can try to draw from posterior distributions with discrete unknowns with MCMC algorithms such as Metropolis-within-Gibbs but you are not still conditioning on the number of trials if you are drawing from its posterior distribution. So, if you were going to try to do that in Stan, you would be declaring the unknown number of trials in the parameters block rather than the data or transformed data blocks.

Except Stan will throw an error if you try to do that because the Hamiltonian Monte Carlo algorithms require that the posterior log-kernel to be differentiable with respect to the unknowns, which is not the case for unknowns from a discrete parameter space. So, in order to proceed in Stan, you have to marginalize them out of the posterior distribution. When you marginalize them out, you typically get much better sampling performance than when you leave them in and use a MCMC algorithm that is not based on Hamiltonian dynamics. And the performance of other MCMC algorithms for posterior distributions with discrete unknowns is often not good enough to make reliable inference for any finite number of MCMC iterations.


If you’re sampling N from some auxiliary distribution then you’re not even sampling from the full generative posterior, but some approximation thereof in a vein similar to cut from BUGS. In any case, Ben is right – you’re not conditioning on N if it’s sampled in any way.


Thank you both for the feedback. For an example of the approach I describe in a peer-reviewed publication, see Gayeski et al. Canadian Journal of Fisheries and Aquatic Sciences, volume 68, pp 498-510, 2011. In any event, uncertainty about data is a genuine issue, especially but not exclusively in ecology, and certainly lends itself to Bayesian methods.
But I am appreciating the challenges of doing this in Stan.

Here’s simple but real example. I’d be interested in your thoughts on how it might be addressed in Stan: Consider fish migrating upstream in at river. We tag 1000 at site 1 (M) and take non-lethal tissue samples for later DNA analysis that will enable assigning the sampled individuals to natal populations (their destinations). At an upstream site 2 we perfectly detect (with probability 1) 500 of the tagged fish (n). But we know or suspect that some of the 1000 tagged fish are bound for tributary rivers between site1 and site2 and hence are in principle incapable of being detected at site 2 were they to survive tagging. Our main interest is in estimating the survival rate (s) of fish tagged at site 1 and bound for site2 (and sites further upstream of site2). Since some of the tagged fish are not capable of being detected at site 2 because they are not migrating there, we will under-estimate the survival rate from site 1 to site2 if we use n~binomial(M,s). We have funds to analyze 300 of the tissue samples and the results are 200 bound for site 2 and above, and 100 bound for destinations between sites 1 and 2. A reasonable approach is to employ a parameter p for the proportion of M that are bound for site 2 and above, and to place a Beta(200,100) prior on p, and then sample from this distribution and multiply each such p by M and use the integer value of M*p in place of M in the binomial. again this is readily done using MCMC with Metropolis-within-Gobbs sampling.


I don’t know anything about this particular paper, but I wouldn’t put too much faith in peer review. The applied statistics literature is a mess.

I was having this very discussion with some fisheries people at U. Washington at the last ISEC in Seattle. They wanted to treat the tuna catch as exact and impose some hideous hack on the likelihood to avoid predicting negative numbers of tuna when a simple measurement error model would’ve solved the problem without a huge zero-avoiding hack.

This is a general maneuver in statistics. The general term in stats is “measurement error”. There are two chapters on how to code them in Stan.

If the variable is continuous, this is easy. If it’s finite and discrete, it can be marginalized (see the chapter on latent discrete parameters).

If it’s count data and the range is small, you can marginalize in Stan to within arthmetic precision. If it’s count data and it’s large (e.g., Poisson(1e6)), the marginalization becomes intractable. This is the one case where it’d be nice to have discrete sampling, because the discrete sampling won’t necessarily be intractable, just mix very slowly as discrete parameters tend to do in sampling (also explained in the chapter on latent discrete parameters with simple Monte Carlo [not even MCMC] examples). Often, it’s possible to use some kind of continuous approximation. For example, rather than treating the true measurement as an integer with a Poisson distribution, treat it as continuous with a gamma distribution, e.g.,

y ~ gamma(a, b);  // true value
y_meas ~ poisson(y);  // measured value

The only problem is that now the true value is continuous, so if you need it to be discrete (e.g., you need something other than expectations), this won’t work well.


Thanks for the thougtfull, helpful feedback. The peer reviewed paper is one I am lead author on, but I appreciate the point about peer review. In general, Bayesian analysis aims to address uncertainty in the best way possible. So, uncertainty about data should be included in this scope. The mechanics of the implementation in software and the ensuing accuracy/inaccuracy of the algorithms is, of course, an important matter.


It’s in scope in the sense of being something we’re always considering, but measurement error isn’t always an explicit part of the models. More often it just gets rolled into a general noise term. We have to make tradeoffs and tend to only build our models out to the point our data sets support. While Bayes in theory should be able to support estimating parameters we know little about and then let us integrate out the uncertainty, in practice that’s often very challenging computationally or very sensitive to priors when we have very little prior information.


I took a quick look at the paper. I couldn’t isolate the model you were fitting in the paper. I often have that problem with applied stats papers. I see something like a likelihood, but don’t see any priors or any indication of how the constraint that population remain positive is managed to ensure that the distributions are proper.

Importance sampling using the prior as a proposal distribution only stands a chance in very simple problems where the posterior is consistent with and not very concentrated relative to the prior.

I didn’t follow the reasoning in

By sampling 10000000 random combinations of N, R, U, and M, we were assured of adequately sampling the joint parameter space.

Was there simulated data calibration? Raw randomization and iterations don’t mean much by themselves. For example, HMC can be more than thousands of times more effective per iteration than Metropolis, so measuring number of iterations in Metropolis doesn’t measure much.

SWL sam- ples the prior distributions by direct simulation with calls to random number generators and then weights each sampled set of values of parameters by their likelihood, cumulating histograms and posterior summaries of the sampled parame- ter values weighted accordingly.

Shouldn’t the reweighting be by the posterior density (up to a proportion), not just the likelihood?



A follow-up on trying to model the simple binomial example I gave, with known recaptures n (400), known total trials (number marked) N (1000, but with uncertainty as to how many of N are bound for the detection site), and unknown survival rate parameter (with uniform or broad beta prior).
Here is the .stan and .R file. The data is hard coded in the .R file. This compiles and produces the error that NT has value of -2147483648, but is constrained to be >=1. don’t understand why.

#NG June 23 2018

N <- 1000.0; # number of trials (to be declared as a real0
M <- 400; # number successes
T_Alpha<-5; #alpha param of beta distribution on success rate theta
T_Beta<-5; #beta parameter of beta prior on theta
Z_Alpha<-200;#alpha parameter of beta prior on z parameter for proportion of N bound for detection site.
Z_Beta<-100; #beta parameter of beta prior on z

stanDat <- list(N,M, T_Alpha, T_Beta, Z_Alpha, Z_Beta)

params<-c( “z”, “R”, “theta”)

#inits<-list(theta = 0.5);
#sample posterior:
Binomtstfit <- stan(file=“Binomtst.stan”, data = stanDat, pars = params, iter = 1000, chains = 1)


///Simple binomial testing adding variable N using a
// NG June 23.

data {
real<lower=0> N; //number of trials
int<lower=0> M; //number of successes
real<lower=0> T_Alpha;
real<lower=0> T_Beta;
real<lower=0> Z_Alpha;
real<lower=0> Z_Beta;

transformed data{
int<lower=1> NT;

parameters {
// real<lower=0, upper=1> theta; //probability of success
real<lower=0, upper=1> z;
real<lower=0, upper=1> theta;

transformed parameters{
real<lower=0> R;
R = N*z;
model {
// priors
theta ~ beta(T_Alpha,T_Beta);
z ~ beta(Z_Alpha, Z_Beta);
// likelihood:


I also tried a brute force variation. I created a variant of the .stan program where I first created the uncertain trial data in Matlab using N = 1000, z = betarnd(200,100,10000,1), K = N*z, then NT = poissrnd(K) just to insure that the NT are integers. This created 10000 integers between approximately 500 and 800. The in the .stan file I sampled the binomial likelihood using for(i = 1:10000) n~binomial(NT[i],theta).
This ran but the posterior on theta was mean 0.6 with a range between 0.5995 and 0.6005. so it simply did not sample multiple values of NT. I went so far as to run it using a warmup of 50000 and 1,000,000 iterations.



Thanks for looking at the paper. I will get back to you soon. I was pretty sure that I included the likelihood int the paper. But I’ll check. I did mis-speak about having used MCMC with Gibbs for that paper. Since it was a low dimension probability problem, I used a program “Sampling Weighted Likelihood” briefly characterized in the paper. This program draws samples from the prior probability distributions and uses them directly to calculate the likelihood (thus, not having to sample the posterior using mcmc) and bins the parameter values, scaling them to the sum of the bins at the end of the run. Broad, quasi-informative priors were placed on the parameters (described in the paper). But I will look at it again (soon) at reply in more detail.


Because the transformed data block of your Stan program does not define NT.


I may have just missed it. I was looking for an encapsulated definition rather than something I needed to piece together from the narrative.