Count data modeling with discrete parameters

schen5 · July 7, 2017, 10:36pm

I am a new user of Stan, and am fitting a count data, which I do not know what the distribution is. But:

Y = 2^(z+1) - alpha,
where
alpha ~ Poisson(lambda[z+1]) (corrected 07/10, thanks Bob)
and
z ~ poisson(theta)

Y is the count data (chromosome), z is a parameter represents the stage of this sample. Most of z should be 0,1,2,3… so most of Y should be 2,4,8,16… but occasionally we have 3,5,6,7,…15… and some other numbers. (when alpha !=0 so there is missing of chromosome)

I am defining a customized function Poipoi for distribution of alpha (to marginalize integer z). But since I do not know the distribution of Y ( I didn’t figure out how to do solve this analytically ), I am still having the issue of this integer parameter z.

Any suggestion how to model this kind of data? Thank you so much in advance!

Here is my code. It’s kind of messed up but you will know what I am trying to do. Thank you!!

functions {
      real Poipoi_lpmf(int alpha, int Z, vector[] lambda, real theta){
        vector[Z] lp;
        for(z in 1:Z)
          lp[z] = poisson_lpmf(z | theta) + poisson_lpmf(alpha | lambda[z]);
        return log_sum_exp(lp);
      }
    }

    data {
      int N;// total number of samples
      int Z; // total number of categories
      vector[N] Y; //number of  count
    }

    parameters {
      real<lower=0> theta;  //categorical distribution parameters (with length)
      vector<lower=0>[Z] lambda;
      int z[N];  // I know this is wrong
    }

    transformed parameters{
      vector<lower=0>[N] alpha;
      for (n in 1:N)
        alpha[n] = 2^(z[n]+1)-Y[n]
    }

    model {
      for(n in 1:N)
        alpha[n] ~ Poipoi(Z, lambda[z[n]+1], theta);
    }

Bob_Carpenter · July 10, 2017, 9:33pm

I’m afraid I don’t quite know what you’re trying to do. You probably want to rethink in terms of where the error’s coming in from sampling or measurement. I have no idea what alpha is—presumably some latent count? And then theta comes out of the blue in your math and has no explanation.

You can’t have integer parameters—they need to be marginalized out. If you marginalize it out, you don’t need to include it as a parameter. The manual has a chapter on latent discrete parameters with three fully worked examples.

It helps if you can write out the density with latent discrete parameters, then you have an explicit form to marginalize. For instance, if z was really an integer indicating which mixture component something is drawn from, then we might have something like this:

for (n in 1:N) {
  z[n] ~ categorical(theta);
  alpha[n] ~ poisson(lambda[z[n]]]);
}

schen5 · July 10, 2017, 10:27pm

Thank you Bob! And yes, I figured that my problem description was probably not clear…
The model itself is exactly as what you wrote on the code (despite that z is poisson in my model, this is because most of the case, z is zero, since Y (observed) is usually 2) – and that’s why I named it (distribution of alpha) “Poipoi” distribution.

The problem is that I do not know the distribution of Y. I was thinking to transfer Y to alpha, since alpha~Poipoi. But alpha = 2^(z+1) -Y, and z is the discrete parameter, and I am not sure how I should marginalize it (I mean this part of the expression)

Again, any advise would be highly appreciated !
Also, please let me know if I made the question clear, thank you!

Bob_Carpenter · July 10, 2017, 10:33pm

If z is truly Poisson, you can’t fit the model exactly in Stan and need to use something like the approximation you used (put a finite upper bound on z so that you can marginalize).

If you can write the density out, we can help you marginalize. So far, I still don’t understand what you’ve written down as a density at the front of your original post. There’s a Y that’s not used and I’m thinking maybe lambda(z) should be lambda[z]. What I’m looking for is:

p(Y, z, alpha, theta, lambda, ...) = ...

where you indicate the types of the variables and whether they’re data or parameters.

schen5 · July 10, 2017, 10:49pm

Thank you Bob!

Data description:

Y is the observed data (chromosome counts, so usually it is 2)
z (~Poisson) is the parameter, indicating the stats of that sample. if z = 0 (and alpha =0), then Y = 2 (Y = 2^0). But z can also be 1,2,or 3. If it is 3, it means there are a lot of duplicate of the chromosome in that sample. Its parameter is theta.
alpha (~Poisson) is the parameter, which indicate the amount of chromosome lost. Usually it is zero. Sometime you lose one chromosome, but it could be more than that.
lambda is the parameter for alpha, and since sample in each stages has their common lambda (lambda_1, lambda_2, lambda_3, lambda_4 for z = 0,1,2,3)

Further question

If z is truly Poisson, you can’t fit the model exactly in Stan and need to use something like the approximation

I do not understand why?? Is that because it only allows discrete? Do you suggest approximating with continuous valuables??

If you can write the density out, we can help you marginalize.

Thank you! I did not figure out the solution for the density of Y yet… In theory, is all density marginalizable??

And yes, that should be lambda[z+1]. (corrected the original post). Thank you for pointing out !

Bob_Carpenter · July 11, 2017, 3:32am

I just meant write the distribution down with the latent parameters—that’s why I included z.

Yes, in theory, any density is marginalizable as

p(a) = SUM_b p(a, b)

or in the continuous case as

p(a) = INTEGRAL p(a, b) d.b

They’re just not all marginalizable in practice. When the sum is infinite, the loop will never terminate.

schen5 · July 11, 2017, 3:22pm

Thank you again Bob! So in this case, do you think it is not marginalizable?
Also I would appreciate if anyone has any suggestion on alternative approach to specify the model ( I am quite new in bayesian modeling and will be grateful for any comments.) Thank you!

Bob_Carpenter · July 13, 2017, 5:59pm

The usual thing to use a continuous z[n]. For instance, the beta-binomial distribution uses a continuous latent parameter with a gamma distribution and then plugs that into the Poisson.

Topic		Replies	Views
How to implement a poisson count model into stan? Modeling rstan , fitting-issues , specification , poisson	1	483	January 25, 2023
Mixed discrete-continuous parametre Modeling	4	674	December 27, 2018
Modeling problem when N obeys Poisson distribution in binomial distribution Modeling hierarchical-model	9	766	September 30, 2021
Sampling distribution for SIR Model, non integer count data Modeling fitting-issues	4	765	February 25, 2022
Help with Poisson model Modeling loo	45	7604	March 3, 2018

Count data modeling with discrete parameters

Related topics