Hi,
what I am basically trying to do is to model a process in which I observe Y which is a convolution of c negative binomials:
Y=Y_1+...+Y_c ,
where each random variable Y_i \sim NB(\alpha_i, p_i) .
As described in this Paper (https://www.sciencedirect.com/science/article/abs/pii/S0167715206002185), the distribution of Y \sim NB(\alpha + K, p_{max} ) is "mixture negative binomial with mixture parameter \alpha + K and p_{max}", where
\alpha = \alpha_1 + ... + \alpha_c
p_{max} = max_i(p_i)
K is an integer random variable with pmf:
pr_k = R\delta_k, k=0,1…
where:
q_i = 1-p_i
R = \displaystyle\prod_{j=1}^c \left( \frac{p_{max} q_j}{q_{max}p_j} \right)
\delta_{k+1} = \frac{1}{k+1} \displaystyle\sum^{k+1}_{i=1} i \xi_i \delta_{k+1-i} , with \delta_0 = 0
and \xi_i = \displaystyle\sum_{j=1}^n \frac{\alpha_j(1-q_{max}p_j / q_jp_{max})^i}{i}
The stan model I managed to do so far is the following. The problem I encounter thereby is that I have the unkown k and y_i, i=1,...,c (for each observation n) that need to be integer values, but in the parameter or transformed parameter block (the way it is written now in the model) this is not possible.
I know that this has already been asked before, but the answers aren’t applicable to my model.
Is there a specific reason this is not allowed?
And might there be a work around for my model?
functions{
// functions for the pmf of the random variable K
real R(real[] alpha_c, real[] prob_c, real prob_m){
real r = 0;
real beta_m = prob_m / (1-prob_m);
for(j in 1:size(prob_c)){
r = r * ( (((1-prob_c[j]) / (prob_c[j])) * beta_m)^(-alpha_c[j]));
}
return r;
}
real epsilon(int i, real[] alpha_c, real[] prob_c, real prob_m){
real e;
real prob_m_s;
real beta_cj;
e = 0;
prob_m_s = (1-prob_m) / prob_m;
for(j in 1:size(prob_c)){
beta_cj = prob_c[j] / (1-prob_c[j]);
e += (alpha_c[j] * (1 - ( prob_m_s * beta_cj ))^i ) / i;
}
return e;
}
real delta(int k, real[] alpha_c, real[] prob_c, real prob_m);
real delta(int k, real[] alpha_c, real[] prob_c, real prob_m){
if( k == 0) {
return 1;
}else{
real d = 0;
for(i in 1:k){
d += i * epsilon(i, alpha_c, prob_c, prob_m ) * delta( k - i, alpha_c, prob_c, prob_m);
}
d = d / k;
return d;
}
}
// log pmf for K
real param_k_log(int k, real[] alpha_c, real[] prob_c, real prob_m){
real p;
p = R(alpha_c, prob_c, prob_m) * delta(k ,alpha_c, prob_c, prob_m);
return log(p);
}
}
data {
int N; // # of trials
int y[N]; // # of failures in trial n
int <lower=1> C; // # random variables (cells) in each N
}
parameters{
positive_ordered[C] beta_c;
real<lower=0, upper=1000> alpha_c[C];
simplex [C] y_unscaled[N];
}
transformed parameters{
real<lower=0, upper=0.999> prob_c[C]; // probability for each NB
int<lower=0>[C,N] y_c; // random variables y_1,...,y_c that sum up to y
real<lower=0, upper=0.999> prob_m;
real<lower=0> beta_m;
real<lower=0> alpha_m;
int<lower=0> k;
for(c in 1:C){
prob_c[c] = beta_c[c] / (beta_c[c] + 1);
for(n in 1:N){
y_c[c,n] = y_unscaled[n,c] * y[n];
}
}
prob_m = max(prob_c);
beta_m = prob_m / (1-prob_m);
alpha_m = sum(alpha_c);
}
model{
for( c in 1:C){
y_c[c] ~ neg_binomial(alpha_c[c], beta_c[c]);
}
k ~ param_k(alpha_c, prob_c,prob_m );
y ~ neg_binomial(alpha_m + k, beta_m);
}
Data can be generated in R with the following code:
y <- integer(N)
C <- 2
N <- 1000
alpha <- c(20,100)
prob <- c(0.9,0.8)
for(c in 1:C){
y <- y + rnbinom(n=N, size=alpha[c], prob = prob[c] )
}
I would very much appreciate any help, I have been stuck with this for quite a while.
Thanks!