Hi,
I have derived a discrete probability distribution for the variable z=\{0, 1, ... 2S-1, 2S\} and I have data which is an estimate of y_i=\frac{z_i}{2S}. However, due to the noise introduced when the data is gathered, y_i is a continuous variable between 0 and 1. I am having trouble formulating this model correctly in Stan, my first thought was to take the logit transform of y_i and then add normal noise to logit(\frac{k}{S}), however I think this fails when z=0,2S.
The probability distribution I have derived is:
p(Z=z | \mu , \gamma, \lambda, S) = \sum_{x=0}^{z} \binom{S}{x} \binom{S}{z-x} \frac{ \prod_{i=1}^{x}[(i-1)\lambda + (S-1)\mu] \prod_{j=1}^{S-x} [(j-1)\lambda + (S-1)\gamma] \prod_{k=1}^{z-x} [(k-1)\lambda + (S-1)\mu] \prod_{l=1}^{S-(z-x)}[(l-1)\lambda + (S-1)\mu]}{\left( \prod_{m=1}^{S}[(m-1)\lambda + (S-1)(\mu + \gamma)] \right)^2}
I would like to estimate the parameters \mu , \gamma, \lambda and S.
My Stan code:
functions{
vector methylation_log_prob(real lambda, real mu, real gamma, int S){
vector[S+1] lprob_x;
vector[2*S+1] lprob_z;
for (x in 0:S){
lprob_x[x+1] = lchoose(S, x);
for (j in 1:S){
if (j <= x){
lprob_x[x+1] += log((j-1)*lambda + (S-1)*mu);
}
if (j <= S-x){
lprob_x[x+1] += log((j-1)*lambda + (S-1)*gamma);
}
lprob_x[x+1] += -log((j-1)*lambda + (S-1)*(mu+gamma));
}
}
for (z in 0:2*S){
if (z <= S){
vector[z+1] lterm;
for (x in 0:z){
lterm[x+1] = lprob_x[x+1] + lprob_x[z-x+1];
}
lprob_z[z+1] = log_sum_exp(lterm);
}
else{
vector[2*S-(z-1)] lterm;
for (x in z-S:S){
lterm[S-x+1] = lprob_x[x+1] + lprob_x[z-x+1];
}
lprob_z[z+1] = log_sum_exp(lterm);
}
}
return lprob_z;
}
real Mvalue_lpdf(vector M, real lambda, real mu, real gamma, real sigma, int S){
vector[2*S+1] lprob_z;
vector[num_elements(M)] LL;
vector[2*S+1] LL_z;
real frac;
real min_M;
real max_M;
real logit_frac;
min_M = min(M);
max_M = max(M);
lprob_z = methylation_log_prob(lambda, mu, gamma, S);
for (i in 1:num_elements(M)){
for (z in 0:2*S){
frac = z;
frac = frac / (2*S);
logit_frac = logit(frac);
LL_z[z+1] = normal_lpdf(M[i]| logit_frac, sigma) + lprob_z[z+1];
}
LL[i] = log_sum_exp(LL_z);
}
return sum(LL);
}
}
data {
int<lower=0> N ; // Number of Sites
vector<lower=0,upper=1>[N] y ; // Fraction methylated
int<lower=2> S; // Stem Cell Number
}
transformed data{
vector[N] M;
M = logit(y);
}
parameters {
real<lower=0> lambda; // Replacement rate
real<lower=0> mu; // Methylation rate
real<lower=0> gamma; // Demethylation rate
real<lower=0> sigma;
}
model {
lambda ~ normal(0, 5); // Prior
mu ~ normal(0, 5); // Prior
gamma ~ normal(0, 5); // Prior
sigma ~ normal(0, 5); // Prior
M ~ Mvalue(lambda, mu, gamma, sigma, S);
}
When I attempt to run this model, I get the error message -
Rejecting initial value:
Error evaluating the log probability at the initial value.
Exception: Exception: normal_lpdf: Location parameter is -inf, but must be finite! (in ‘unknown file name’ at line 46)
For reference, here is a histogram of my y_i values.