I am a previous \textsc{bugs} user. I have a moderate size data (650,000 2 dimensional samples). I aim to fit the distribution of its log ratio log(x/y). The size of data makes it infeasible to simulate using \textsc{bugs}.
By inspection, the distribution of ratio is like 7% zeros and the rest 93% follows a double exponential distribution with mean zero. The state is latent. So I aim to model the % and the scale of double exponential distribution. The model is very simple. However, the only difficulty I have is that \textsc{stan} does not support integer vector. So \texttt{int<lower=0,upper=1> x[N]; } would give me an error.
I have read the chapter latent discrete parameters but it does not help much.
I manage to get around with it by using a uniform distribution with a tight domain (-0.000005,0.000005) and a mixture model but it is a overkill I guess.
data {
int<lower = 0> N;
vector[N] y;
}
parameters {
real<lower=0, upper=.3> sigma1;
real<lower=0, upper=1> theta;
}
model {
for (n in 1:N)
target += log_mix(theta,
double_exponential_lpdf(y[n] | 0, sigma1),
uniform_lpdf(y[n] | -0.005, 0.005));
}
Thx for your help! You code indeed compiles but it gives a result of theta=0. I am not sure why. However, it seems that you also think I should fit a mixture model. I will still use my ‘overkill’
data {
int<lower = 0> N;
vector[N] y;
}
parameters {
real<lower=0, upper=.3> sigma1;
real<lower=0, upper=1> theta;
}
model {
for (n in 1:N)
target += log_mix(theta,
double_exponential_lpdf(y[n] | 0, sigma1),
uniform_lpdf(y[n] | -0.0005, 0.0005));
}
The idea here is that, in the dataset, the only sample between -0.0005, 0.0005 are exactly zero.
In the manual there is a zero inflated Poisson model. What I need is a zero inflated double exponential model. The fact double_exponential is continuous makes this problem a bit messy.
I believe the mixture approach as proposed by @bbbales2 in the 2nd post should work with a small correction. When the observation model is a mixture of a continuous component and a discrete component, the likelihood needs to be defined so that p(y \mid \sigma,\theta) evaluated at y=0 is just Pr(y=0 \mid \sigma, \theta), which in turn equals 1-\theta (conditionally on drawing from the continuous part of the mixture, getting exactly 0 has probability 0). (Attempt at theoretic justification at the end of this post). So, defining the parameters/priors as in @cc20002002’s latest post, we get
(In the real model, you probably want a weakly-informative prior rather than hard constraints for \sigma). I tested this with simulated data with N=10000,\sigma=0.1,\theta=0.93 and seemed to recover correct parameters (no other tests done).
Theory
The probability distribution of y (given parameters) does not have a density wrt. the Lebesgue measure (“continuous pdf”) nor a density wrt. the counting measure on integers (“discrete pmf”). We define the likelihood as the density with respect to a reference measure that is the sum of the Lebesgue measure and an atom of measure 1 at 0. When the density-of-interest is written as a mixture of the continuous component and the zero-component,
p(y) = \theta\,p_1(y) + (1-\theta)p_2(y),
the density of the continuous component (against our reference measure!), p_1(y), is 0 at y=0 and equals the “plain” double-exponential-pdf at any y\neq 0. The discrete component p_2(y) is 1 at y=0 and 0 elsewhere.