I’m using a percentage from one process “cancellation rate” to try and predict a result, “loss ratio” which is also a percentage.
Both have a lower bound of zero. Cancellation would have an upper bound of 1 with loss ratio having no upper bound.
I thought with the target, loss ratio, having a lower bound of zero and no upper bound, a cauchy distribution would cover it. But I’m getting predictions under 0 in the ppc (shown below).
data {
int<lower=1> n; // number of observations
vector<lower=0>[n] loss_ratio; //target
vector<lower=0>[n] cancellation; //monthly cancel rate
int<lower=1> p;
//vector<lower=0>[p] cancellation_ppc;
}
parameters {
real alpha; // intercept
real beta; // slope
real<lower=0> sigma; // scatter
}
transformed parameters {
// Any variable declared here is part of the output produced for draws.
vector[n] mu;
mu = alpha + beta * cancellation;
}
model {
// priors
alpha ~ normal(0,1); // prior for intercept
beta ~ normal(0, 1); // prior for cancellation rate
sigma ~ normal(0, 5); // prior for sigma
// likelihood
loss_ratio ~ cauchy(mu, sigma);
}
generated quantities {
vector[p] loss_ratio_ppc;
for (i in 1:p) {loss_ratio_ppc[i] = cauchy_rng(mu[i], sigma);}
}
My apologies as I’m still a novice at this.
You need to add the truncation for the Cauchy as in 4.2 Truncated data | Stan User’s Guide. Specifying a lower bound of 0 in the data block does not translate into a half Cauchy in the model block, that only applies in the parameters block.
Ok. So i changed my model to the following:
data {
int<lower=1> n; // number of observations
vector<lower=0>[n] loss_ratio; //target
vector<lower=0, upper=1>[n] cancellation; //monthly cancel rate
int<lower=1> p;
}
parameters {
real alpha; // intercept
real beta; // slope
real<lower=0> sigma; // scatter
}
transformed parameters {
// Any variable declared here is part of the output produced for draws.
vector[n] mu;
mu = alpha + beta * cancellation;
}
model {
// priors
alpha ~ normal(0,5); // prior for intercept
beta ~ normal(0, 5); // prior for cancellation rate
sigma ~ normal(0, 5); // prior for sigma
// likelihood
for (i in 1:n) {
loss_ratio[i] ~ normal(mu, sigma) T[0,];
}
}
generated quantities {
vector[p] loss_ratio_ppc;
for (i in 1:p) {loss_ratio_ppc[i] = normal_rng(mu[i], sigma) ;}
}
It looks better but i still see ppc samples being pulled below zero.
I tried to add the “T[0,]” to the generated quantities but that just throws an error. Not sure how to get it to quit sampling ppc under 0.
That’s b/c you need to make a normal_rng function that is constrained to positive values. See Truncated normal distribution - Wikipedia generating values from the truncated normal distribution.
1 Like
Isn’t that adding “T[0,]” constrain the normal_rng to positive values?
Only in the model block. Think of the generated quantities as a separate program. Stan does not have a built in truncated rng.
You can write the inverse transform sampling method or you can do rejection sampling (slower but easier to do).
functions {
real half_normal_rng(real mu, real sigma) {
real p = normal_cdf(0, mu, sigma); // cdf for bounds
real u = uniform_rng(p, 1);
return (sigma * inv_Phi(u)) + mu; // inverse cdf for value
}
}
...
generated quantities {
vector[p] loss_ratio_ppc;
for (i in 1:p) {loss_ratio_ppc[i] = half_normal_rng(mu[i], sigma) ;}
}
3 Likes