Problems with a Cauchy Model

I’m using a percentage from one process “cancellation rate” to try and predict a result, “loss ratio” which is also a percentage.

Both have a lower bound of zero. Cancellation would have an upper bound of 1 with loss ratio having no upper bound.

I thought with the target, loss ratio, having a lower bound of zero and no upper bound, a cauchy distribution would cover it. But I’m getting predictions under 0 in the ppc (shown below).

data {
    int<lower=1> n;           // number of observations
    vector<lower=0>[n] loss_ratio; //target
    vector<lower=0>[n] cancellation;  //monthly cancel rate
    int<lower=1> p;
    //vector<lower=0>[p] cancellation_ppc;
}

parameters {
    real alpha;               // intercept
    real beta;                // slope
    real<lower=0> sigma;      // scatter
}

transformed parameters {
    // Any variable declared here is part of the output produced for draws.
    vector[n] mu;
    mu = alpha + beta * cancellation;
}

model {
    // priors
    alpha ~ normal(0,1);  // prior for intercept
    beta ~ normal(0, 1);     // prior for cancellation rate
    sigma ~ normal(0, 5);    // prior for sigma
    // likelihood
    loss_ratio ~ cauchy(mu, sigma);
}

generated quantities {
    vector[p] loss_ratio_ppc;
    for (i in 1:p) {loss_ratio_ppc[i] = cauchy_rng(mu[i], sigma);}
}

My apologies as I’m still a novice at this.

You need to add the truncation for the Cauchy as in 4.2 Truncated data | Stan User’s Guide. Specifying a lower bound of 0 in the data block does not translate into a half Cauchy in the model block, that only applies in the parameters block.

Ok. So i changed my model to the following:

data {
    int<lower=1> n;           // number of observations
    vector<lower=0>[n] loss_ratio; //target
    vector<lower=0, upper=1>[n] cancellation;  //monthly cancel rate
    int<lower=1> p;
}

parameters {
    real alpha;               // intercept
    real beta;                // slope
    real<lower=0> sigma;      // scatter
}

transformed parameters {
    // Any variable declared here is part of the output produced for draws.
    vector[n] mu;
    mu = alpha + beta * cancellation;
}

model {
    // priors
    alpha ~ normal(0,5);  // prior for intercept
    beta ~ normal(0, 5);     // prior for cancellation rate
    sigma ~ normal(0, 5);    // prior for sigma
    // likelihood
    for (i in 1:n) {
    loss_ratio[i] ~ normal(mu, sigma) T[0,];
}
}

generated quantities {
    vector[p] loss_ratio_ppc;
    for (i in 1:p) {loss_ratio_ppc[i] = normal_rng(mu[i], sigma) ;}
}

It looks better but i still see ppc samples being pulled below zero.

I tried to add the “T[0,]” to the generated quantities but that just throws an error. Not sure how to get it to quit sampling ppc under 0.

That’s b/c you need to make a normal_rng function that is constrained to positive values. See Truncated normal distribution - Wikipedia generating values from the truncated normal distribution.

1 Like

Isn’t that adding “T[0,]” constrain the normal_rng to positive values?

Only in the model block. Think of the generated quantities as a separate program. Stan does not have a built in truncated rng.

You can write the inverse transform sampling method or you can do rejection sampling (slower but easier to do).

functions {
real half_normal_rng(real mu, real sigma) {
    real p = normal_cdf(0, mu, sigma);  // cdf for bounds
    real u = uniform_rng(p, 1);
    return (sigma * inv_Phi(u)) + mu;  // inverse cdf for value
}
}
...
generated quantities {
  vector[p] loss_ratio_ppc;
    for (i in 1:p) {loss_ratio_ppc[i] = half_normal_rng(mu[i], sigma) ;}
}

3 Likes