Mixture with deterministic

Hi,

I am really struggling with the following issue:

Say that I want to model a stochastic process o_t that follows:

o_{t}=\left\{\begin{array}{l}{1 \text { with probability }(1-p)} \\ {U[2,10] \text { with probability } p}\end{array}\right.

where p\sim\text{Beta}(\alpha, \beta).

(The context: o_t is an “outlier” process that occurs once in every arbitrary N years, where N is a function of \alpha and \beta, see Stock and Watson (2015)).

How can I do this with Stan?

Thanks!

References

Stock, J. H., & Watson, M. W. (2019). Trend, Seasonal, and Sectoral Inflation in the Euro Area.

The structure of your model seems closely related to hurdle models. Check this out and see if it helps you.

Thanks. It looks like something similar, but I was unable to perfectly adjust it to my case. Anyway, it got me thinking so I tried to write something down. How about this?


data {
  int<lower=0> T;
  vector[T] y;
}
parameters {

  real mu;
  real<lower=0> sigma;
  real<lower=0, upper=1> prob;

  vector<lower=2, upper=10>[T] out;

}
model {
  
  prob ~ beta(2.5, 37.5);
  
  for (t in 1:T)
    out[t] ~ uniform(2, 10);
  
  for (t in 1:T)
    target += log_mix(prob,
                      normal_lpdf(y[t] | mu, out[t] * sigma),
                      normal_lpdf(y[t] | mu, sigma));
  
}

What I would really want is to get rid of the out[t] time series and just put it inside the target, e.g.,


for (t in 1:T)
    target += log_mix(prob,
                      normal_lpdf(y[t] | mu, uniform(2,10) * sigma),
                      normal_lpdf(y[t] | mu, sigma));

But it won’t run…

I don’t get where the normal_lpdf comes from. By monkey-see-monkey-doing the manual, we have something like

for (t in 1:T)
   if (out[t] == 1)
      1 ~ bernoulli(1-p);
    else {
      0 ~ bernoulli(p);
      out[t] ~ uniform(2, 10);
    }

where I have used the variable out for the o_t in your original post. The error model is a different thing. If you can explain the link between y_t and o_t we can probably move forward.

Thanks for help!

OK, I’ll try to explain.

The model for y_t is

y_t\sim N(\mu, o_t\sigma)

where, as before,

o_{t}=\left\{\begin{array}{l}{1 \text { with probability }(1-p)} \\ {U[2,10] \text { with probability } p}\end{array}\right.

and p\sim\text{Beta}(\alpha, \beta).

This means that most of the time o_t=1 and thus the standard deviation of y_t is \sigma, and once in a while there is an outlier that multiplies \sigma by something between 2 and 10.

OK. But why do you want to get rid of out? For a first stab at the model, I’d just write everything out and try to fit the model. Remember, premature optimisation is the root of all evil.

You mean why make out=1 most of the time? If so, the only reason is that I am trying to replicate an existing model where they do exactly that.

For the sake of clarity (?) I’ll just put my latest version of the model with your suggestion for modeling out[t] (the model is called the Stock and Watson univariate seasonal unobserved components stochastic volatility outlier adjusted (UCSVO) model…):


data {
  int<lower=0> T;
  vector[T] y;
}
parameters {

  vector[T]   ln_sigma2_eps;
  vector[T]   ln_sigma2_eta;
  vector[T]   ln_sigma2_seas;
  vector[T]   tau;
  vector[T+3] seas;
  vector<lower=2, upper=10>[T]   out;
  
  real<lower=0, upper=0.1> gamma_eps;
  real<lower=0, upper=0.1> gamma_eta;
  real<lower=0, upper=0.1> gamma_seas;
  
  real<lower=0, upper=1> prob;

}
transformed parameters{
  
  real seas_1_4;
  seas_1_4 = seas[1] + seas[2] + seas[3] + seas[4];
  
}
model {
  
  // priors
  ln_sigma2_eps[1]  ~ normal(0, 10^6);  
  ln_sigma2_eta[1]  ~ normal(0, 10^6);  
  ln_sigma2_seas[1] ~ normal(0, 10^6);  
  tau[1]            ~ normal(0, 10^6);  
  seas_1_4          ~ normal(0, 10^6);
  gamma_eps         ~ uniform(0,0.1);
  gamma_eta         ~ uniform(0,0.1);
  gamma_seas        ~ uniform(0,0.1);
  prob              ~ beta(2.5, 37.5);
  
  for (t in 2:T)
    ln_sigma2_eps[t] ~ normal(ln_sigma2_eps[t-1], gamma_eps);
  
  for (t in 2:T)
    ln_sigma2_eta[t] ~ normal(ln_sigma2_eta[t-1], gamma_eta);
  
  for (t in 2:T)
    ln_sigma2_seas[t] ~ normal(ln_sigma2_seas[t-1], gamma_seas);
    
  for (t in 5:T+3)
    seas[t] ~ normal(-seas[t-1]-seas[t-2]-seas[t-3]-seas[t-4], exp(0.5*ln_sigma2_seas[t-3]));  
    
  for (t in 1:T)
   if (out[t] == 1)
      1 ~ bernoulli(1-prob);
    else {
      0 ~ bernoulli(prob);
      out[t] ~ uniform(2, 10);
    }
  
  for (t in 2:T)
    tau[t] ~ normal(tau[t-1], exp(0.5*ln_sigma2_eps[t]));
  
   for (t in 2:T)
    y[t] ~ normal(tau[t] + seas[t+3],  out[t] * exp(0.5*ln_sigma2_eta[t]));
    
  
}

Not at all. I meant why get rid of the out variable in the program. Stuff like

for (t in 1:T)
    target += log_mix(prob,
                      normal_lpdf(y[t] | mu, uniform(2,10) * sigma),
                      normal_lpdf(y[t] | mu, sigma));

would cut corners, but as you’ve found, this is not valid syntax.

Now, questions: (i) does this model run? (ii) if yes, how do the diagnostics look? The priors are BUGS-style, extremely flat priors that would, under most circunstances, induce terrible posterior geometry. I know you’re just trying to reproduce previous analyses, but it might be that these priors just render the model unrunnable.

I agree. I shouldn’t get rid of it since I’d like to examine it as well. Your approach is much better.

The model runs. Here are some diagnostics (for a rather small number of iter):

image

I follow the same priors that appear in the original study. How would you change them in order to improve?

Btw, this is how out[t] looks like:

image

I’d expect it to be most of the time 1 and occasionally something larger. Any ideas on why this happens?

Can you also show if there are any divergences or exceedances of maximum tree depth? Neff is also important to look at.

Hard to give a definitive answer without domain knowledge, but just decreasing the standard deviations from a million to about 10 would probably be sensible.

No idea. What I recommend is to simplify your model a bit more, simulate some synthetic data and then fit the model to this data. If you simulate under the regime where it’s 1 most of the time and then can’t recover that, it signals problems.

I’ll try do to this with a larger number of iterations,

A prior of 10 sounds reasonable. Never really understood why do 10^6…

The code that you’ve suggested for out[t] reflects such a process? The posterior mean for p that I get is 0.6, which means that I should see at least some 1s, but I see none.

This suggests a bug. It should be 1 about 4 times out 10. I think I know what’s going on. You need

real<lower=1> out[T] 

Not going to work. The manual example assumes that out[] is known and discrete.
@Itamar_Caspi’s log_mix() model is the right way to do this with an unknown out[]. In this case the estimate out[t] is the value of the multiplier conditional on that year being an outlier. The probability that a given year is an outlier can be calculated later

generated quantities {
  vector[T] prob_outlier;
  for (t in 1:T) {
    real lp_ok = log(1-prob) + normal_lpdf(y[t]| mu[t], sigma);
    real lp_out = log(prob) + normal_lpdf(y[t]| mu[t], out[t]*sigma);
    prob_outlier[t] = exp(lp_out - log_sum_exp(lp_ok, lp_out));
  }
}
1 Like

Good to know! Thanks. Continuing this, would it be better to directly generate out[t] first using some sort of log_mix and then add it to y[t]? If so, how can I declare that one of the states in the log_mix is fixed at 1?

Awesome! thanks! Thanks to this piece of code I see that my p is to high - it flags too many observations as outliers.

I guess it’s not too surprising if the model thinks everything is an outlier–the only difference an outlier makes is 6x bigger sigma but sigma has a non-informative prior so it’s size is difficult to determine.

log_mix() is just a hack to work around the limitation that Stan cannot sample discrete parameters. It cannot directly generate anything.

I don’t think Stock and Watson think there’s something inherently discrete going on with inflation; the outlier process is just a way to prevent a few bad predictions from having a big effect on the estimates. A Gibbs sampler can mark surprising years as outliers and downweight them.
A more Stan-compatible solution is to use a long-tailed distribution; i.e. instead of

y[t] ~ normal(..., out[t] * ...);

it could be

y[t] ~ student_t(4, ..., ...);

Student-T for y can be thought of as drawing out[t] from an inverse chi distribution.

2 Likes