Mixture with deterministic

Itamar_Caspi · November 13, 2019, 10:40am

Hi,

I am really struggling with the following issue:

Say that I want to model a stochastic process o_t that follows:

o_{t}=\left\{\begin{array}{l}{1 \text { with probability }(1-p)} \\ {U[2,10] \text { with probability } p}\end{array}\right.

where p\sim\text{Beta}(\alpha, \beta).

(The context: o_t is an “outlier” process that occurs once in every arbitrary N years, where N is a function of \alpha and \beta, see Stock and Watson (2015)).

How can I do this with Stan?

Thanks!

References

Stock, J. H., & Watson, M. W. (2019). Trend, Seasonal, and Sectoral Inflation in the Euro Area.

maxbiostat · November 13, 2019, 11:58am

The structure of your model seems closely related to hurdle models. Check this out and see if it helps you.

Itamar_Caspi · November 13, 2019, 1:48pm

Thanks. It looks like something similar, but I was unable to perfectly adjust it to my case. Anyway, it got me thinking so I tried to write something down. How about this?


data {
  int<lower=0> T;
  vector[T] y;
}
parameters {

  real mu;
  real<lower=0> sigma;
  real<lower=0, upper=1> prob;

  vector<lower=2, upper=10>[T] out;

}
model {
  
  prob ~ beta(2.5, 37.5);
  
  for (t in 1:T)
    out[t] ~ uniform(2, 10);
  
  for (t in 1:T)
    target += log_mix(prob,
                      normal_lpdf(y[t] | mu, out[t] * sigma),
                      normal_lpdf(y[t] | mu, sigma));
  
}

What I would really want is to get rid of the out[t] time series and just put it inside the target, e.g.,


for (t in 1:T)
    target += log_mix(prob,
                      normal_lpdf(y[t] | mu, uniform(2,10) * sigma),
                      normal_lpdf(y[t] | mu, sigma));

But it won’t run…

maxbiostat · November 13, 2019, 2:00pm

I don’t get where the normal_lpdf comes from. By monkey-see-monkey-doing the manual, we have something like

for (t in 1:T)
   if (out[t] == 1)
      1 ~ bernoulli(1-p);
    else {
      0 ~ bernoulli(p);
      out[t] ~ uniform(2, 10);
    }

where I have used the variable out for the o_t in your original post. The error model is a different thing. If you can explain the link between y_t and o_t we can probably move forward.

Itamar_Caspi · November 13, 2019, 2:22pm

Thanks for help!

OK, I’ll try to explain.

The model for y_t is

y_t\sim N(\mu, o_t\sigma)

where, as before,

o_{t}=\left\{\begin{array}{l}{1 \text { with probability }(1-p)} \\ {U[2,10] \text { with probability } p}\end{array}\right.

and p\sim\text{Beta}(\alpha, \beta).

This means that most of the time o_t=1 and thus the standard deviation of y_t is \sigma, and once in a while there is an outlier that multiplies \sigma by something between 2 and 10.

maxbiostat · November 13, 2019, 2:30pm

OK. But why do you want to get rid of out? For a first stab at the model, I’d just write everything out and try to fit the model. Remember, premature optimisation is the root of all evil.

Itamar_Caspi · November 13, 2019, 2:42pm

You mean why make out=1 most of the time? If so, the only reason is that I am trying to replicate an existing model where they do exactly that.

For the sake of clarity (?) I’ll just put my latest version of the model with your suggestion for modeling out[t] (the model is called the Stock and Watson univariate seasonal unobserved components stochastic volatility outlier adjusted (UCSVO) model…):


data {
  int<lower=0> T;
  vector[T] y;
}
parameters {

  vector[T]   ln_sigma2_eps;
  vector[T]   ln_sigma2_eta;
  vector[T]   ln_sigma2_seas;
  vector[T]   tau;
  vector[T+3] seas;
  vector<lower=2, upper=10>[T]   out;
  
  real<lower=0, upper=0.1> gamma_eps;
  real<lower=0, upper=0.1> gamma_eta;
  real<lower=0, upper=0.1> gamma_seas;
  
  real<lower=0, upper=1> prob;

}
transformed parameters{
  
  real seas_1_4;
  seas_1_4 = seas[1] + seas[2] + seas[3] + seas[4];
  
}
model {
  
  // priors
  ln_sigma2_eps[1]  ~ normal(0, 10^6);  
  ln_sigma2_eta[1]  ~ normal(0, 10^6);  
  ln_sigma2_seas[1] ~ normal(0, 10^6);  
  tau[1]            ~ normal(0, 10^6);  
  seas_1_4          ~ normal(0, 10^6);
  gamma_eps         ~ uniform(0,0.1);
  gamma_eta         ~ uniform(0,0.1);
  gamma_seas        ~ uniform(0,0.1);
  prob              ~ beta(2.5, 37.5);
  
  for (t in 2:T)
    ln_sigma2_eps[t] ~ normal(ln_sigma2_eps[t-1], gamma_eps);
  
  for (t in 2:T)
    ln_sigma2_eta[t] ~ normal(ln_sigma2_eta[t-1], gamma_eta);
  
  for (t in 2:T)
    ln_sigma2_seas[t] ~ normal(ln_sigma2_seas[t-1], gamma_seas);
    
  for (t in 5:T+3)
    seas[t] ~ normal(-seas[t-1]-seas[t-2]-seas[t-3]-seas[t-4], exp(0.5*ln_sigma2_seas[t-3]));  
    
  for (t in 1:T)
   if (out[t] == 1)
      1 ~ bernoulli(1-prob);
    else {
      0 ~ bernoulli(prob);
      out[t] ~ uniform(2, 10);
    }
  
  for (t in 2:T)
    tau[t] ~ normal(tau[t-1], exp(0.5*ln_sigma2_eps[t]));
  
   for (t in 2:T)
    y[t] ~ normal(tau[t] + seas[t+3],  out[t] * exp(0.5*ln_sigma2_eta[t]));
    
  
}

maxbiostat · November 13, 2019, 2:47pm

Not at all. I meant why get rid of the out variable in the program. Stuff like

for (t in 1:T)
    target += log_mix(prob,
                      normal_lpdf(y[t] | mu, uniform(2,10) * sigma),
                      normal_lpdf(y[t] | mu, sigma));

would cut corners, but as you’ve found, this is not valid syntax.

Now, questions: (i) does this model run? (ii) if yes, how do the diagnostics look? The priors are BUGS-style, extremely flat priors that would, under most circunstances, induce terrible posterior geometry. I know you’re just trying to reproduce previous analyses, but it might be that these priors just render the model unrunnable.

Itamar_Caspi · November 13, 2019, 3:10pm

I agree. I shouldn’t get rid of it since I’d like to examine it as well. Your approach is much better.

The model runs. Here are some diagnostics (for a rather small number of iter):

I follow the same priors that appear in the original study. How would you change them in order to improve?

Btw, this is how out[t] looks like:

I’d expect it to be most of the time 1 and occasionally something larger. Any ideas on why this happens?

maxbiostat · November 13, 2019, 3:20pm

Can you also show if there are any divergences or exceedances of maximum tree depth? Neff is also important to look at.

Hard to give a definitive answer without domain knowledge, but just decreasing the standard deviations from a million to about 10 would probably be sensible.

No idea. What I recommend is to simplify your model a bit more, simulate some synthetic data and then fit the model to this data. If you simulate under the regime where it’s 1 most of the time and then can’t recover that, it signals problems.

Itamar_Caspi · November 13, 2019, 3:43pm

I’ll try do to this with a larger number of iterations,

A prior of 10 sounds reasonable. Never really understood why do 10^6…

The code that you’ve suggested for out[t] reflects such a process? The posterior mean for p that I get is 0.6, which means that I should see at least some 1s, but I see none.

maxbiostat · November 13, 2019, 3:52pm

This suggests a bug. It should be 1 about 4 times out 10. I think I know what’s going on. You need

real<lower=1> out[T]

nhuurre · November 13, 2019, 4:35pm

Not going to work. The manual example assumes that out[] is known and discrete.
@Itamar_Caspi’s log_mix() model is the right way to do this with an unknown out[]. In this case the estimate out[t] is the value of the multiplier conditional on that year being an outlier. The probability that a given year is an outlier can be calculated later

generated quantities {
  vector[T] prob_outlier;
  for (t in 1:T) {
    real lp_ok = log(1-prob) + normal_lpdf(y[t]| mu[t], sigma);
    real lp_out = log(prob) + normal_lpdf(y[t]| mu[t], out[t]*sigma);
    prob_outlier[t] = exp(lp_out - log_sum_exp(lp_ok, lp_out));
  }
}

Itamar_Caspi · November 14, 2019, 6:34am

Good to know! Thanks. Continuing this, would it be better to directly generate out[t] first using some sort of log_mix and then add it to y[t]? If so, how can I declare that one of the states in the log_mix is fixed at 1?

Awesome! thanks! Thanks to this piece of code I see that my p is to high - it flags too many observations as outliers.

nhuurre · November 14, 2019, 9:05am

I guess it’s not too surprising if the model thinks everything is an outlier–the only difference an outlier makes is 6x bigger sigma but sigma has a non-informative prior so it’s size is difficult to determine.

log_mix() is just a hack to work around the limitation that Stan cannot sample discrete parameters. It cannot directly generate anything.

I don’t think Stock and Watson think there’s something inherently discrete going on with inflation; the outlier process is just a way to prevent a few bad predictions from having a big effect on the estimates. A Gibbs sampler can mark surprising years as outliers and downweight them.
A more Stan-compatible solution is to use a long-tailed distribution; i.e. instead of

y[t] ~ normal(..., out[t] * ...);

it could be

y[t] ~ student_t(4, ..., ...);

Student-T for y can be thought of as drawing out[t] from an inverse chi distribution.

Topic		Replies	Views
Stock and Watson outliers Modeling	2	573	June 3, 2019
Mixture model in stan Modeling	5	1257	September 3, 2018
Mixture formula of discrete and continuous distributions Modeling	9	1332	July 20, 2018
Mixture model for unknown fraction of outliers Modeling	3	621	May 8, 2023
Best way to calculate the mean in a regression hurdle model Modeling mixture	7	125	February 25, 2025

Mixture with deterministic

References

Related topics