Regime Switching Model

Hi,

First, let me say that I am a beginner with Stan and I find it amazing both from an ease of use and documentation perspective.

I am trying to fit a 4 state regime switching model to some market data, and I am getting divergent transitions and very low n_eff values. The diagnostic charts in ShinyStan show that the chains are not mising well. I read the Workflow guide below and I understand that I need to change the parametrisation in my model, but I don’t really understand why the new parametrisation fixes the issue in the example.
https://betanalpha.github.io/assets/case_studies/rstan_workflow.html

I would welcome any comment on my implementation of the model, and any suggestion on how to fix my divergence issues.

To briefly describe my model, the value of r_t is distributed according to \mathcal{N}( \mu_i, \sigma_i), with i \in \{1,2,3,4\}. The rows of the transition matrix P are distributed according to a dirichlet distribution. Finally, there are constraints on the signs of the \mu_i. \Pi is the vector of unconditional probability of each state, and I try to impose constraints on average returns in states 1,2 and 3,4.

data {
  int<lower=1> T;                   // number of observations (length) - time series of returns
  real r[T];                        // weekly return
  
  // 1-2 bear regime
  // 3-4 bull regime
  real m[4];  // mu prior mean m = c(-0.7, 0.2, -0.2,0.3)
  real n[4];  // mu prior sd n = c(1, 1, 1, 1)
  real v[4];  // sigma2 prior v = c(1,1,1,1)
  real s[4];  // sigma2 prior s = c(1,1,1,1)
  
  vector<lower=0>[3] alpha[4];  // transit prior rows
  // {p11, p12, p14} ~ Dir(8, 1.5, 0.5), 
  // {p21, p22, p24} ~ Dir(1.5, 8, 0.5), 
  // {p31, p33, p34} ~ Dir(0.5, 8, 1.5), 
  // {p41, p43, p44} ~ Dir(0.5, 1.5, 8).
  
  simplex[4] s0;             // inital state
}

parameters {
  simplex[3] reppp[4];       // transit probs row i 
  real<upper=0> mu1;// bear market
  real<lower=0> mu2;// bear rally
  real<upper=0> mu3;// bull correction
  real<lower=0> mu4;// bull market
  real<lower=0> sigma2[4];   // sd
  // simplex[4] s0;             // inital state
}

transformed parameters {
  vector[4] eta[T]; // First, let’s define the likelihood contribution of an observation r[t] under each state
  simplex[4] si[T];  // probability of being in each state
  vector[T] f;      // likelihood
  
  vector[4] PI;                        // unconditional state probs
    
  matrix[4,4] P;           // transit probs
  real mu_bull;
  real mu_bear;

  // put together the transition matrix
   for (j in 1:4){
     P[j, 1] = reppp[j,1];
     if ( j == 1 || j==2 ){
       P[j, 2] = reppp[j,2];
       P[j, 3] = 0;
     } else {
       P[j, 2] = 0;
       P[j, 3] = reppp[j,2];
     }
     P[j, 4] = reppp[j,3];
   }

  
  // fill in etas (likelihood contribution of an observation r[t] under each state)
  {
    real mu[4];                // expected returns
    
    mu[1] = mu1;
    mu[2] = mu2;
    mu[3] = mu3;
    mu[4] = mu4;
   
    for(t in 1:T) {
      for(j in 1:4 ){
        eta[t,j] = exp(normal_lpdf(r[t] | mu[j], sigma2[j]));
      }
    }
  }
  
   // work out likelihood contributions
   f[1] = dot_product( P*s0, eta[1]);
   si[1] = ((P*s0) .* eta[1]) / f[1];
   
   // and for the rest
   for(t in 2:T) {
     f[t] = dot_product( P*si[t-1], eta[t]);
     si[t] =  ((P*si[t-1]) .* eta[t]) / f[t];
   }
  
  // unconditional probabilities PI
  {
    matrix[4,5] A;
    vector[5] ee;

    matrix[4,4] AAprime;
    vector[4] AE;

    A = append_col(P - diag_matrix(rep_vector(1., 4)),rep_vector(1., 4));
    ee = rep_vector(0., 5);
    ee[5] = 1.0;
    // PI = (A' * A) \ (A' * ee);
    AAprime = tcrossprod(A);
    AE = A * ee;
    PI = AAprime \ AE;

    mu_bear = PI[1]/(PI[1]+PI[2])*mu1 + PI[2]/(PI[1]+PI[2])*mu2;
    mu_bull = PI[3]/(PI[3]+PI[4])*mu3 + PI[4]/(PI[3]+PI[4])*mu4;
  }
  
}


// T observations (length)
model {
  for (kk in 1:4){
    reppp[kk] ~ dirichlet(alpha[kk]);  // Sample the transition probability
    sigma2[kk] ~ cauchy( v[kk], s[kk] );
  }
  mu1 ~ normal( m[1], n[1] );
  mu2 ~ normal( m[2], n[2] );
  mu3 ~ normal( m[3], n[3] );
  mu4 ~ normal( m[4], n[4] );
  
  // likelihood is really easy here!
  target += sum(log(f));

  if( mu_bull < 0){ target += negative_infinity(); }
  if( mu_bear > 0){ target += negative_infinity(); }
  
}

Howdy howdy! Lemme see if I have this straight.

This is a time series model with four states (bear market, bear rally, bull correction, and bull market).

At each time point you emit a real value (this is what is measured) and then transition to one of the other states or stay in your current state. You can’t get from every state to every other one, but that is encoded by only having simplex[3]s and not simplex[4]s.

The emissions are normally distributed and there are four mu/sigma parameters (a different mean and standard deviation for each state).

Before getting too far into your model, any chance your model is the one described here (search for “Alternatively, if the output is continuous” – it goes from page 2 to about page 6): https://github.com/luisdamiano/stancon18/blob/master/main_pdf.pdf? If so I’d start with that model.

There’s a Betancourt case study on the difficulties of using mixture models that is relevant here as well (Identifying Bayesian Mixture Models).

The couple things that stand out to me in the model you wrote are:

eta[t,j] = exp(normal_lpdf(r[t] | mu[j], sigma2[j]));

If you’re going to add these together and compute the log somewhere, this can lead to numerical problems (Google “logsumexp trick”). Sometimes these sorts of operations are necessary, but you try to package them up in log_sum_exp calls (this is available as a Stan function).

The other thing that scares me is:

  if( mu_bull < 0){ target += negative_infinity(); }
  if( mu_bear > 0){ target += negative_infinity(); }

Since mu_bull/mu_bear are functions of parameters, there’s definitely going to be a discontinuity in the log density around mu_bull/mu_bear == 0. The log density needs to be first differentiable for Stan to work.

1 Like

That is exactly right, you are describing what I am trying to achive much better than me!

Actually I am trying to estimate this model:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.210.1636&rep=rep1&type=pdf

Thank you for the pointers, the paper you mentionned looks very close to what I am trying to achieve, so I will have a good read and get back to the drawing board.

Beside the numerical instability that I should be able to address, if I understand correctly, the challenge with my model is to make the parameters identifiable, otherwise the chains will not mix. This is achieved in your reference by using ordered mu parameters. In my case I am trying to impose different constraints using the mu_bear/mu_bull functions, but this introduces a discontinuitiy in the log density that Stan can not deal with.

Hi again, thanks for your advice, it was quite helpful.

I removed the discontinuity by changing the parameters of the model, and it seemed to help the estimation. I also followed the model in tutorial to remove the potential numerical problems, but I was still not happy with with the result. Finally, following the tutorial, I added initial values for some of the parameters and I got rhat values close to 1.

Another option for non-convergence is to reduce the randomization interval and/or reduce the initial stepsize and increase the target acceptance rate. That’ll produce more stable, albeit slower, estimates.