Strange behaviour when sampling

Hello all!

Im trying to do parameter estimation in a factor stochastic volatility model and I’m getting some strange behaviour when sampling. As in this question: Strange convergence behaviour, I can sometimes run the model and get good results and sometimes I get things like the plot below.
image

Any idea what may be the reason to this behaviour? Is is that, when it works, it does not explore the posterior good enough? In any case, is there anything one can do to get around this?
The code for the model is given below:

//2 dim factor model

data {
  int<lower=0> n; // Sample size
  matrix[2,n] y;  // data
}

parameters {
  vector[n] h_std; //standardized AR(1) for factor
  matrix[2,n] x_std; //standardized AR(1) idiosyncratic process
  vector[2] beta; //loading matrix (vector in this case)
  real<lower=0,upper=1> phi_h_std; 
  real<lower=0> sigma_h_std; 
  vector<lower=0,upper=1>[2] phi_x_std; 
  vector[2] mu_x; 
  vector<lower = 0>[2] sigma_x_std; 
}

transformed parameters {
  matrix[2,n] x; //indiosyncratic processes
  vector[n] h;   // factor volatility
  real<lower=-1,upper=1> phi_h; 
  real<lower=0> sigma_h;
  vector<lower=-1,upper=1>[2] phi_x; 
  vector<lower=0>[2] sigma_x; 
  
  phi_h = 2*phi_h_std - 1; 
  sigma_h = sqrt(sigma_h_std); 
  phi_x = 2*phi_x_std - 1; 
  sigma_x = sqrt(sigma_x_std);

  //Scale with standard deviation
  //Stationary assumption on h
  h = h_std * sigma_h;
  h[1] = h[1]/sqrt(1 - square(phi_h));

  //Scale with standard deviation
  //Stationary assumption on x
  for(t in 1:2){
    x[t,] = x_std[t,]*sigma_x[t];
    x[t,] = x[t,] + mu_x[t];
    x[t,1] = x[t,1]/sqrt(1 - square(phi_x[t]));
    //print("x[,t]:",x[,t]);
  }
  for(t in 2:n){
    h[t] = h[t] + phi_h*h[t-1];
    for(i in 1:2){
      x[i,t] = x[i,t] + phi_x[i]*(x[i,t-1] - mu_x[i]);
    }
  }

}

model {
  //Prior parameters 
  real B_beta = 1; 
  real b_mu = 0; 
  real B_mu = 10; 
  real a0 = 10; 
  real b0 = 3; 
  real B_sigma = 1;
  vector[2] null; 
  matrix[2,2] Sigma; //Covariance matrix for observations as function of parameters
  null[1] = 0; 
  null[2] = 0; 


  //Priors
  sigma_h_std ~ gamma(0.5,1/(2*B_sigma)); 
  sigma_x_std ~ gamma(0.5,1/(2*B_sigma));
  mu_x ~ normal(b_mu,B_mu);
  phi_h_std ~ beta(a0,b0); 
  phi_x_std ~ beta(a0,b0); 
  
  beta ~ normal(null,B_beta); 
  
  
  h_std ~ normal(0,1);
  x_std[1,] ~ normal(0,1); 
  x_std[2,] ~ normal(0,1); 
  
  //Constribution from observations
  for(i in 1:n){
    Sigma[1,1] = square(beta[1])*exp(h[i]) + exp(x[1,i]); 
    Sigma[1,2] = beta[1]*beta[2]*exp(h[i]); 
    Sigma[2,2] = square(beta[2])*exp(h[i]) + exp(x[2,i]); 
    Sigma[2,1] = Sigma[1,2]; 
    y[,i] ~ multi_normal(null,Sigma);
  }
}

generated quantities{
    matrix[2,n] y_new; 
    matrix[2,2] Sigma; //Covariance matrix for observations as function of parameters
    vector[2] null; 
    null[1] = 0; 
    null[2] = 0; 
  for(i in 1:n){
    Sigma[1,1] = square(beta[1])*exp(h[i]) + exp(x[1,i]); 
    Sigma[1,2] = beta[1]*beta[2]*exp(h[i]); 
    Sigma[2,2] = square(beta[2])*exp(h[i]) + exp(x[2,i]); 
    Sigma[2,1] = Sigma[1,2];
  y_new[,i] = multi_normal_rng(null,Sigma); 
  }
}

I hope this makes sense. Thanks in advance!

Jens

Looks like your model is multi modal. Step one should be figuring out if both modes are meaningful or if one of them is an artifact. For example in disease transmission models sometimes you get one mode that describes immediate extinction. If you’re interested in the probability that an epidemic starts you can’t ignore it, but if you’re interested in parameter estimates conditional on non-extinction you could (and then fiddle with the model to avoid multi-modality

1 Like

Thanks for the advice. All the parameters (phi_x[2], mu_x[2] and sigma_x[2]) are parameters in an AR(1) process. It seems like process is collapsing (phi goes to one), so if it the model is multi modal, this mode does note make sense. I’ve tried different priors for the phi (which is one of the things I want to compare - effect of different priors put on phi). If one want to ignore it, how do you do it? Just discard the samples where it changes?

If you have chains that stay in a single node you can always run a bunch of chains and discard the ones that end up in the wrong node (just keep in mind that it’s a conditional posterior). If individual chains are jumping I would consider changing the priors. With AR models in particular this is hard because nearly-non-stationary models are very useful. The literature on AR has some discussions about reparameterizing nearly-non-staionary models s.t. they are easier to fit, but I don’t have references on hand and last time I tried it I didn’t work out anything that was really generically applicable. You might want to look into why you’re getting multi-modality if your model is nearly non-stationary.

I get the same behaviour even with very strong priors, and it seems that when phi jumps very close to 1, I get divergence. As I understand from this question https://groups.google.com/forum/#!topic/stan-users/tK4axcrIKig, there is no way to chose a subset of chains?
Could it also be that Stan manages to explore the posterior better than earlier estimation methods, and therefore gets into these “hard” areas?

The strong priors may be your problem, or it may be that you are using strong but heavy-tailed priors, idk. It’s worth figuring out which set of parameters creates a model that fits the data better (even in-sample).

Thanks for all the advice. I really appreciate it. I apologize for all the basic questions. I’m still very new to Bayesian statistics and hmc in particular, so this is very helpful.

1 Like

Correct. You want to parameterize the model so that you get the same result in multiple chains reliably to rule out the possibility that you’re getting biased estimates from not exploring the posterior thoroughly enough.

You don’t want to set the prior parameters in the model—that makes them vaiables and then they require autodiff. Put them in the transformed data block.

Those beta priors you have aren’t particularly strong if that’s what you meant by string priors. It’s the weight of about 11 prior observations.

You might find it easier to work on the log-odds scale directly, where you can start putting down stronger priors more reasonably. E.g.,

phi_h_log_odds ~ normal(logit(0.3), 1);
phi_h_std = inv_logit(phi_h_log_odds);

Hello Bob! Thanks for the response.

I have changed the way I handle the prior parameters. This is now a vector in the data section vector<lower = 0>[6] priors. The prior I am most interested in is the prior of the phi’s. Usually in these models they let (phi + 1)/2 ~ Beta(20,1.5), but I want to test weaker priors (in the code above its (10,3)).

I’m sorry, but I don’t quite understand why you would like to work on the log-odds scale. Do you mean that it is easier to put stronger/weaker priors on that scale scale rather that to phi_h_std, which is defined on (0,1)?

Yes—there are a lot more choices on the (-infinity, infinity) scale than on (0, 1). It’s also easier to generalize if you have predictors or hierarchical structure.