Model Parameters highly correlated and do not converge well

Yifan_Zhang · February 22, 2024, 9:49pm

[edit: escaped Stan program]

Hello, here I am trying to fit a dual-rate motor adaptation model using rstan. The model has 6 parameters, retention rates: Af, As. Learning rates: Bf, Bs. And state noise and execution noise standard deviation.
It also has 2 hidden states - a fast state and a slow state, the constraint is such that As > Af and Bf > Bs, and As, Af, Bs, Bf belongs to (0,1). The noise are believed to be gaussian center at 0, with std >0.
y is the observed data, and p-y is the error term associated with learning rate, nofeedback is just an indictor for whether the error is present.

Here is my model:

data {
  int<lower=1> NSubjects;
  int<lower=1> TrialEnd[NSubjects];
  int<lower=1> max_length;
  matrix[max_length, NSubjects] y;
  matrix[max_length, NSubjects] p;
  matrix[max_length, NSubjects] nofeedback;
}

parameters {
  vector<lower=0, upper=1>[NSubjects] Af;
  vector<lower=0, upper=1>[NSubjects] Bs;
  vector<lower=0, upper=1>[NSubjects] delta_A; // Ensuring a minimum difference
  vector<lower=0, upper=1>[NSubjects] delta_B; // Ensuring a minimum difference
  vector<lower=0>[NSubjects] execute_noise;
  vector<lower=0>[NSubjects] state_noise;
}

transformed parameters {
  vector<lower=0, upper=1>[NSubjects] As;
  vector<lower=0, upper=1>[NSubjects] Bf;

  for (i in 1:NSubjects) {
    As[i] = Af[i] + delta_A[i]; // As is now guaranteed to be larger than Af
    Bf[i] = Bs[i] + delta_B[i]; // Bf is now guaranteed to be larger than Bs
  }
}

model {
  //  prior
  Af ~ normal(0.7, 0.1);
  Bs ~ normal(0.1, 0.1);
  delta_A ~ normal(0.2, 0.05); 
  delta_B ~ normal(0.2, 0.05); 
  execute_noise ~ cauchy(0, 2); 
  state_noise ~ cauchy(0, 2); 

  for (j in 1:NSubjects) {
    vector[max_length] fast;
    vector[max_length] slow;
    fast[1] = 0;
    slow[1] = 0;

    for (n in 2:TrialEnd[j]) {
      fast[n] = Af[j] * fast[n-1] + Bf[j] * (p[n-1,j] - y[n-1,j]) * (1 - nofeedback[n-1,j]);
      slow[n] = As[j] * slow[n-1] + Bs[j] * (p[n-1,j] - y[n-1,j]) * (1 - nofeedback[n-1,j]);
      y[n,j] ~ normal(fast[n] + slow[n], sqrt(execute_noise[j]^2 + state_noise[j]^2));
    }
  }
}

The problem is parameters dont converge well (no rhat <= 1.05) also I see a high correlation between Af and As, Bf and Bs(could be caused by the way I parameterize the constraint). Is there any way to make the model more robust? Thank you !!

Bob_Carpenter · February 22, 2024, 10:28pm

Sometimes when models don’t fit it’s because the model doesn’t capture the data well. You can often see this with probability mass piling up against constraints. I would suggest looking at variables like delta_A and delta_B which have upper bounds and then values like delta_A and delta_B to make sure the priors are consistent with the fit.

I didn’t understand the parameterization here (I rewrote the transformed parameters into a one-liner with the same behavior as the original):

parameters {
  vector<lower=0, upper=1>[NSubjects] Af;
  vector<lower=0, upper=1>[NSubjects] Bs;
  vector<lower=0, upper=1>[NSubjects] delta_A; // Ensuring a minimum difference
  vector<lower=0, upper=1>[NSubjects] delta_B; // Ensuring a minimum difference
  ...
transformed parameters {
  vector<lower=0, upper=1>[NSubjects] As = Af + delta_A;
  vector<lower=0, upper=1>[NSubjects] Bf = Bf + detlta_B;

I’m not sure if your doc or intent is wrong here. There’s no minimum difference between As and Af, because delta_A has a lower bound of 0. It will ensure there’s a non-zero difference as long as there’s not underflow in delta_A. Instead of this parameterization, I think it’d be clearer to write it this way:

parameters {
  vector<lower=0, upper=1>[NSubjects] Af;
  vector<lower=0, upper=1>[NSubjects] Bs;
  vector<lower=Af, upper = Af + 1>[NSubjects] As; // Ensuring a minimum difference
  vector<lower=Bs, upper=Bs + 1>[NSubjects] Bf; // Ensuring a minimum difference

Here the ordering gets reversed (Bf > Bs and As > Af). Was that intentional?

I also wasn’t sure what the intent was here:

  fast[n] = Af[j] * fast[n-1] + Bf[j] * (p[n-1,j] - y[n-1,j]) * (1 - nofeedback[n-1,j]);
  slow[n] = As[j] * slow[n-1] + Bs[j] * (p[n-1,j] - y[n-1,j]) * (1 - nofeedback[n-1,j]);
  y[n,j] ~ normal(fast[n] + slow[n], sqrt(execute_noise[j]^2 + state_noise[j]^2));

This is probably going to be very poorly identified. given the form. There are a lot of ways to make fast[n] bigger or smaller and same for slow[n] and then the normal distribution just combines them. So you are probably going to need to either reparameterize or find some other way to make the problem more identifiable.

You can get a substantial speedup in the final loop by converting this

for (n in 2:TrialEnd[j]) {
  fast[n] = Af[j] * fast[n-1] + Bf[j] * (p[n-1,j] - y[n-1,j]) * (1 - nofeedback[n-1,j]);
  slow[n] = As[j] * slow[n-1] + Bs[j] * (p[n-1,j] - y[n-1,j]) * (1 - nofeedback[n-1,j]);
  y[n,j] ~ normal(fast[n] + slow[n], sqrt(execute_noise[j]^2 + state_noise[j]^2));
}

to

vector[max_length] fast = append_row(0, Af[j] + fast[1:TrialEnd[j] - 1] + Bf[j] ( p[1:TrialEnd[j], j] - y[1:TrialEnd[j], j] ) .* (1 - nofeedback[1:TrialEnd[j], j]));
vector[max_length] slow = ...;
y[ , j] ~ normal(fast + slow, hypot(execute_noise[j], state_noise[j]));

Yifan_Zhang · February 22, 2024, 11:12pm

Hi Bob, thank you so much! Yes the Bf > Bs and Af < As are correct by model definition, and the states are updated iteratively. The model also says that the observation is the sum of fast and slow states with state noise + execution noise, which is a definition(as in page 5-6 herehttps://www.nature.com/articles/s41598-022-13866-y. So if I m implementing this model correctly, then the poor identifiability problem may be a result of model itself ?

Bob_Carpenter · February 23, 2024, 10:18pm

Yes. I often run into this problem with models from the wild that I code in Stan. HMC is much better at diagnosing problems than something like Gibbs or Metropolis because it’s faster to explore. I’m often surprised that models don’t replicate after coding them in Stan, and the problem isn’t on Stan’s side, it’s highlighting a problem in an originally published fit.

I’ll try to take a look at that model and see if the Stan code matches, but I won’t be able to get to that until next week.

Topic		Replies	Views
Hierarchical learning model fitting issues Modeling fitting-issues , specification	4	605	October 16, 2019
Simple reinforcement learning model fails to initialize - Reproducible Example Modeling rstan , fitting-issues , cognitive-science	1	566	May 29, 2020
Different behavior in Cmdstan and Rstan CmdStan cognitive-science	6	1225	November 28, 2018
Initialization failed error - Attempt to fit model using rstan Modeling rstan , techniques , fitting-issues	1	445	May 30, 2020
Convergence Error: Rstan Modeling rstan	3	497	October 20, 2020

Model Parameters highly correlated and do not converge well

Related topics