2 equivalent models produce different outputs (LBA model)

Hi,

I’m working on extending a Linear Ballistic Model (LBA) to accomodate trial-level regressors. I am currently running into a strange issue, being that 2 models that should (in my view) be completely equivalent, produce different outputs. I have no idea why this is the case and feel like I might be missing something obvious. Hopefuly someone could help and clarify this issue.

Starting with the original, fully-functional model, focusing on a particular parameter of the model, tau:

parameters {
...
 real<lower=0> mu_tau[N_cond]; // group-level parameter
real<lower=0> sigma_tau[N_cond];
 real<lower=0> tau[N, N_cond]; //individual parameter
...
}
model {
...
for (j in 1:N_cond) {
     mu_tau[j] ~ normal(1, .5)T[0,];
     sigma_tau[j] ~ gamma (1,1)
    }

for (i in 1:N) {
    for (j in 1:N_cond) {
          tau[i,j] ~ normal(mu_tau[j], sigma_tau[j])T[0,]; 
 // Likelihood 
         RT[i, j, , 1:n_trials] ~ lba(B, A[i], v[i, j,], s[i], tau[i,j]);
         }
     }
}

The likelihood function looks like this:

real lba_lpdf(matrix RT, real B, real A, vector v, real s, **real tau**) {

    real t;
    real b;
    real cdf;
    real pdf;
    vector[cols(RT)] prob;
    real out;
    real prob_neg;

    b = B;
    for (i in 1:cols(RT)) {
      **t = RT[1, i] - tau;** **//only relevant part for our issue**
      if (t > 0) {
        cdf = 1;
        for (j in 1:num_elements(v)) {
          if (RT[2, i] == j) {
            pdf = lba_pdf(t, b, A, v[j], s);
          } else {
            cdf = lba_cdf(t, b, A, v[j], s) * cdf;
          }
        }
        prob_neg = 1;
        for (j in 1:num_elements(v)) {
          prob_neg = Phi(-v[j]/s) * prob_neg;
        }
        prob[i] = pdf * (1-cdf);
        prob[i] = prob[i]/(1-prob_neg);
        if (prob[i] < 1e-10) {
          prob[i] = 1e-10;
        }

      } else {
        prob[i] = 1e-10;
      }
    }
    out = sum(log(prob));
    return out;
  }

Now, let’s say I want the value of tau to vary per trial, as a linear function of a different variable. To start, I’ve set up a model with an additional parameter taut, which I want to use to substitute tau with. For testing purposes, taut just takes on the values of individual tau values. It is defined like follows:

transformed parameters {
real taut [N, N_cond, Max_tr];

for (i in 1:N) {
    for (j in 1:N_cond) {
      int n_trials;
      n_trials = N_tr_cond[i, j];
      for (t in 1:n_trials) {
          **taut[i,j,t] = tau[i,j];** 
        }
      for (tX in n_trials:Max_tr) {
        taut[i,j,tX] = -2; // this is kind of ugly, but I since the number of trials can vary per //condition it just serves as a meaningless placeholder, just not to have NaNs in the output
      }
    }
  }
}

All I’ve done here is substitute a single real value of tau with an array of this value repeated t times (no. of trials).
The likelihood now changes to (changes bolded):

for (i in 1:N) {
    for (j in 1:N_cond) {
        RT[i, j, , 1:n_trials] ~ lba(B, A[i], v[i, j,], s[i], **taut[i,j,1:n_trials]**);
     }
}

And the likelihood function, correspondingly, to:

real lba_lpdf(matrix RT, real B, real A, vector v, real s, **real [] taut**) {

    real t;
    real b;
    real cdf;
    real pdf;
    vector[cols(RT)] prob;
    real out;
    real prob_neg;

    b = B;
    for (i in 1:cols(RT)) {
      
      **t = RT[1, i] - taut[i]**;
      if ( t> 0) ) { 
        cdf = 1;
        for (j in 1:num_elements(v)) {
          if (RT[2, i] == j) {
            pdf = lba_pdf(t, b, A, v[j], s);
          } else {
            cdf = lba_cdf(t, b, A, v[j], s) * cdf;
          }
        }
        prob_neg = 1;
        for (j in 1:num_elements(v)) {
          prob_neg = Phi(-v[j]/s) * prob_neg;
        }
        prob[i] = pdf * (1-cdf);
        prob[i] = prob[i]/(1-prob_neg);
        if (prob[i] < 1e-10) {
          prob[i] = 1e-10;
        }

        } else {
          prob[i] = 1e-10;
        }
      
    }
    out = sum(log(prob));
    return out;
  }

In case 1, I input a single real value of tau to the likelihood function, which is then reused for each iteration. In case 2, I input an array which should consist of the same value repeated by the number of trials. So it still should be the same value for each iteration.

From my understanding, these models should perform basically identically. However, when I run these models, the first does well, while the second one can never converge and usually has difficulties in finding more than 1 efficient sample per parameter. Can you help me understand what am I missing?

Best,
Wojciech

Could it be that here:

      for (t in 1:n_trials) {
          **taut[i,j,t] = tau[i,j];** 
        }
      for (tX in n_trials:Max_tr) {
        taut[i,j,tX] = -2; // this is kind of ugly, but I since the number of trials can vary per //condition it just serves as a meaningless placeholder, just not to have NaNs in the output
      }

You set the element taut[i, j, n_trials] twice? Stan loops include their endpoints, so you overwrite one of the taut values you wanted I think.

Ah - that’s a good catch, thank you. I’ve fixed that now. Unfortunately, it doesn’t solve the issue. The 2nd model is still unable to converge

Oh wow, sorry for the delay. I didn’t get a discourse notification for this.

If that didn’t fix it, the first thing I’d do would be set the elements of tau to be a constant (like 1.798 or some distinct, easy to identify, reasonable floating point number you know) and then add a print here:

for (i in 1:cols(RT)) {
  **t = RT[1, i] - taut[i]**;
  print(i, taut[i]);
  ...
}

And make sure some craziness isn’t coming out.

It’s really crude debugging, but there must be something different going on somewhere. If it’s too confusing to parse the output, you could just dump it in a file, make the same modifications for the case where tau is just 2 dimensions (instead of 3), and then diff the two files (assuming the file size doesn’t blow up too badly :D).

Thanks for that. I was actually unaware that you can use print statements in STAN, this makes life a lot easier :)

Regarding the issue, I have a hypothesis why this doesn’t work. I have only a very crude understanding of the way STAN computes the posterior densities, so I might be completely off, but this is my idea:
Since Model B introduces the trial-level variable taut, the level of parameters to estimate goes from 239 to 5239. Even though all values of taut are the same, I assume STAN still is trying to compute the posterior as if each trial value was a separate parameter, which makes the task virtually impossible. The way to circumvent this is to calculate the value of taut as a local variable. Let me know if my reasoning here is correct.

That would be a problem, but if taut is declared as a transformed parameter then it is just like a local variable. Only things in the parameters block are actually sampled.

Yeah I didn’t know it for a long while either haha. There should be a solution to this problem, keep digging :D.

Well then I have no clue why it behaves strange. Printing the values doesn’t reveal any weirdness going on. The next step would probably be to investigate the log sums of likelihood per iteration, but since I found a way around this, I will not pursue this further.

An effective work-around is to define taut as a local variable inside the lba_lpdf function:

vector[cols(RT)] taut;
taut = rep_vector(tau, cols(RT));

The rest works as in Model 2, so we simply subtract each taut value from trial RT:

for (i in 1:cols(RT)) {
   t = RT[1, i] - taut[i];

This method scales nicely to the actual issue, that is defining trial-level linear relationship. This can be done by defining taut as a variable outside of the main loop of the function and then filling it with needed values:

real taut;
 for (i in 1:cols(RT)) {
      taut = tau + beta * RT[3,i]; // tau acts as intercept, beta is the estimated coefficient, and RT[3,i] is the trial-lvl regressor

I guess it doesn’t really matter if I define a full-lentgh array/vector and then fill it with values or just define a single value which is overwriten each iteration, but my intuition tells me the second method should be a bit faster.

Since I only really care about the betas and intercepts, and not single-trial values of the regression (I just need to make sure they aren’t impossible, which I can do by assigning low probabilities to unrealistic values), this solution seems to do the job quite nicely.

1 Like