Fitted parameters do not evolve across iterations with self-defined distribution functions

evangeline · June 28, 2021, 1:27pm

I want to define the distribution function by myself for the drift diffusion model, replacing wiener function. But the fitted parameters in the output in the iterations do not change across iterations, namely, the parameters remain the same as the starting parameters. I think the parameters were not fitted at all but do not know why. Does anyone have any idea about what may cause it? Thanks ahead!

Here is the Stan code of the model.

functions {
  real[] analytic_ddm_linbound(real a1, vector b1, real a2, vector b2, real[] tevall){
      vector[size(tevall)] tmp;
      real teval[size(tevall)];
      for(i in 1:size(tevall)){
        if(tevall[i] == 0){
          teval[i] = 1e-30;
        }
        else{
          teval[i] = tevall[i];
          
        }
      }
       tmp = -2 * ((a1 - a2) ./ to_vector(teval) + b1 - b2); 


  int nMax = 100;
  real errbnd = 1e-10;
  vector[size(tevall)] suminc = to_vector(rep_array(0, size(tevall)));
  real checkerr = 0;
  vector[size(tevall)] inc;

  for(n in 0: nMax-1){
    inc = exp(tmp*n*((n+1)*a1-n*a2))*((2*n+1)*a1-2*n*a2)-
              exp(tmp*(n+1)*(n*a1-(n+1)*a2))*((2*n+1)*a1-2*(n+1)*a2);
    suminc = suminc + inc;
    vector[size(tevall)] a;
    for(i in 1:size(tevall)){
      
      if(suminc[i] == 0){
        a[i] = 1e-30;
      }
      else{
        a[i] = suminc[i];
      }
    }
    if(max(to_array_1d(fabs(inc./a))) < errbnd){
      checkerr = checkerr + 1;
      if(checkerr == 3){
        break;
        }
      } 
    else{
      checkerr = 0;
    }
  } // end for

  real dist[size(tevall)] = to_array_1d(exp(-(a1+b1.*to_vector(teval))^2 ./ to_vector(teval) ./ 2) / sqrt(2*pi()) ./ pow(to_vector(teval), 1.5).*to_vector(suminc));
  // /sqrt(2*pi())./pow(to_vector(teval), 1.5).*to_vector(suminc));
  //  /sqrt(2*pi())./pow(to_vector(teval), 1.5).*to_vector(suminc));

  for(i in 1:size(tevall)){
    if(dist[i] < 0 ){
      dist[i] = - dist[i];
    }
  }
  return dist;

  }

  real RT_distribution_lpdf(real[] y, vector delta, real noise, real b, real shift, real tau, real b_slopel, int flag) {
     // delta is drift rate
    // shift is initial bias, if shift is none, b_upper = b_lower = b
    // tau is non-decision time
    // b is half of total bound height
    // b_slope is the coefficient of linear boundary, then the upper boundary is B(t) = b + b_slope*t,
              //and the lower boundary is B(t) = -b - b_slope*t
    real b_lower = 2*b*shift;
    real b_upper = 2*b - b_lower;

    // Scale b, drift, and (implicitly) noise so new noise is 1
    b_lower = b_lower / noise;
    b_upper = b_upper / noise;
    
    real b_slope = b_slopel / noise;
    real prob[size(y)];
    real y_change[size(y)];
    int j = 0;
    int j_change = 0;
    real drift[size(y)];

    for(i in 1:size(y)){
      if((y[i] - tau)*b_slope + b <= 0){
          j += 1;
      }
      else{
        j_change += 1;
        y_change[j_change] = y[i] - tau;
        drift[j_change] = delta[i];
        
        
      }
      }

    real prob_change[size(y) - j];
  
  if(flag == 1){
    prob_change = analytic_ddm_linbound(b_upper, -to_vector(drift[:j_change]) + b_slope, -b_lower, -to_vector(drift[:j_change]) - b_slope, y_change[:j_change]);
  }
  else{
    prob_change = analytic_ddm_linbound(b_lower, to_vector(drift[:j_change]) + b_slope, -b_upper, to_vector(drift[:j_change]) - b_slope, y_change[:j_change]);
  }


    
// where to avoid 0 in y??????????????

    real lprob = sum(log(prob_change));
    print("lprob", lprob);
    return lprob;
}

    real partial_sum(real[,,] all_parameters_sliced, int start, int end, vector Al, matrix Bl, matrix Cl, real[,,] Ul, 
                            vector mu_dl, vector sigma_r, matrix X1l, int Wl, int Xdiml,

                            int[,] idx_rdm_obsl, real[,,] RTu_rdml, real[,,] RTl_rdml, real[,,] Cohu_rdml, real[,,] Cohl_rdml,
                            int[,] Nu_rdml, int[,] Nl_rdml
                            ) 
    
    {

    real lt = 0;

    for (n in 1:(end-start+1)) {

        // unpack parameters
        //itc
        real alpha_rdm_prl[Wl]      =   all_parameters_sliced[n,,1];
        real alpha_rdml[Wl]     =   all_parameters_sliced[n,,2];
        real beta_rdm_prl[Wl]     =   all_parameters_sliced[n,,3];
        real beta_rdml[Wl]        =   all_parameters_sliced[n,,4];
        real delta_rdm_prl[Wl]     =   all_parameters_sliced[n,,5];
        real tau_rdm_prl[Wl]    =   all_parameters_sliced[n,,6];
        real tau_rdml[Wl]       =   all_parameters_sliced[n,,7];
        real bound_slope_prl[Wl]  =  all_parameters_sliced[n,, 8];
        real bound_slope[Wl] = all_parameters_sliced[n,, 9];   

        real Xl[Wl,Xdiml];
        

        int s = start + (n - 1); 
	        for (w in 1:Wl) {
	            
	            if (w == 1) {
                    Xl[w,] = to_array_1d(inv_logit(X1l[s,])); 
                } else {
                    Xl[w,] = to_array_1d(inv_logit(diag_matrix(Al) * to_vector(Xl[w-1,]) + Bl * to_vector(Ul[s,w-1,])));
                }  
          

                lt += normal_lpdf(alpha_rdm_prl[w]    | mu_dl[1] + Cl[1,] * to_vector(Xl[w,]),sigma_r[1]);
                lt += normal_lpdf(beta_rdm_prl[w]   | mu_dl[2] + Cl[2,] * to_vector(Xl[w,]),sigma_r[2]);  
                lt += normal_lpdf(delta_rdm_prl[w]   | mu_dl[3] + Cl[3,] * to_vector(Xl[w,]),sigma_r[3]);  
                lt += normal_lpdf(tau_rdm_prl[w]   | mu_dl[4] + Cl[4,] * to_vector(Xl[w,]),sigma_r[4]); 
                lt += normal_lpdf(bound_slope_prl[w]   | mu_dl[5] + Cl[5,] * to_vector(Xl[w,]),sigma_r[5]); 

            	  if (idx_rdm_obsl[s,w] != 0) {                                    // if week exists

	                vector[Nu_rdml[s,w]] delta_cohu = delta_rdm_prl[w]*to_vector(Cohu_rdml[s,w,:Nu_rdml[s,w]]);
	                vector[Nl_rdml[s,w]] delta_cohl = delta_rdm_prl[w]*to_vector(Cohl_rdml[s,w,:Nl_rdml[s,w]]);

	                lt += RT_distribution_lpdf(RTu_rdml[s,w,:Nu_rdml[s,w]] | delta_cohu, 1.0, alpha_rdml[w], beta_rdml[w], tau_rdml[w], -bound_slope[w], 1 );
                    lt += RT_distribution_lpdf(RTl_rdml[s,w,:Nl_rdml[s,w]] | delta_cohl, 1.0, alpha_rdml[w], beta_rdml[w], tau_rdml[w], -bound_slope[w], 0 );
	            }                    
 	    }
 }

    return lt;

    }

}


data {

    // Gaussian model
    int W;                                                          // Number of weeks (typically 12)
    int N;                                                          // Number of subjects
    int Xdim;                                                       // Dimension of X - latent low dimensional structure of the phenotype

    // Exogenous variables
    int exo_q_num;                                                  // number of exogenous survey questions
    real U[N,W,exo_q_num];                                       // exogenous survey questions - missing weeks were linearly interpolated outside of Stan 
   
       // Random dot motion
    int<lower=0> idx_rdm_obs[N,W];                            // Indices of weeks WITH data
    int<lower=1> P_rdm;                                             // number of parameters
    int<lower=0> Nu_max_rdm;                            // Max (across subjects) number of upper boundary responses
    int<lower=0> Nl_max_rdm;                        // Max (across subjects) number of lower boundary responses
    int<lower=0> Nu_rdm[N,W];                     // Number of upper boundary responses for each subj
    int<lower=0> Nl_rdm[N,W];                         // Number of lower boundary responses for each subj
    real RTu_rdm[N, W, Nu_max_rdm];                // upper boundary response times
    real RTl_rdm[N, W, Nl_max_rdm];                    // lower boundary response times
    real Cohu_rdm[N, W, Nu_max_rdm];                           // coherence for correct trials
    real Cohl_rdm[N, W, Nl_max_rdm];                   // coherence for incorrect trials
    matrix[N,W] minRT_rdm;                          // minimum RT for each subject of the observed data
    real RTbound_rdm;  

}
transformed data {

    int num_par = P_rdm;
}
parameters {
    vector<lower=0, upper=1>[Xdim] A;
    matrix[Xdim, exo_q_num] B;
    matrix[num_par,Xdim] C;
    matrix[N, Xdim] X1;
    vector[num_par] mu_d;                     // constant offset
    vector<lower=0>[num_par] sigma_r; 

    real alpha_rdm_pr[N,W];              // rdm
    real beta_rdm_pr[N,W];               // rdm
    real delta_rdm_pr[N,W];              // rdm
    real tau_rdm_pr[N,W];                // rdm
    real bound_slope_pr[N,W];
}

transformed parameters {
    real<lower=0> alpha_rdm[N,W];                 
    real<lower=0, upper=1> beta_rdm[N,W];                  
    real<lower=RTbound_rdm, upper=max(minRT_rdm[,])> tau_rdm[N,W];
    real<lower=0> bound_slope[N,W];

    
   
    alpha_rdm = exp(alpha_rdm_pr);                // rdm
    bound_slope = exp(bound_slope_pr);                // rdm

    for (n in 1:N) {
        for (w in 1:W) {
            beta_rdm[n,w] = Phi(beta_rdm_pr[n,w]);                                                       // gng
            tau_rdm[n,w]  = Phi(tau_rdm_pr[n,w]) * (minRT_rdm[n,w] - RTbound_rdm) + RTbound_rdm;     // rdm
        }
    }            
}
model {
       
    A ~ normal(0.5, 0.05);
    to_vector(B) ~ normal(0, 0.05);
    to_vector(C) ~ normal(0, 0.05);
    to_vector(X1) ~ normal(0, 0.1);
    mu_d ~ normal(0, 0.1);
    sigma_r ~ normal(0, 0.1);

    // pack parameters for reduce sum
    real all_parameters[N, W, 9]; 



     for (n in 1:N) {
        for (w in 1:W) {
            all_parameters[n, w, 1] = alpha_rdm_pr[n,w];
            all_parameters[n, w, 2] = alpha_rdm[n,w];
            all_parameters[n, w, 3] = beta_rdm_pr[n,w];
            all_parameters[n, w, 4] = beta_rdm[n,w];
            all_parameters[n, w, 5] = delta_rdm_pr[n,w];
            all_parameters[n, w, 6] = tau_rdm_pr[n,w];
            all_parameters[n, w, 7] = tau_rdm[n,w];
            all_parameters[n, w, 8] = bound_slope_pr[n,w];
            all_parameters[n, w, 9] = bound_slope[n,w];
        }
    }                                      

        target += reduce_sum(partial_sum, all_parameters, 1, A, B, C, U, mu_d, sigma_r, X1, W, Xdim,
                            idx_rdm_obs, RTu_rdm, RTl_rdm, Cohu_rdm, Cohl_rdm, Nu_rdm, Nl_rdm); 
                                       
}

martinmodrak · July 2, 2021, 7:19pm

Hi,
this is a huge model, so it is very hard to debug. It is usually beneficial to start small, so e.g. jsut building a model where you observe a set of y generated by RT_distribution with the same delta, noise, band other parameters... OR maybe even simpler - treat most of the parameters as known and only fit one or two... Does theRT_distribution_lpdf` then let you recover the correct values (from simulated data)?

Generally when the chains don’t move, you also get a bunch of other warnings/errors (divergences, exceptions, …) which cause all transitions to be rejected. In some interfaces, those may not show up unless you run just a single chain - what do you see when you run a single chain? Do you get warnings about divergent transitions/treedepth?

Best of luck with your model!

evangeline · July 3, 2021, 3:47pm

Thanks! Actually, I have used similar models before but there was no problem at that time so I really have no idea how to further inspect it. I have tried running a single chain but there is no warnings or error. And indeed, there is tiny change in the parameters across iterations, but the change is really small. Do you have any idea under what cases such thing will happen?

evangeline · July 7, 2021, 2:06pm

Hi,

After digging into the model, I found that in this equation

                lt += RT_distribution_lpdf(RTu_rdml[s,w,:Nu_rdml[s,w]] | delta_cohu, 1.0, alpha_rdml[w], beta_rdml[w], tau_rdml[w], -bound_slope[w], 1 );
                lt += RT_distribution_lpdf(RTl_rdml[s,w,:Nl_rdml[s,w]] | delta_cohl, 1.0, alpha_rdml[w], beta_rdml[w], tau_rdml[w], -bound_slope[w], 0 );

If I use +bound_slope, the parameters will evolve well. But if I use -bound_slope, they will not evolve. Hope you have any idea about it ! Thanks again!

kholsinger · July 7, 2021, 6:02pm

Jumping in late and since the model is very complicated, I don’t understand what it’s doing, but I did notice that you have

real<lower=0> bound_slope[N,W];

Does it make any sense for bound_slopel in RT_distribution_lpdf to be negative? If not, it looks to me as if simply using bound_slope in those calls is what you want to do, which should (I think) be equivalent to using +bound_slope.

martinmodrak · July 9, 2021, 6:45pm

I tried looking a bit more into the model and I admit I am not much wiser. I just have no idea what is going on, so can’t help much. I also suspect (though I am not sure) that reduce_sum could hide warnings - what happens if you run the model single-threaded?

There is also a bunch of suspicious stuff in the model - those might and might not be related to the problems you are having.

 for(i in 1:size(tevall)){
        if(tevall[i] == 0){
          teval[i] = 1e-30;
        }
        else{
          teval[i] = tevall[i];
          
        }
      }
       tmp = -2 * ((a1 - a2) ./ to_vector(teval) + b1 - b2);

you are avoiding division by zero, but the quick fix could make the posterior be discontinous and could still behave badly - if tevall[i] can be zero for other reasons than numerical underflow during warmup, then it might be worth investigating why it becomes zero and try to avoid the zero values in the first place as you obviously believe that zeroes are implausible values for tevall.

if((y[i] - tau)*b_slope + b <= 0)

This means that the posterior will not be continuous in b_slope, tau and b (which are all paramters, if I understand the model correctly) - a discontinuous posterior can cause all sorts of issues (including the chains not moving).

inc = exp(tmp*n*((n+1)*a1-n*a2))*((2*n+1)*a1-2*n*a2)-
              exp(tmp*(n+1)*(n*a1-(n+1)*a2))*((2*n+1)*a1-2*(n+1)*a2)

If both arguments are large, the subtraction could result in a catastrophic cancellation. Maybe using log_diff_exp (and then summing the logarithms with log_sum_exp) could be more stable.

Hope that helps at least a bit.

Topic		Replies	Views
Problem of User defined distribution function in Stan Modeling	7	477	June 26, 2021
Fitting generalized logistic function Modeling fitting-issues	8	2930	October 2, 2017
Across trial drift-rate variability in the (hierarchical) drift diffusion model Modeling fitting-issues , cognitive-science	7	1180	February 5, 2019
Parameters during warmup far outside prior range Modeling	16	2104	February 26, 2018
Divergent transitions & BFMI low in a state-space model Modeling	6	1037	August 17, 2017

Fitted parameters do not evolve across iterations with self-defined distribution functions

Related topics