Problem of User defined distribution function in Stan

evangeline · June 7, 2021, 11:23am

I want to define the distribution function by myself for the drift diffusion model, replacing wiener function. But after I print the probability calculated in my model, I found it really small, around 1e-13. I wonder if there is anything wrong with my definition of probability distribution function, especially whether I should return the sum of the log of each probabilities. Hope someone could help me! Thanks ahead!

Here is my code:

functions {
  real[] analytic_ddm_linbound(real a1, vector b1, real a2, vector b2, real[] tevall){
      vector[size(tevall)] tmp;
      real teval[size(tevall)];
      for(i in 1:size(tevall)){
        if(tevall[i] == 0){
          teval[i] = 1e-30;
        }
        else{
          teval[i] = tevall[i];
          
        }
      }
       tmp = -2 * ((a1 - a2) ./ to_vector(teval) + b1 - b2); 


  int nMax = 100;
  real errbnd = 1e-10;
  vector[size(tevall)] suminc = to_vector(rep_array(0, size(tevall)));
  real checkerr = 0;
  vector[size(tevall)] inc;

  for(n in 1: nMax){
    inc = exp(tmp*n*((n+1)*a1-n*a2))*((2*n+1)*a1-2*n*a2)-
              exp(tmp*(n+1)*(n*a1-(n+1)*a2))*((2*n+1)*a1-2*(n+1)*a2);
    suminc = suminc + inc;
    vector[size(tevall)] a;
    for(i in 1:size(tevall)){
      
      if(suminc[i] == 0){
        a[i] = 1e-30;
      }
      else{
        a[i] = suminc[i];
      }
    }
    if(max(to_array_1d(fabs(inc./a))) < errbnd){
      checkerr = checkerr + 1;
      if(checkerr == 3){
        break;
        }
      } 
    else{
      checkerr = 0;
    }
  } // end for

  real dist[size(tevall)] = to_array_1d(exp(-(a1+b1.*to_vector(teval))^2 ./ to_vector(teval) ./ 2) / sqrt(2*pi()) ./ pow(to_vector(teval), 1.5).*to_vector(suminc));
  // /sqrt(2*pi())./pow(to_vector(teval), 1.5).*to_vector(suminc));
  //  /sqrt(2*pi())./pow(to_vector(teval), 1.5).*to_vector(suminc));

  for(i in 1:size(tevall)){
    if(dist[i] < 0 ){
      dist[i] = - dist[i];
    }
  }
  return dist;

  }

  real RT_distribution_lpdf(real[] y, vector delta, real noise, real b, real shift, real tau, real b_slopel, int flag) {
     // delta is drift rate
    // shift is initial bias, if shift is none, b_upper = b_lower = b
    // tau is non-decision time
    // b is half of total bound height
    // b_slope is the coefficient of linear boundary, then the upper boundary is B(t) = b + b_slope*t,
              //and the lower boundary is B(t) = -b - b_slope*t
    real b_lower = 2*b*shift;
    real b_upper = 2*b - b_lower;

    // Scale b, drift, and (implicitly) noise so new noise is 1
    b_lower = b_lower / noise;
    b_upper = b_upper / noise;
    
    real b_slope = b_slopel / noise;
    real prob[size(y)];
    real y_change[size(y)];
    int j = 0;
    int j_change = 0;
    real drift[size(y)];

    for(i in 1:size(y)){
      if((y[i] < tau) || (y[i]*b_slope + b <= 0)){
          j += 1;
      }
      else{
        j_change += 1;
        y_change[j_change] = y[i];
        drift[j_change] = delta[i];
        
        
      }
      }

    real prob_change[size(y) - j];
  
  if(flag == 1){
    prob_change = analytic_ddm_linbound(b_upper, -to_vector(drift[:j_change]) + b_slope, -b_lower, -to_vector(drift[:j_change]) - b_slope, y_change[:j_change]);
  }
  else{
    prob_change = analytic_ddm_linbound(b_lower, to_vector(drift[:j_change]) + b_slope, -b_upper, to_vector(drift[:j_change]) - b_slope, y_change[:j_change]);
  }


    
// where to avoid 0 in y?????????????? 

    real lprob = sum(log(to_vector(prob_change) * 10^9));
    print("prob_change", prob_change);
    print("lprob", lprob);
    return lprob;
}

    real partial_sum(real[,,] all_parameters_sliced, int start, int end, 
    //vector Al, matrix Bl, matrix Cl, real[,,] Ul, 
                            //vector mu_dl, vector sigma_r, matrix X1l, 
                            int Wl, 
                            //int Xdiml,

                            int[,] idx_rdm_obsl, real[,,] RTu_rdml, real[,,] RTl_rdml, real[,,] Cohu_rdml, real[,,] Cohl_rdml,
                            int[,] Nu_rdml, int[,] Nl_rdml
                            ) 
    
    {

    real lt = 0;

    for (n in 1:(end-start+1)) {

        // unpack parameters
        //itc
        real alpha_rdm_prl[Wl]      =   all_parameters_sliced[n,,1];
        real alpha_rdml[Wl]     =   all_parameters_sliced[n,,2];
        real beta_rdm_prl[Wl]     =   all_parameters_sliced[n,,3];
        real beta_rdml[Wl]        =   all_parameters_sliced[n,,4];
        real delta_rdm_prl[Wl]     =   all_parameters_sliced[n,,5];
        real tau_rdm_prl[Wl]    =   all_parameters_sliced[n,,6];
        real tau_rdml[Wl]       =   all_parameters_sliced[n,,7];
        real bound_slope_prl[Wl]  =  all_parameters_sliced[n,, 8];
        real bound_slope[Wl] = all_parameters_sliced[n,, 9];   

        //real Xl[Wl,Xdiml];
        

        int s = start + (n - 1); 
	        for (w in 1:Wl) {
	              
          

                //lt += normal_lpdf(alpha_rdm_prl[w]    | mu_dl[1] + Cl[1,] * to_vector(Xl[w,]),sigma_r[1]);
                //lt += normal_lpdf(beta_rdm_prl[w]   | mu_dl[2] + Cl[2,] * to_vector(Xl[w,]),sigma_r[2]);  
                //lt += normal_lpdf(delta_rdm_prl[w]   | mu_dl[3] + Cl[3,] * to_vector(Xl[w,]),sigma_r[3]);  
                //lt += normal_lpdf(tau_rdm_prl[w]   | mu_dl[4] + Cl[4,] * to_vector(Xl[w,]),sigma_r[4]); 
                //lt += normal_lpdf(bound_slope_prl[w]   | mu_dl[5] + Cl[5,] * to_vector(Xl[w,]),sigma_r[5]); 

                lt += uniform_lpdf(alpha_rdm_prl[w] | -100, 100);
                lt += uniform_lpdf(beta_rdm_prl[w] | -100, 100);
                lt += uniform_lpdf(delta_rdm_prl[w] | -100, 100);
                lt += uniform_lpdf(tau_rdm_prl[w] | -100, 100);
                lt += uniform_lpdf(bound_slope_prl[w] | -100, 100);

            	  if (idx_rdm_obsl[s,w] != 0) {                                    // if week exists

	                vector[Nu_rdml[s,w]] delta_cohu = delta_rdm_prl[w]*to_vector(Cohu_rdml[s,w,:Nu_rdml[s,w]]);
	                vector[Nl_rdml[s,w]] delta_cohl = delta_rdm_prl[w]*to_vector(Cohl_rdml[s,w,:Nl_rdml[s,w]]);

	                lt += RT_distribution_lpdf(RTu_rdml[s,w,:Nu_rdml[s,w]] | delta_cohu, 1.0, alpha_rdml[w], beta_rdml[w], tau_rdml[w], bound_slope[w], 1 );
                    lt += RT_distribution_lpdf(RTl_rdml[s,w,:Nl_rdml[s,w]] | delta_cohl, 1.0, alpha_rdml[w], beta_rdml[w], tau_rdml[w], bound_slope[w], 0 );
	            }                    
 	    }
 }

    return lt;

    }

}


data {

    // Gaussian model
    int W;                                                          // Number of weeks (typically 12)
    int N;                                                          // Number of subjects
    int Xdim;                                                       // Dimension of X - latent low dimensional structure of the phenotype

    // Exogenous variables
    int exo_q_num;                                                  // number of exogenous survey questions
    real U[N,W,exo_q_num];                                       // exogenous survey questions - missing weeks were linearly interpolated outside of Stan 
   
       // Random dot motion
    int<lower=0> idx_rdm_obs[N,W];                            // Indices of weeks WITH data
    int<lower=1> P_rdm;                                             // number of parameters
    int<lower=0> Nu_max_rdm;                            // Max (across subjects) number of upper boundary responses
    int<lower=0> Nl_max_rdm;                        // Max (across subjects) number of lower boundary responses
    int<lower=0> Nu_rdm[N,W];                     // Number of upper boundary responses for each subj
    int<lower=0> Nl_rdm[N,W];                         // Number of lower boundary responses for each subj
    real RTu_rdm[N, W, Nu_max_rdm];                // upper boundary response times
    real RTl_rdm[N, W, Nl_max_rdm];                    // lower boundary response times
    real Cohu_rdm[N, W, Nu_max_rdm];                           // coherence for correct trials
    real Cohl_rdm[N, W, Nl_max_rdm];                   // coherence for incorrect trials
    matrix[N,W] minRT_rdm;                          // minimum RT for each subject of the observed data
    real RTbound_rdm;  

}
transformed data {

    int num_par = P_rdm;
}
parameters {
    //vector<lower=0, upper=1>[Xdim] A;
    //matrix[Xdim, exo_q_num] B;
    //matrix[num_par,Xdim] C;
    //matrix[N, Xdim] X1;
    //vector[num_par] mu_d;                     // constant offset
    //vector<lower=0>[num_par] sigma_r; 

    real alpha_rdm_pr[N,W];              // rdm
    real beta_rdm_pr[N,W];               // rdm
    real delta_rdm_pr[N,W];              // rdm
    real tau_rdm_pr[N,W];                // rdm
    real bound_slope_pr[N,W];
}

transformed parameters {
    real<lower=0> alpha_rdm[N,W];                 
    real<lower=0, upper=1> beta_rdm[N,W];                  
    real<lower=RTbound_rdm, upper=max(minRT_rdm[,])> tau_rdm[N,W];
    real<lower=0> bound_slope[N,W];

    
   
    alpha_rdm = exp(alpha_rdm_pr);                // rdm
    bound_slope = exp(bound_slope_pr);                // rdm

    for (n in 1:N) {
        for (w in 1:W) {
            beta_rdm[n,w] = Phi(beta_rdm_pr[n,w]);                                                       // gng
            tau_rdm[n,w]  = Phi(tau_rdm_pr[n,w]) * (minRT_rdm[n,w] - RTbound_rdm) + RTbound_rdm;     // rdm
        }
    }            
}
model {
       
    //A ~ normal(0.5, 0.05);
    //to_vector(B) ~ normal(0, 0.05);
    //to_vector(C) ~ normal(0, 0.05);
    //to_vector(X1) ~ normal(0, 0.1);
    //mu_d ~ normal(0, 0.1);
    //sigma_r ~ normal(0, 0.1);

    // pack parameters for reduce sum
    real all_parameters[N, W, 9]; 



     for (n in 1:N) {
        for (w in 1:W) {
            all_parameters[n, w, 1] = alpha_rdm_pr[n,w];
            all_parameters[n, w, 2] = alpha_rdm[n,w];
            all_parameters[n, w, 3] = beta_rdm_pr[n,w];
            all_parameters[n, w, 4] = beta_rdm[n,w];
            all_parameters[n, w, 5] = delta_rdm_pr[n,w];
            all_parameters[n, w, 6] = tau_rdm_pr[n,w];
            all_parameters[n, w, 7] = tau_rdm[n,w];
            all_parameters[n, w, 8] = bound_slope_pr[n,w];
            all_parameters[n, w, 9] = bound_slope[n,w];
        }
    }                                      

        target += reduce_sum(partial_sum, all_parameters, 1, 
        //A, B, C, U, mu_d, sigma_r, X1, 
        W, 
        //Xdim,
                            idx_rdm_obs, RTu_rdm, RTl_rdm, Cohu_rdm, Cohl_rdm, Nu_rdm, Nl_rdm); 
                                       
}

maxbiostat · June 7, 2021, 11:32am

Hi,

It would be helpful if you could write down the exact density you’re trying to implement so we can have a better idea. As it stands, the code is much too complicated to easily debug.

evangeline · June 8, 2021, 5:30am

Hi,

Sorry that this model is actually really difficult and hard to directly write it down. The main point is that I am referring the python version of this model here PyDDM/analytic.py at master · mwshinn/PyDDM · GitHub. And the original theorem of this model is here. file:///Users/chenwenqi/Downloads/1177705996.pdf

Thank you for your help! Hope it helps!

evangeline · June 8, 2021, 5:33am

I think my main problem is whether the definition of the probability distribution is correct. As you can see, I currently calculate the probability of one set of parameters and its variable as p = f(y | parameters) and return sum(log(p)) as the probability function. I wonder it is absolutely correct as the other distribution functions in Stan. Thanks!

maxbiostat · June 8, 2021, 10:28am

Right. The first thing to do is to numerically stabilise the code. For instance, this line

could be written as

log_inc =  log_diff_exp( tmp*n*((n+1)*a1-n*a2))*((2*n+1)*a1-2*n*a2, tmp*(n+1)*(n*a1-(n+1)*a2))*((2*n+1)*a1-2*(n+1)*a2);

and then

suminc = exp(log_sum_exp(log_inc));

or, even better, you could keep everything in log space until it is absolutely necessary to map back to the original space.

So my first advice is to go through your code and identify all of the points where you could be getting under/overflows and try to re-write those in a numerically stable way.

evangeline · June 8, 2021, 11:34am

Thanks! Will try that! Another question is how to deal with the scenario where the probability is 0 or close to 0. This will cause the log(probability) to be -inf, which will be harmful for the model. But I can also not disregard it at all because it will indicate that the loglikelihood will be small. Thanks ahead!

maxbiostat · June 8, 2021, 11:45am

That depends on whether the probability is exactly zero or just very small. If it’s the latter case, then all you need to do is make sure the probabilities are being accurately computed, i.e., not underflowing. If it’s the former, then something must necessarily be wrong with the model, since it gives zero probability to things that are actually observed.

evangeline · June 26, 2021, 5:12am

Thanks! I think I got it. Another problem is that the result of the model did not evolve, namely, for every iteration, the parameters remain the same as the starting parameters. I think the parameters were not fitted at all but do not know why. Do you have any idea about what may cause it? Thank you so much!

Topic		Replies	Views
Error message: Random variable[1] is nan, but must not be nan! Modeling	7	1921	December 15, 2019
User defined function syntax error: No matches for Modeling	2	1594	December 11, 2019
User defined probability function involving an integral Modeling	2	373	May 23, 2019
What's wrong with my stan code for estimating parameters with multivariate distribution Modeling	2	419	July 31, 2019
Case study: Extreme value analysis and user defined probability functions in Stan Modeling	4	2301	November 15, 2017

Problem of User defined distribution function in Stan

Related topics