Reparameterize model when loop is used

I want to reparameterize the model as below. The key thing I think that need to be reparameterized is itc_k_pr and itc_beta_pr. But they will depend on X which will be updated in the loop in model block. So my question is how I could reparameterize them. I think if I follow the general instructions to put them in transformed_parameters block, itc_k_pr and itc_beta_pr will be sampled with fixed X. So I think maybe I need to put them in model block but I wonder whether it will work.

Here is the original code and I delete something unnecessary for others to understand the structure.

parameters {
    real itc_k_pr[N,W];                  // itc
    real itc_beta_pr[N,W];               // itc

    real<lower=-1, upper=1> C[2,Xdim];
    matrix[N, Xdim] X1;
    real<lower=0, upper=1> eta[N]; //eta
    matrix[Xdim, exo_q_num] B;
    vector[2] mu;
    vector<lower=0.00000001>[2] sigma_r;
}

transformed parameters {
  
    real<lower=0> itc_k[N,W];                // change for the sake of consistency  - k_itc
    real<lower=0> itc_beta[N,W];
   


    itc_k = exp(itc_k_pr);                        // itc    
    itc_beta = exp(itc_beta_pr);                // itc
}

model {
       
    to_vector(to_matrix(C)) ~ normal(0, 0.05);
    to_vector(X1) ~ normal(0, 0.1);
    eta ~ normal(0.01, 0.1);// prior for eta
    to_vector(B) ~ normal(0, 0.05);
    mu ~ normal(0, 0.1);
    sigma_r ~ normal(0, 0.1);
    real X[N,W,Xdim];
    real itc_grad_sum[N, W, 2];
    
     for (s in 1:N) {
        real reward_sum = 0;                
        for (w in 1:W) {
            reward_sum += reward[s, w];
            matrix[2, Tr_itc[s,w]] itc_grad ;
            
            if (w == 1) {
                X[s,w,] = to_array_1d(inv_logit(X1[s,]));
            } else {
      
                vector[Xdim] a = eta[s].*(reward[s, w] - reward_sum/w)* to_matrix(C)' * to_vector(itc_grad_sum[s, w-1]);
                vector[Xdim] b = to_vector(X[s,w-1,]);
                X[s,w,] = to_array_1d( b + a); // 
    
  
            }  
            itc_k_pr[s,w] ~    normal(mu[1] + B * to_vector(U[s,w,]) + to_matrix(C)[1,] * to_vector(X[s,w,]),sigma_r[1] );
            itc_beta_pr[s,w] ~ normal(mu[2] + B * to_vector(U[s,w,]) + to_matrix(C)[2,] * to_vector(X[s,w,]),sigma_r[2] );  

            vector[Tr_itc[s,w]] ev_later   = xxxxxx
 	        vector[Tr_itc[s,w]] ev_sooner  = xxxxxx
 	        choice_itc[s,w,:Tr_itc[s,w]] ~ bernoulli_logit(itc_beta[s,w] * (ev_later - ev_sooner)); 
            vector[Tr_itc[s,w]] itc_p = xxxx
            vector[Tr_itc[s,w]] log_likelihood_p_grad = xxxxx
            
            itc_grad = xxxx
              itc_grad_sum[s, w, ] =  xxxx
          
            
            
 	    }  
    }                                     
} 


Below is the reparameterized code I think may work. But I am not sure about it and wonder how it can be revised so that the reparameterization can be done.

parameters {
    real itc_k_pr[N,W];                  // itc
    real itc_beta_pr[N,W];               // itc

    real itc_k_pr_standard[N,W];                  // itc
    real itc_beta_pr_standard[N,W];               // itc
    real<lower=-1, upper=1> C[2,Xdim];
    matrix[N, Xdim] X1;
    real<lower=0, upper=1> eta[N]; //eta
    matrix[Xdim, exo_q_num] B;
    vector[2] mu;
    vector<lower=0.00000001>[2] sigma_r;
}

transformed parameters {
    //real itc_k_pr[N,W];                  // itc
    //real itc_beta_pr[N,W];               // itc
    real<lower=0> itc_k[N,W];                // change for the sake of consistency  - k_itc
    real<lower=0> itc_beta[N,W];
    


    itc_k = exp(itc_k_pr);                        // itc    
    itc_beta = exp(itc_beta_pr);                // itc
}

model {
       
    to_vector(to_matrix(C)) ~ normal(0, 0.05);
    to_vector(X1) ~ normal(0, 0.1);
    eta ~ normal(0.01, 0.1);// prior for eta
    to_vector(B) ~ normal(0, 0.05);
    mu ~ normal(0, 0.1);
    sigma_r ~ normal(0, 0.1);
    real X[N,W,Xdim];
    real itc_grad_sum[N, W, 2];
    
     for (s in 1:N) {
        real reward_sum = 0;                
        for (w in 1:W) {
            reward_sum += reward[s, w];
            matrix[2, Tr_itc[s,w]] itc_grad ;
            
            if (w == 1) {
                X[s,w,] = to_array_1d(inv_logit(X1[s,]));
            } else {
      
                vector[Xdim] a = eta[s].*(reward[s, w] - reward_sum/w)* to_matrix(C)' * to_vector(itc_grad_sum[s, w-1]);
                vector[Xdim] b = to_vector(X[s,w-1,]);
                X[s,w,] = to_array_1d( b + a); // 
    
  
            }  
           itc_k_pr_standard[s,w] ~ std_normal();
           itc_k_pr_standard[s,w] ~ std_normal();
           itc_k_pr[s,w] =  mu[1] + B * to_vector(U[s,w,]) + to_matrix(C)[1,] * to_vector(X[s,w,]) + sigma_r[1]*itc_k_pr_standard[s,w];
           itc_beta_pr[s,w] =  mu[2] + B * to_vector(U[s,w,]) + to_matrix(C)[1,] * to_vector(X[s,w,]) + sigma_r[2]*itc_beta_pr_standard[s,w];

            vector[Tr_itc[s,w]] ev_later   = xxxxxx
 	        vector[Tr_itc[s,w]] ev_sooner  = xxxxxx
 	        choice_itc[s,w,:Tr_itc[s,w]] ~ bernoulli_logit(itc_beta[s,w] * (ev_later - ev_sooner)); 
            vector[Tr_itc[s,w]] itc_p = xxxx
            vector[Tr_itc[s,w]] log_likelihood_p_grad = xxxxx
            
            itc_grad = xxxx
              itc_grad_sum[s, w, ] =  xxxx

It looks like you’re doing a non-centered parameterization for itc_k_pr and itc_beta_pr. So the steps are define those variables as either transformed parameters or parameters in the model block, add new parameters with standard normal priors, and then transform those to the original parameters.

Looking at how you’ve changed:

itc_k_pr[s,w] ~    normal(mu[1] + B * to_vector(U[s,w,]) + to_matrix(C)[1,] * to_vector(X[s,w,]),sigma_r[1] );
itc_beta_pr[s,w] ~ normal(mu[2] + B * to_vector(U[s,w,]) + to_matrix(C)[2,] * to_vector(X[s,w,]),sigma_r[2] ); 

To:

itc_k_pr[s,w] =  mu[1] + B * to_vector(U[s,w,]) + to_matrix(C)[1,] * to_vector(X[s,w,]) + sigma_r[1]*itc_k_pr_standard[s,w];
itc_beta_pr[s,w] =  mu[2] + B * to_vector(U[s,w,]) + to_matrix(C)[1,] * to_vector(X[s,w,]) + sigma_r[2]*itc_beta_pr_standard[s,w];

Seems okay but I think you may have meant to_matrix(C)[2,] on the second line.

There’s also a problem here:

itc_k_pr_standard[s,w] ~ std_normal();
itc_k_pr_standard[s,w] ~ std_normal();

I think the second should be itc_beta_pr_standard.

Also these should no longer be parameters:

real itc_k_pr[N,W];                  // itc
real itc_beta_pr[N,W];               // itc
1 Like

Thank you so much! I have successfully reparameterized it. But I have a question about how to get the itc_k_pr and itc_beta_pr from itc_k_pr_standard/ itc_beta_pr_standard as I want. I think maybe I should generate it in generated_quantities block. But in this way, I will have to repeat all the things I did in model to generate it, which may be time-consuming. Is there any better way that I can get it?

If you define the variables in transformed parameters then they will appear in the output and also be available to use in the model block. Will this work?

But the parameter will depend on the variable that is updated in model block during loop. If I define it in transformed_parameters, it can not use the updated variables. So I think it will not work?

Yeah so everything in that loop would need to move.

To be clear though, sampling statements (the ones with the ~) don’t directly modify any variables in the model, so even though those can’t move to the model block, they aren’t updating any of the other variable values directly so they wouldn’t need to move.

This sorta transformation can get awkward if you end up with large numbers of transformed parameters you don’t want to print. Is that the case here?

Yeah, so do you mean move all the things in the model to transformed block? But I suppose I could move all the things to generated block. Will this work?

Yeah, basically everything but the ~ statements.

The rules on this are something like:

  1. Things defined in the transformed parameters block can be used in the model block and generated quantities and get printed to the output.
  2. Things defined in the model block aren’t printed and can only be used in the model block
  3. Things in generated quantities are printed and can only be used in generated quantities

So if you’re using something in model block and generated quantities you can put it in transformed parameters. Also if it’s in transformed parameters it will get printed directly.

Thanks! And I want to ask what is the order of running the code in Stan? If I move the loop into tansformed block, will it first follow the distribution in model block and then generate the parameters in transformed block?
Now my code is as follows:

# reparameterized
itc_gradient_policy = """
data {

    // Gaussian model
    int W;                                                          // Number of weeks (typically 12)
    int N;                                                          // Number of subjects
    int Xdim;                                                       // Dimension of X - latent low dimensional structure of the phenotype

    // Exogenous variables
    int exo_q_num;                                                  // number of exogenous survey questions
    real U[N,W,exo_q_num];                                       // exogenous survey questions - missing weeks were linearly interpolated outside of Stan 

    // Intertemporal choice
    int<lower=0> idx_itc_obs[N,W];                            // Indices of weeks WITH data
    int<lower=1> P_itc;                                             // number of parameters
    int<lower=0> T_max_itc;                                         // Max number of trials across subjects across weeks
    int<lower=0> Tr_itc[N, W];                              // Number of trials for each subj for each week
    real<lower=0> delay_later[N, W, T_max_itc];             // Delay later - subj x weeks x trials
    real<lower=0> amount_later[N, W, T_max_itc];            // Amount later - subj x weeks x trials      
    real<lower=0> amount_sooner[N, W, T_max_itc];           // Amount sooner - subj x weeks x trials
    int<lower=-1, upper=1> choice_itc[N, W, T_max_itc];     // choice itc - 0 for instant reward, 1 for delayed reward - subj x weeks x trials
    
    // reward over the tasks
    real reward[N, W];

}

parameters {

    real itc_k_pr_std[N,W];                  // itc
    real itc_beta_pr_std[N,W];               // itc  
    real<lower=-1, upper=1> C[2,Xdim];
    matrix[N, Xdim] X1;
    real<lower=0, upper=1> eta[N]; //eta
    matrix[Xdim, exo_q_num] B;
    vector[2] mu;
    vector<lower=0.00000001>[2] sigma_r;
}

transformed parameters {
    real itc_k_pr[N,W];                  // itc
    real itc_beta_pr[N,W];               // itc
    real<lower=0> itc_k[N,W];                // change for the sake of consistency  - k_itc
    real<lower=0> itc_beta[N,W];
    real reward_sum;

    real X[N,W,Xdim];
    real itc_grad_sum[N, W, 2];

    
    for (s in 1:N) {
        reward_sum = 0;                
        for (w in 1:W) {
            reward_sum += reward[s, w];
            matrix[2, Tr_itc[s,w]] itc_grad ;
            
            if (w == 1) {
                X[s,w,] = to_array_1d(inv_logit(X1[s,]));
            } else {
                X[s,w,] = to_array_1d( to_vector(X[s,w-1,]) + eta[s].*(reward[s, w] - reward_sum/w)* to_matrix(C)' * to_vector(itc_grad_sum[s, w-1])); // 68
    
            }  
            itc_k_pr[s,w] =  to_array_1d(mu[1] + B * to_vector(U[s,w,]) + to_matrix(C)[1,] * to_vector(X[s,w,]) + sigma_r[1]*itc_k_pr_std[s, w])[1];
            itc_beta_pr[s,w] = to_array_1d(mu[2] + B * to_vector(U[s,w,]) + to_matrix(C)[2,] * to_vector(X[s,w,]) + sigma_r[2]*itc_beta_pr_std[s,w])[1];  
            itc_k[s, w] = exp(itc_k_pr[s,w]);                        // itc    
            itc_beta[s, w] = exp(itc_beta_pr[s, w]);                // itc
            
            vector[Tr_itc[s,w]] ev_later   = to_vector(amount_later[s,w,:Tr_itc[s,w]])  ./ (1 + itc_k[s,w] * to_vector(delay_later[s,w,:Tr_itc[s,w]])/7);
 	          vector[Tr_itc[s,w]] ev_sooner  = to_vector(amount_sooner[s,w,:Tr_itc[s,w]]);
 	          
            vector[Tr_itc[s,w]] itc_p = 1 ./ (1 + exp(- itc_beta[s,w] * (ev_later - ev_sooner)));
            vector[Tr_itc[s,w]] log_likelihood_p_grad = to_vector(choice_itc[s,w,:Tr_itc[s,w]]) ./ (itc_p + 0.00001) - (1 - to_vector(choice_itc[s,w,:Tr_itc[s,w]])) ./ (1 - itc_p + 0.00001);
            
            itc_grad[1,]= to_row_vector(log_likelihood_p_grad .* itc_p .* (1 - itc_p) * itc_beta[s,w] * exp(itc_k_pr[s,w]) .* (- ev_later .* to_vector(delay_later[s,w,:Tr_itc[s,w]])/7 ./ (1 + itc_k[s,w] * to_vector(delay_later[s,w,:Tr_itc[s,w]])/7))); // grad of k
            itc_grad[2,] = to_row_vector(log_likelihood_p_grad .* itc_p .* (1 - itc_p)*exp(itc_beta_pr[s, w]) .* (ev_later - ev_sooner)); // grad of beta


            //vector[Tr_itc[s,w]] ones_rep = to_vector(rep_array(1, Tr_itc[s,w]));
            itc_grad_sum[s, w, ] = to_array_1d(itc_grad * to_vector(rep_array(1, Tr_itc[s,w])));
            
            
 	    }  
    } 


}

model {
       
    to_vector(to_matrix(C)) ~ normal(0, 0.05);
    to_vector(X1) ~ normal(0, 0.1);
    eta ~ normal(0.01, 0.1);// prior for eta
    to_vector(B) ~ normal(0, 0.05);
    mu ~ normal(0, 0.1);
    sigma_r ~ normal(0, 0.1); 


    for(s in 1:N){
      for(w in 1:W){
          itc_beta_pr_std[s,w] ~ std_normal();
          itc_k_pr_std[s,w] ~ std_normal();
          vector[Tr_itc[s,w]] ev_later   = to_vector(amount_later[s,w,:Tr_itc[s,w]])  ./ (1 + itc_k[s,w] * to_vector(delay_later[s,w,:Tr_itc[s,w]])/7);
 	        vector[Tr_itc[s,w]] ev_sooner  = to_vector(amount_sooner[s,w,:Tr_itc[s,w]]);
          choice_itc[s,w,:Tr_itc[s,w]] ~ bernoulli_logit(itc_beta[s,w] * (ev_later - ev_sooner));


      }
    } 
                                       
} 
   



"""

I wonder whether it is totally the same as the original code. I am confused because my model requires sampling from prior and then the sampled parameters be used in the next sampling. So I wonder this is totally the same.

The model executes transformed parameters, then model block or generated quantities.

The thing to keep in mine in terms of thinking about sampling statements is that they’re not actually changing the variable on the left hand side. For instance, this:

x ~ normal(0, 1);

Translates under the hood to:

target += normal_lupdf(x | 0, 1);

What that second statement is doing is incrementing the unnormalized log density the MCMC algorithm uses to generate samples. The reference manual had more info on this stuff (here and here).

So It is the same whether I put the updating functions for the parameters in transformed block or model block? I thought what you mean is Stan will not change the variable but assign a distribution to it and generate the posterior distribution finally with all the information in transformed block and prior. Is that right?

Another question is how I can conduct reduce_sum function to do parallel computing with all the calculations put into transformed parameters. Is there any method that I can compute parallely in transformed block or generated block?

Sampling statements can only go in the model block.

Well the only thing that changes the parameter values directly is the MCMC algorithm (description here), so in the course of evaluating the model block all you’re doing is computing \log p(\theta) (which doesn’t change \theta directly). \theta gets updated by the MCMC.

This is probably the easiest way to get started with reduce_sum: Reduce Sum: A Minimal Example . It should work in any of the blocks.