# Reparameterize model when loop is used

I want to reparameterize the model as below. The key thing I think that need to be reparameterized is itc_k_pr and itc_beta_pr. But they will depend on X which will be updated in the loop in model block. So my question is how I could reparameterize them. I think if I follow the general instructions to put them in transformed_parameters block, itc_k_pr and itc_beta_pr will be sampled with fixed X. So I think maybe I need to put them in model block but I wonder whether it will work.

Here is the original code and I delete something unnecessary for others to understand the structure.

parameters {
real itc_k_pr[N,W];                  // itc
real itc_beta_pr[N,W];               // itc

real<lower=-1, upper=1> C[2,Xdim];
matrix[N, Xdim] X1;
real<lower=0, upper=1> eta[N]; //eta
matrix[Xdim, exo_q_num] B;
vector[2] mu;
vector<lower=0.00000001>[2] sigma_r;
}

transformed parameters {

real<lower=0> itc_k[N,W];                // change for the sake of consistency  - k_itc
real<lower=0> itc_beta[N,W];

itc_k = exp(itc_k_pr);                        // itc
itc_beta = exp(itc_beta_pr);                // itc
}

model {

to_vector(to_matrix(C)) ~ normal(0, 0.05);
to_vector(X1) ~ normal(0, 0.1);
eta ~ normal(0.01, 0.1);// prior for eta
to_vector(B) ~ normal(0, 0.05);
mu ~ normal(0, 0.1);
sigma_r ~ normal(0, 0.1);
real X[N,W,Xdim];

for (s in 1:N) {
real reward_sum = 0;
for (w in 1:W) {
reward_sum += reward[s, w];

if (w == 1) {
X[s,w,] = to_array_1d(inv_logit(X1[s,]));
} else {

vector[Xdim] a = eta[s].*(reward[s, w] - reward_sum/w)* to_matrix(C)' * to_vector(itc_grad_sum[s, w-1]);
vector[Xdim] b = to_vector(X[s,w-1,]);
X[s,w,] = to_array_1d( b + a); //

}
itc_k_pr[s,w] ~    normal(mu[1] + B * to_vector(U[s,w,]) + to_matrix(C)[1,] * to_vector(X[s,w,]),sigma_r[1] );
itc_beta_pr[s,w] ~ normal(mu[2] + B * to_vector(U[s,w,]) + to_matrix(C)[2,] * to_vector(X[s,w,]),sigma_r[2] );

vector[Tr_itc[s,w]] ev_later   = xxxxxx
vector[Tr_itc[s,w]] ev_sooner  = xxxxxx
choice_itc[s,w,:Tr_itc[s,w]] ~ bernoulli_logit(itc_beta[s,w] * (ev_later - ev_sooner));
vector[Tr_itc[s,w]] itc_p = xxxx

}
}
}



Below is the reparameterized code I think may work. But I am not sure about it and wonder how it can be revised so that the reparameterization can be done.

parameters {
real itc_k_pr[N,W];                  // itc
real itc_beta_pr[N,W];               // itc

real itc_k_pr_standard[N,W];                  // itc
real itc_beta_pr_standard[N,W];               // itc
real<lower=-1, upper=1> C[2,Xdim];
matrix[N, Xdim] X1;
real<lower=0, upper=1> eta[N]; //eta
matrix[Xdim, exo_q_num] B;
vector[2] mu;
vector<lower=0.00000001>[2] sigma_r;
}

transformed parameters {
//real itc_k_pr[N,W];                  // itc
//real itc_beta_pr[N,W];               // itc
real<lower=0> itc_k[N,W];                // change for the sake of consistency  - k_itc
real<lower=0> itc_beta[N,W];

itc_k = exp(itc_k_pr);                        // itc
itc_beta = exp(itc_beta_pr);                // itc
}

model {

to_vector(to_matrix(C)) ~ normal(0, 0.05);
to_vector(X1) ~ normal(0, 0.1);
eta ~ normal(0.01, 0.1);// prior for eta
to_vector(B) ~ normal(0, 0.05);
mu ~ normal(0, 0.1);
sigma_r ~ normal(0, 0.1);
real X[N,W,Xdim];

for (s in 1:N) {
real reward_sum = 0;
for (w in 1:W) {
reward_sum += reward[s, w];

if (w == 1) {
X[s,w,] = to_array_1d(inv_logit(X1[s,]));
} else {

vector[Xdim] a = eta[s].*(reward[s, w] - reward_sum/w)* to_matrix(C)' * to_vector(itc_grad_sum[s, w-1]);
vector[Xdim] b = to_vector(X[s,w-1,]);
X[s,w,] = to_array_1d( b + a); //

}
itc_k_pr_standard[s,w] ~ std_normal();
itc_k_pr_standard[s,w] ~ std_normal();
itc_k_pr[s,w] =  mu[1] + B * to_vector(U[s,w,]) + to_matrix(C)[1,] * to_vector(X[s,w,]) + sigma_r[1]*itc_k_pr_standard[s,w];
itc_beta_pr[s,w] =  mu[2] + B * to_vector(U[s,w,]) + to_matrix(C)[1,] * to_vector(X[s,w,]) + sigma_r[2]*itc_beta_pr_standard[s,w];

vector[Tr_itc[s,w]] ev_later   = xxxxxx
vector[Tr_itc[s,w]] ev_sooner  = xxxxxx
choice_itc[s,w,:Tr_itc[s,w]] ~ bernoulli_logit(itc_beta[s,w] * (ev_later - ev_sooner));
vector[Tr_itc[s,w]] itc_p = xxxx



It looks like youâ€™re doing a non-centered parameterization for itc_k_pr and itc_beta_pr. So the steps are define those variables as either transformed parameters or parameters in the model block, add new parameters with standard normal priors, and then transform those to the original parameters.

Looking at how youâ€™ve changed:

itc_k_pr[s,w] ~    normal(mu[1] + B * to_vector(U[s,w,]) + to_matrix(C)[1,] * to_vector(X[s,w,]),sigma_r[1] );
itc_beta_pr[s,w] ~ normal(mu[2] + B * to_vector(U[s,w,]) + to_matrix(C)[2,] * to_vector(X[s,w,]),sigma_r[2] );


To:

itc_k_pr[s,w] =  mu[1] + B * to_vector(U[s,w,]) + to_matrix(C)[1,] * to_vector(X[s,w,]) + sigma_r[1]*itc_k_pr_standard[s,w];
itc_beta_pr[s,w] =  mu[2] + B * to_vector(U[s,w,]) + to_matrix(C)[1,] * to_vector(X[s,w,]) + sigma_r[2]*itc_beta_pr_standard[s,w];


Seems okay but I think you may have meant to_matrix(C)[2,] on the second line.

Thereâ€™s also a problem here:

itc_k_pr_standard[s,w] ~ std_normal();
itc_k_pr_standard[s,w] ~ std_normal();


I think the second should be itc_beta_pr_standard.

Also these should no longer be parameters:

real itc_k_pr[N,W];                  // itc
real itc_beta_pr[N,W];               // itc

1 Like

Thank you so much! I have successfully reparameterized it. But I have a question about how to get the itc_k_pr and itc_beta_pr from itc_k_pr_standard/ itc_beta_pr_standard as I want. I think maybe I should generate it in generated_quantities block. But in this way, I will have to repeat all the things I did in model to generate it, which may be time-consuming. Is there any better way that I can get it?

If you define the variables in transformed parameters then they will appear in the output and also be available to use in the model block. Will this work?

But the parameter will depend on the variable that is updated in model block during loop. If I define it in transformed_parameters, it can not use the updated variables. So I think it will not work?

Yeah so everything in that loop would need to move.

To be clear though, sampling statements (the ones with the ~) donâ€™t directly modify any variables in the model, so even though those canâ€™t move to the model block, they arenâ€™t updating any of the other variable values directly so they wouldnâ€™t need to move.

This sorta transformation can get awkward if you end up with large numbers of transformed parameters you donâ€™t want to print. Is that the case here?

Yeah, so do you mean move all the things in the model to transformed block? But I suppose I could move all the things to generated block. Will this work?

Yeah, basically everything but the ~ statements.

The rules on this are something like:

1. Things defined in the transformed parameters block can be used in the model block and generated quantities and get printed to the output.
2. Things defined in the model block arenâ€™t printed and can only be used in the model block
3. Things in generated quantities are printed and can only be used in generated quantities

So if youâ€™re using something in model block and generated quantities you can put it in transformed parameters. Also if itâ€™s in transformed parameters it will get printed directly.

Thanks! And I want to ask what is the order of running the code in Stan? If I move the loop into tansformed block, will it first follow the distribution in model block and then generate the parameters in transformed block?
Now my code is as follows:

# reparameterized
data {

// Gaussian model
int W;                                                          // Number of weeks (typically 12)
int N;                                                          // Number of subjects
int Xdim;                                                       // Dimension of X - latent low dimensional structure of the phenotype

// Exogenous variables
int exo_q_num;                                                  // number of exogenous survey questions
real U[N,W,exo_q_num];                                       // exogenous survey questions - missing weeks were linearly interpolated outside of Stan

// Intertemporal choice
int<lower=0> idx_itc_obs[N,W];                            // Indices of weeks WITH data
int<lower=1> P_itc;                                             // number of parameters
int<lower=0> T_max_itc;                                         // Max number of trials across subjects across weeks
int<lower=0> Tr_itc[N, W];                              // Number of trials for each subj for each week
real<lower=0> delay_later[N, W, T_max_itc];             // Delay later - subj x weeks x trials
real<lower=0> amount_later[N, W, T_max_itc];            // Amount later - subj x weeks x trials
real<lower=0> amount_sooner[N, W, T_max_itc];           // Amount sooner - subj x weeks x trials
int<lower=-1, upper=1> choice_itc[N, W, T_max_itc];     // choice itc - 0 for instant reward, 1 for delayed reward - subj x weeks x trials

real reward[N, W];

}

parameters {

real itc_k_pr_std[N,W];                  // itc
real itc_beta_pr_std[N,W];               // itc
real<lower=-1, upper=1> C[2,Xdim];
matrix[N, Xdim] X1;
real<lower=0, upper=1> eta[N]; //eta
matrix[Xdim, exo_q_num] B;
vector[2] mu;
vector<lower=0.00000001>[2] sigma_r;
}

transformed parameters {
real itc_k_pr[N,W];                  // itc
real itc_beta_pr[N,W];               // itc
real<lower=0> itc_k[N,W];                // change for the sake of consistency  - k_itc
real<lower=0> itc_beta[N,W];
real reward_sum;

real X[N,W,Xdim];

for (s in 1:N) {
reward_sum = 0;
for (w in 1:W) {
reward_sum += reward[s, w];

if (w == 1) {
X[s,w,] = to_array_1d(inv_logit(X1[s,]));
} else {
X[s,w,] = to_array_1d( to_vector(X[s,w-1,]) + eta[s].*(reward[s, w] - reward_sum/w)* to_matrix(C)' * to_vector(itc_grad_sum[s, w-1])); // 68

}
itc_k_pr[s,w] =  to_array_1d(mu[1] + B * to_vector(U[s,w,]) + to_matrix(C)[1,] * to_vector(X[s,w,]) + sigma_r[1]*itc_k_pr_std[s, w])[1];
itc_beta_pr[s,w] = to_array_1d(mu[2] + B * to_vector(U[s,w,]) + to_matrix(C)[2,] * to_vector(X[s,w,]) + sigma_r[2]*itc_beta_pr_std[s,w])[1];
itc_k[s, w] = exp(itc_k_pr[s,w]);                        // itc
itc_beta[s, w] = exp(itc_beta_pr[s, w]);                // itc

vector[Tr_itc[s,w]] ev_later   = to_vector(amount_later[s,w,:Tr_itc[s,w]])  ./ (1 + itc_k[s,w] * to_vector(delay_later[s,w,:Tr_itc[s,w]])/7);
vector[Tr_itc[s,w]] ev_sooner  = to_vector(amount_sooner[s,w,:Tr_itc[s,w]]);

vector[Tr_itc[s,w]] itc_p = 1 ./ (1 + exp(- itc_beta[s,w] * (ev_later - ev_sooner)));
vector[Tr_itc[s,w]] log_likelihood_p_grad = to_vector(choice_itc[s,w,:Tr_itc[s,w]]) ./ (itc_p + 0.00001) - (1 - to_vector(choice_itc[s,w,:Tr_itc[s,w]])) ./ (1 - itc_p + 0.00001);

itc_grad[1,]= to_row_vector(log_likelihood_p_grad .* itc_p .* (1 - itc_p) * itc_beta[s,w] * exp(itc_k_pr[s,w]) .* (- ev_later .* to_vector(delay_later[s,w,:Tr_itc[s,w]])/7 ./ (1 + itc_k[s,w] * to_vector(delay_later[s,w,:Tr_itc[s,w]])/7))); // grad of k
itc_grad[2,] = to_row_vector(log_likelihood_p_grad .* itc_p .* (1 - itc_p)*exp(itc_beta_pr[s, w]) .* (ev_later - ev_sooner)); // grad of beta

//vector[Tr_itc[s,w]] ones_rep = to_vector(rep_array(1, Tr_itc[s,w]));

}
}

}

model {

to_vector(to_matrix(C)) ~ normal(0, 0.05);
to_vector(X1) ~ normal(0, 0.1);
eta ~ normal(0.01, 0.1);// prior for eta
to_vector(B) ~ normal(0, 0.05);
mu ~ normal(0, 0.1);
sigma_r ~ normal(0, 0.1);

for(s in 1:N){
for(w in 1:W){
itc_beta_pr_std[s,w] ~ std_normal();
itc_k_pr_std[s,w] ~ std_normal();
vector[Tr_itc[s,w]] ev_later   = to_vector(amount_later[s,w,:Tr_itc[s,w]])  ./ (1 + itc_k[s,w] * to_vector(delay_later[s,w,:Tr_itc[s,w]])/7);
vector[Tr_itc[s,w]] ev_sooner  = to_vector(amount_sooner[s,w,:Tr_itc[s,w]]);
choice_itc[s,w,:Tr_itc[s,w]] ~ bernoulli_logit(itc_beta[s,w] * (ev_later - ev_sooner));

}
}

}

"""


I wonder whether it is totally the same as the original code. I am confused because my model requires sampling from prior and then the sampled parameters be used in the next sampling. So I wonder this is totally the same.

The model executes transformed parameters, then model block or generated quantities.

The thing to keep in mine in terms of thinking about sampling statements is that theyâ€™re not actually changing the variable on the left hand side. For instance, this:

x ~ normal(0, 1);


Translates under the hood to:

target += normal_lupdf(x | 0, 1);


What that second statement is doing is incrementing the unnormalized log density the MCMC algorithm uses to generate samples. The reference manual had more info on this stuff (here and here).

So It is the same whether I put the updating functions for the parameters in transformed block or model block? I thought what you mean is Stan will not change the variable but assign a distribution to it and generate the posterior distribution finally with all the information in transformed block and prior. Is that right?

Another question is how I can conduct reduce_sum function to do parallel computing with all the calculations put into transformed parameters. Is there any method that I can compute parallely in transformed block or generated block?

Sampling statements can only go in the model block.

Well the only thing that changes the parameter values directly is the MCMC algorithm (description here), so in the course of evaluating the model block all youâ€™re doing is computing \log p(\theta) (which doesnâ€™t change \theta directly). \theta gets updated by the MCMC.

This is probably the easiest way to get started with reduce_sum: Reduce Sum: A Minimal Example . It should work in any of the blocks.