# Help with vectorizing for loops

Hello everyone,

I am looking for a way to vectorize the two for loops that I have in my model. Any help would be appreciated. Here is my code:

``````data {
int<lower=0> n_person; // number of examinees
int<lower=0> n_item;  // number of items
int<lower = 1> Dim; // number of dimensions
int<lower = 1> N; // number of data points
int<lower = 1> jj[N]; // item id
int<lower = 1> ii[N]; // person id
int<lower=0, upper = 1> Y[N];
real<lower=0> t[N]; // response time data
real<lower=0, upper=1> X[N]; // time scale variable
vector[Dim] Zero; // vector of 0s for person parameter means
}

parameters {
vector[Dim] PersParStar[n_person];
cholesky_factor_corr[Dim] choleskyP;   // Cholesky decomposition of correlation of person parameters
vector<lower=0>[Dim] sigmaP;  // variances of person parameters
vector[n_item] b;          // item difficulty parameters
vector[n_item] beta;       // item time intensity parameters
real<lower=0> alpha;      // common variance of RTs

}

transformed parameters {
vector[Dim] PersPar[n_person];
for(i in 1:n_person){
PersPar[i] = Zero + sigmaP .* (choleskyP * PersParStar[i]);
}
}

model {
for(j in 1:n_person){
PersParStar[j] ~ std_normal();
}
sigmaP ~ student_t(3, 0, 2.5);
choleskyP ~ lkj_corr_cholesky(1);
b ~ normal(0, 100);
beta ~ normal(0, 100);
alpha ~ student_t(3, 0, 2.5);
for(n in 1:N){
target += (bernoulli_logit_lpmf(Y[n] | (PersPar[ii[n],1] + PersPar[ii[n],2]*X[n]) - b[jj[n]])+
lognormal_lpdf(t[n] | beta[jj[n]] - (PersPar[ii[n],3] + PersPar[ii[n],4]*X[n]), alpha));
}
}
``````

I havenâ€™t tested these but the above should help

Thanks for the code, Iâ€™m not sure if I perhaps put the pieces of code together incorrectly because Iâ€™m getting the `PARSER EXPECTED: <identifier>` error on the line where the response times, t, are declared in the data block. This is how my code looks:

``````data {
int<lower=0> n_person; // number of examinees
int<lower=0> n_item;  // number of items
int<lower = 1> Dim; // number of dimensions
int<lower = 1> N; // number of data points
int<lower = 1> jj[N]; // item id
int<lower = 1> ii[N]; // person id
int<lower=0, upper = 1> Y[N];
real<lower=0>[N] t; // response time data
real<lower=0, upper=1>[N] X; // time scale variable
vector[Dim] Zero; // vector of 0s for person parameter means
}

parameters {
matrix[Dim, n_person] PersParStar;
cholesky_factor_corr[Dim] choleskyP;   // Cholesky decomposition of correlation of person parameters
vector<lower=0>[Dim] sigmaP;  // variances of person parameters
vector[n_item] b;          // item difficulty parameters
vector[n_item] beta;       // item time intensity parameters
real<lower=0> alpha;      // common variance of RTs

}

transformed parameters {
matrix[Dim, n_person] PersPar;
PersPar[i] = Zero + rep_matrix(sigmaP, n_person) .* (choleskyP * PersParStar);
}

model {
to_vector(PersParStar) ~ std_normal();
sigmaP ~ student_t(3, 0, 2.5);
choleskyP ~ lkj_corr_cholesky(1);
b ~ normal(0, 100);
beta ~ normal(0, 100);
alpha ~ student_t(3, 0, 2.5);
target += (bernoulli_logit_lpmf(Y | (PersPar[ii,2] .* X + PersPar[ii,1]) - b[jj]) +
lognormal_lpdf(t | beta[jj] - (PersPar[ii,4] .* X + PersPar[ii,3]), alpha));
}
``````

Nevermind, the above error was due to the declarations of t and X in the data block having [N] in front of them rather than behind. However, now I have a different error message in my code which is related to vectorizing the for loops. In one of the lines, the index that was used in the for loop has stayed in the code. When I try to remove it, I get the error: `No matches for: Available argument signatures for operator+: Expression is ill formed.` The error is in the transformed parameters block at this line: `PersPar = Zero + rep_matrix(sigmaP, n_person) .* (choleskyP * PersParStar);`

This is how the whole code looks like:

``````data {
int<lower=0> n_person; // number of examinees
int<lower=0> n_item;  // number of items
int<lower = 1> Dim; // number of dimensions
int<lower = 1> N; // number of data points
int<lower = 1> jj[N]; // item id
int<lower = 1> ii[N]; // person id
int<lower=0, upper = 1> Y[N];
real<lower=0> t[N]; // response time data
real<lower=0, upper=1> X[N]; // time scale variable
vector[Dim] Zero; // vector of 0s for person parameter means
}

parameters {
matrix[Dim, n_person] PersParStar;
cholesky_factor_corr[Dim] choleskyP;   // Cholesky decomposition of correlation of person parameters
vector<lower=0>[Dim] sigmaP;  // variances of person parameters
vector[n_item] b;          // item difficulty parameters
vector[n_item] beta;       // item time intensity parameters
real<lower=0> alpha;      // common variance of RTs

}

transformed parameters {
matrix[Dim, n_person] PersPar;
PersPar = Zero + rep_matrix(sigmaP, n_person) .* (choleskyP * PersParStar);
}

model {
to_vector(PersParStar) ~ std_normal();
sigmaP ~ student_t(3, 0, 2.5);
choleskyP ~ lkj_corr_cholesky(1);
b ~ normal(0, 100);
beta ~ normal(0, 100);
alpha ~ student_t(3, 0, 2.5);
target += (bernoulli_logit_lpmf(Y | (PersPar[ii,2] .* X + PersPar[ii,1]) - b[jj]) +
lognormal_lpdf(t | beta[jj] - (PersPar[ii,4] .* X + PersPar[ii,3]), alpha));
}
``````

Oh sorry you need to use `rep_matrix` on Zero as well

1 Like

Thanks, that fixed that error. Now I just have an error in one of the last lines where it says `PersPar[ii,1]`. Strangely, the error doesnâ€™t appear at `PersPar[ii,2]`, which appears earlier in the code. The error text is almost identical to the previous error, but says that there are no matches for `operator+` and for `operator.*`. Do you perhaps know what could be the cause?

Can you post the full error?

Certainly. Before Iâ€™ve just been looking at the error message in the pop-up balloon while editing syntax, but now I ran the model and here is the whole message. It looks a bit weird:

``````SYNTAX ERROR, MESSAGE(S) FROM PARSER:

Unknown variable: to_vector
No matches for:

vector .* real[]

Available argument signatures for operator.*:

vector .* vector
row vector .* row vector
matrix .* matrix

Expression is ill formed
error in 'model1a33711e61506_var_abil_speed_lin' at line 36, column 57
-------------------------------------------------
34:   beta ~ normal(0, 100);
35:   alpha ~ student_t(3, 0, 2.5);
36: target += (bernoulli_logit_lpmf(Y | ((PersPar[ii,2] .* X) + PersPar[ii,1]) - b[jj]) +
^
37:  lognormal_lpdf(t | beta[jj] - (PersPar[ii,4] .* X + PersPar[ii,3]), alpha));
-------------------------------------------------

Error in stanc(file = file, model_code = model_code, model_name = model_name,  :
failed to parse Stan model 'var_abil_speed_lin' due to the above error.
Calls: stan -> stan_model -> stanc
Execution halted
``````
``````The code:

data {
int<lower=0> n_person; // number of examinees
int<lower=0> n_item;  // number of items
int<lower = 1> Dim; // number of dimensions
int<lower = 1> N; // number of data points
int<lower = 1> jj[N]; // item id
int<lower = 1> ii[N]; // person id
int<lower=0, upper = 1> Y[N];
real<lower=0> t[N]; // response time data
real<lower=0, upper=1> X[N]; // time scale variable
vector[Dim] Zero; // vector of 0s for person parameter means
}

parameters {
matrix[Dim, n_person] PersParStar;
cholesky_factor_corr[Dim] choleskyP;   // Cholesky decomposition of correlation of person parameters
vector<lower=0>[Dim] sigmaP;  // variances of person parameters
vector[n_item] b;          // item difficulty parameters
vector[n_item] beta;       // item time intensity parameters
real<lower=0> alpha;      // common variance of RTs

}

transformed parameters {
matrix[Dim, n_person] PersPar;
PersPar = rep_matrix(Zero, n_person) + rep_matrix(sigmaP, n_person) .* (choleskyP * PersParStar);
}

model {
to_vector(PersParStar) ~ std_normal();
sigmaP ~ student_t(3, 0, 2.5);
choleskyP ~ lkj_corr_cholesky(1);
b ~ normal(0, 100);
beta ~ normal(0, 100);
alpha ~ student_t(3, 0, 2.5);
target += (bernoulli_logit_lpmf(Y | ((PersPar[ii,2] .* X) + PersPar[ii,1]) - b[jj]) +
lognormal_lpdf(t | beta[jj] - (PersPar[ii,4] .* X + PersPar[ii,3]), alpha));
}
``````

I tried wrapping everything in the last lines in the `to_vector` function, like this:

``````data {
int<lower=0> n_person; // number of examinees
int<lower=0> n_item;  // number of items
int<lower = 1> Dim; // number of dimensions
int<lower = 1> N; // number of data points
int<lower = 1> jj[N]; // item id
int<lower = 1> ii[N]; // person id
int<lower=0, upper = 1> Y[N];
real<lower=0> t[N]; // response time data
real<lower=0, upper=1> X[N]; // time scale variable
vector[Dim] Zero; // vector of 0s for person parameter means
}

parameters {
matrix[Dim, n_person] PersParStar;
cholesky_factor_corr[Dim] choleskyP;   // Cholesky decomposition of correlation of person parameters
vector<lower=0>[Dim] sigmaP;  // variances of person parameters
vector[n_item] b;          // item difficulty parameters
vector[n_item] beta;       // item time intensity parameters
real<lower=0> alpha;      // common variance of RTs

}

transformed parameters {
matrix[Dim, n_person] PersPar;
PersPar = rep_matrix(Zero, n_person) + rep_matrix(sigmaP, n_person) .* (choleskyP * PersParStar);
}

model {
to_vector(PersParStar) ~ std_normal();
sigmaP ~ student_t(3, 0, 2.5);
choleskyP ~ lkj_corr_cholesky(1);
b ~ normal(0, 100);
beta ~ normal(0, 100);
alpha ~ student_t(3, 0, 2.5);
target += (bernoulli_logit_lpmf(Y | (to_vector(PersPar[ii,2]) .* to_vector(X) + to_vector(PersPar[ii,1])) - b[jj]) +
lognormal_lpdf(t | beta[jj] - (to_vector(PersPar[ii,4]) .* to_vector(X) + to_vector(PersPar[ii,3])), alpha));
}
``````

However, now I am getting the following error:

``````Chain 4: Unrecoverable error evaluating the log probability at the initial value.
Chain 4: Exception: matrix[multi,uni] index row: accessing element out of range. index 5 out of range; expecting index to be between 1 and 4  (in 'model38404540abd85_var_abil_speed_lin' at line 36)
``````

I would be really grateful if someone could help me sort this out.

Try the following:

``````data {
int<lower=0> n_person; // number of examinees
int<lower=0> n_item;  // number of items
int<lower = 1> Dim; // number of dimensions
int<lower = 1> N; // number of data points
int<lower = 1> jj[N]; // item id
int<lower = 1> ii[N]; // person id
int<lower=0, upper = 1> Y[N];
vector<lower=0>[N] t; // response time data
vector<lower=0, upper=1>[N] X; // time scale variable
vector[Dim] Zero; // vector of 0s for person parameter means
}

parameters {
matrix[n_person, Dim] PersParStar;
cholesky_factor_corr[Dim] choleskyP;   // Cholesky decomposition of correlation of person parameters
vector<lower=0>[Dim] sigmaP;  // variances of person parameters
vector[n_item] b;          // item difficulty parameters
vector[n_item] beta;       // item time intensity parameters
real<lower=0> alpha;      // common variance of RTs
}

transformed parameters {
matrix[n_person, Dim] PersPar = PersParStar * diag_pre_multiply(sigmaP, choleskyP);
}

model {
to_vector(PersParStar) ~ std_normal();
sigmaP ~ student_t(3, 0, 2.5);
choleskyP ~ lkj_corr_cholesky(1);
b ~ normal(0, 100);
beta ~ normal(0, 100);
alpha ~ student_t(3, 0, 2.5);
target += bernoulli_logit_lpmf(Y | (PersPar[ii,1] + PersPar[ii,2] .* X) - b[jj]);
target += lognormal_lpdf(t | beta[jj] - (PersPar[ii,3] + PersPar[ii,4] .* X), alpha);
}
``````

By changing the data/parameters to use vectors and matrices, instead of arrays and arrays of vectors, itâ€™s a bit easier to use the vectorised forms

2 Likes

Iâ€™d also recommend using narrower priors, especially for the `bernoulli` parameters, as values on the logit scale approach the bounds of the function (0 & 1) very quickly:

Thank you very much for your help! This code seems to work! I would just like to ask you two additional questions:

1.) Can the `target` always be broken down into several steps? I thought that the whole expression must always stay in the single `target`. What exactly is different if the `target` is calculated here in two steps as opposed to just putting `+` in between the Bernoulli part and the lognormal part? Or is it the same?

2.) Do you think that the priors of (0, 10) instead of (0, 100) for the bs and betas would be enough, or would I need to narrow down things more (weâ€™re trying to have non-informative priors)?

Many thanks!

1. `target` is an internal variable that Stan uses to hold the log-density. Sampling statements using `~` are a shorthand for telling Stan to add something to `target`. We can also explicitly add something to target using `target += something`. As such, the following are equivalent, at least at the level of representing the same model (Iâ€™m not sure whether or not there are minor under-the-hood differences in the compiled c++).
``````target += thing_1;
target += thing_2;
``````

is equivalent to

``````target += thing_1 + thing_2;
``````
1. â€śNon-informativeâ€ť is not clearly defined in this case. For an intercept-only model, intuitions of non-informative often correspond to a prior that is non-informative on the probability scale. This corresponds approximately to `student_t(7.76, 0, 1.57)` and exactly to `logistic(0, 1)`. However, once we begin to include covariates, then even if the intercept is uninformative on the probability scale, the prior pushforward distributions for the inverse-logit of the linear predictor will still begin to concentrate heavily near 0 and 1. To control the degree of this concentration, it is useful to tighten the priors on the coefficients as much as youâ€™re comfortable with. My personal experience has been that thinking carefully about these prior pushforward distributions in the context of any particular logistic regression has always ultimately led me away from being â€śnoninformativeâ€ť with my priors, because my noninformative priors are always informative on the prior pushforward distribution for the inverse-link linear predictor in ways that I do not intend.
3 Likes