Regression - generic formulation much slower

Dear all,

(sorry if I’m double-posting - I tried posting it on the google group, but I don’t think it went through…)
I have coded a very simple hierarchical regression in Stan with 2 predictors that ran reasonably fast.
Then I tried to make the code generic so that I could use it for an arbitrary number of predictors (I tried to code it the way the manual suggests in the section on Linear Regression). While this still gave me the same parameter estimates (so something must be right), it took much longer (~3 times slower). I’m not sure why and whether there is anything I can do to speed it up?

Original model:
data {
int datp; //number of data points
vector[datp] y; //reaction time
int nsub; //number of subjects
int ntrs[nsub]; //number of trials for each person
int<lower=1> nregs; //number of regressors in this design (including constant)
matrix[datp,nregs] x; //design matrix - format: column 1:intercept, column 2: regressor, rows:datapoints for all subjects concatenated
int subID[datp]; //which subject each datapoint belongs to
}

parameters {
// group level inference parameters
vector[2] beta_mu; // constant (intercept) and effect of condition (slope)

// structured noise: differences between individual subjects
real<lower=0> sigma_e; //error sd per measurement
vector<lower=0>[2] sigma; //individual differences between people
vector[nsub] betas0; //RT adjustment for each person (intercept)
vector[nsub] betas1; //RT adjustment for slope
}

model {
real mu;
betas0 ~ normal(0,sigma[1]);
betas1 ~ normal(0,sigma[2]);

// Run regression
for (i in 1:datp){
mu = (beta_mu[1] + betas0[subID[i]])*x[i,1] + (beta_mu[2] + betas1[subID[i]])*x[i,2];
y[i] ~ lognormal(mu,sigma_e);
}
}

Here is the new model:
data {
int datp; //number of data points
vector[datp] y; //reaction time
int nsub; //number of subjects
int ntrs[nsub]; //number of trials for each person
int<lower=1> nregs; //number of regressors in this design (including constant)
matrix[datp,nregs] x; //design matrix - format: column 1:intercept, column 2: regressor, rows:datapoints for all subjects concatenated
int subID[datp]; //which subject each datapoint belongs to
}

parameters{
vector[nregs] beta_mu; // group level paras
vector<lower=0>[nregs] sigma; // individual differences
real<lower=0> sigma_e; //error sd per measurement
vector[nsub] betasRaw[nregs];
}

// To do a non-centered distribution,this speeds up the sampling
transformed parameters{
vector[nsub] betas[nregs];
for (ir in 1:nregs){
betas[ir] = beta_mu[ir]+sigma[ir]*betasRaw[ir];}
}
// This block runs the actual model
model {
real mu;
//Individual subject
for (ir in 1:nregs){
betasRaw[ir] ~ normal(0,1);
}

// Run the regression
for (i in 1:datp){
mu=to_vector(betas[:,subID[i]])’*x[i]’;
y[i] ~ lognormal(mu,sigma_e);
}
}

Howdy Jacquie

On a quick read of the model, are your intercepts coming from your design matrix or your code?

matrix[datp,nregs] x; //design matrix - format: column 1:intercept, column 2: regressor, rows:datapoints for all

vs

vector[2] beta_mu; // constant (intercept) and effect of condition (slope)

And is what you want that each person’s betas come from a distribution around the group beta_mus? So somewhere should there be a:

beta_mu0(0.0, 10.0)
betas0 ~ normal(beta_mu0, sigmas[1])

Something like that?

It’s this line that seems suspicious to me (in both models):

mu = (beta_mu[1] + betas0[subID[i]])x[i,1] + (beta_mu[2] + betas1[subID[i]])x[i,2];

That help at all?

thanks for the quick reply:

On a quick read of the model, are your intercepts coming from your design matrix or your code?
matrix[datp,nregs] x; //design matrix - format: column 1:intercept, column 2: regressor, rows:datapoints for all vs
vector[2] beta_mu; // constant (intercept) and effect of condition (slope)

sorry, I could have labeled this better, the design matrix has a column of ones, so intercept is fitted - i.e. the regression weight that is multiplied with that column of 1s

And is what you want that each person’s betas come from a distribution around the group beta_mus? So somewhere should there be a:
beta_mu0(0.0, 10.0)
betas0 ~ normal(beta_mu0, sigmas[1])
Something like that?

yes, it’s meant to be a distribution around beta_mus and I thought that the line

mu = (beta_mu[1] + betas0[subID[i]])x[i,1] + (beta_mu[2] + betas1[subID[i]])x[i,2];

achieves this? I think it does because the results are the same if I code it like:

betas0 ~ normal(beta_mu0,sigmas[1])

If’ve now made a new version of the script, after consulting the Stan manual more carefully again (Section Hierarchical Logistic Regression). And now it runs much faster (and still gives the same results). But I’m still not sure why this code would be different speed? Is it something about the call to the to_vector command? (I don’t think it’s the row that computes mu in a loop before doing the sampling with the vector, because if I change this, it’s about the same speed)

data {
int datp; //number of data points
vector[datp] y; //reaction time
int nsub; //number of subjects
int ntrs[nsub]; //number of trials for each person
int<lower=1> nregs; //number of regressors in this design (including constant)
row_vector[nregs] x[datp]; // design matrix
int subID[datp]; //which subject each datapoint belongs to
}
parameters{
vector[nregs] beta_mu;
vector[nregs] sigma;
real<lower=0> sigma_e; //error sd per measurement
row_vector[nregs] betasRaw[nsub];
}
transformed parameters{
vector[nregs] betas[nsub];
for (is in 1:nsub)
betas[is] = beta_mu + betasRaw[is]*sigma;
}
model {
real mu[datp];
//Individual subject
for (is in 1:nsub){
betasRaw[is] ~ normal(0,1);}
// Compute this first to speed up code - ‘sampling’ on vector is faster than in loop
for (i in 1:datp){
mu[i]=x[i]*betas[subID[i]];}
y ~ lognormal(mu,sigma_e);
}

yes, it's meant to be a distribution around beta_mus and I thought that the line

You’re right, my apologies. Someone else probly knows better on the performance details, but here’s my go at it:

mu=to_vector(betas[:,subID[i]])'*x[i]';

Is just creating a lot of extra shuffling of values that the autodiff has to track (a lot of these transformations have to be autodiff’d through, even if they look like simple identities). In C++ lingo, vector[nsub] betas[nregs] is a std::vectorEigen::VectorXd (nregs length c++ std::vector containing some nsub length Eigen::VectorXds), so the multiple indexing (betas[:,subID[i]]) isn’t a super straightforward. The first slice is across a C++ std::vector, and then the second indices go to an Eigen::VectorXd.

It might also be fast if in the betas were:

vector[nregs] betas[nsub]; instead of vector[nsub] betas[nregs]; and then you can do the dot product with dot_product(v1, v2).

Or done as a matrix, but I think the thing you figured out looks good :D (and is easy to read).

just for the record, I realised that i’ve made a mistake when defining vectors/row vectors, it should say:

parameters{
vector[nregs] beta_mu;
vector<lower=0>[nregs] sigma;
real<lower=0> sigma_e; //error sd per measurement
vector[nregs] betasRaw[nsub];
}
transformed parameters{
vector[nregs] betas[nsub];
for (is in 1:nsub)
betas[is] = beta_mu + betasRaw[is].*sigma;
}