Student_t_cdf vectorization issue


#1

Hi, I am trying to write the stan file for robit model which is a probit model that uses student_t_cdf instead of normal cdf.

My code works fine but since it is very slow, I tried to use vectorization.

It doesn’t work for some reason.

Can I not use vectorization for student_t_cdf?

Here is my code that works fine.

data {
int<lower=0> N; // number of obs
int<lower=0> K; // number of predictors
int<lower=0,upper=1> u[N]; // outcomes
matrix[N, K] v; // predictor variables
}
parameters {
vector[K] beta; // beta coefficients
real<lower=1> df;
}
model {
vector[N] mu;
beta ~ normal(0, 10); // normal priors for betas
df ~ gamma(2, 0.1);
mu = v*beta;
for (n in 1:N) mu[n] = student_t_cdf(mu[n], df, 0, 1);
u ~ bernoulli(mu);
}

And this is the vectorization code that I tried.

data {
int<lower=0> N; // number of obs
int<lower=0> K; // number of predictors
int<lower=0,upper=1> u[N]; // outcomes
matrix[N, K] v; // predictor variables
}
parameters {
vector[K] beta; // beta coefficients
real<lower=1> df;
}
model {
vector[N] mu;
beta ~ normal(0, 10); // normal priors for betas
df ~ gamma(2, 0.1);
mu = v*beta;
mu = student_t_cdf(mu, df, 0, 1);
u ~ bernoulli(mu);
}

And this is the error message I get.

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

base type mismatch in assignment; variable name = mu, type = vector; right-hand side type=real
error in ‘model276074a12540_robit’ at line 16, column 7

14:   df ~ gamma(2, 0.1);
15:   mu = v*beta; 
16:   mu = student_t_cdf(mu, df, 0, 1);
          ^
17:   u ~ bernoulli(mu);

PARSER EXPECTED:
Error in stanc(file = file, model_code = model_code, model_name = model_name, :
failed to parse Stan model ‘robit’ due to the above error.
In addition: Warning message:
In readLines(file, warn = TRUE) :
incomplete final line found on ‘C:\Users\g1310\Desktop\Stan\robit.stan’

Or is there any other way that I can speed up my model?


#2

The student_t_cdf outputs a scalar number (the product of the CDFs when the input is a vector). Even if there were a version of the Student t CDF that input a vector and output a vector, it would entail the same loop in C++ as the one you wrote, so there would be no difference in execution speed; just less typing.

Models like this are slow because estimating the degrees of freedom is hard. Unless you really think a Cauchy is possible, you often get better results by bumping the lower limit on df up a bit.