I am a little confused about how to change the following code to return parameters for each factor as opposed to just the vector.
I have the following model
{stan, output.var=multi_regression}
data {
int<lower=0> N; // number of data items
int<lower=0> K; // number of predictors
matrix[N, K] x; // predictor matrix
vector[N] y; // outcome vector
}
parameters {
real alpha; // intercept
vector[K] beta; // coefficients for predictors
real<lower=0> sigma; // error scale
}
model {
y ~ normal(x * beta + alpha, sigma); // likelihood
}
and the following data
{r}
library(tidyverse)
library(rstan)
data(diamonds)
y <- select(diamonds,price)
x <- select(diamonds,carat,cut)
N <- nrow(diamonds)
K <- 2
#try to speed things up
rstan_options(auto_write = FALSE)
options(mc.cores = 4)
#multi_regression model
multi_reg_model <- sampling(multi_regression,
data=list(y=y$price,x=x,N=N,K=K))
rstan::summary(multi_reg_model)$summary
mf <- model.frame(price ~ carat + cut, data = diamonds)
y <- mf$price
x <- model.matrix(price ~ carat + cut, data = mf)
or at least that is basically what the rstanarm and brms R packages do when fitting models like this. If you want to get rid of those Helmert contrasts, you can first coerce cut to an unordered factor or call
Thanks, with your suggestion I cam up with the following, which seems to work
{r}
# Prepare data to replicate the following
# lm(price~carat + cut, data=diamonds)
library(tidyverse)
library(rstan)
data("diamonds")
mf <- model.frame(price ~ carat + cut, data = diamonds) #needed for factors
y <- mf$price
x <- model.matrix(price ~ carat + cut, data = mf) #also needed for factors
N <- nrow(diamonds)
K <- 6 #number of parameters (remember to count factors)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
multi_reg <- sampling(multi_regression,
data=list(y=y,x=x,N=N,K=K))
rstan::summary(multi_regression)$summary
One thing I am surprised about is how long this takes to run. I am new to Bayesian computing, but I am experiencing over 30 minutes of wait time. Is this an average wait time?
I received the following output
1000 transitions using 10 leapfrog steps per transition would take 46.7 seconds.
That Stan output is basically a lie so don’t pay any attention to it. But since that dataset is 50000+ observations and you coded the likelihood without using sufficient statistics, it can take a while for NUTS to finish.
No, it is an exactly true statement conditioned on the clearly stated conditions. In problems that require many more leapfrog steps per transition the runtime will be correspondingly larger.
That is why I said it is basically a lie. Technically true, but completely misleading for what people infer it is supposed to be used for, which is to project the runtime. We should get rid of it.
@bgoodri Thanks again for the help. I am having difficulty following the code you linked to in github. As a simple example, what changes could I make to my current multiple regression model to speed it up. I think if I were able to see that then I would be able to apply the same pattern to other problems.
This only works for linear, no-link, Gaussian models, but take this modified function to evaluate the log-likelihood
functions {
/**
* Increments the log-posterior with the logarithm of a multivariate normal
* likelihood with a scalar standard deviation for all errors
* Equivalent to normal_lpdf(y | intercept + X * beta, sigma) but faster
* @param theta vector of coefficients (excluding intercept)
* @param XpX matrix equal to X'X (excluding intercept) with columns of X centered
* @param b precomputed vector of OLS coefficients (excluding intercept)
* @param intercept scalar (assuming columns of Q have mean zero)
* @param ybar precomputed sample mean of the outcome
* @param SSR positive precomputed value of the sum of squared OLS residuals
* @param sigma positive scalar for the standard deviation of the errors
* @param N integer equal to the number of observations
*/
void ll_mvn_ols_lp(vector theta, matrix XpX, vector b,
real intercept, real ybar,
real SSR, real sigma, int N) {
target += -0.5 * (quad_form(XpX, theta - b) +
N * square(intercept - ybar) + SSR) /
square(sigma) - N * (log(sigma) + log(sqrt(2 * pi())));
}
}
You are going to need to split the intercept out of X, center the columns of X, calculate the OLS solution, the sample average of the outcome, the sum of squared residuals, etc. Also, calculate XpX as t(X) %*% X in R. You can pass all that to the function and it will use much less autodiff to calculate the derivatives of the likelihood function.
The QR transformation is talked about in the manual and in one of Mike’s case studies. It will usually help and applies to any GLM.
But really in this case, the model of diamond price does not fit that well, which is responsible for a lot of the slowdown. It is better to model log-price additively.
MCMCpack::MCMCregress is using essentially the same tricks to pre-compute matrices of cross-products and stuff as rstanarm::stan_lm does. But NUTS is generally a better way to do MCMC than is Gibbs sampling and is always better than Metropolis-Hastings. If you compare MCMCpack::MCMClogit to either rstanarm::stan_glm or brms::brm with binomial link functions there is a big difference in the sampling efficiency.