I am curious about how other stan user’s workflow as it relates to predictive analysis. As a simple example lets say that you want to calculate the root mean squared error of predicted values vs actual from a multiple regression equation.
For reproducibility lets say that I have the following fake data in R:
#library(tidyverse)
library(rstan)
#library(bayesplot)
#library(brms)
n <- 1000
x1 <- rbinom(n,1,.7)
x2 <- runif(n,1,50)
x3 <- rnorm(n, 20, 10)
a <- 10
b1 <- .5
b2 <- .25
b3 <- 2
e <- rnorm(n, 0, .5^2)
y <- a + b1*x1 + b2*x2 + b3*x3 + e
x <- as.data.frame(list(x1=x1,x2=x2,x3=x3)) %>% as.matrix()
train_x <- x[1:700,]
test_x <- x[700:1000,]
train_y <- y[1:700]
test_y <- y[700:1000]
# multiple regression
fit <- stan(file = "lm_pred.stan", data = list(N=length(train_y),
X=train_x,
Y=train_y,
K=ncol(train_x),
N_new = length(test_y),
X_new = test_x,
Y_new = test_y))
fit
my lm_pred.stan
code is :
data{
int<lower=0> N;
int<lower=0> K;
matrix[N, K] X;
vector[N] Y;
int<lower=0> N_new;
matrix[N_new, K] X_new;
}
parameters{
real alpha;
vector[K] beta;
real<lower=0> sigma;
vector[N_new] Y_new;
}
model{
Y ~ normal(alpha + X * beta, sigma);
Y_new ~ normal(alpha + X_new * beta, sigma);
}
This leaves me with predictions in the console. I am used to using either the predict()
function or tidyverse
tools. I am just curious how other stan users would calculate RMSE.