# Workflow: Root Mean Squared Error

I am curious about how other stan user’s workflow as it relates to predictive analysis. As a simple example lets say that you want to calculate the root mean squared error of predicted values vs actual from a multiple regression equation.

For reproducibility lets say that I have the following fake data in R:

``````#library(tidyverse)
library(rstan)
#library(bayesplot)
#library(brms)

n <- 1000
x1 <- rbinom(n,1,.7)
x2 <- runif(n,1,50)
x3 <- rnorm(n, 20, 10)

a <- 10
b1 <- .5
b2 <- .25
b3 <- 2

e <- rnorm(n, 0, .5^2)
y <- a + b1*x1 + b2*x2 + b3*x3 + e
x <- as.data.frame(list(x1=x1,x2=x2,x3=x3)) %>% as.matrix()

train_x <- x[1:700,]
test_x <- x[700:1000,]
train_y <- y[1:700]
test_y <- y[700:1000]

# multiple regression
fit <- stan(file = "lm_pred.stan", data = list(N=length(train_y),
X=train_x,
Y=train_y,
K=ncol(train_x),
N_new = length(test_y),
X_new = test_x,
Y_new = test_y))

fit
``````

my `lm_pred.stan` code is :

``````data{
int<lower=0> N;
int<lower=0> K;
matrix[N, K] X;
vector[N] Y;

int<lower=0> N_new;
matrix[N_new, K] X_new;
}

parameters{
real alpha;
vector[K] beta;
real<lower=0> sigma;

vector[N_new] Y_new;
}

model{
Y ~ normal(alpha + X * beta, sigma);
Y_new ~ normal(alpha + X_new * beta, sigma);
}
``````

This leaves me with predictions in the console. I am used to using either the `predict()` function or `tidyverse` tools. I am just curious how other stan users would calculate RMSE.

Both rstanarm and brms have `predict` functions that take a `newdata` argument. For things as simple as your example, you should be using that. If you were to write this in Stan code, `Y_new` should not be declared in parameters, but rather in `generated quantities`, like this

``````generated quantities {
vector[N_new] Y_new;
{
vector[N_new] mu_new = alpha + X_new * beta;
for (n in 1:N_new) Y_new[n] = normal_rng(mu_new[n], sigma);
}
}
``````

Thanks, I like using brms and I understand that this would normally be a model that I would fit with brms, but I am practicing stan so I can fit more realistic models in the future.

Is it possible to fit a model in stan then use `predict` from brms on new data? If so, how would I get a `stanfit` object from my code above with the purpose of passing it to `brms::predict`?

No

Okay, so for models written in stan how do most people calculated root mean squared error for predicted values?

Skipping sections of the Stan model you already have, something like this:

``````data {
.
.
.
// Observed Y values at the X_new values, not used in model estimation
vector [N_new] Y_test;
}
.
.
.
generated quantities {
vector[N_new] Y_new;
vector[N_new] indiv_squared_errors;
real <lower = 0> sum_of_squares;
real <lower = 0> root_mean_squared_error;

{
vector[N_new] mu_new = alpha + X_new * beta;
for (n in 1:N_new)  {
Y_new[n] = normal_rng(mu_new[n], sigma);
indiv_squared_errors[n] = (Y_test[n] - Y_new[n])^2;
}
}
// sum of squares, data set size and scale specific
sum_of_squares = sum(indiv_squared_errors);

// divide by number of new / test points and sqrt for RMSE
root_mean_squared_error = sqrt(sum_of_squares / N_new);

}
``````

The usually condition on all the data and calculate the ELPD with `loo::loo`.