Specifying year fixed effects in Stan from a numerical year variable with too many years to write down manually



Hello Stan users,

Currently, in R, I have a simple OLS with year fixed effects of the form:

y ~ x1 + x2 + x3 + factor(year)

After estimating that, I calculate clustered standard errors using the variable “groups” as the cluster.

Now, I want to migrate that to Stan, in order to model a Bayesian version of it. Specifying a Bayesian linear model is pretty easy in Stan. My understanding, however, is that to migrate that model above the ideal would be a multilevel model with only the intercept varying at the clustering level - which would approximate the clustered standard error that I have been doing in the frequentist framework.

Not a problem with that either: after some reading I was indeed able to properly specify a multilevel model in Stan. The remaining problem, however, is that I have no idea and I couldn’t find an example, of how I should specify the year fixed effects in Stan. Sure, I could just create dummies for each year and manually include all those newly generated dummy variables and their beta coefficients in my Stan code, but that sounds really impractical since my dataset covers hundreds of years for each cluster (variable “group”).

So, my question: is there a practical way of doing that? This is, a practical way of using a numerical variable as dummified fixed effects? Perhaps generating the dummies for each year in R then using a for-loop in Stan to declare those dummies, their coefficients and their priors?

Any ideas and perhaps links to potential examples that are already out there I wasn’t able to find, will be very helpful. Also, feel free to ask me for more information if needed.


You can create a design matrix in R and use it in Stan or just have a ‘vector[n_years] beta_year;’ in the parameters block and add ‘+ beta_year[year_idx[i]]’ to your regression eq. where coming in as data you create ‘year_idx = as.numeric(factor(year))’