Bmm: R package for easy and flexible Bayesian Measurement Modeling (v1.0.0)

Ven_Popov · June 19, 2024, 2:22pm

@Bob_Carpenter thank you for engaging in such depth. I think this is really important., so I’m really grateful we can discuss it. I’ve started to realize that depending on background, different people have a different conceptual framework of modeling, and sometimes we make incorrect assumptions about what we communicate.

I’ll try my best to answer. First some general thoughts and then specific replies. (PS: This became much longer than I planned. I hope it’s not too basic. It helped me clear up how I think about this and maybe would be helpful for other who end up reading this discussion. I might write a blog post with more details and illustrations - consider this a rough starting point. Will add specific replies to your comments in a separate comment).

Model components in distributional regression
- A core model
- A regression model layer
What ‘model’, ‘model family’, and ‘model-comparison’ refer to
Understanding the bottleneck
- Stan components
  - Core model likelihood
  - Data structure
  - Parameter structure
  - Priors
  - Regression layer
  - Optimizations
- What a user has and what they want to achieve
- Bottleneck: How do the goals map to changes in stan code?
How do brms and bmm solve the bottleneck?

Model components in distributional regression

We (and brms more generally) implement a distributional regression approach to modeling. Within this framework, we can consider a model at two levels of abstraction:

The core model, which specifies a likelihood function of the data given a closed set of parameters and their priors.

In a simple statistical model this is a distributional family - a normal, poisson or whatever, with its parameters. A more complicated example could be a drift diffusion model which gives the joint likelihood of response times and binary choices as a function of parameters drift-rate, bias, threshold, non-decision time. Another example could be a mixture of 3 von-mises distributions, which then has parameters for each distribution and the mixture weights. The parameters of this core model are a closed set and each plays a distinct part in the likelihood function (e.g. always 5 scalar paramers). In essence this is the part of the model that looks like this:

\begin{align*} y \sim f_Y(\theta_a, \theta_b, \theta_c, ...) && \text{likelihood of the data} \\ \theta_a \sim f_{\theta_a}(...) && \text{prior on parameter } \theta_a\\ ... && \text{more priors} \end{align*}

where y is the data (one or more variables), and \theta_a, \theta_b, etc are the core parameters

A regression model layer, which specifies for a particular dataset how the core parameters change over observations.

This is a standard statistical formula of the type:

g(\theta_a) = \beta X

where g() is some link function that unconstrains a parameter, \beta are the regression parameters and X is the model matrix. Such a formula is specified for every core parameter of the model. This can include fixed and random effects at different levels. In contrast to the core model, which has a closed set of parameters, the regression model now has a vector for each core parameter. The core model priors correspond to the intercepts, and the regression layer needs to additionally specify priors for the effects.

What do the terms “model”, “model comparison”, etc, refer to?

This I think is where often misunderstandings occur because different fields use this differently, and also the same field uses it differently depending on the context.

In the cognitive modeling literature, by “model” we often refer to (1) - a drift diffusion model, a linear balistic accumultor, unequal-variance signal detection, a 3-parameter mixture model of memory, etc. In the bmm package we use the term “measurement” model in this sense - a core model without the regression layer. In the field, when we talk about “different models”, we usually mean different core models.

This corresponds to the idea of “model family” in the statistical literature. Where instead, usually when you discuss alternative models, you mean different regression-level specifications of the same model family (e.g. comparing the same core model in which a parameter \theta_a is allowed to vary with some predictor vs that model but with a fixed \theta_a parameter.

Understanding the bottleneck

To understand where the bottleneck and friction points in modeling for many of us come from, and how brms and bmm help in generating the Stan code for distributional models, we need to understand:

how the distinct components of such models map to Stan code
what goals modelers usually want to achieve
- in a typical workflow, what do we have starting out and what do we want to get
how the modeling goals map to changes in these components
- how do we move from what we have to what we want to get in Stan

Stan components

Here I don’t mean the syntactic part of data {}, parameters {}, model {} - although there is some overlap, the components sometimes are written over multiple parts.

1. Core model likelihood. One of:

a built-in function such as normal_lpdf(y | mu, sigma);
a complete custom function defined in the functions {} block of the type my_fun_lpdf(y | mu, sigma); This can either specify a series of equations from basic math functions, or combine them with built-in likelihood functions
a sequence of code in model {} that calls several functions storing values in intermediate variables. Equivalent to the second option, but messier

2. Data structure. Consists of:

Variables that contain:
- observations
- auxiliary information (size of other data vectors, number of observations, number of predictors, etc)
- predictors for the regression layer

3. Parameter structure. Consists of:

Variables that define the which model parameters are sampled (rather than fixed)
Typically a variable corresponds to a core model parameter, and its structure (a vector, list, array, whatever), contains the values for the regression components

4. Priors
5. Regression layer

code in model {} which defines how the model matrix is combined with the regression parameter vectors to produce a final parameter vector to be used in the model likelihood
this is needed, because the likelihood function needs to know for each (set of) observations, which parameters to use. Usually trickiest with random effects, because the most efficient ways of coding them are not the cleanest (e.g. just using linear algebra is slower than looping over observations an subseting the parameter vector based on random effect indicator…)

6. Optimizations

code that plays no conceptual role in the model, but improves sampling or inference (parametrization, centering predictors, applying link functions)

What a user has and what they want to achieve

Depends on conventions in each field, of course, but in the behavioral sciences, this is very common.

We have:

Data(sets): typically in a long format data.frame, where one column stores observations (or more, but one for each distinct observation type, e.g. response times and accuracy), and other columns store potential predictor variables for fixed and random effects. This means, that each row represents a single observation (or a sufficient statistic), and all of its predictors.
Potential core model candidates
Potential predictor candidates

All three of these might vary, and typically the goal is to do some form of inference:

Inference about the regression layer

Definition: For a given particular dataset and core model, which set of predictors on which parameters lead to the best model fit.

Use case: Very typical with pure statistical models, such as testing the hypothesis that reaction times in condition A are faster than in condition B, and using the default gaussian or log-normal core model. But also applicable with complicated domain specific models such as drift diffusion models in which we test the hypothesis that the drift rate is slower in conditation A vs B.

Assumptions: This type of data is described by a particular core model, and this particular dataset includes conditions that affect the parameters of the model.

What varies:

data: fixed in a particular instance, but we might want to apply the same regression approach for different experiments/data sources.
regression formula

Inference about the core model

Definition: For data coming from a particular paradigm (e.g., memory decision in delayed estimation), what core model is “the best”

Use case: usually the aim is to compare different data generating assumptions (process models) or to compare what model describes the type of data best (measurement models).

Assumptions: we include regression components to account for variability over datasets (random slopes for participants, fixed effects for conditions), so that we can do a fair comparison of different core models

Bottleneck: How do the goals map to changes in stan code?

The big bottleneck is that much of the stancode we need to write depends on details we don’t care about, introducing a lot of overhead. The names and sizes of variables, parameters and regression layer syntax depends on the specific dataset and regression formula.

When we want to compare several regression formulations of the same core model to the same dataset, we would need either 1) as many copies of the stan code as variations, each with a lot of code duplications and idiosyncratic changes, 2) a very generic script with an abstract structure, which allows you to somehow map those choices to the same script by varying the inputs. This is very difficult to write for the average user. Then you also need to create a lot of the optimization code and regression flow anew for each project. These scripts can get hundreds of lines long for relatively simple cases.

A typical workflow without brms would be and not much training in software engineering (which is true for most statistical users).

if you’ve done this before, copy a script you have for a similar situation (e.g. another time I applied a logistic multi-level regression with random effects to a memory experiment; regression layer inferece goal; or my 3 different custom likelihood measurement models).
adapt variable names, sizes for the current data with the basic model
make multiple copies of the code, one for each regression version, and change relevant code
debug and test interactively
write complex R code to translate your tidy long-format data.frame into a stan-type data structure. Repeat as many times as you have alternative models, or write a custom function to do that for you
and a bunch of other things.

How do `brms` and `bmm` solve the bottleneck?

You could say at this point that a solution to all these problems is to write abstracted code that can be reused, and whose outputs will depend on the inputs, rather than having a different script for each input. Well, this is exactly what brms does.

Given:

a model family (core model; builtin or user-defined stan-code)
a long-format data frame
a standardized regression formula
priors specified via user-friendly syntax

It:

transforms the data to an appropriate format for stan
generates boiler plate code that does not depend on the core or regression model
creates mappings between user specified variables and standardized stan code for data and parameter variables
defines the regression layer syntax to feed parameters to the core model

🔥 Several things make it very flexible:

the brmsformula syntax:

via linear and non-linear formulas it can specify nearly arbitrary regression layer code, custom link functions, etc

turns out that it can do many things it was not designed for, which is what we take advantage of in bmm

custom families:

users can write a function in stan code and then pass that to brms via the custom_family() option. Through this you can define any custom core model you can code in Stan.

stan code injection

via the stanvars argument, you can specify arbitrary stan code to be inserted in any code section.

Summary - role of brms and bmm

Based on the last section, it means that as long as you have model that can be described in the above framework, you can generate the stan code for it with brms. You still have to provide some Stan code if 1) you are using a core model that is not a typical statistical family or 2) you want to affect some of the code logic via stanvars. Thus, brms provides you with three methods for injecting stancode: custom families (tells brms to use that function to increment the traget); stanvars: inject code in arbitrary blocks but after/before existing code; specify non-linear regression formulas injects code in how parameter vectors are combined to produce a final parameter passed to the core model.

All of this in brms gives you a lot of control over the stan code that you care about changing manually, while automatically generating the parts of the code that can be boilerplate.

This works, but still requires more code that just fitting a typical mixed-effects gaussian regression. What bmm then does is provide a framework to smooth this process for custom domain models, so that users won’t have to write even those things themselves. And for me as a model developer, I can use bmm to specify the common parts of a model I’m working on, and then more easily itterate on the code, by needing to only supply changes to the relevant code parts, and let the rest of the code be generated automatically.

Topic		Replies	Views
Beta version of brms 2.0 now available on Github Publicity	3	1149	December 21, 2017
Brms paper published Publicity	2	1243	September 23, 2017
Hierarchical model 101: Link, categorical without baseline, posterior predictive checks brms bayesplot	10	2244	October 23, 2018
"Bayesian Cognitive Modeling" in cmdstanr CmdStan cognitive-science	5	1192	September 6, 2021
Announcement: 'shinybrms' Publicity	6	953	November 23, 2020