Brms mulitlevel mode: syntax issue

Hi, stan team –

I am a newbie with brms (learned via rethinking package), and I am having issues figuring out the syntax for specifying a hyper-prior. Specifically, I want to model a gamma-poisson process of the response variable O (observed) as a function of an intercept term, a location-specific proxy variable (L, location; P, proxy), and an hourly (H) spline term by location. The function looks like this:

formula ← bf(
O ~ alpha + betaP * P * L + s(HOUR, by = L)
)

I want to specify the following priors:
𝛼~ 𝑁𝑜𝑟𝑚𝑎𝑙(0, 2)
𝛽_𝑃~𝑁𝑜𝑟𝑚𝑎𝑙(3,𝜎𝑃)
𝑓(𝐻)_𝐿~𝑆𝑡𝑢𝑑𝑒𝑛𝑡′ 𝑠(3, 0, 2.5)
𝜎𝑃~ 𝐸𝑥𝑝(1)

(recopying in latex)
\alpha ~ Normal(0,2)
\beta_P ~Normal(3,\sigma_P)
f(H)_L ~ Student(3,0,2.5)
\sigma_P ~ exp(1)

I have tried a couple of iterations:

fit3priors_1 ← c(
prior(normal(0, 2), class = “Intercept”),
prior(student_t(3, 0, 2.5), class = “sds”),
prior(normal(3,2), class = “b”, coef = “PROXY”),
prior(exponential(1), class = “sd”, group = “PROXY”)
)

fit3priors_2 ← c(
prior(normal(0, 2), class = “Intercept”),
prior(student_t(3, 0, 2.5), class = “sds”),
prior(normal(3,sigmaP), class = “b”, coef = “PROXY”),
prior(exponential(1), class = “sd”, group = “sigmaP”)
)

… but so far, no dice. I keep getting errors that say my priors do not correspond to any model parameter, which makes realize that I am clearly not specifying this right. Could some help me define these priors in brms syntax? Thanks in advance for helping with a novice coding question!

g’day @rtpanik22 ,
Is betaP a variable in your data or did you mean it as a parameter? If the latter, you don’t want it in your formula in bf – you’ll just need to find the corresponding parameter (should be class = "b" and coef = P). Also, what’s meant by P * L? Are proxy and location two separate variables? Just didn’t seem to match your description.
I don’t think you’ll use group = in the prior call since the formula didn’t have any varying effects.

You may need to share an example or show how the data is structured.
When I screw up with syntax for priors, I find the get_prior function very useful.

Thanks so much for your response and patience! This is my first time posting on Stan, so I have a lot to learn.

  • Beta_P is the parameter. So, you’re right that bf() should look like:
    formula ← bf(OBSERVED ~ PROXY*LOCATION + s(HOUR, by=LOCATION)). My mistake when typing this formula previously!

  • Proxy, P, and Location, L, are two separate variables. Proxy is an indirect measure of the regressed variable Observed, O, that varies by Location, L. So, O \sim \alpha + \beta_P P*L +s(HOUR, by = L) translates to: the (average) observed data can be estimated by an intercept term, a proxy whose relationship to the observed data varies by location, and that average varies by hour at each location (i.e., the spline). I think that this is the correct way to build this model, but please feel free to add corrections here if you have them!

Here is a (schematic) example of how the data is structured. Each location has somewhere between 15 and 24 hours of observation per day, and I have corresponding “proxies” (i.e., imperfect estimates) of those observations at each hour. The goal is to figure out if and where these proxies can be used to estimate O(bserved).

LOCATION HOUR OBSERVED PROXY
1 0 0 0
1 1 0 0
1 2 1 0
1 3 1 1
1 4 5 2
1 5 8 4
1 6 10 3
2 0 0 0
2 1 0 0
2 2 1 0
2 3 1 1
2 4 5 2
2 5 8 4
2 6 10 3

Does this help? Thanks in advance for any additional guidance you can provide!