I’m looking to model a proportion p that is sampled at regular intervals (say, each year) and is known to be nondecreasing as the years go on. For example, the proportion of cars on the road that were made after 1990.
TL;DR Is there a way to do this with higher-level syntax, e.g. brms
? Or do I need to write Stan code? I optimistically guessed brm(..., family = exponential(link = "logit"))
but that threw an error.
Generally, I’m trying to stick with higher-level interfaces like brms
. I’m still exploring writing Stan code. To show that I’ve tried, though, here is a suboptimal implementation of a (weighted) version of:
\text{logit}(p) = \alpha - \beta \, \text{exp}\left( −\lambda t \right)
data {
int<lower=0> N; // number of observations
array[N] int<lower=0, upper=1> y; // binary outcome variable: non-certified
vector[N] trend; // centered and scaled measure of time
vector<lower=0>[N] w; // observation weights
}
parameters {
real<lower=0> alpha; // intercept; expect greater than 50-50 odds of certification
real<lower=0> beta; // should be positive to ensure decrease (we subtract)
real<lower=0> lambda; // decay rate, constrained to be positive
}
model {
vector[N] log_odds;
// Priors
alpha ~ exponential(3);
beta ~ exponential(3);
lambda ~ exponential(1);
// Log-odds for each observation
for (i in 1:N) {
log_odds[i] = alpha - beta * exp(-lambda * trend[i]);
}
// Likelihood, weighted
for (i in 1:N) {
target += w[i] * bernoulli_logit_lpmf(y[i] | log_odds[i]);
}
}
generated quantities {
vector[N] y_pred;
vector[N] log_lik; // Vector to store log likelihood for each observation
for (i in 1:N) {
y_pred[i] = bernoulli_logit_rng(alpha - beta * exp(-lambda * trend[i]));
log_lik[i] = bernoulli_logit_lpmf(y[i] | alpha - beta * exp(-lambda * trend[i]));
}
}
As you can see, I’m quite the beginner and I’m sure this could benefit from vectorization, etc.
My point is more about the workflow after conditioning on the data. I can kind-of work with the CmdStanMCMC
result, with, for example:
mod1 <- cmdstanr::cmdstan_model(...) # file containing the code above
fit1 <- mod1$sample(..., data = dat)
draws1 <- fit1$draws(variables = c("alpha", "beta", "lambda"))
posterior::as_draws(draws1)
Things get a little clunky if, for example, I just want to compare to other models I’ve implemented with brms
:
mod0 <- brm(y | weights(w) ~ 1, family = bernoulli(), data = dat)
loo_list <- list(loo::loo(mod0), fit1$loo())
loo::loo_model_weights(loo_list)
… then I get:
Error: Each object in the list must have the same dimensions.
There’s friction with other post-fitting tools too, like the stuff in the marginaleffects
package. Anyway.
General advice on when to cut bait and work with brms
vs Stan code, or vice versa, is probably too vague to be useful. So maybe it’s helpful to restrict this to just the topic at the beginning, i.e., nondecreasing proportions in a population over time?
P.S. Please don’t take this as a complaint! I don’t expect lower-level interfaces to have implemented every S3 method some other package has. All these packages are great, Stan is great, and I’m so glad for the work this community is pouring into making Bayesian methods approachable for people like me who specialize in other fields.
P.P.S. Thanks in advance for your time and attention!