Modeling a nondecreasing proportion over time: low-level vs high-level tools

dholstius · May 9, 2024, 11:11pm

I’m looking to model a proportion p that is sampled at regular intervals (say, each year) and is known to be nondecreasing as the years go on. For example, the proportion of cars on the road that were made after 1990.

TL;DR Is there a way to do this with higher-level syntax, e.g. brms? Or do I need to write Stan code? I optimistically guessed brm(..., family = exponential(link = "logit")) but that threw an error.

Generally, I’m trying to stick with higher-level interfaces like brms. I’m still exploring writing Stan code. To show that I’ve tried, though, here is a suboptimal implementation of a (weighted) version of:

\text{logit}(p) = \alpha - \beta \, \text{exp}\left( −\lambda t \right)

data {
  int<lower=0> N;        // number of observations
  array[N] int<lower=0, upper=1> y;  // binary outcome variable: non-certified
  vector[N] trend;       // centered and scaled measure of time
  vector<lower=0>[N] w;  // observation weights
}

parameters {
  real<lower=0> alpha;   // intercept; expect greater than 50-50 odds of certification
  real<lower=0> beta;    // should be positive to ensure decrease (we subtract)
  real<lower=0> lambda;  // decay rate, constrained to be positive
}

model {
  vector[N] log_odds;

  // Priors
  alpha  ~ exponential(3);
  beta   ~ exponential(3);
  lambda ~ exponential(1);

  // Log-odds for each observation
  for (i in 1:N) {
    log_odds[i] = alpha - beta * exp(-lambda * trend[i]);
  }

  // Likelihood, weighted
  for (i in 1:N) {
    target += w[i] * bernoulli_logit_lpmf(y[i] | log_odds[i]);
  }

}

generated quantities {

  vector[N] y_pred;
  vector[N] log_lik;  // Vector to store log likelihood for each observation

  for (i in 1:N) {
    y_pred[i] = bernoulli_logit_rng(alpha - beta * exp(-lambda * trend[i]));
    log_lik[i] = bernoulli_logit_lpmf(y[i] | alpha - beta * exp(-lambda * trend[i]));
  }

}

As you can see, I’m quite the beginner and I’m sure this could benefit from vectorization, etc.

My point is more about the workflow after conditioning on the data. I can kind-of work with the CmdStanMCMC result, with, for example:

mod1 <- cmdstanr::cmdstan_model(...) # file containing the code above
fit1 <- mod1$sample(..., data = dat)
draws1 <- fit1$draws(variables = c("alpha", "beta", "lambda"))
posterior::as_draws(draws1)

Things get a little clunky if, for example, I just want to compare to other models I’ve implemented with brms:

mod0 <- brm(y | weights(w) ~ 1, family = bernoulli(), data = dat)
loo_list <- list(loo::loo(mod0), fit1$loo())
loo::loo_model_weights(loo_list)

… then I get:

Error: Each object in the list must have the same dimensions.

There’s friction with other post-fitting tools too, like the stuff in the marginaleffects package. Anyway.

General advice on when to cut bait and work with brms vs Stan code, or vice versa, is probably too vague to be useful. So maybe it’s helpful to restrict this to just the topic at the beginning, i.e., nondecreasing proportions in a population over time?

P.S. Please don’t take this as a complaint! I don’t expect lower-level interfaces to have implemented every S3 method some other package has. All these packages are great, Stan is great, and I’m so glad for the work this community is pouring into making Bayesian methods approachable for people like me who specialize in other fields.

P.P.S. Thanks in advance for your time and attention!

jsocolar · May 10, 2024, 1:12am

Do you want the predicted log odds to have this particular functional form? (There are many other options that are also nondecreasing). If so, you can still fit this with brms using the nonlinear syntax–see vignette here). If not, you might start with even simpler nondecreasing formulations, like a linear effect of trend on the log-odds, multiplied by a coefficient beta that is constrained to be positive.

In either case, if I am understanding your data properly, you want to use brms family bernoulli(link = "logit").

dholstius · May 10, 2024, 4:37am

That vignette is great! In particular, the trick with bernoulli(link=“identity”) and inv_logit() in the “advanced item/response” section. I do want to explore and understand what’s possible with other functional forms. Really glad brms supports this. Thanks for the help @jsocolar !

Topic		Replies	Views
How can I make my brms partial proportional odds model more "flexible?" Modeling ordinal-response , brms	6	285	September 26, 2024
Beta prior in non-linear `brms`? brms specification	3	180	May 10, 2024
Autoregression on non-linear model parameters in brms brms	16	1551	September 20, 2018
Fitting logistic regression models where the dependent variable is proportions in brms Modeling logistic-regression , binomial-response , brms	3	495	March 18, 2024
Positive_ordered constraint in brms brms specification	4	832	September 29, 2020

Modeling a nondecreasing proportion over time: low-level vs high-level tools

Related topics