Measurement error in nonlinear mixed effect models

meghs · August 23, 2020, 2:32am

Hi,

I am fitting a nonlinear model with negative binomial errors and the predictor X as a latent variable with measurement error.

I set up the model formulation like so:

mod2 = bf(Y ~ a*(me(X, X_sd))^b,
          a ~ 1 + Ed*Tt + (1|g1) + (1|g2),
          b ~ 1 + Ed*Tt + (1|g1),
          nl = T)

nl1 = brm(formula = mod2, data = dat2, 
          prior = prior1, family = negbinomial(link="identity"), 
          iter = 2000, chains = 3, cores = 3,
          control = list(adapt_delta = 0.9),
          save_model = "mod-me.txt")

Here is the error I get when I try to make the stan code:

> make_stancode(formula = mod2, data = dat2, 
+               family = negbinomial(link="identity"))
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
No matches for: 

  me(real, real)

Function me not found.
 error in 'model32e441e674d_file32e47e7a7a6e' at line 72, column 54
  -------------------------------------------------
    70:   for (n in 1:N) {
    71:     // compute non-linear predictor values
    72:     mu[n] = nlp_a[n] + nlp_b[n] * (me(C_1[n] , C_2[n]));
                                                             ^
    73:   }
  -------------------------------------------------

Error in stanc(model_code = paste(program, collapse = "\n"), model_name = model_cppname,  : 
  failed to parse Stan model 'file32e47e7a7a6e' due to the above error.
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
No matches for: 

  me(real, real)

Function me not found.
 error in 'model32e43a363066_file32e47e7a7a6e' at line 72, column 54
  -------------------------------------------------
    70:   for (n in 1:N) {
    71:     // compute non-linear predictor values
    72:     mu[n] = nlp_a[n] + nlp_b[n] * (me(C_1[n] , C_2[n]));
                                                             ^
    73:   }
  -------------------------------------------------

Error in stanc(model_code = paste(program, collapse = "\n"), model_name = model_cppname,  : 
  failed to parse Stan model 'file32e47e7a7a6e' due to the above error.

Does brms not recognize the function me() when passed in the nonlinear model? Should I formulate the model differently?

If required, I can provide data to test this model.

Thanks,
Meghna

Operating System: Windows 10
brms Version: 2.13.0

meghs · August 26, 2020, 7:00am

Hi,

Bumping up this issue. In case I have been unclear in describing the problem, my apologies, please let me know how I can improve the question.

My basic question is:

How to set up a non-linear model in brms such that measurement error in the predictor is included? I want to set it up like this:

mod2 = bf(Y ~ a*(me(X, X_sd))^b,
          a ~ 1 + Ed*Tt + (1|g1) + (1|g2),
          b ~ 1 + Ed*Tt + (1|g1),
          nl = TRUE)

I get errors (see thread) when I specify the brms model in this way.

Perhaps @paul.buerkner can help?

Thanks,
Meghna

paul.buerkner · August 26, 2020, 7:05am

me() can only be used in linear formulas of the model not in nonlinear formulas.

franzsf · August 26, 2020, 11:42am

I am not sure if this will work for your purposes, but perhaps you can extract X from the nonlinear component of your formula.

Something like:

Y ~ a*Xtrue^b
Xtrue ~ me(X,X_sd)
a ~ etc.
b ~ etc. etc.

meghs · August 27, 2020, 1:01pm

Hi Paul,

Thanks for that clarification. What do you think of the suggestion by @franzsf?

I did actually try to set up the model that way:

mod2 = bf(Y ~ exp(a)*X^b,
          X ~ me(X_mean, X_sd),
          a ~ 1 + e*T + (1|g1) + (1|g2),
          b ~ 1 + e*T + (1|g1),
          nl = TRUE)
prior1 = c(
  set_prior("lognormal(0,2)", nlpar = "X",
  coef = "Intercept"),
  set_prior("negbinomial(1, 5)", nlpar = 'X',
            coef = "meXmeanXsd"),
  set_prior("normal(0.2, 0.5)", nlpar = "a", coef = ""),
  set_prior("normal(0.2, 0.5)", nlpar = "b", coef = ""),
  set_prior("normal(-0.1, 0.5)", nlpar = "a", coef = "eE1"),
  set_prior("normal(0.2, 0.5)", nlpar = "b", coef = "eE1"),
  set_prior("normal(0.25, 0.5)", nlpar = "a", coef = "eE2"),
  set_prior("normal(-0.2, 0.5)", nlpar = "b", coef = "eE2"),
  set_prior("normal(-1.2, 0.5)", nlpar = "a", coef = "eE3"),
  set_prior("normal(-0.4, 0.5)", nlpar = "b", coef = "eE3")
)

However, when I try to check the stancode using:

make_stancode(formula = mod2, data = dat2, prior=prior1,
              family = negbinomial(link="log"))

I get this error:

Error: Priors on population-level coefficients are required in non-linear models, but none were found for parameter 'X'. See help(set_prior) for more details.

I don’t understand why, because I have specified the prior for X. What am I missing here?

FYI, I used get_prior() to check whether my priors were specified correctly.

get_prior(formula = mod2, data = newdat2, 
          family = negbinomial(link="log"))

meghs · August 27, 2020, 1:02pm

Thanks for the suggestion @franzsf. I did try the model that way. Please see my response to @paul.buerkner

paul.buerkner · August 27, 2020, 1:04pm

Not sure if you want the intercept. Perhaps go with X ~ 0 + me(…)

Please tell me if the error persists

meghs · August 27, 2020, 2:31pm

Thanks Paul. Removing the intercept seemed to correct for that error message. However, now the compilation fails with this error:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
No matches for: 

  negbinomial_lpdf(vector, int, int)

Function negbinomial_lpdf not found.

Can we not specify a negative binomial distribution as a prior? Sorry if I am missing something fundamental here.

paul.buerkner · August 27, 2020, 6:03pm

Do you have a minimal reproductible example?

meghs · August 28, 2020, 2:44am

Hi Paul…would it be OK if I send you a part of the data privately?

A clarification though: can we specify a negative binomial distribution as prior in Stan/brms or am I doing that wrong? The error seems to be happening because brms is unable to assign a negbinomial prior. I specified this prior: set_prior("negbinomial(1, 5)", nlpar = 'X', coef = "meXmeanXsd"),
For which I get this error:

Function negbinomial_lpdf not found.
 error in 'modeldd9876bc970_filedd988482a0' at line 94, column 45
  -------------------------------------------------
    92:   }
    93:   // priors including all constants
    94:   target += negbinomial_lpdf(bsp_X | 1, 5);
                                                    ^
    95:   target += normal_lpdf(b_a[1] | 0.2, 0.5);

Is this happening because when X is drawn from Xmean using a measurement error model, it draws a continuous, real value from a normal distribution whereas X is an integer that follows a negative binomial distribution. Should I change the model specification to:

mod2 = bf(Y ~ exp(a)*exp(X)^b,
          X ~ me(X_mean, X_sd),
          a ~ 1 + e*T + (1|g1) + (1|g2),
          b ~ 1 + e*T + (1|g1),
          nl = TRUE)

OR

mod2 = bf(Y ~ exp(a)*X^b,
          log(X) ~ me(log(X_mean), log(X_sd)),
          a ~ 1 + e*T + (1|g1) + (1|g2),
          b ~ 1 + e*T + (1|g1),
          nl = TRUE)

Thank you for your help!

paul.buerkner · August 28, 2020, 4:03am

A negbinomial prior wont work because stan cannot sample from discrete parameters.

martinmodrak · August 28, 2020, 10:32am

I would add that for most use cases, gamma is a good continuous approximation to neg. binomial. Nevertheless, it is possible that the need for neg. binomial hints at some deeper problem with the match between the model and reality. Why was neg. binomial a natural choice of the prior in your case? Note that measurement-error currently means normally distributed measurement error and the true value of X is assumed unbounded (while both neg. binomial and gamma assume positive values).

Best of luck with your model!

meghs · August 29, 2020, 1:24am

Thanks @paul.buerkner and @martinmodrak for your inputs.

The choice of negbinom was dictated by the nature of the data. This predictor is a non-negative integer (counts) with the distribution highly skewed - we get 1 or 2 entities in most samples, but few samples have as many as 800 entities. For each X, we had two paired sampling plots to assess data mismatch in adjacent samples, i.e., measurement error. What I need is to draw the true value of X from a negative binomial, but with the mean of the negbinom drawn from a log-normal distribution informed by the observed data.

Xtrue ~ NegBin(Xmean, theta); Xmean ~ lognorm(Xobs, sigma)

I thought to do so with a measurement error component, because that is the eventual goal with this analysis: Y relates to X that has been sampled with error. Apologies for the rookie mistake about how the measurement error part works.

I guess my current nonlinear formulation is incorrect. So, how can I model X and Y from a joint distribution as:

Xtrue ~ NegBin(Xmean, theta); Xmean ~ lognorm(Xobs, sigma)
Y ~ NegBin(exp(a)Xtrue^b

Is this possible using brms or would this need to be coded from scratch in Stan? Would appreciate your help!

Thanks,
Meghna

martinmodrak · August 30, 2020, 11:52am

Unfortunately the model as given is impossible in Stan. For various technical reasons, Stan doesn’t support discrete unknown parameters (as Xtrue would need to be). Note that while other probabilistic languages do support discrete parameters, they are usually very hard to fit anyway. For some models, you can get rid of discrete parameters with suitable math tricks, but I don’t think that’s possible for your model. So what can you do?

In Stan, you could get Xtrue ~ gamma(theta, theta / Xmean), which should be quite close to Xtrue ~ NegBin(Xmean, theta) for most practical purposes. This form of measurement error unfortunately isn’t supported in brms. I would however expect that if you took log_x = log(X) (or log(X + 1) if you have zeroes), and use normal measurement error for log_x, the results won’t likely be very different. (it is also possible that your data are more dispersed than what neg. binomial would handle and the log normal error could actually improve fit. But that is just speculation…).

On thing the continuous approximations wouldn’t handle is that with Xtrue ~ NegBin you can have big “steps” in predictions (e.g. 75% probability Xtrue = 0 and 20% Xtrue = 1). If a or b are large (say a = 200, b = 5), this would mean that you would have 75% probability Y is around zero and 20% probability Y is around 200 and in the reamining 5% of cases you would have Y larger than 200*2^5 = 6400, effectively making the response multimodal. The continuous approximations would smooth those big steps and keep your response unimodal. To what extent you need this discrete behaviour in your model is your call (and if you absolutely need it, there might be some ways to approximate it more closely).

Does that make sense?

EDIT: Wanted to check that my reasoning is correct here with some simulations. First the gamma approximation to NB:

mu <- 0.5
theta <- 1
phi <- 30
a <- 200
b <- 1.1

N <- 1e6
nb_samples <- rnbinom(N, mu = mu, size = theta)
gamma_samples <- rgamma(N, shape = theta, rate = theta / mu)

breaks <- 0:max(nb_samples, gamma_samples)
hist(nb_samples, breaks = breaks)
hist(gamma_samples, breaks = breaks)

Pretty close, right? So what happens when we simulate the whole process:

nb_samples_y <- rnbinom(N, mu = a * nb_samples ^ b, size = phi)
gamma_samples_y <- rnbinom(N, mu = a * gamma_samples ^ b, size = phi)

breaks2 = seq(0, max(nb_samples_y, gamma_samples_y), length.out = 100)
hist(nb_samples_y, breaks = breaks2)
hist(gamma_samples_y, breaks = breaks2)

That’s qualitatively very different behavior!

However, if we rerun with mu <- 30 the approximation becomes very good:

Topic		Replies	Views
Adding addition-terms on non-linear formulas brms	1	450	March 10, 2019
Inconsistent behaviour of model that includes measurement error term for a predictor brms	5	530	August 25, 2019
Parameterization of measurement error in brms brms	4	789	September 7, 2018
MLM with NL component - difficulty with syntax brms	3	752	March 11, 2021
Propagation of Measurement Error on the outcome within mi() Modeling brms	8	136	October 9, 2024

Measurement error in nonlinear mixed effect models

Related topics