Help defining non-linear model

Matias_Guzman_Naranjo · January 17, 2020, 3:42pm

Hi,

I am trying to define a non-linear model but I am not so sure on the form it should take. I have a dataset where items can belong to classes A or B, and let’s say one predictor X which determines to which class the item belongs to. However, I additionally know that some portion P of the items have no relation to X, and belong to classes A or B by accident. At the same time, I do not know how well X predicts class assignment for the non-randomly, so I cannot just look at the accuracy of the classifier to determine P.

Now, is it possible to define this model in brms (Stan)? I tried the model described here: https://cran.r-project.org/web/packages/brms/vignettes/brms_nonlinear.html (the last one) which is somewhat similar to what I expect to be happening, but when I run it on simulated data where I know the proportions of randomly assigned items, the estimates do not reflect these known proportions.

The data could look something like this (in this case there we assume x1 complete:

n <- 900
k <- 100
x1 <- inv_logit(rnorm(n+k, 0, 10))
class_t <- ifelse(head(x1, n) > 0.5, "a", "b") 
class_r <- ifelse(round(runif(k, 0, 1)) == 0, "a", "b")

df_t <- tibble(class = c(class_t, class_r),
              x1 = x1)

I also tried something like this:

model <- brm(bf(class ~ round(b2) * inv_logit(b1) + round(1 - b2) * c
               , b2 ~ 1
               , c ~ 1
               , b1 ~ 1 + x1
               , nl = TRUE)
            , data = df_t
            , family = bernoulli("identity")
            , prior = c(prior(normal(-4, 3), nlpar = "b1", lb = -10, ub = 0)
                      , prior(beta(1, 1), nlpar = "b2", lb = 0, ub = 1)
                      , prior(beta(1, 1), nlpar = "c", lb = 0, ub = 1)
                        )
            , iter = 4000, warmup = 1000, chains = 4, cores = 4
            , seed = 123
            , control = list(adapt_delta = 0.9)
              )

But that also does not seem to do what I want. Any help would be greatly appreciated.

Thanks!

emiruz · January 19, 2020, 7:19am

Hey Matias, your data generators both create class labels a and b with the same probabilities (.5 for both). The signal doesn’t differ from the noise.

Where you have just two labels; you have a binary classification problem. A natural first model would be a logistic regression: y_i \sim Bernoulli(logit^{-1}(\alpha + \beta x_i)), where x_i \in \mathbb{R} is your predictor and \alpha,\beta are the coefficients you’d be fitting. There’s a good intro to it fitting logit models with Stan here. Could this help?

Matias_Guzman_Naranjo · January 19, 2020, 10:00am

Hi Emir,

Thank you for your answer but I don’t see how simple logistic regression helps me here. I am trying to estimate K, which is why I’m trying to define a non linear model. But maybe what I want isn’t possible?

emiruz · January 19, 2020, 10:33am

IMO, from your data generating function; it’s impossible to recover k. Say Bob and Alice where flipping coins. They perform a total of 1000 flips of a fair coin between them. They both flip the coin in exactly the same way. How many times did Alice flip the coin?

Matias_Guzman_Naranjo · January 19, 2020, 12:29pm

Well, except that for my data, each time Alice and Bob flip a coin, they make a prediction first. Alice happens to be an expert coin flipper (let’s say she’s at 95% accuracy but I don’t actually know the exact value) while Bob is at random chance. In my data I have the prediction made by Alice and Bob. I know Bob flipped k coins, so at least k/2 coins should have a wrong prediction. Is there no way to estimate k in this scenario? Or at least fit a model which allows both both data generating processes to be taken into account?

The model y_i∼Bernoulli(logit^{−1}(α+βx_i)) is incorect for a portion of the data, where the correct model should be y_i∼Bernoulli(logit^{−1}(α)).

Matias_Guzman_Naranjo · January 19, 2020, 10:43pm

Actually, reading a bit about it, I think a mixture model is the correct solution. If a model is the mixture of two bernoulli distributions with the formula: bf(class ~ 1, mu1 ~ x1, mu2 ~ 1). The estimated mixing proportions do seem to closely resemble the proportions of the simulated data.

emiruz · January 21, 2020, 9:11am

Hey Matias, Bernoulli mixtures won’t be identifiable. Say you have two Bernoulli random variables distributed according to \theta_1,\theta_2 \in [0,1] and combined according to \lambda \in [0,1]. We might model the likelihood of some data p(y=1\mid\theta_1,\theta_2,\lambda)=\lambda \theta_2 + (1-\lambda)\theta_2. We can see from the form that there exists some \theta_3 \in [0,1] such that \theta_3=\lambda \theta_2 + (1-\lambda)\theta_2, and hence the mixture can be replaced with a single Bernoulli variable with parameter \theta_3. But for any \theta_3 there is an infinity ways of picking \theta_1,\theta_2,\lambda.

Hence, any generating process that is a mixture of Bernoulli’s will collapse into a single Bernoulli, and any single Bernoulli might be decomposed into a mixture an infinity ways therefore Bernoulli mixtures are not identifiable without additional information.

Matias_Guzman_Naranjo · January 21, 2020, 9:22am

Why does the model calculate correct mixing proportions for simulated data? I ran 20 or so simulations with different mixture proportions and the estimated mixing proportions were always (ballpark) “correct”.

nhuurre · January 21, 2020, 9:47am

It’s identifiable because there is additional information in X. The model will stop working if beta is so small that no observed value of X makes near-certain prediction or if alpha is so large that an item can belong to class A only by accident.

Matias_Guzman_Naranjo · January 21, 2020, 9:52am

I think this solves it then. Thank you!

emiruz · January 21, 2020, 12:20pm

I hadn’t noticed that @Matias_Guzman_Naranjo were passing x1 to the model as data. That said, I still don’t see how this model is correctly identifiable. The mixture components are B(logit^{-1}(\alpha + \beta x)) and B(logit^{-1}(\alpha)), I guess?

One expects that “a” occurs in class_r at a rate of .5, and that 95% of N=n+k will be according to x1, and hence that ~ 95% of the mixture will be weighed towards B(logit^{-1}(\alpha + \beta x)). k is then .05*N. This would be equivalently answered analytically using: k=N-sum(x1>.5).

To me it seems that k, even if x1 is data, cannot be be recovered; it’ll always just be the compliment of what didn’t happen to align with the predictor. Would appreciate if you could elaborate a bit to the contrary.

emiruz · January 21, 2020, 2:16pm

Looking at it some more, since the incidental agreement with x1 in class_r will inform the component B(logit^{-1}(\alpha+\beta x)) I’d expect the situation to get more dire when k>n. Then at some point the majority of the allocation to B(logit^{-1}(\alpha+\beta x)) will actually be as a result of noise, and k will be more wrong.

nhuurre · January 21, 2020, 3:59pm

In this case we have a very large beta so predictions are basically just whether x1 exceeds some unknown threshold and k is of course always the complement of what didn’t align with the prediction.
The question is whether you can recover the threshold. The answer is that you can measure the error rate on either side and the model assumes those must be equal. That assumption can narrow down the location of the threshold enough, at least in the simulated data.

Topic		Replies	Views
Advice on a non linear regression model (brms) brms	2	628	February 28, 2019
User defined non linear model in brms brms	5	1143	March 25, 2019
Brms model linear vs non linear Modeling rstan , fitting-issues , brms	6	513	May 9, 2024
Linear no-intercept model in brms interpretation Modeling techniques , specification , hierarchical-model , brms	4	121	April 27, 2025
Non-linear Logistic Regression with brms Modeling specification	1	688	December 19, 2020

Help defining non-linear model

Related topics