Distribution regression for non-linear parameters

Hi,

I’m fitting a non-linear model with brms, where I am interested in estimating the effects of a treatment on the variation of some data, which has a sigmoid form. I have had success so far with something like this, where y is the variable I’m hoping to predict, t is time, and treatment is a categorical variable with three levels.

formula <- (
  bf(y ~ shift + scale * inv_logit(slope * (t - offset))
     , sigma ~ 1 + t:treatment
     , family = brmsfamily("gaussian", link='identity', link_sigma="identity")
     , nl=TRUE
  )
  + lf(offset ~ 1, slope ~ 1, scale ~ 1, shift ~ 1)
)

I’d now like to also see if I can also use treatment in estimating the variation of some of the non-linear parameters. Essentially, I am curious if the change in variation of treatment can be apportioned to specific non-linear parameters varying less, or if it simply reduces the residual variation.

However, when I try something like this:

formula <- (
  bf(y ~ shift + scale * inv_logit(slope * (t - offset))
     , sigma ~ 1 + t:treatment
     , family = brmsfamily("gaussian", link='identity', link_sigma="identity")
     , nl=TRUE
  )
  + lf(slope ~ 1, sigma ~ treatment)
  + lf(offset ~ 1, scale ~ 1, shift ~ 1)
)

I get the output “Replacing initial definitions of parameters ‘sigma’”.

How do I also model the non-linear parameter “slope” with a distribution where I can predict the parameters?

One idea I had was to explicitly code slope as a latent variable, and provide a column full of NaNs when fitting the model:

formula <- (
    bf(y ~ shift + scale * inv_logit(mi(slope) * (t - offset))
       , sigma ~ 1 + t:treatment
       , family = brmsfamily("gaussian", link='identity', link_sigma="identity")
       , nl=TRUE
    )
    + bf(slope | mi() ~ 1, sigma ~ treatment)
    + lf(offset ~ 1, scale ~ 1, shift ~ 1, resp="y")
)

This seems to produce the right model structure. However, I wanted to check if this is in fact the way to go about this and that I’m not introducing any errors here? In particular, down the line I’d actually like to fit a multivariate model, and if I wanted to take the same approach for two outcome variables I’d need to add separate latent variables for them - which is pretty daunting when I think about it 😅

Are there any other ways of achieving this that keep the complexity low?

EDIT: After attempting to make the latent variable approach work, I don’t actually think it works, as the missing variable operator doesn’t seem to work in non-linear formulas.


R version 4.0.2 (2020-06-22)
brms_2.16.1

So, after discovering this thread: Mi() with "non-linear" model?, I decided to continue down the route of using a latent variable to try and place a distribution over slope.

However, I am running into some strange issues. I first wanted to effectively re-create my simple model, but using a latent variable for slope. For reference, my original model, priors, and fit procedure are:

formula.test.1 <- (
  bf(y ~ shift + scale * inv_logit(slope * (t - offset))
     , sigma ~ 1 + t:treatment
     , offset ~ 1
     , scale ~ 1
     , shift ~ 1
     , slope ~ 1
     , family = brmsfamily("gaussian", link='identity', link_sigma="identity")
     , nl=TRUE
  )
)

prior.test.1 <- c(
    # priors for the sigmoid curve:
    prior(normal(0.5, 0.2), nlpar='offset')
    , prior(normal(1.5, 1), nlpar='scale', lb=0)
    , prior(normal(0, 0.2), nlpar='shift')
    , prior(normal(3, 2), nlpar='slope', lb=0)
    # priors for parameters for sigma
    , prior(normal(0, 2), dpar="sigma", lb=0)
    , prior(lognormal(-3, 1.5), dpar="sigma", class="Intercept")
)

brm(
    data = df.test, formula = formula.test.1, prior = prior.test.1,
     # note these are short chains for iteration
    iter = 1000, warmup = 500, chains = 2, cores = 2
  )

Producing the following, promising model results:

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ shift + scale * inv_logit(slope * (t - offset)) 
         sigma ~ 1 + t:treatment
         offset ~ 1
         scale ~ 1
         shift ~ 1
         slope ~ 1
   Data: df.test (Number of observations: 780) 
  Draws: 2 chains, each with iter = 500; warmup = 0; thin = 1;
         total post-warmup draws = 1000

Population-Level Effects: 
                         Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma_Intercept              0.01      0.00     0.01     0.01 1.01      560      622
offset_Intercept             0.53      0.01     0.52     0.54 1.01      311      416
scale_Intercept              1.05      0.02     1.02     1.10 1.01      253      303
shift_Intercept             -0.06      0.01    -0.07    -0.05 1.01      330      504
slope_Intercept              5.44      0.15     5.13     5.72 1.01      263      331
sigma_t:treatmentleft        0.08      0.01     0.07     0.09 1.00      588      579
sigma_t:treatmentneutral     0.08      0.01     0.07     0.09 1.01      533      460
sigma_t:treatmentright       0.08      0.01     0.07     0.09 1.00      555      592

Now, at least conceptually, should the following not be the same?

formula.test.2 <- (
  bf(y ~ shift + scale * inv_logit(slope * (t - offset))
     , sigma ~ 1 + t:treatment
     , offset ~ 1
     , scale ~ 1
     , shift ~ 1
     , slope ~ 0 + mi(slopelatent)
     , family = brmsfamily("gaussian", link='identity', link_sigma="identity")
     , nl=TRUE
  )
  + bf(slopelatent | mi() ~ 1
       , family = brmsfamily("gaussian", link="identity", link_sigma="identity"))
  + set_rescor(FALSE)
)

prior.test.2 <- c(
    # priors for the sigmoid curve:
    prior(normal(0.5, 0.2), nlpar='offset', resp='y')
    , prior(normal(1.5, 1), nlpar='scale', resp='y', lb=0)
    , prior(normal(0, 0.2), nlpar='shift', resp='y')
    # fix b for slope to 1 as the latent parameter will be modelled instead:
    , prior(constant(1), nlpar='slope', resp='y', lb=0)
    # priors for parameters for sigma
    , prior(normal(0, 2), dpar="sigma", resp="y", lb=0)
    , prior(lognormal(-3, 1.5), dpar="sigma", resp="y", class="Intercept")
    # priors for slope parameter distribution
    , prior(normal(3, 2), resp="slopelatent", class="Intercept")
    # fix the sigma on this distribution to something very low, to mimic fit.test.1
    , prior(constant(0.00001), resp="slopelatent", class="sigma")
)

fit.test.2 <- brm(
    data = df.test %>% mutate(slopelatent=as.numeric(NA)),
    formula = formula.test.2, prior = prior.test.2,
    iter = 1000, warmup = 500, chains = 2, cores = 2
  )

The resulting fit however fails completely:

 Family: MV(gaussian, gaussian) 
  Links: mu = identity; sigma = identity
         mu = identity; sigma = identity 
Formula: y ~ shift + scale * inv_logit(slope * (t - offset)) 
         sigma ~ 1 + t:treatment
         offset ~ 1
         scale ~ 1
         shift ~ 1
         slope ~ 0 + mi(slopelatent)
         slopelatent | mi() ~ 1 
   Data: df.test (Number of observations: 780) 
  Draws: 2 chains, each with iter = 500; warmup = 0; thin = 1;
         total post-warmup draws = 1000

Population-Level Effects: 
                           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma_y_Intercept              0.22      0.10     0.10     0.32 3.05        2       11
slopelatent_Intercept          0.02      0.02     0.00     0.03 2.65        2       13
y_offset_Intercept            -0.74      0.04    -0.78    -0.70 2.61        2       14
y_scale_Intercept              2.43      0.13     2.30     2.56 2.46        2       11
y_shift_Intercept              1.33      0.30     1.03     1.63 2.24        3       11
sigma_y_t:treatmentleft        2.45      0.86     1.59     3.31 2.93        2       11
sigma_y_t:treatmentneutral     1.68      0.97     0.71     2.67 3.03        2       11
sigma_y_t:treatmentright       3.24      2.40     0.84     5.65 3.05        2       11
y_slope_mislopelatent          1.00      0.00     1.00     1.00   NA       NA       NA

Family Specific Parameters: 
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma_slopelatent     0.00      0.00     0.00     0.00   NA       NA       NA

In my mind, the two models should behave similarly, since:

  • the relationship between slopelatent and slope is fixed (b=1)
  • The intercept for slopelatent has the same prior as slope does in the first model
  • The scale of the slopelatent distribution is fixed to a very low number
  • all values of slopelatent are inferred

Obviously, the resulting stan code looks quite different. Is there an obvious reason why the second model formulation is so much more challenging for Stan to sample?

Thanks,
Jan