Distribution regression for non-linear parameters

So, after discovering this thread: Mi() with "non-linear" model?, I decided to continue down the route of using a latent variable to try and place a distribution over slope.

However, I am running into some strange issues. I first wanted to effectively re-create my simple model, but using a latent variable for slope. For reference, my original model, priors, and fit procedure are:

formula.test.1 <- (
  bf(y ~ shift + scale * inv_logit(slope * (t - offset))
     , sigma ~ 1 + t:treatment
     , offset ~ 1
     , scale ~ 1
     , shift ~ 1
     , slope ~ 1
     , family = brmsfamily("gaussian", link='identity', link_sigma="identity")
     , nl=TRUE
  )
)

prior.test.1 <- c(
    # priors for the sigmoid curve:
    prior(normal(0.5, 0.2), nlpar='offset')
    , prior(normal(1.5, 1), nlpar='scale', lb=0)
    , prior(normal(0, 0.2), nlpar='shift')
    , prior(normal(3, 2), nlpar='slope', lb=0)
    # priors for parameters for sigma
    , prior(normal(0, 2), dpar="sigma", lb=0)
    , prior(lognormal(-3, 1.5), dpar="sigma", class="Intercept")
)

brm(
    data = df.test, formula = formula.test.1, prior = prior.test.1,
     # note these are short chains for iteration
    iter = 1000, warmup = 500, chains = 2, cores = 2
  )

Producing the following, promising model results:

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ shift + scale * inv_logit(slope * (t - offset)) 
         sigma ~ 1 + t:treatment
         offset ~ 1
         scale ~ 1
         shift ~ 1
         slope ~ 1
   Data: df.test (Number of observations: 780) 
  Draws: 2 chains, each with iter = 500; warmup = 0; thin = 1;
         total post-warmup draws = 1000

Population-Level Effects: 
                         Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma_Intercept              0.01      0.00     0.01     0.01 1.01      560      622
offset_Intercept             0.53      0.01     0.52     0.54 1.01      311      416
scale_Intercept              1.05      0.02     1.02     1.10 1.01      253      303
shift_Intercept             -0.06      0.01    -0.07    -0.05 1.01      330      504
slope_Intercept              5.44      0.15     5.13     5.72 1.01      263      331
sigma_t:treatmentleft        0.08      0.01     0.07     0.09 1.00      588      579
sigma_t:treatmentneutral     0.08      0.01     0.07     0.09 1.01      533      460
sigma_t:treatmentright       0.08      0.01     0.07     0.09 1.00      555      592

Now, at least conceptually, should the following not be the same?

formula.test.2 <- (
  bf(y ~ shift + scale * inv_logit(slope * (t - offset))
     , sigma ~ 1 + t:treatment
     , offset ~ 1
     , scale ~ 1
     , shift ~ 1
     , slope ~ 0 + mi(slopelatent)
     , family = brmsfamily("gaussian", link='identity', link_sigma="identity")
     , nl=TRUE
  )
  + bf(slopelatent | mi() ~ 1
       , family = brmsfamily("gaussian", link="identity", link_sigma="identity"))
  + set_rescor(FALSE)
)

prior.test.2 <- c(
    # priors for the sigmoid curve:
    prior(normal(0.5, 0.2), nlpar='offset', resp='y')
    , prior(normal(1.5, 1), nlpar='scale', resp='y', lb=0)
    , prior(normal(0, 0.2), nlpar='shift', resp='y')
    # fix b for slope to 1 as the latent parameter will be modelled instead:
    , prior(constant(1), nlpar='slope', resp='y', lb=0)
    # priors for parameters for sigma
    , prior(normal(0, 2), dpar="sigma", resp="y", lb=0)
    , prior(lognormal(-3, 1.5), dpar="sigma", resp="y", class="Intercept")
    # priors for slope parameter distribution
    , prior(normal(3, 2), resp="slopelatent", class="Intercept")
    # fix the sigma on this distribution to something very low, to mimic fit.test.1
    , prior(constant(0.00001), resp="slopelatent", class="sigma")
)

fit.test.2 <- brm(
    data = df.test %>% mutate(slopelatent=as.numeric(NA)),
    formula = formula.test.2, prior = prior.test.2,
    iter = 1000, warmup = 500, chains = 2, cores = 2
  )

The resulting fit however fails completely:

 Family: MV(gaussian, gaussian) 
  Links: mu = identity; sigma = identity
         mu = identity; sigma = identity 
Formula: y ~ shift + scale * inv_logit(slope * (t - offset)) 
         sigma ~ 1 + t:treatment
         offset ~ 1
         scale ~ 1
         shift ~ 1
         slope ~ 0 + mi(slopelatent)
         slopelatent | mi() ~ 1 
   Data: df.test (Number of observations: 780) 
  Draws: 2 chains, each with iter = 500; warmup = 0; thin = 1;
         total post-warmup draws = 1000

Population-Level Effects: 
                           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma_y_Intercept              0.22      0.10     0.10     0.32 3.05        2       11
slopelatent_Intercept          0.02      0.02     0.00     0.03 2.65        2       13
y_offset_Intercept            -0.74      0.04    -0.78    -0.70 2.61        2       14
y_scale_Intercept              2.43      0.13     2.30     2.56 2.46        2       11
y_shift_Intercept              1.33      0.30     1.03     1.63 2.24        3       11
sigma_y_t:treatmentleft        2.45      0.86     1.59     3.31 2.93        2       11
sigma_y_t:treatmentneutral     1.68      0.97     0.71     2.67 3.03        2       11
sigma_y_t:treatmentright       3.24      2.40     0.84     5.65 3.05        2       11
y_slope_mislopelatent          1.00      0.00     1.00     1.00   NA       NA       NA

Family Specific Parameters: 
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma_slopelatent     0.00      0.00     0.00     0.00   NA       NA       NA

In my mind, the two models should behave similarly, since:

  • the relationship between slopelatent and slope is fixed (b=1)
  • The intercept for slopelatent has the same prior as slope does in the first model
  • The scale of the slopelatent distribution is fixed to a very low number
  • all values of slopelatent are inferred

Obviously, the resulting stan code looks quite different. Is there an obvious reason why the second model formulation is so much more challenging for Stan to sample?

Thanks,
Jan