Difficulty fitting a bounded continuous (gaussian) response

JLC · December 20, 2020, 7:45pm

I am attempting to fit a model to a bounded continuous response ratings to auditory stimuli. Participants use a slider to rate stimuli on a continuous (1 decimal place) scale from 0 to 3.

I have tried a couple of different distributions for the bounded response (gaussian, lognormal, skew_normal) but they all seem to miss the characteristics of the response distribution. One of the conditions is nested within subjects.

# Formula
fmla <- bf(Rating | resp_trunc(lb = 0, ub = 3) ~ 0 + Intercept +
                            cond1 +
                                cond2 +
                                cond3t +
                            (1|SubjID/cond3),
                    center = FALSE)

All of my attempts so far have resulted in PPC’s like this:

I have tried fairly tight priors on the intercept (normal(1.5,0.05)) and sigma(normal(0,0.5)), but the predictions don’t seem to shift at all.

I have also tried scaling the data between (0,1) and using a Beta distribution, but that results in almost the same PPC discrepancy.

Any suggestions on how to capture what’s happening in these data?

Edit to add example data:

Data <- 
structure(list(SubjID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                                    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                                    1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
                                    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
                                    4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
                                    4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
                                    5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
                                    6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
                                    6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 
                                    7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 
                                    8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 
                                    8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 
                                    9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 
                                    10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 
                                    10L, 10L, 10L, 10L, 10L, 10L), 
                                  .Label = c("00", "04", "06", "09", 
                                             "11", "16", "21", "25", "27", "31", "33"), 
                                  class = "factor"), 
Cond = structure(c(3L, 1L, 4L, 3L, 1L, 4L, 3L, 1L, 4L, 3L, 
          1L, 4L, 6L, 2L, 5L, 6L, 2L, 5L, 6L, 2L, 5L, 6L, 2L, 5L, 3L, 
          1L, 4L, 3L, 1L, 4L, 3L, 1L, 4L, 3L, 1L, 4L, 6L, 2L, 5L, 6L, 
          2L, 5L, 6L, 2L, 5L, 6L, 2L, 5L, 3L, 1L, 4L, 3L, 1L, 4L, 3L, 
          1L, 4L, 3L, 1L, 4L, 6L, 2L, 5L, 6L, 2L, 5L, 6L, 2L, 5L, 6L, 
          2L, 5L, 3L, 1L, 4L, 3L, 1L, 4L, 3L, 1L, 4L, 3L, 1L, 4L, 6L, 
          2L, 5L, 6L, 2L, 5L, 6L, 2L, 5L, 6L, 2L, 5L, 3L, 1L, 4L, 3L, 
          1L, 4L, 3L, 1L, 4L, 3L, 1L, 4L, 6L, 2L, 5L, 6L, 2L, 5L, 6L, 
          2L, 5L, 6L, 2L, 5L, 3L, 1L, 4L, 3L, 1L, 4L, 3L, 1L, 4L, 3L, 
          1L, 4L, 6L, 2L, 5L, 6L, 2L, 5L, 6L, 2L, 5L, 6L, 2L, 5L, 3L, 
          1L, 4L, 3L, 1L, 4L, 3L, 1L, 4L, 3L, 1L, 4L, 6L, 2L, 5L, 6L, 
          2L, 5L, 6L, 2L, 5L, 6L, 2L, 5L, 3L, 1L, 4L, 3L, 1L, 4L, 3L, 
          1L, 4L, 3L, 1L, 4L, 6L, 2L, 5L, 6L, 2L, 5L, 6L, 2L, 5L, 6L, 
          2L, 5L, 3L, 1L, 4L, 3L, 1L, 4L, 3L, 1L, 4L, 3L, 1L, 4L, 6L, 
          2L, 5L, 6L, 2L, 5L, 6L, 2L, 5L, 6L, 2L, 5L), 
          .Label = c("1", "2", "3", "4", "5", "6"), 
          class = "factor"),  
Rating = c(0.2, 
           1.5, 0.7, 0, 1.4, 1.1, 0, 1.8, 1.2, 0.2, 1.6, 1, 0.3, 1.5, 
           0.9, 0.4, 1.5, 1.1, 0.4, 1.8, 1, 0.5, 1.7, 1.2, 0, 0, 0.4, 
           0, 0, 0.3, 0.4, 0, 0.2, 0, 0.4, 0.4, 0.7, 1, 1, 0.7, 1, 1, 
           1, 1, 1, 1, 1, 0.9, 0, 1, 1, 1, 2, 2.6, 1, 3, 2.3, 1, 2, 
           2.7, 2.4, 3, 2, 1.1, 3, 2.2, 1, 1.9, 1.6, 1, 2.4, 1.6, 0, 
           0.8, 0, 0, 0, 0, 0, 0, 0, 0, 0.4, 0, 1.5, 2.5, 1.8, 1.2, 
           2, 1.5, 1.3, 1.4, 1.4, 2.1, 1.3, 1.5, 0.2, 1.1, 1.9, 0.2, 
           2, 1.5, 0.4, 2, 1.5, 0, 1.3, 1.7, 0.8, 1.5, 2.3, 0.8, 1.4, 
           2, 0.3, 1.1, 1.8, 0, 1.2, 1.8, 0.2, 1, 0.6, 0.6, 0, 0, 0, 
           1.1, 1, 1, 0, 0, 1, 1.1, 1.1, 0.8, 0.9, 1.5, 0.9, 1.2, 1.1, 
           0.8, 1.5, 1.3, 0, 1, 1.2, 0, 1.2, 0.6, 0, 0.9, 1.1, 0, 0.8, 
           1, 0.7, 1.2, 1, 1, 1.1, 1.2, 0.9, 1.1, 1.2, 0.7, 1.1, 1.2, 
           0, 2.2, 1.6, 1.8, 1.3, 0.6, 0, 1.3, 1.3, 1.7, 1, 0.5, 2, 
           1, 1.3, 1, 1.7, 1.5, 1.7, 0.8, 1.3, 1.6, 1.3, 1.3, 1.4, 1.2, 
           1, 0.1, 0.3, 0.5, 0.1, 0.3, 0.5, 1.2, 0.5, 1.2, 1, 1, 1, 
           1.3, 1.1, 1.3, 1, 1, 1, 1.2, 1, 0)), row.names = c(NA, -216L
           ), 
class = "data.frame")

Attempt with the above example data:

## Formula
Cond_fmla <- bf(Rating | resp_trunc(lb = 0, ub = 3) ~ 0 + 
                        Cond +
                        (1|SubjID),
                center = TRUE)

# Get list of possible priors
Cond_list <- get_prior(Cond_fmla,
                       data = Data,
                       family = gaussian()) 

## Set priors
Cond_priors <- c(
        set_prior("normal(2,1)",
                  class = "b"),
        
        set_prior("exponential(0.25)",
                  class = "sd",
                  coef = "Intercept",
                  group = "SubjID"),
        
        set_prior("normal(0,2)",
                  class = "sigma")
)


## Run model
Cond_Mod <- brm(
        Cond_fmla,
        Data,
        family = gaussian(),
        prior = Cond_priors,
        inits = "random",
        iter = 2000,
        warmup = 1000,
        chains = 4,
        cores = ncores,
        sample_prior = TRUE,
        save_pars = save_pars(all = TRUE),
        seed = 42,
        control = list(max_treedepth = 14,
                       adapt_delta = 0.99)
)

Resulting PPC

Example_PPC

JLC · December 21, 2020, 2:35pm

If I remove the upper and lower bounds from the response, the location improves somewhat:

No_Bound_PPC

But, I still see density in y where there is none in the data (at and below 0), am I misunderstanding the way truncation works?

scholz · December 21, 2020, 6:17pm

Hey @JLC,

Just as an idea, you could try to normalize your data and use a beta distribution. It might handle the boundness of your data better.

It also seems like your data is bimodal. Have you checked if there are a bunch of zeros in your data? Then a zero-inflated model might perform better (eg a zero-inflated beta).
If it’s just low values but not zeros, you could try a mixture model instead but they become harder to specify and sample.

JLC · December 22, 2020, 12:14am

Thank you very much for the suggestions, @scholz!

There are a lot of zeros in the data, but they’re not really zeros. The data come from a rating scale zero as the baseline rating and 3 as the maximum, so finding a good distributions is a bit tricky.

What I’ve done so far is adjust the ratings to be in the 1-4 range and then created a lognormal mixture model to capture the bimodal nature of the observations.

I’m not there yet, but I think I’m making progress:
Mix_PPC

This is the approach I have so far:

mix <- mixture(lognormal, lognormal)

Mix_fmla <- bf(Rating | trunc(ub = 4) ~ 1 +                     
                    Cond1 +
                    (1|Cond2) +
                    (1|Cond3) +
                    (1|SubjID),
                center = TRUE)

Mix_priors <- c(    
    set_prior("normal(1,0.5)",
              class = "Intercept",
              dpar = "mu1"),
    
    set_prior("normal(2,0.5)",
              class = "Intercept",
              dpar = "mu2"),
    
    set_prior("normal(0,3)",
              class = "b",
              dpar = "mu1"),
    
    set_prior("normal(0,3)",
              class = "b",
              dpar = "mu2"),    

    set_prior("normal(0,2)",
              class = "sigma1"),
    
    set_prior("normal(0,2)",
              class = "sigma2")
)

## Run model
Mix_Mod <- brm(
    Mix_fmla,
    Data 
    family = mix, 
    prior = Mix_priors,
    inits = 0,
    iter = 6000,
    warmup = 2500,
    chains = 4,
    cores = ncores,
    sample_prior = TRUE,
    save_pars = save_pars(all = TRUE),
    seed = 42,
    control = list(max_treedepth = 16,
                   adapt_delta = 0.999)
)

scholz · December 22, 2020, 11:07pm

Looks a lot better than before. I would still try the zero-inflated beta. You can then compare the different models using loo_compare(loo(m1), loo(m2)). The models have to use the same outcomes, so you’d have to train your other models on the normalized outcomes as well (which shouldn’t be a problem I think).

I am also not sure you gain anything from moving from (0,3) to (0,4).

JLC · December 23, 2020, 5:54pm

Thanks again, @scholz!

I tried a quick zero one inflated negbeta and it doesn’t look too bad.

Some more informative priors might help push predictions a little closer. What do you think?

scholz · December 23, 2020, 9:00pm

Have you compared your models with loo_compare?
I can’t really help you with your priors as I have no experience with what you are working with but you can always start by plotting them.
Run your models with the sample_prior="only" option and then run a `pp_check´ on the result. It will show you how your prior looks in aggregate and on the outcome scale. You can use that to reflect your expectations.
You should be careful not to tune your priors to the point that they closely resemble your data as that is not what they are meant for.

spinkney · December 23, 2020, 10:00pm

If it makes sense, I think a hurdle model would work well. It’s saying what’s the probability of this stimuli being detected and then if it is detected what’s the probability of the scale being x. Like a hurdle with negative binomial, gamma, or lognormal may work well.

Topic		Replies	Views
Ordinal-ish model: setting priors and pp_check brms specification	10	1745	October 11, 2021
Multi-modal distribution (bounded 0-100) Modeling	7	693	June 25, 2022
Distribution of the response variable with skewed data containing small values and zeros brms	10	2881	November 2, 2019
Finding appropriate mixture distribution for brms model Modeling fitting-issues , specification , brms	3	1322	June 2, 2022
Appropriate choice of likelihood for 0 to 100 self-report response variable? Modeling	7	831	May 23, 2020

Difficulty fitting a bounded continuous (gaussian) response

Related topics