Difficulty fitting a bounded continuous (gaussian) response

I am attempting to fit a model to a bounded continuous response ratings to auditory stimuli. Participants use a slider to rate stimuli on a continuous (1 decimal place) scale from 0 to 3.

I have tried a couple of different distributions for the bounded response (gaussian, lognormal, skew_normal) but they all seem to miss the characteristics of the response distribution. One of the conditions is nested within subjects.

# Formula
fmla <- bf(Rating | resp_trunc(lb = 0, ub = 3) ~ 0 + Intercept +
                            cond1 +
                                cond2 +
                                cond3t +
                            (1|SubjID/cond3),
                    center = FALSE)

All of my attempts so far have resulted in PPC’s like this:

I have tried fairly tight priors on the intercept (normal(1.5,0.05)) and sigma(normal(0,0.5)), but the predictions don’t seem to shift at all.

I have also tried scaling the data between (0,1) and using a Beta distribution, but that results in almost the same PPC discrepancy.

Any suggestions on how to capture what’s happening in these data?

Edit to add example data:

Data <- 
structure(list(SubjID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                                    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                                    1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
                                    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
                                    4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
                                    4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
                                    5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
                                    6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
                                    6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 
                                    7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 
                                    8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 
                                    8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 
                                    9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 
                                    10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 
                                    10L, 10L, 10L, 10L, 10L, 10L), 
                                  .Label = c("00", "04", "06", "09", 
                                             "11", "16", "21", "25", "27", "31", "33"), 
                                  class = "factor"), 
Cond = structure(c(3L, 1L, 4L, 3L, 1L, 4L, 3L, 1L, 4L, 3L, 
          1L, 4L, 6L, 2L, 5L, 6L, 2L, 5L, 6L, 2L, 5L, 6L, 2L, 5L, 3L, 
          1L, 4L, 3L, 1L, 4L, 3L, 1L, 4L, 3L, 1L, 4L, 6L, 2L, 5L, 6L, 
          2L, 5L, 6L, 2L, 5L, 6L, 2L, 5L, 3L, 1L, 4L, 3L, 1L, 4L, 3L, 
          1L, 4L, 3L, 1L, 4L, 6L, 2L, 5L, 6L, 2L, 5L, 6L, 2L, 5L, 6L, 
          2L, 5L, 3L, 1L, 4L, 3L, 1L, 4L, 3L, 1L, 4L, 3L, 1L, 4L, 6L, 
          2L, 5L, 6L, 2L, 5L, 6L, 2L, 5L, 6L, 2L, 5L, 3L, 1L, 4L, 3L, 
          1L, 4L, 3L, 1L, 4L, 3L, 1L, 4L, 6L, 2L, 5L, 6L, 2L, 5L, 6L, 
          2L, 5L, 6L, 2L, 5L, 3L, 1L, 4L, 3L, 1L, 4L, 3L, 1L, 4L, 3L, 
          1L, 4L, 6L, 2L, 5L, 6L, 2L, 5L, 6L, 2L, 5L, 6L, 2L, 5L, 3L, 
          1L, 4L, 3L, 1L, 4L, 3L, 1L, 4L, 3L, 1L, 4L, 6L, 2L, 5L, 6L, 
          2L, 5L, 6L, 2L, 5L, 6L, 2L, 5L, 3L, 1L, 4L, 3L, 1L, 4L, 3L, 
          1L, 4L, 3L, 1L, 4L, 6L, 2L, 5L, 6L, 2L, 5L, 6L, 2L, 5L, 6L, 
          2L, 5L, 3L, 1L, 4L, 3L, 1L, 4L, 3L, 1L, 4L, 3L, 1L, 4L, 6L, 
          2L, 5L, 6L, 2L, 5L, 6L, 2L, 5L, 6L, 2L, 5L), 
          .Label = c("1", "2", "3", "4", "5", "6"), 
          class = "factor"),  
Rating = c(0.2, 
           1.5, 0.7, 0, 1.4, 1.1, 0, 1.8, 1.2, 0.2, 1.6, 1, 0.3, 1.5, 
           0.9, 0.4, 1.5, 1.1, 0.4, 1.8, 1, 0.5, 1.7, 1.2, 0, 0, 0.4, 
           0, 0, 0.3, 0.4, 0, 0.2, 0, 0.4, 0.4, 0.7, 1, 1, 0.7, 1, 1, 
           1, 1, 1, 1, 1, 0.9, 0, 1, 1, 1, 2, 2.6, 1, 3, 2.3, 1, 2, 
           2.7, 2.4, 3, 2, 1.1, 3, 2.2, 1, 1.9, 1.6, 1, 2.4, 1.6, 0, 
           0.8, 0, 0, 0, 0, 0, 0, 0, 0, 0.4, 0, 1.5, 2.5, 1.8, 1.2, 
           2, 1.5, 1.3, 1.4, 1.4, 2.1, 1.3, 1.5, 0.2, 1.1, 1.9, 0.2, 
           2, 1.5, 0.4, 2, 1.5, 0, 1.3, 1.7, 0.8, 1.5, 2.3, 0.8, 1.4, 
           2, 0.3, 1.1, 1.8, 0, 1.2, 1.8, 0.2, 1, 0.6, 0.6, 0, 0, 0, 
           1.1, 1, 1, 0, 0, 1, 1.1, 1.1, 0.8, 0.9, 1.5, 0.9, 1.2, 1.1, 
           0.8, 1.5, 1.3, 0, 1, 1.2, 0, 1.2, 0.6, 0, 0.9, 1.1, 0, 0.8, 
           1, 0.7, 1.2, 1, 1, 1.1, 1.2, 0.9, 1.1, 1.2, 0.7, 1.1, 1.2, 
           0, 2.2, 1.6, 1.8, 1.3, 0.6, 0, 1.3, 1.3, 1.7, 1, 0.5, 2, 
           1, 1.3, 1, 1.7, 1.5, 1.7, 0.8, 1.3, 1.6, 1.3, 1.3, 1.4, 1.2, 
           1, 0.1, 0.3, 0.5, 0.1, 0.3, 0.5, 1.2, 0.5, 1.2, 1, 1, 1, 
           1.3, 1.1, 1.3, 1, 1, 1, 1.2, 1, 0)), row.names = c(NA, -216L
           ), 
class = "data.frame") 

Attempt with the above example data:

## Formula
Cond_fmla <- bf(Rating | resp_trunc(lb = 0, ub = 3) ~ 0 + 
                        Cond +
                        (1|SubjID),
                center = TRUE)

# Get list of possible priors
Cond_list <- get_prior(Cond_fmla,
                       data = Data,
                       family = gaussian()) 

## Set priors
Cond_priors <- c(
        set_prior("normal(2,1)",
                  class = "b"),
        
        set_prior("exponential(0.25)",
                  class = "sd",
                  coef = "Intercept",
                  group = "SubjID"),
        
        set_prior("normal(0,2)",
                  class = "sigma")
)


## Run model
Cond_Mod <- brm(
        Cond_fmla,
        Data,
        family = gaussian(),
        prior = Cond_priors,
        inits = "random",
        iter = 2000,
        warmup = 1000,
        chains = 4,
        cores = ncores,
        sample_prior = TRUE,
        save_pars = save_pars(all = TRUE),
        seed = 42,
        control = list(max_treedepth = 14,
                       adapt_delta = 0.99)
)

Resulting PPC

Example_PPC

If I remove the upper and lower bounds from the response, the location improves somewhat:

No_Bound_PPC

But, I still see density in y where there is none in the data (at and below 0), am I misunderstanding the way truncation works?

Hey @JLC,

Just as an idea, you could try to normalize your data and use a beta distribution. It might handle the boundness of your data better.

It also seems like your data is bimodal. Have you checked if there are a bunch of zeros in your data? Then a zero-inflated model might perform better (eg a zero-inflated beta).
If it’s just low values but not zeros, you could try a mixture model instead but they become harder to specify and sample.

Thank you very much for the suggestions, @scholz!

There are a lot of zeros in the data, but they’re not really zeros. The data come from a rating scale zero as the baseline rating and 3 as the maximum, so finding a good distributions is a bit tricky.

What I’ve done so far is adjust the ratings to be in the 1-4 range and then created a lognormal mixture model to capture the bimodal nature of the observations.

I’m not there yet, but I think I’m making progress:
Mix_PPC

This is the approach I have so far:

mix <- mixture(lognormal, lognormal)

Mix_fmla <- bf(Rating | trunc(ub = 4) ~ 1 +                     
                    Cond1 +
                    (1|Cond2) +
                    (1|Cond3) +
                    (1|SubjID),
                center = TRUE)

Mix_priors <- c(    
    set_prior("normal(1,0.5)",
              class = "Intercept",
              dpar = "mu1"),
    
    set_prior("normal(2,0.5)",
              class = "Intercept",
              dpar = "mu2"),
    
    set_prior("normal(0,3)",
              class = "b",
              dpar = "mu1"),
    
    set_prior("normal(0,3)",
              class = "b",
              dpar = "mu2"),    

    set_prior("normal(0,2)",
              class = "sigma1"),
    
    set_prior("normal(0,2)",
              class = "sigma2")
)

## Run model
Mix_Mod <- brm(
    Mix_fmla,
    Data 
    family = mix, 
    prior = Mix_priors,
    inits = 0,
    iter = 6000,
    warmup = 2500,
    chains = 4,
    cores = ncores,
    sample_prior = TRUE,
    save_pars = save_pars(all = TRUE),
    seed = 42,
    control = list(max_treedepth = 16,
                   adapt_delta = 0.999)
)

Looks a lot better than before. I would still try the zero-inflated beta. You can then compare the different models using loo_compare(loo(m1), loo(m2)). The models have to use the same outcomes, so you’d have to train your other models on the normalized outcomes as well (which shouldn’t be a problem I think).

I am also not sure you gain anything from moving from (0,3) to (0,4).

Thanks again, @scholz!

I tried a quick zero one inflated negbeta and it doesn’t look too bad.

Some more informative priors might help push predictions a little closer. What do you think?

Have you compared your models with loo_compare?
I can’t really help you with your priors as I have no experience with what you are working with but you can always start by plotting them.
Run your models with the sample_prior="only" option and then run a `pp_check´ on the result. It will show you how your prior looks in aggregate and on the outcome scale. You can use that to reflect your expectations.
You should be careful not to tune your priors to the point that they closely resemble your data as that is not what they are meant for.

If it makes sense, I think a hurdle model would work well. It’s saying what’s the probability of this stimuli being detected and then if it is detected what’s the probability of the scale being x. Like a hurdle with negative binomial, gamma, or lognormal may work well.