Capturing difference in within-subject variance in multilevel model with map2stan

Hi,

I’ve been using map2stan to fit a multilevel model to a dataset. I was hoping to model the difference in variance between individuals, and am having difficulty achieving this.

Here’s a dummy example of the approach I have been taking:

nullModel <- ulam(
   alist(
      outcome ~ dnorm(mu, sigma),
      mu <- a[ID] + beta[ID] * predictor,

      # Individual-level priors
      a[ID] ~ dnorm(a_mu, a_sig),
      beta[ID] ~ dnorm(beta_mu, beta_sig),
      sigma ~ dcauchy(0, 2),

      # Hyperparameter priors
      a_mu ~ dnorm(0, 2),
      a_sig ~ dcauchy(0, 1) & T[0,],
      beta_mu ~ dnorm(0, 2),
      beta_sig ~ dcauchy(0, 1) & T[0,]

), data = d )
      

The issue arises from the fact that I expect some difference between individuals in the variance of their scores. My question is, how can I specify the model such that I end up with individual a, beta, and sigma parameters for each individual in the grouping structure?

My initial attempt involved applying the same approach to the sigma parameter, like so:

      sigma ~ dcauchy(sigma_mu, sigma_sig),

      sigma_mu ~ dnorm(0, 2),
      sigma_sig ~ dcauchy(0, 1)

However, this returned the error message:
invalid class “map2stan” object: invalid object for slot “coef” in class “map2stan”: got class “NULL”, should be or extend class “numeric”

Am I going about this the wrong way? Any advice is hugely appreciated.

Sorry for any naïvety in my question, too!

Hi and welcome to the community!

First,

Do you set a prior on intercept?

Second, I believe that the dcauchy() in rethinking is already truncated?

Third, if you use individuals’ within measurement then I believe you can use this notation in rethinking (if you have a column ID for each individual):

a[ID]

so one example could be,

m <- ulam(
  alist(
    outcome ~ dnorm(mu, sigma),
    mu <- a + b * x + a[ID],
    a ~ dnorm(0,10),
    b ~ dnorm(0,2),
    a[ID] ~ dnorm(0,5),
    sigma ~ dexp(1)
  ), data = d, cores = 4, chains = 4
)

where x is a predictor and a[ID] is a varying intercept for each individual (but this would require a few measurements per individual I guess).

1 Like

Thanks for the welcome! And thanks for your help!

Whoops, that’s an error in the dummy code. intercept should just be replaced with a. I’ll update the original post to reflect that, for posterity.

This is, in part, what I the initial post should already do. The original code above provides an estimate of individual subjects’ intercept and slope, but doesn’t provide an estimate of the variance around those estimates. This is what I was hoping to achieve.

Hope this clarifies the question somewhat!

I think you should check out the chapter on categorical variables in the book.

If you want to estimate sigma for each ID I believe you could do, for example, something like this,

mu <- ... + a[ID],
a[ID] ~ dnorm(0, sigma_id),
sigma_id ~ dexp(2)

Unless I’m missing something, this would provide an estimate of sigma of the mean of the unique intercepts for each ID, which is subtly different from an estimate of the sigma of the observations themselves within that ID.

The a[ID] part of the definition of mu exists within the original code (missing the [ID] part in the definition, I’ll add that to the original code though it doesn’t affect it).

I have a feeling I’m misunderstanding something very straightforward here, sorry if that’s the case!

You’re right - could it be that what you want is really,

mu <- a[id] + b[id]*x
c[a,b][id] ~ multi_normal(c(a,b), Rho, sigma_id)

i.e., a varying effects model?

I believe this would offer an improvement on the current model insofar as it would allow for a prior to be set over the correlation matrix (and make the model specification more succinct), but, to my understanding, wouldn’t provide an estimate of the within-individual variance of scores.

Adding here that there’s a snippet that solves a similar problem here. The goal within the present framework would be analagous to finding log_sigma_id in the linked code – essentially reverse-engineering the sigma_id values that were created to simulate the data. The main difference is that there would only be one observation per individual within each nation.

Well if you only have one observation per individual I have a hard time seeing how to estimate the within variance? Maybe I’m misunderstanding you now :)

Sorry, that’s a confusing statement on my part.
The “one observation per individual” was referring to the linked code solving a similar problem, but using a data structure involving a number of individuals nested within countries.
The data in the problem I’m trying to solve involves a number of observations nested within individuals.

But if you have a multilevel model you can get the individual’s variance from the posterior? Please see Chapter 13 in McElreath’s book, and in particular Figure 13.4 (Figure 12.4 in the first edition).