Issues with convergence for latent variable model in brms

I am interested in fitting a latent variable model to a set of data. I have played with the priors quite a bit and made them very tight, and have adjusted the delta and treedepth, and increased iterations to a high number, but I am not having any luck with convergence. If anyone can help me take a look at the model and see if I have specified it incorrectly, I would greatly appreciate it. Or is there something else I can do to the priors to help convergence?

The dataset here was collected to investigate whether there is any systematic bias or differences in variability in the weight (in the dataset this is called total weight or TW) measurements recorded by three different machines (the machines are identified as GAC, Vol, and Perten). There are about 650 samples (each sample has a unique combination of Plot:Trial:Year) that were each measured nine times, three times by each of the three machines. I tried to model this as each of those nine readings being an indication of a latent variable, the true weight TWtrue of the sample. Each machine was allowed to have its own sigma parameter to determine whether any of the machines gives more variable readings. A random intercept was fit for each sample as (1 | Plot:Trial:Year).

Here is a DAG of the model.


The data.table looks like this. Rep is just an identifier, so Rep 1 for one machine is not especially related to Rep 1 from another machine. The first three rows here just represent nine readings of the same sample, three times each on each of the three machines.

      Trial Year Plot Rep  GAC Perten  Vol
   1:    H6 2020  101   1 55.8   56.3 56.3
   2:    H6 2020  101   2 55.8   56.1 56.0
   3:    H6 2020  101   3 55.8   56.9 56.2
   4:    H6 2020  102   1 56.3   57.2 56.3
   5:    H6 2020  102   2 56.3   57.4 56.5
1933:    U8 2020  314   2 57.0   56.1 54.9
1934:    U8 2020  314   3 57.0   56.1 54.9
1935:    U8 2020  315   1 54.4   55.5 54.7
1936:    U8 2020  315   2 54.4   55.6 54.5
1937:    U8 2020  315   3 54.4   55.6 54.4

Here I create a column of NAs for the latent variable then define the different components of the model formula. The TWtrue variable predicts each of the three observed variables, is coded as missing, and has a random intercept for each sample.

dat_wide[, TWtrue := as.numeric(NA)]

bf_gac <- bf(GAC ~ mi(TWtrue))
bf_vol <- bf(Vol ~ mi(TWtrue))
bf_per <- bf(Perten ~  mi(TWtrue))
bf_tw <- bf(TWtrue | mi() ~ 0 + (1|Plot:Trial:Year))

Here are priors. I set the prior for each machine’s intercept to be centered around the overall mean of that machine, and the prior for the slopes of each machine to be around 1 because we would not expect a large difference from a slope of 1.

overall_means <- dat[, .(TW = mean(TW, na.rm = TRUE)), by = Machine]
stanvars <- stanvar(overall_means$TW[1], 'mean_gac') + 
  stanvar(overall_means$TW[2], 'mean_per') + 
  stanvar(overall_means$TW[3], 'mean_vol')

latent_model_priors <- c(
  prior(normal(mean_gac, 1), class = Intercept, resp = GAC),
  prior(normal(mean_per, 1), class = Intercept, resp = Perten),
  prior(normal(mean_vol, 1), class = Intercept, resp = Vol),
  prior(normal(1, 0.25), class = b, resp = GAC),
  prior(normal(1, 0.25), class = b, resp = Perten),
  prior(normal(1, 0.25), class = b, resp = Vol),
  prior(gamma(5, 5), class = sigma, resp = GAC),
  prior(gamma(5, 5), class = sigma, resp = Perten),
  prior(gamma(5, 5), class = sigma, resp = Vol),
  prior(gamma(2, 5), class = sigma, resp = TWtrue)

Here are two versions of the model, one with and one without residual correlation. I was not really sure which is more appropriate. Any guidance on that would also be helpful. Neither of these models converge, and they take a very long time to run on this modestly sized dataset. Thanks in advance for any help.

fit_latent <- brm(bf_gac + bf_vol + bf_per + bf_tw + set_rescor(TRUE), data = dat_wide,
                  prior = latent_model_priors,
                  chains = 4, iter = 10000, warmup = 7500, seed = 91919, stanvars = stanvars,
                  control = list(adapt_delta = 0.95, max_treedepth = 20))

fit_latent_norescor <- brm(bf_gac + bf_vol + bf_per + bf_tw + set_rescor(FALSE), data = dat_wide,
                  prior = latent_model_priors,
                  chains = 4, iter = 10000, warmup = 7500, seed = 19191, stanvars = stanvars,
                  control = list(adapt_delta = 0.95, max_treedepth = 20))

Have you already looked at resources like this?

Thanks for the response! The Scott Claessens post is actually what I used as a template for specifying the model. But I just wasn’t sure if I had correctly incorporated the random effect in the right place, because the Claessens example does not have one.

I’m far from an expert here, but in both cases they constrain parameters with either the constant() or a “stupidly narrow” prior. Yours seem much broader. Again, not sure what the solution is in your case but something to consider.

Yes, I noticed that some of the coefficients were set to be constant at 1 in the examples, but I did not think that constraining any of the parameters to be constant was appropriate in this case because I am not trying to do confirmatory factor analysis. I am actually trying to use the data to estimate what the slope and intercept are that relate the observed values from the different machines to the unobserved true value.

I am afraid the model will remain unidentifiable. Without any constraints, different values of loadings and scores will give the same linear predictor and thus the same log posterior. This will lead to multiple modes in the posterior and impede you to compute posterior mean and uncertainty for the slopes you are interested in.

Fixing one loading to 1 should solve this.

Eventually, you could try post-processing of loading as described in
Merkle, E. C., & Wang, T. (2018). Bayesian latent variable models for the analysis of experimental psychology data. Psychonomic bulletin & review , 25 (1), 256-270.

Briefly, you may want to change loadings sign depending upon the sign of a given loading.

Hope that helps!
All the best,

1 Like

Thanks Lucas! Just to confirm because I am not familiar with the terminology of factor analysis, the term “loading” here refers to the slope relating the latent variable to one of the observed values, correct?

Yes, the model is at the end a CFA, that is what the Scott Claessens is replicating. The 2 most common constraints would be to set a factor loadings (regression between latent variable and item), or the variance of the latent variable to 1. These do not change the model, but change the point of reference for the interpretation of the parameters. I prefer to fix the latent variance to 1

Also, I think you can do this in the blavaan (bayesian lavaan) R package, which uses Stan as the backend, and save a lot of trouble

1 Like

I wanted to follow up and say that I did eventually achieve convergence but only when I set residual correlation to FALSE and constrained two of the parameters to be constant at 1: sigma_TWtrue, the variance of the unobserved true value of total weight, and b_Vol_miTWtrue, the slope relating the unobserved true value to the observed value of the Vol machine (which is considered to be a more reliable standard measurement than the other two machines being compared). That is acceptable and still yields a valid inference about all the other parameters, correct?

Also, thanks again for your help, I wish I could mark both as solutions!