Estimating people's factor scores (theta) from item response model


I’m running an item response model (i.e., a graded response model in item response theory). The model is a three-level model where the item responses are nested within multiple timepoints (waves), which are nested within participants. How can I generate the estimated IRT factor score (theta or ability score) for each participant–wave combination? For example, if participant 1 has responses at waves 1 and 3, and participant 2 has responses at waves 2, 3, and 4, how can I generate the estimated factor score for the following five participant–wave combinations?
-participant 1 at wave 1
-participant 1 at wave 3
-participant 2 at wave 2
-participant 2 at wave 3
-participant 2 at wave 4

My model syntax is below (the participant is identified by “ID”):

ordinalGRM2PL_formula <- bf(
  response ~ 1 + age + rater + age:rater + (1 |i| item) + (1 + age |p| ID) + (1 |w| ID:wave),
  disc ~ 1 + age + rater + age:rater + (1 |i| item) + (1 + age |p| ID) + (1 |w| ID:wave)

ordinalGRM2PL_model <- brm(
  formula = ordinalGRM2PL_formula,
  data = mydata,
  family = brmsfamily("cumulative", "logit")

I’m a new user to brms, so any help would be greatly appreciated. Thanks!

  • Operating System: Windows 10
  • brms Version: 2.10.0

Sorry, I don’t really understand IRT models, hopefully @paul.buerkner is not busy and can chime in?

You can use brms’ posterior_linpred function to get such factor scores.


Thanks for following up. I’m not sure how to use the posterior_linpred() function to generate people’s factor scores. Below, I’ll provide a minimal reproducible example:


rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
Sys.setenv(LOCAL_CPPFLAGS = '-march=native')


a <- matrix(rlnorm(20,.2,.3))
diffs <- t(apply(matrix(runif(20*2, .3, 1), 20), 1, cumsum))
diffs <- -(diffs - rowMeans(diffs))
d <- diffs + rnorm(20)

itemData <- simdata(a, d, 2400, itemtype = 'graded')
participantData <- expand.grid(ID = 1:200, wave = 1:4, rater = 1:3)
myData <- cbind(participantData, itemData)

myData[which(myData$ID %% 2 == 0 & myData$wave == 2), 4:ncol(myData)] <- NA
myData[which(myData$ID %% 2 != 0 & myData$wave == 3), 4:ncol(myData)] <- NA

myData_long <- myData %>%
  pivot_longer(-c("ID","wave","rater"), names_to = "item", values_to = "response")

myData_long$ID <- as.factor(myData_long$ID)
myData_long$wave <- as.factor(myData_long$wave)
myData_long$rater <- as.factor(myData_long$rater)
myData_long$item <- as.factor(myData_long$item)
myData_long$response <- factor(myData_long$response, levels = c(0, 1, 2), ordered = TRUE)

formula_va_ord_2pl <- bf(
  response ~ 1 + rater + (1 |i| item) + (1 |w| wave) + (1 |p| ID),
  disc ~ 1 + rater + (1 |i| item) + (1 |w| wave) + (1 |p| ID)

mixedEffectsGRM <- brm(
  formula = formula_va_ord_2pl,
  data = myData_long,
  family = brmsfamily("cumulative", "logit"),
  chains = 1,
  iter = 100,
  seed = 1234

mixedEffectsGRM_fitted <- fitted(mixedEffectsGRM)
mixedEffectsGRM_predict <- predict(mixedEffectsGRM)
mixedEffectsGRM_posterior1 <- posterior_linpred(mixedEffectsGRM)
mixedEffectsGRM_posterior2 <- posterior_linpred(mixedEffectsGRM, newdata = myData_long)

dim(mixedEffectsGRM_fitted) #36000 x 4 x 3
dim(mixedEffectsGRM_predict) #36000 x 3
dim(mixedEffectsGRM_posterior1) #50 x 36000
dim(mixedEffectsGRM_posterior2) #50 x 36000

nrow(unique(myData[,c("ID","wave")])) #800 ID-wave combinations

First, I simulate the data. Then I run the item response model in brms. Then I check the output for various functions, including fitted(), predict(), and posterior_linpred(). However, the output from all of these functions leads to output that is much larger than expected. I’m looking to obtain the factor scores for all 800 ID-wave combinations. Therefore, my final output should be a single vector of length 800. It seems that the fitted values are generated at the item-level, but I’m looking to output factor scores at the person-wave level. Paul showed how to do this at the person-level in his paper on item response modeling by using the ranef() function (the reference is below). However, I’m not sure how to do generate the factor scores at the person-wave level. Your help is greatly appreciated!

Bürkner, P.-C. (2019). Bayesian Item Response Modelling in R with brms and Stan. arXiv. doi: 1905.09501

The coef() method should be what you are looking for. Basically

1 Like

I assumed that you wanted factor scores for each ID, split by wave.

Here is what I meant in more detail.
(sorry, I am a data.table person and don’t use the tidyverse much, hope this can still be understood)

First transpose the results from posterior_linpred, make a data.table, and rename columns, and merge with myData_long.

tmp = data.table(t(mixedEffectsGRM_posterior1))
setnames(tmp,names(tmp), paste0("iter.",1:50))
complete.myData_long = data.table(myData_long)[complete.cases(myData_long)]
plp_wide = cbind(complete.myData_long, tmp)

Next make really long format, that has also iterations in long format.

plp_long = 
     id.vars = names(myData_long), = "iter")

Now we can aggregate by ID, wave and iteration:

plp_long.ID.wave.iter = 
plp_long[,list(latent = mean(value)),
          by = c("ID","wave","iter")]

With this in hand, we can for example calculate means, and CIs by wave and ID:

plp.long.ID.wave = 
plp_long_ID_wave_iter[,list(m = mean(latent),
                            CI05 = quantile(latent,.05),
                            CI95 = quantile(latent,.95)),
                        by = c("ID","wave")]
1 Like

Thanks very much to both of you for your help. A few questions: is there a reason to prefer the approach recommended by Paul vs. the one recommended by Guido (or vice versa)?

As for Paul’s suggestion to use the coef() function, I get NULL when I use the command suggested (coef(mixedEffectsGRM)[["ID:wave"]]). I’m uploading a text file here of the data returned from running coef(mixedEffectsGRM). coef.txt (20.7 KB)

Guido, your code is very helpful. I think I was able to follow it. It appears that it calculates the latent estimate of ability first across items within ID, wave, and iteration, and then it averages the latent estimate of ability across iterations within ID and wave. Is that correct? The code looks great. My only question is that, looking at the ability estimates, the ability estimates look considerably different from item-to-item, even for the same person at the same wave (see below for a table of means for each item). This is not necessarily surprising. However, this could be a problem if a person forgets to complete some items and therefore different items are going into the averages for different people. Is there a way to account for the potential bias of different items being endorsed by different people so that the estimate of a person’s estimate is driven by their latent factor score (and less by which items they completed)? Indeed, we administer slightly different (but overlapping) item sets at different ages/waves to make sure the measure retains construct validity at each of the different ages. Your thoughts are greatly appreciated.

> plp_long %>% group_by(item) %>% summarize(latent = mean(value, na.rm = TRUE))
# A tibble: 20 x 2
   item     latent
   <fct>     <dbl>
 1 Item_1   26.8  
 2 Item_10   1.08 
 3 Item_11 -38.3  
 4 Item_12  -0.223
 5 Item_13 -31.6  
 6 Item_14  -6.57 
 7 Item_15   0.992
 8 Item_16  18.4  
 9 Item_17  41.7  
10 Item_18 -17.6  
11 Item_19  38.5  
12 Item_2  -20.8  
13 Item_20  -7.92 
14 Item_3   -6.58 
15 Item_4    0.573
16 Item_5  -12.2  
17 Item_6   -4.85 
18 Item_7  -14.7  
19 Item_8  -11.6  
20 Item_9   -8.84

Your varying effects may be named differently so just look at the coef output itself without further subsetting it.

I think coeff outputs have coefficients for ID and for wave, but they do not provide factors scores for individuals at different waves. This is why I proposed to start with posterior_linpred.

1 Like

Ah I see. Thanks for clarifying it. I though I had seen ID:wave in the original post but maybe in another context.

1 Like

Let me restate that understood that you want to get one factor score per individual and item. If you were after something else, the approach I proposed is not necessarily useful.

I agree the that factor scores for items look funny. But I am also not sure what to expect, given that (a) I am not familier with how you set up the data-simulation and (b) I am not sure that I would look at factor scores for items. I would be interested in item difficulties, but the approach I proposed was specifically for factor scores, I don’t think one can used it to get item difficulties.
(I am also not sure how many iterations you used to get these results. iter = 100 is fine if you just want to have a look at the structure of the output. If you want to check the results itself, I’d stick with the recommended iter = 2000).

More generally, I think the model you are using is already quite complex. So you could start by checking if the posterior_linpred approach allows you to recover factor scores. A simple example could for example involve only 20 subjects, each with 10 items, over 2 waves, where factor scores generally increase from wave 1 to two. This is also a data set that is small enough that your can run it with iter = 2000, chains = 4

1 Like

Yes, for a model with ID:wave (which would have been my first attempt to model the described data) I would also use the coeff command.

That all makes sense. Thanks very much for your helpful and prompt responses. I will use coef() to obtain the factor scores if I include ID:wave in the model, whereas I will use posterior_linpred() to obtain the factor scores if I don’t include ID:wave in the model. And I will also conduct a simulation to verify that the factor scores obtained from posterior_linpred() correspond to what would be expected (e.g., if the data are generated such that factor scores increase from T1 to T2).

1 Like

@Guido_Biele I’m trying to use this processing with my data, and I’m running into challenges with the step you noted to turn the data into “really long format, that also has iterations in long format” (see quote above). The challenge is that my data set (myData_long) is very large (over 6,000 rows and over 30,000 columns), and when I run the melt function from the data.table package, it gives me the following error:

Error: cannot allocate vector of size 1.3 Gb

To give you a sense of how much it expands, I tried running it on a subset of the data (1,000 rows X 1,000 columns) and it expands to 992,000 rows X 10 columns (it would be many many more rows if I used the whole data set.

Do you have suggestions either for functions that can melt the large frame, or for other processing approaches? I also tried the melt function from the reshape package, but got the same error. I understand that this isn’t a Bayesian/stan issue, but thought you might have some suggestions given the scope of data that are sometimes examined with Bayesian methods. Thanks very much for your help!

1.3G is large, but not that large.

One thing you can try is to free up RAM, the simplest way being closing all other programs except RStudio. In R you can delete all intermediate data.tables or data.frames that you don’t need anymore (though I remember reading that this might not be sufficient).

Another possibility is to use fewer iterations, e.g. only 250 from each chain.