I’m puzzling over how to supply hypergeometric_lpmf with integer parameters. Maybe it requires a tparameters block?
The data I’m modeling is # of guesses it takes to get the next character (or word) right in a cloze completion task. So, you’ve got a text snippet “…nd he went to the store to buy milk, but was waylaid by a _”, and you’re meant to guess what letter goes in the blank.
There are 26 letters in the English alphabet, and you guess until you get it right - sampling without replacement. So the hypergeometric distribution seemed like the thing that was closest to modeling the underlying data-generating process.
The hypergeometric requires combinations of integers; how can I make sure it gets integers?
Or/also, am I missing something here?
Here’s code for a simple model (I’ve tried many different things at this point, this is just a starting point for discussion):
# Load necessary libraries
library(brms)
library(tidyverse)
library(tidybayes)
library(patchwork)
# Define constants for hypergeometric distribution
N <- 26 # Total number of letters in the alphabet
m <- 1 # Number of correct letters
nincorrect <- 25 # Number of incorrect letters
# Define a custom family function for the hypergeometric distribution
# Note: The hypergeometric distribution isn't natively supported in brms, so we define it as a custom family
hypergeo_family <- custom_family(
"hypergeo",
dpars = c("mu", "nincorrect", "k"),
links = c("identity", "identity", "identity"),
lb = c(0, 0, 0),
type = "int",
)
# Define custom functions for the likelihood and posterior predictive density
dparse <- stanvar(
scode = "
real hypergeo_lpmf(int y, int mu, int nincorrect, int k) {
return hypergeometric_lpmf(y | mu, nincorrect, k); // using vint to ensure integer types
}",
block = "functions"
)
# Fit a Bayesian model with brms
m_hypergeo <-
brm(
formula = num_guesses ~ GPT_num_guesses + GPT_ans_prob + OANC_mean_guesses + prop_left + (1|ppt.code),
family = hypergeo_family,
data = model_data,
stanvars = dparse,
cores = 4,
seed = 1,
iter = 2000
)
# Summary of the model
summary(m_hypergeo)
# Plot model diagnostics
plot(m_hypergeo)
The error I see with this particular implementation is:
Error in stanc(file = file, model_code = model_code, model_name = model_name, :
0
Semantic error in 'string', line 57, column 16 to column 58:
-------------------------------------------------
55: }
56: for (n in 1:N) {
57: target += hypergeo_lpmf(Y[n] | mu[n], nincorrect, k);
^
58: }
59: }
-------------------------------------------------
Ill-typed arguments supplied to function 'hypergeo_lpmf':
(int, real, real, real)
Available signatures:
(int, int, int, int) => real
The second argument must be int but got real