thank you for all the responses! I’m working through this slowly, and I want to check a few things:
in the example, @desislava said:
reponse_group1 <- fitted(fullunimputed, newdata = tibble(primetype = "1"), summary=FALSE)
so I have imputed and unimputed models. The unimputed are the same, but to run tests quickly and check things, so I’ll use them to test my understanding and run through your suggestions with.
Q1: How might I do fitted() on an imputed model (brm_multiple
)
Q2: I’m not familiar with fitted(), is it similar to as.draws
?
Q2: As @desislava guessed, my model is a little more complicated (though only slightly!) I add in standardised age (z.age
), and whether or not they smoke (0,1). I’ll put some reproducible data below. When using fitted
and a new dataframe, I assume I need to also include other variables from the model? If I incorporate smoking and z.age
, I would guess I do so holding z.age
at its mean, and smoker
at its mode?
I like the suggestion to answer in probabilities, as @desislava’s code said, so I’m trying to understand exactly what is going on there.
library(brms)
library(tictoc)
library(tidyverse)
library(posterior)
set.seed(2)
# Create the ID column
ids <- sprintf("%04d", 1:30)
# Create the sentiment column
sentiments <- c("negative", "positive", "neutral")
sentiment_col <- sample(sentiments, 30, replace = TRUE)
# Create the group column
groups <- rep(1:3, each = 10)
# Create the smoker column
smoker_col <- sample(0:1, 30, replace = TRUE)
# Create the z.age column
z_age_col <- rnorm(30)
# Create the dataframe
df <- data.frame(ID = ids,
sentiment = sentiment_col,
group = groups,
smoker = smoker_col,
z.age = z_age_col)
# Run the model
tic()
unimputed_mod <- brm(
formula = sentiment ~ 1 + group+z.age +smoker,
data = df,
family = categorical(refcat = "negative"),
seed = 64, chains = 5, control = list(adapt_delta = 0.999),
iter = 1000,warmup = 400,
cores = 5
)
toc() #17.716 sec elapsed
print(unimputed_mod)
# Family: categorical
# Links: muneutral = logit; mupositive = logit
# Formula: sentiment ~ 1 + group + z.age + smoker
# Data: df (Number of observations: 30)
# Draws: 5 chains, each with iter = 1000; warmup = 400; thin = 1;
# total post-warmup draws = 3000
# Population-Level Effects:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# muneutral_Intercept -1.44 1.47 -4.42 1.40 1.00 2349 2149
# mupositive_Intercept -1.95 1.44 -4.97 0.67 1.00 2195 1784
# muneutral_group 1.02 0.73 -0.34 2.51 1.00 1920 2012
# muneutral_z.age -0.05 0.79 -1.57 1.56 1.00 2000 1896
# muneutral_smoker -2.07 1.25 -4.72 0.24 1.00 1950 1672
# mupositive_group 0.82 0.65 -0.35 2.14 1.00 2015 2006
# mupositive_z.age -0.72 0.81 -2.36 0.76 1.00 1820 1604
# mupositive_smoker 0.35 1.09 -1.69 2.55 1.00 2034 2050
#
# Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
# and Tail_ESS are effective sample size measures, and Rhat is the potential
# scale reduction factor on split chains (at convergence, Rhat = 1).
IF I run a model without age and smoker:
df$group<-as.character(df$group)
unimputed_mod1 <- brm(
formula = sentiment ~ 1 + group,
data = df,
family = categorical(refcat = "negative"),
seed = 64, chains = 5, control = list(adapt_delta = 0.999),
iter = 1000,warmup = 400,
cores = 5
)
reponse_group1 <- fitted(unimputed_mod1, newdata = tibble(group = "1"), summary=FALSE)
reponse_group2 <- fitted(unimputed_mod1, newdata = tibble(group = "2"), summary=FALSE)
posterior::summarise_draws(reponse_group1 - reponse_group2)
# variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
# <chr> <num> <num> <num> <num> <num> <num> <num> <num> <num>
# 1 negative 0.393 0.403 0.186 0.187 0.0608 0.682 1.00 2029. 2158.
# 2 neutral -0.497 -0.508 0.173 0.174 -0.760 -0.197 1.00 2677. 2385.
# 3 positive 0.104 0.102 0.185 0.187 -0.195 0.411 1.00 2597. 2174.
Now the model output gives log odds, is the reponse_group1
reponse_group2
also log odds, so for the example above, the summarise_draws
are those numbers still log odds, and should be converted to actual odds? Or are the numbers crude numbers? so could I say 'if I survey a person from group 1 and a person from group 2, the person from group 1 is more likely to give a negative (0.393) and a positive answer (0.104) and less likely to give a neutral answer than the person from group 2?
Apologies for so many Qs and such a long thread! I want to make sure I am understanding this.