Hi Stan Forum, It is my first time here, and I don’t write code in Stan, just code in brms, but I will write ```Stan before my code chunks
I am wondering if you can just tell me how to reformat my data so I can do the analysis that the reviewer asked me to do–my friend thinks the reviewer suggests that I am supposed to be getting a coefficient for each question from the brms output and I am not yet getting that.
I attach my data environment here along with an abbreviated version of code I have developed for you to run.
To review, the reviewer suggested the following:"An alternative approach to investigate the effects on both general attitudes and specific components would be to use multilevel linear modeling with items nested within participants, including random effects of the term. In R lme4 notation, the formula for this model would look like: response ~ undocumented + (1 | participant) + (undocumented | item). Fixed effects would indicate the general trend, while random effects estimates would show how these effects differ depending on the item used. "
Model seems to match their description.The difficulty for me as always is getting my data into a format that allows me to do this sort of analysis. my friend earlier suggested the following:
First read in the attached mydata2.csv file into R as mydata2
Then:
- Step one: recode each item-variable so that it goes in the same (e.g. pro immigrant direction) and has the same number of levels:#this is for the coding where don’t know is 0
mydata2$immfavor_proimm<-revalue(mydata2$immfavor, c("-2"="5", "-1"="4", "0"="3", "1"="2", "2"="1"))
mydata2$immrights_proimm<-revalue(mydata2$immrights, c("-2"="5", "-1"="4", "0"="3", "1"="2", "2"="1"))
mydata2$immlang_proimm<-revalue(mydata2$immlang, c("-2"="5", "-1"="4", "0"="3", "1"="2", "2"="1"))
mydata2$immcrime_proimm<-revalue(mydata2$immcrime, c("-2"="5", "-1"="4", "0"="3", "1"="2", "2"="1"))
mydata2$immjob_proimm<-revalue(mydata2$immjob, c("-2"="5", "-1"="4", "0"="3", "1"="2", "2"="1"))
mydata2$immunit_proimm<-revalue(mydata2$immunit, c("2"="5", "1"="4", "0"="3", "-1"="2", "-2"="1"))
mydata2$letin_proimm<-revalue(mydata2$letin, c("2"="5", "1"="4", "0"="3", "-1"="2", "-2"="1"))
mydata2$immideas_proimm<-revalue(mydata2$immideas, c("2"="5", "1"="4", "0"="3", "-1"="2", "-2"="1"))
mydata2$immgrowth_proimm<-revalue(mydata2$immgrowth, c("2"="5", "1"="4", "0"="3", "-1"="2", "-2"="1"))
- Step two: make sure all the variables are ordered in the right direction:
mydata2$letin_proimm<-ordered(mydata2$letin_proimm, levels = c("1", "2", "3", "4", "5"))
mydata2$immrights_proimm<-ordered(mydata2$immrights_proimm, levels = c("1", "2", "3", "4", "5"))
mydata2$immfavor_proimm<-ordered(mydata2$immrights_proimm, levels = c("1", "2", "3", "4", "5"))
mydata2$immlang_proimm<-ordered(mydata2$immlang_proimm, levels = c("1", "2", "3", "4", "5"))
mydata2$immcrime_proimm<-ordered(mydata2$immcrime_proimm, levels = c("1", "2", "3", "4", "5"))
mydata2$immideas_proimm<-ordered(mydata2$immideas_proimm, levels = c("1", "2", "3", "4", "5"))
mydata2$immjob_proimm<-ordered(mydata2$immjob_proimm, levels = c("1", "2", "3", "4", "5"))
mydata2$immgrowth_proimm<-ordered(mydata2$immgrowth_proimm, levels = c("1", "2", "3", "4", "5"))
mydata2$immunit_proimm<-ordered(mydata2$immunit_proimm, levels = c("1", "2", "3", "4", "5"))
- step three: restructure the data into long format where each row is a response to an individual question/item for a particular participant (with individual questions nested within each participant):
library(tidyr)
responses.df <- mydata2%>%
select(turkID = turkID,
undocumented,
letin_proimm,
immrights_proimm,
immfavor_proimm,
immlang_proimm,
immcrime_proimm,
immideas_proimm,
immjob_proimm,
immgrowth_proimm,
immunit_proimm, female, hispanic, white, age, northeast, south, midwest, lnincome, educ, born_usa, democrat, republican, media, polview, auth, ethnocentrism)%>%
distinct(turkID, .keep_all = TRUE)%>%#Headsup don't forget the full inclusion criteria in the writeup.
pivot_longer(c(letin_proimm, immrights_proimm, immfavor_proimm, immlang_proimm, immcrime_proimm, immideas_proimm, immjob_proimm, immgrowth_proimm, immunit_proimm),
names_to = "question")%>%
mutate(ppnt_response = factor(value,
ordered = TRUE,
levels = c(1,2,3,4,5)) )#probably this is the default anyway but nice to be explicit.
responses.df$value<-as.factor(responses.df$value)
- run the model with the following code–I believe this is correct if participant would be MTurkID and “item” is supposed to be question–but the output provided only gives the output alone which contains only one coefficient “undocumented” which I believe its composite association on all variables:
fit13<- brm(ppnt_response ~
undocumented + (1 | turkID) + (undocumented | question),
data = responses.df, init="0",
family = cumulative("probit"), chains = 10, cores = 2,
control = list(adapt_delta = 0.99))
output: Family: cumulative
Links: mu = probit; disc = identity
Formula: value ~ undocumented + undocumented + (1 | turkID) + (undocumented | question)
Data: responses.df (Number of observations: 3582)
Draws: 1 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 1000
Group-Level Effects:
~question (Number of levels: 9)
Estimate Est.Error l-95% CI u-95% CI Rhat
sd(Intercept) 0.38 0.13 0.21 0.74 1.00
sd(undocumented) 0.18 0.08 0.04 0.39 1.00
cor(Intercept,undocumented) -0.14 0.38 -0.77 0.65 1.00
Bulk_ESS Tail_ESS
sd(Intercept) 351 696
sd(undocumented) 290 414
cor(Intercept,undocumented) 1126 726
~turkID (Number of levels: 398)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.68 0.03 0.62 0.74 1.00 456 558
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1] -1.14 0.15 -1.45 -0.83 1.00 404 514
Intercept[2] -0.42 0.15 -0.73 -0.12 1.00 400 469
Intercept[3] -0.12 0.15 -0.43 0.18 1.00 400 435
Intercept[4] 0.83 0.15 0.51 1.13 1.00 411 503
undocumented -0.08 0.10 -0.29 0.12 1.00 581 703
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc 1.00 0.00 1.00 1.00 NA NA NA
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Is this not what the reviewer asked me to do? I thought that what question here is the equivalent of what the reviewer calls item and ppnt_response is hte same as what the reviewer called response, but my colleague seems to suggest i should get a coefficient for every item in the output. However, that is not what I see here.
if I need to create other variables from the data by coding to what the reviewer suggests, let me know how I do that. What I see from the output is not an estimate for each of the 9 items/questions but only a general estimate for the undocumented variable.If that is what I should expect from this model and everything is fine, then I could move ahead with providing you results and my results based on this minimal output.
Thank you!
Jacob
datax.csv (73.8 KB)
Reproducible example of data prepration and analysis for BRMS.R (3.2 KB)