Want to confirm my data is in the correct structure

Jacob_Richard_Thomas · January 9, 2024, 2:22pm

Hi Stan Forum, It is my first time here, and I don’t write code in Stan, just code in R with brms, but I can write the code in “stan” chunks if that some how makes it easier for you.

I am wondering if you can just tell me how to reformat my data so I can do the analysis that the reviewer asked me to do–my friend thinks the reviewer suggests that I am supposed to be getting a coefficient for each question from the brms output and I am not yet getting that.
I attach my data environment here along with an abbreviated version of code I have developed for you to run.
To review, the reviewer suggested the following:"An alternative approach to investigate the effects on both general attitudes and specific components would be to use multilevel linear modeling with items nested within participants, including random effects of the term. In R lme4 notation, the formula for this model would look like: response ~ undocumented + (1 | participant) + (undocumented | item). Fixed effects would indicate the general trend, while random effects estimates would show how these effects differ depending on the item used. "
Model seems to match their description.The difficulty for me as always is getting my data into a format that allows me to do this sort of analysis. my friend earlier suggested the following:
First read in the attached mydata2.csv file into R as mydata2Then:1) Step one: recode each item-variable so that it goes in the same (e.g. pro immigrant direction) and has the same number of levels:#this is for the coding where don’t know is 0

mydata2$immfavor_proimm<-revalue(mydata2$immfavor, c("-2"="5", "-1"="4", "0"="3", "1"="2", "2"="1"))
mydata2$immrights_proimm<-revalue(mydata2$immrights,  c("-2"="5", "-1"="4", "0"="3", "1"="2", "2"="1"))
mydata2$immlang_proimm<-revalue(mydata2$immlang,  c("-2"="5", "-1"="4", "0"="3", "1"="2", "2"="1"))
mydata2$immcrime_proimm<-revalue(mydata2$immcrime,  c("-2"="5", "-1"="4", "0"="3", "1"="2", "2"="1"))
mydata2$immjob_proimm<-revalue(mydata2$immjob,  c("-2"="5", "-1"="4", "0"="3", "1"="2", "2"="1"))
mydata2$immunit_proimm<-revalue(mydata2$immunit, c("2"="5", "1"="4", "0"="3", "-1"="2", "-2"="1"))
mydata2$letin_proimm<-revalue(mydata2$letin, c("2"="5", "1"="4", "0"="3", "-1"="2", "-2"="1"))
mydata2$immideas_proimm<-revalue(mydata2$immideas, c("2"="5", "1"="4", "0"="3", "-1"="2", "-2"="1"))
mydata2$immgrowth_proimm<-revalue(mydata2$immgrowth, c("2"="5", "1"="4", "0"="3", "-1"="2", "-2"="1"))

Step two: make sure all the variables are ordered in the right direction:mydata2$letin_proimm<-ordered(mydata2$letin_proimm, levels = c(“1”, “2”, “3”, “4”, “5”))

mydata2$immrights_proimm<-ordered(mydata2$immrights_proimm, levels = c("1", "2", "3", "4", "5"))
mydata2$immfavor_proimm<-ordered(mydata2$immrights_proimm, levels = c("1", "2", "3", "4", "5"))
mydata2$immlang_proimm<-ordered(mydata2$immlang_proimm, levels = c("1", "2", "3", "4", "5"))
mydata2$immcrime_proimm<-ordered(mydata2$immcrime_proimm, levels = c("1", "2", "3", "4", "5"))
mydata2$immideas_proimm<-ordered(mydata2$immideas_proimm, levels = c("1", "2", "3", "4", "5"))
mydata2$immjob_proimm<-ordered(mydata2$immjob_proimm, levels = c("1", "2", "3", "4", "5"))
mydata2$immgrowth_proimm<-ordered(mydata2$immgrowth_proimm, levels = c("1", "2", "3", "4", "5"))
mydata2$immunit_proimm<-ordered(mydata2$immunit_proimm, levels = c("1", "2", "3", "4", "5"))

step three: restructure the data into long format where each row is a response to an individual question/item for a particular participant (with individual questions nested within each participant):


library(tidyr)
responses.df <- mydata2%>%
    select(turkID = turkID,
           undocumented,
           letin_proimm,
           immrights_proimm,
           immfavor_proimm,
           immlang_proimm,
           immcrime_proimm,
           immideas_proimm,
           immjob_proimm,
           immgrowth_proimm,
           immunit_proimm, female, hispanic, white, age, northeast, south, midwest, lnincome, educ, born_usa, democrat, republican, media, polview, auth, ethnocentrism)%>%
    distinct(turkID, .keep_all = TRUE)%>%#Headsup don't forget the full inclusion criteria in the writeup.
    pivot_longer(c(letin_proimm, immrights_proimm, immfavor_proimm, immlang_proimm, immcrime_proimm, immideas_proimm, immjob_proimm, immgrowth_proimm, immunit_proimm),
                 names_to = "question")%>%
    mutate(ppnt_response = factor(value,
                                    ordered = TRUE,
                                    levels = c(1,2,3,4,5)) )#probably this is the default anyway but nice to be explicit.
responses.df$value<-as.factor(responses.df$value)

run the model with the following code–I believe this is correct if participant would be MTurkID and “item” is supposed to be question–but the output provided only gives the output alone which contains only one coefficient “undocumented” which I believe its composite association on all variables:


fit13<- brm(ppnt_response ~
             undocumented + (1 | turkID) + (undocumented | question),
           data = responses.df, init="0",
           family = cumulative("probit"), chains = 10, cores = 2,
           control = list(adapt_delta = 0.99))

output: Family: cumulative
Links: mu = probit; disc = identity
Formula: value ~ undocumented + undocumented + (1 | turkID) + (undocumented | question)
Data: responses.df (Number of observations: 3582)
Draws: 1 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 1000

Group-Level Effects:
~question (Number of levels: 9)
Estimate Est.Error l-95% CI u-95% CI Rhat
sd(Intercept) 0.38 0.13 0.21 0.74 1.00
sd(undocumented) 0.18 0.08 0.04 0.39 1.00
cor(Intercept,undocumented) -0.14 0.38 -0.77 0.65 1.00
Bulk_ESS Tail_ESS
sd(Intercept) 351 696
sd(undocumented) 290 414
cor(Intercept,undocumented) 1126 726

~turkID (Number of levels: 398)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.68 0.03 0.62 0.74 1.00 456 558

Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1] -1.14 0.15 -1.45 -0.83 1.00 404 514
Intercept[2] -0.42 0.15 -0.73 -0.12 1.00 400 469
Intercept[3] -0.12 0.15 -0.43 0.18 1.00 400 435
Intercept[4] 0.83 0.15 0.51 1.13 1.00 411 503
undocumented -0.08 0.10 -0.29 0.12 1.00 581 703

Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc 1.00 0.00 1.00 1.00 NA NA NA

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Is this not what the reviewer asked me to do? I thought that what question here is the equivalent of item and ppnt_response is the same as what hte reviewer called response and TurkID is the same as participant, but a Bayesian colleague seems to suggest i should have a different coefficient estimate for every item/question in my model and that is not what I see in my output. What I see from the output is not an estimate for each of the 9 items/questions but only a general estimate for the undocumented variable.If that is what I should expect from this model and everything is fine, then I could move ahead with providing you results and my results based on this minimal output.
if my code is not correct and I need to create other variables from the data by coding to what the reviewer suggests, I would like to know how I do that.
Thank you,
Jacob

datax.csv (73.8 KB)
Reproducible example of data prepration and analysis for BRMS.R (3.2 KB)

Ara_Winter · January 22, 2024, 8:20pm

Welcome.

Thanks for details and providing sample data and code. I am commenting here to bump this up to the top.

jd_c · January 22, 2024, 8:38pm

Howdy!
In the model that you fit, you will get a single ‘population-level effect’ (in the lingo Paul Christian-Burkner uses in brms) and a varying slope for the effect of undocumented on ppnt_response for every level of the group question. These varying slopes are actually offsets from the population-level effect of undocumented on ppnt_response, and they are not printed in the model summary output. To see the effect of undocumented on ppnt_response for each level of question, you can either extract the samples and manually add the varying offset for each level of question to the coefficient for undocumented, or you can use the coef() command which does that for you (see the documentation for details and particularly the example code given for the coef() function).

Topic		Replies	Views
Want to confirm my data is in the correct structure for brms regression given output Modeling brms	1	285	January 10, 2024
Understanding Multilevel modelling STAN code Modeling	2	172	April 26, 2024
BRMS to STAN Compatibility Modeling rstan , brms	3	194	February 12, 2024
Brms approach to Stan models for nested versus non-nested multi-level models? brms	11	5847	June 23, 2019
Brms paper published Publicity	2	1243	September 23, 2017

Want to confirm my data is in the correct structure

Related topics