Greetings,
I am relatively new to Bayesian stuff and have some questions (beginner’s question).
Here is the data I am working on. All variables, except two are nominal/categorical data. The idea is to assess effects of individual variable(predictors) on the outcome(Alternation, categorical data with two factors) with random effects(Native_Language)
data.frame': 3485 obs. of 12 variables:
$ Native_Language : Factor w/ 27 levels "Bulgarian","Chinese",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Agent_Pos : Factor w/ 3 levels "NOUN","PRON",..: 2 1 2 2 1 3 3 3 3 3 ...
$ Agent_Animacy : Factor w/ 2 levels "Animate","Inanimate": 1 1 1 1 2 2 2 2 2 2 ...
$ Verb : Factor w/ 194 levels "absorb","accuse",..: 161 77 77 77 22 11 77 77 77 77 ...
$ Semantic_Class : Factor w/ 6 levels "a","c","f","nd",..: 1 1 1 1 1 3 1 1 1 1 ...
$ Theme_Pos : Factor w/ 3 levels "NOUN","PRON",..: 3 3 2 2 2 1 1 1 1 1 ...
$ Theme_Animacy : Factor w/ 2 levels "Animate","Inanimate": 2 2 2 2 1 2 2 2 2 2 ...
$ Theme_length : num 0.699 0.845 0.903 0.301 0.477 ...
$ Recipient_Pos : Factor w/ 3 levels "NOUN","PRON",..: 2 2 2 1 2 2 3 2 2 2 ...
$ Recipient_Animacy: Factor w/ 2 levels "Animate","Inanimate": 2 1 1 2 1 1 2 1 1 1 ...
$ Recipient_length : num 0.301 0.301 0.602 0.477 0.477 ...
$ Alternation : Factor w/ 2 levels "Double Object",..: 1 1 1 2 2 1 2 1 1 1 ...
Here is my model with weak prior informants
alternation_model <- brm(Alternation ~ (1 | Native_Language) + Agent_Pos + Agent_Animacy + Semantic_Class + Theme_Pos + Theme_Animacy + Theme_length + Recipient_Pos + Recipient_Animacy + Recipient_length, data = df, warmup = 1000, iter = 3000, cores = 4, chains = 4, family = bernoulli(link = 'logit'))
and the results are as follows;
Family: bernoulli
Links: mu = logit
Formula: Alternation ~ (1 | Native_Language) + Agent_Pos + Agent_Animacy + Semantic_Class + Theme_Pos + Theme_Animacy + Theme_length + Recipient_Pos + Recipient_Animacy + Recipient_length
Data: df (Number of observations: 3485)
Draws: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
total post-warmup draws = 8000
Group-Level Effects:
~Native_Language (Number of levels: 27)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.29 0.07 0.16 0.44 1.00 3318 5461
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -1.20 0.41 -2.02 -0.40 1.00 14616 6076
Agent_PosPRON -0.12 0.11 -0.33 0.09 1.00 13552 6037
Agent_PosPROPN 0.06 0.18 -0.31 0.42 1.00 15941 6282
Agent_AnimacyInanimate -0.40 0.10 -0.61 -0.19 1.00 13728 6330
Semantic_Classc -0.88 0.22 -1.32 -0.44 1.00 16805 5672
Semantic_Classf 0.51 0.17 0.17 0.85 1.00 16874 5879
Semantic_Classnd -1.85 0.32 -2.49 -1.26 1.00 15973 6077
Semantic_Classp -2.76 0.43 -3.68 -1.98 1.00 16276 5585
Semantic_Classt 0.45 0.16 0.14 0.75 1.00 15594 5528
Theme_PosPRON 1.25 0.22 0.83 1.68 1.00 16202 5617
Theme_PosPROPN 0.43 0.51 -0.62 1.40 1.00 16901 5790
Theme_AnimacyInanimate 0.60 0.27 0.09 1.13 1.00 16548 5931
Theme_length -1.16 0.24 -1.64 -0.69 1.00 16606 6162
Recipient_PosPRON -1.48 0.13 -1.74 -1.23 1.00 12939 6610
Recipient_PosPROPN 0.02 0.27 -0.50 0.55 1.00 17995 6193
Recipient_AnimacyInanimate 0.62 0.11 0.39 0.84 1.00 15439 6132
Recipient_length 2.31 0.29 1.75 2.86 1.00 12869 6375
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
1)First question is how to interpret results? Particularly negative estimate scores(I assume these are median scores for centrality). Does the scores for negatives mean that data is left-skewed? or Are they simply correlation results i.e predictors and outcomes are negatively correlated? If so, should I say that, for instance, likelihood of Semantic_Class
being p(semantic classification labeled as p
) reduces the likelihood of outcome being Double Object(codify as 1 in factor data)?
- How about the effect of random effects (Native Language)? Group-Level Effects result is as;
sd(Intercept) 0.29 0.07 0.16 0.44 1.00 3318 5461
However, I got the results for individual 27 factors from describe_posterior(results, test = c("p_direction", "rope", "bayesfactor", "equivalence_test", "p_map"), effects = 'random')
Parameter | Median | 95% CI | p (MAP) | pd | ROPE | % in ROPE | Equivalence (ROPE) | Rhat | ESS | BF
------------------------------------------------------------------------------------------------------------------------------------------------------------
Native_Language:Bulgarian | 0.16 | [-0.16, 0.52] | 0.711 | 83.58% | [-0.18, 0.18] | 54.75% | Undecided | 0.999 | 5583.00 | 1.28
Native_Language:Chinese | -0.05 | [-0.45, 0.38] | 0.956 | 59.05% | [-0.18, 0.18] | 66.01% | Undecided | 1.000 | 6049.00 | 0.789
Native_Language:Chinese-Cantonese | 0.33 | [ 0.08, 0.62] | 0.067 | 99.28% | [-0.18, 0.18] | 10.92% | Undecided | 0.999 | 3733.00 | 11.57
Native_Language:Czech | 0.32 | [-0.07, 0.72] | 0.287 | 95.15% | [-0.18, 0.18] | 23.81% | Undecided | 1.000 | 4753.00 | 3.71
Native_Language:Dutch | -0.05 | [-0.45, 0.33] | 0.970 | 61.05% | [-0.18, 0.18] | 66.38% | Undecided | 1.000 | 5849.00 | 0.777
Native_Language:Finnish | 0.22 | [-0.16, 0.60] | 0.541 | 87.75% | [-0.18, 0.18] | 43.07% | Undecided | 1.000 | 4672.00 | 1.65
Native_Language:French | -0.29 | [-0.72, 0.13] | 0.429 | 92.22% | [-0.18, 0.18] | 30.76% | Undecided | 1.000 | 4848.00 | 2.43
Native_Language:German | 0.05 | [-0.27, 0.41] | 0.939 | 62.68% | [-0.18, 0.18] | 72.17% | Undecided | 1.000 | 6180.00 | 0.743
Native_Language:Greek | -0.26 | [-0.64, 0.08] | 0.376 | 93.08% | [-0.18, 0.18] | 31.62% | Undecided | 1.000 | 4868.00 | 2.50
Native_Language:Hungarian | -0.25 | [-0.59, 0.07] | 0.325 | 93.95% | [-0.18, 0.18] | 33.31% | Undecided | 1.000 | 4682.00 | 2.37
Native_Language:Italian | 0.30 | [-0.04, 0.70] | 0.276 | 95.60% | [-0.18, 0.18] | 24.20% | Undecided | 1.000 | 5318.00 | 2.86
Native_Language:Japanese | -0.18 | [-0.54, 0.14] | 0.645 | 85.12% | [-0.18, 0.18] | 49.86% | Undecided | 0.999 | 6196.00 | 1.29
Native_Language:Korean | -0.02 | [-0.37, 0.31] | 0.976 | 53.95% | [-0.18, 0.18] | 74.19% | Undecided | 1.000 | 5629.00 | 0.820
Native_Language:Lithuanian | -0.03 | [-0.40, 0.34] | 0.984 | 55.53% | [-0.18, 0.18] | 70.14% | Undecided | 0.999 | 5587.00 | 0.922
Native_Language:Macedonian | -0.38 | [-0.78, 0.03] | 0.206 | 97.22% | [-0.18, 0.18] | 16.47% | Undecided | 1.000 | 5201.00 | 5.61
Native_Language:Norwegian | 0.02 | [-0.38, 0.40] | 0.975 | 54.47% | [-0.18, 0.18] | 69.24% | Undecided | 1.000 | 6209.00 | 0.775
Native_Language:Persian | -0.23 | [-0.63, 0.09] | 0.475 | 91.10% | [-0.18, 0.18] | 37.60% | Undecided | 1.000 | 5291.00 | 1.91
Native_Language:Polish | -0.19 | [-0.56, 0.16] | 0.598 | 87.08% | [-0.18, 0.18] | 47.65% | Undecided | 0.999 | 4846.00 | 1.60
Native_Language:Portuguese | 0.24 | [-0.11, 0.58] | 0.427 | 91.80% | [-0.18, 0.18] | 38.25% | Undecided | 1.000 | 5300.00 | 2.12
Native_Language:Punjabi | -1.35e-03 | [-0.47, 0.50] | > .999 | 50.25% | [-0.18, 0.18] | 58.38% | Undecided | 1.000 | 7047.00 | 0.847
Native_Language:Russian | 0.01 | [-0.37, 0.38] | 0.993 | 53.62% | [-0.18, 0.18] | 71.27% | Undecided | 0.999 | 7402.00 | 0.746
Native_Language:Serbian | 0.06 | [-0.29, 0.42] | 0.947 | 62.75% | [-0.18, 0.18] | 69.72% | Undecided | 1.000 | 6268.00 | 0.778
Native_Language:Spanish | 0.22 | [-0.13, 0.63] | 0.547 | 88.62% | [-0.18, 0.18] | 42.65% | Undecided | 0.999 | 4609.00 | 1.57
Native_Language:Swedish | 0.19 | [-0.14, 0.53] | 0.594 | 86.35% | [-0.18, 0.18] | 48.75% | Undecided | 1.000 | 5871.00 | 1.40
Native_Language:Tswana | -0.13 | [-0.47, 0.21] | 0.780 | 77.15% | [-0.18, 0.18] | 60.83% | Undecided | 0.999 | 5093.00 | 0.921
Native_Language:Turkish | 0.02 | [-0.38, 0.41] | 0.998 | 52.65% | [-0.18, 0.18] | 68.38% | Undecided | 0.999 | 7183.00 | 0.791
Native_Language:Urdu | -0.07 | [-0.55, 0.36] | 0.975 | 63.15% | [-0.18, 0.18] | 58.70% | Undecided | 0.999 | 7198.00 | 0.953
How should I interpret comparing Native Languages as Group-Level Effect and individual levels?
Edit an additional question. For factor coding, first level ‘Double Object’ is coded as 1 and ‘Prepositional’ is coded as 2. Since brms/rstanarm creates dummy coding I assume levels are then coded as 0-1 in which 1 = Double Object 0 = Not Double Object(i.e Prepositional). Therefore, results represent Double Object. So how can I get regression results for Prepositional without re-running the model?
Finally, is there a commonly agreed guideline for reporting Bayes Regression analysis?
I got the basic idea of describing posterior distrubiton (following explanations provided here yet I cannot the estimate the direction of correlation.