Regarding Outcomes; A or not A vs. A or B, is there a difference? Help a beginner building model

Greetings,

I am trying to do my very first Bayesian analysis and as a beginner I have a bunch of questions. Below you can see a snapshoot of my data. Briefly, I would like to observe effects of categorical predictors along with two numeric ones over a categorical outcome with two factors.

Therefore, my model is as follows;
df_model <- stan_glmer(Alternation ~ (1 |Native_Language) + Agent_Pos + Agent_Animacy + Semantic_Class + Theme_Pos + Theme_Animacy + Theme_length + Recipient_Pos + Recipient_Animacy + Recipient_length, data = df, family = binomial(link = "logit"))

and the result is;

stan_glmer
 family:       binomial [logit]
 formula:      Alternation ~ (1 | Native_Language) + Agent_Pos + Agent_Animacy + 
	   Semantic_Class + Theme_Pos + Theme_Animacy + Theme_length + 
	   Recipient_Pos + Recipient_Animacy + Recipient_length
 observations: 3485
------
                           Median MAD_SD
(Intercept)                -1.2    0.4  
Agent_PosPRON              -0.1    0.1  
Agent_PosPROPN              0.1    0.2  
Agent_AnimacyInanimate     -0.4    0.1  
Semantic_Classc            -0.9    0.2  
Semantic_Classf             0.5    0.2  
Semantic_Classnd           -1.8    0.3  
Semantic_Classp            -2.7    0.4  
Semantic_Classt             0.4    0.2  
Theme_PosPRON               1.3    0.2  
Theme_PosPROPN              0.5    0.5  
Theme_AnimacyInanimate      0.6    0.3  
Theme_length               -1.2    0.3  
Recipient_PosPRON          -1.5    0.1  
Recipient_PosPROPN          0.0    0.3  
Recipient_AnimacyInanimate  0.6    0.1  
Recipient_length            2.3    0.3  

Error terms:
 Groups          Name        Std.Dev.
 Native_Language (Intercept) 0.29    
Num. levels: Native_Language 27 

------
* For help interpreting the printed output see ?print.stanreg
* For info on the priors used see ?prior_summary.stanreg

and also;

Model Info:
 function:     stan_glmer
 family:       binomial [logit]
 formula:      Alternation ~ (1 | Native_Language) + Agent_Pos + Agent_Animacy + 
	   Semantic_Class + Theme_Pos + Theme_Animacy + Theme_length + 
	   Recipient_Pos + Recipient_Animacy + Recipient_length
 algorithm:    sampling
 sample:       4000 (posterior sample size)
 priors:       see help('prior_summary')
 observations: 3485
 groups:       Native_Language (27)

Estimates:
                                                   mean   sd   10%   50%   90%
(Intercept)                                      -1.2    0.4 -1.7  -1.2  -0.7 
Agent_PosPRON                                    -0.1    0.1 -0.3  -0.1   0.0 
Agent_PosPROPN                                    0.1    0.2 -0.2   0.1   0.3 
Agent_AnimacyInanimate                           -0.4    0.1 -0.5  -0.4  -0.3 
Semantic_Classc                                  -0.9    0.2 -1.2  -0.9  -0.6 
Semantic_Classf                                   0.5    0.2  0.3   0.5   0.7 
Semantic_Classnd                                 -1.9    0.3 -2.2  -1.8  -1.5 
Semantic_Classp                                  -2.8    0.4 -3.3  -2.7  -2.2 
Semantic_Classt                                   0.4    0.2  0.2   0.4   0.6 
Theme_PosPRON                                     1.3    0.2  1.0   1.3   1.5 
Theme_PosPROPN                                    0.4    0.5 -0.2   0.5   1.1 
Theme_AnimacyInanimate                            0.6    0.3  0.3   0.6   0.9 
Theme_length                                     -1.2    0.3 -1.5  -1.2  -0.8 
Recipient_PosPRON                                -1.5    0.1 -1.6  -1.5  -1.3 
Recipient_PosPROPN                                0.0    0.3 -0.3   0.0   0.4 
Recipient_AnimacyInanimate                        0.6    0.1  0.5   0.6   0.8 
Recipient_length                                  2.3    0.3  1.9   2.3   2.7 
b[(Intercept) Native_Language:Bulgarian]          0.2    0.2  0.0   0.2   0.4 
b[(Intercept) Native_Language:Chinese]           -0.1    0.2 -0.3   0.0   0.2 
b[(Intercept) Native_Language:Chinese-Cantonese]  0.3    0.1  0.2   0.3   0.5 
b[(Intercept) Native_Language:Czech]              0.3    0.2  0.1   0.3   0.6 
b[(Intercept) Native_Language:Dutch]             -0.1    0.2 -0.3  -0.1   0.2 
b[(Intercept) Native_Language:Finnish]            0.2    0.2  0.0   0.2   0.5 
b[(Intercept) Native_Language:French]            -0.3    0.2 -0.6  -0.3   0.0 
b[(Intercept) Native_Language:German]             0.1    0.2 -0.2   0.1   0.3 
b[(Intercept) Native_Language:Greek]             -0.3    0.2 -0.5  -0.3   0.0 
b[(Intercept) Native_Language:Hungarian]         -0.3    0.2 -0.5  -0.2   0.0 
b[(Intercept) Native_Language:Italian]            0.3    0.2  0.1   0.3   0.6 
b[(Intercept) Native_Language:Japanese]          -0.2    0.2 -0.4  -0.2   0.0 
b[(Intercept) Native_Language:Korean]             0.0    0.2 -0.2   0.0   0.2 
b[(Intercept) Native_Language:Lithuanian]         0.0    0.2 -0.3   0.0   0.2 
b[(Intercept) Native_Language:Macedonian]        -0.4    0.2 -0.6  -0.4  -0.1 
b[(Intercept) Native_Language:Norwegian]          0.0    0.2 -0.2   0.0   0.3 
b[(Intercept) Native_Language:Persian]           -0.2    0.2 -0.5  -0.2   0.0 
b[(Intercept) Native_Language:Polish]            -0.2    0.2 -0.4  -0.2   0.0 
b[(Intercept) Native_Language:Portuguese]         0.2    0.2  0.0   0.2   0.5 
b[(Intercept) Native_Language:Punjabi]            0.0    0.2 -0.3   0.0   0.3 
b[(Intercept) Native_Language:Russian]            0.0    0.2 -0.2   0.0   0.3 
b[(Intercept) Native_Language:Serbian]            0.1    0.2 -0.2   0.1   0.3 
b[(Intercept) Native_Language:Spanish]            0.2    0.2  0.0   0.2   0.5 
b[(Intercept) Native_Language:Swedish]            0.2    0.2  0.0   0.2   0.4 
b[(Intercept) Native_Language:Tswana]            -0.1    0.2 -0.4  -0.1   0.1 
b[(Intercept) Native_Language:Turkish]            0.0    0.2 -0.2   0.0   0.3 
b[(Intercept) Native_Language:Urdu]              -0.1    0.2 -0.4  -0.1   0.2 
Sigma[Native_Language:(Intercept),(Intercept)]    0.1    0.0  0.0   0.1   0.1 

Fit Diagnostics:
           mean   sd   10%   50%   90%
mean_PPD 0.3    0.0  0.3   0.3   0.3  

The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).

First Question; As far as I get, the basic concepts underlying Bayesian statistics is to assess chance of success i.e if the outcome is A or not A. For my case, outcome is either A or B, therefore success rate of A means failure rate of B, am I correct?

Second Question; Is my formula given above correct? Again 11 predictors and 1 outcome with 2 factors. Native_Language is the random effect over outcomes.

Third Question, How should I interpret the results? From a frequncy approach, it is possible to see if the likely outcome is A or B given the predictors. Is it possible to observe the same results with Bayesian framework? is so, how? ShinyStan graphics do not help much :/

I have never taken statistics or any related subjects since I graduated with a degree in Educational Sciences, more particularly Language Teaching (also have Ph.D. in the same field and I am familiar with many types of statistics due to studies in Corpus Linguistics). I am a self-learner type, I would really appreciate less technical explanations if possible.

Welcome to Discourse!

yes & yes

The parameter estimates from a Bayesian analysis of a model mean the exact same thing as the parameter estimates from a frequentist analysis of the same model. You can interpret them in exactly the same way. The difference between the Bayesian and Frequentist approaches has to do with how the estimates are obtained. There are two main differences:

  1. In Bayesian analysis, there is a prior that influences the computation.
  2. In Bayesian model fitting with MCMC (which is what Stan does by default), you don’t get just one estimate, but rather an entire posterior distribution of estimates. Any one of these posterior samples is interpretable just like the frequentist estimate, but the ensemble of the posterior samples fully captures the uncertainty (according to the model you’ve specified) in what the estimate says or means.
    To interpret the ensemble of posterior samples, we do our interpretation on each sample, and then we look at the posterior distribution of what our interpretation is. For example, if “interpreting” means checking whether a parameter is “large” or “small”, we check this across all posterior iterations to figure out how certain we are about our interpretation. If “interpreting” means predicting results at new data values, then we do this prediction for each posterior iteration, and we look at the posterior distribution of our predictions.
2 Likes

Awesome explanation! All clear now. One more question about priors, when dealing with priors, we state them in terms of what? For instance;

prior = [normal](0, 5)
prior_intercept = [student_t](4, 0, 10),
prior_aux = [cauchy](0, 3)

These numbers 0, 3, 4, 5, 10 are values of? mean?, median? etc? When we are defining priors, how do we define them? More importantly what do they actually stand for? I got the general idea of setting priors yet not sure about the details.