Multiple Outcome Variables

dmp · July 24, 2024, 10:40pm

Hi, I am new to both Stan and Bayesian Estimation Methods. I have an individual-level panel data and for each individual, I have three outcome variables, say, y1,y2,y3. y1 and y2 are binary variables, where one depends on the other, and y3 is a categorical variables. I want to jointly model these three dependent variables. I know how to write down the likelihood of the observations in the frequentist way, however, I am not sure how to do it using the Bayesian Methods.
My question is can I estimate the model with the loglikelihood function as the objective function with the ‘target+’? If not, how should I do it?

mippeqf · July 25, 2024, 8:44am

If you want to fit a simple multivariate linear regression model, then all that Bayesianism entails is specifiying a prior distribution on your parameters (presumably beta and Sigma). In Stan, you can choose virtually any distribution to this end, and you would probably want to set it to be quite diffuse (eg a large variance for the beta prior).

The likelihood function is the objective function in both the frequentist and the Bayesian paradigm, although in a Bayesian approach, you would simply add the log of the prior distribution(s) to the log likelihood. If you specify as priors

beta ~ normal(mean,var);
Sigma ~ wishart(nu, V);

Stan interprets this as "please add normal_lupdf(beta|mean,var) and wishart_lupdf(Sigma|nu,V) to my target, ie log likelihood objective function.
Hence, you can also write the main regression equation either as

y ~ normal(X*beta,Sigma);

or equivalently as

target += normal_lupdf(y|X*beta,Sigma);

The target += ... functionality is important primarily for reparametrizations, when you need to manually add the Jacobian to the likelihood. Unless you’re dealing with more complex structures, you can probably stick to the more elegant variable ~ distribution() notation.

harrelfe · July 28, 2024, 11:42am

The question is about binary outcomes that are not assumed to be independent.

mippeqf · July 28, 2024, 2:29pm

The likelihood is known, so I understood the question as which additional accessories a Bayesian estimation would require over a frequentist one. The linear regression is an example to this end, as a prudent choice of prior surely depends on the details of the likelihood.

dmp · July 31, 2024, 2:33pm

yes. This is where I have troubles. Can I simply use the if/else condition to express the distribution? something like,
if y1=1:
y2~binomial(–)

in my case, y2 only matters when y1=1, otherwise, it is not observed.

Bob_Carpenter · July 31, 2024, 2:51pm

If y1 is data and the result is just branching on data, then it should be fine. Ideally, you want to make sure that the likelihood is continuously differentiable in the parameters. If it’s not, HMC can revert to really inefficient rejection sampling or just outright break.

The only thing you need in addition to the likelihood is priors. I want to emphasize that the prior can only be understood in the context of the likelihood (link is to Gelman et al.'s paper of that title). That is, you need to know your likelihood in order to formulate a prior so that you can formulate your prior knowledge relative to that. For example, you can’t put a prior on the intercept of a linear regression without knowing the scale of the independent (x) dependent (y) variables.

Topic		Replies	Views
Fitting multiple datasets simultaneously General specification	8	1270	January 1, 2021
Write model for log-likelihood in Stan General	9	2735	December 11, 2020
Hierarchical model with multivariate outcome that sums to 1 Modeling	8	527	June 6, 2023
Modelling joint likelihood of two outcomes & parameter correlation Modeling rstan	1	539	August 26, 2022
Multivariate outcome logit Modeling rstan	23	1256	February 2, 2023

Multiple Outcome Variables

Related topics