I’m trying to write a partial pooling linear regression model with respect to both the slope and intercept. I’ve worked from an example to produce a random intercept model, which I think works well, see below:
int <lower = 1> N; // number of data points
int <lower = 1> D; // number of datasets
vector [N] y; // dependent variable
vector [N] x; // independent variable
int dataset [N]; // dataset identifier
vector [D] alpha1;
// vector [D] beta1;
real <lower = 0> sigma;
real <lower = 0> sigma_alpha1;
// real <lower = 0> sigma_beta1;
vector [N] alpha = alpha0 + alpha1[dataset];
// vector [N] beta = beta0 + beta1[dataset];
y ~ normal(alpha + beta0 * x, sigma);
alpha1 ~ normal(0, sigma_alpha1);
// beta1 ~ normal(0, sigma_beta1);
I believe that the posterior estimate of \alpha_1 describes the optimal extent of pooling between intercepts in the different datasets, as indicated by the data.
I wanted to use the same approach for the intercept parameter, \beta. However, I am not able to run the model with those lines un-commented (and changing the model to:
y ~ normal(alpha + beta * x, sigma), because I am not able to multiply the parameter vector \beta by the data vector x.
I have been unable to find a way of structuring a random slope stan model in the same way as for random intercepts. I’m not sure if this is because it is a fundamentally more difficult problem or because I’m not working in the right direction.
Any advice would be much appreciated.
if I understand you correctly and “not able to run” means you cannot compile the model, this is probably just a small misunderstand of how Stan works with vectors. Unlike R (and others) Stan distinguishes (column)
* is matrix multiplication (
%*% in R), so you can multiply row vector with a column vector (order matters you either get a single value or a matrix). You are probably looking for the
.* operator which represents element-wise multiplication (this is what
* does in R).
Hope that helps.
Thanks for your reply! This is exactly the issue I was having.
I’ve since found this type of model in this case study: https://mc-stan.org/users/documentation/case-studies/radon.html (in case anyone hasn’t seen it).
Re-writing the model as
y ~ normal(alpha + beta .* x, sigma) does allow it to run. The issue is resolved.
…however, I wondered if you (or anyone else) also had any thoughts regarding the below:
In this separate example: https://mc-stan.org/users/documentation/case-studies/tutorial_rstanarm.html, a covariance matrix is used to describe the correlation between the slope and intercept parameters. This is consistent with the approach in Chapter 14 of Statistical Rethinking.
I’m wondering: why correlation in the posterior samples of the two parameters isn’t sufficient here?
Why does fitting a correlation coefficient parameter help?
Does this just better define the dependency? Or does it also help in pooling for the slope and intercept parameters?
It means you allow the model to take the correlation explicitly into account. If there indeed is correlation in the actual data, this could even decrease the effective number of parameters i.e. it can help in information sharing. Note that modelling correlation on the parameters speaks about correlation structure in the observed data while correlations in posterior are about model configurations consistent with the data. I.e. if there is a negative correlation between slope and intercept in the posterior, than you cannot say “the bigger intercept the smaller slope in the real world”, you can only say something like “model configurations consistent with the data either have large slopes or large intercepts but not both”.
Does that make sense?
Hi @martinmodrak - thanks for your thoughts!
This seems like a subtle difference on first reading.
As I understand it,
(a) correlations in the posterior, and
(b) a posterior estimate of a correlation parameter
…both describe the correlation between slope and intercept parameters, conditional on the data. They can both be accounted for when making predictions.
…can be accounted for explicitly.
…implicitly includes the same information regarding correlations between the parameters.
Is that fair to say?
I think the difference is important. When you explicitly model the correlation you can (at least in principle) say: “The actual correlation between slope and intercept in the real world is likely within x and y”, or in an applied language something like “Patients with high blood pressure at baseline tended to experience decrease in blood pressure following the intervention while patients with low blood pressure at baseline tended to experience an increase”.
On the other hand when you don’t model the correlation, but your posterior is correlated, you can only say something like “The data are consistent with either patients having high blood pressure at baseline and a decrease following intervention OR low blood pressure at baseline and an increase following intervention. We can rule out high-baseline + increase and low baseline + decrease”
Is the difference clearer?
Just to check my understanding of your answer…
There are a couple of key differences:
- Explicitly modelling the correlation allows it to be quantified (as part of the Bayesian model), and
- This also allows for more meaningful interpretation in terms of understanding results outside of the model
…and finally, in the context of the initial question, you’ve also explained:
Yes, that’s roughly what I had in mind.
@martinmodrak - thanks very much for your explanations, and thank you for taking the time!
I’ll continue to read on this topic, but this was very helpful.