Correlation between questionnaire measurements and parameters estimated in regression

Dear all,
I’m trying to make a model with human data (31 subjects, each ~ 180 trials).
There are two steps:

  1. For each person predict their behaviour on each trial of the task as a function of different factors (regressors) in a logistic regression.
  2. Measure the correlation between one of the regression weights (in the example here number 2) and an independent questionnaire measure (that is read in as ‘data’)

I have maybe found a solution, but I’m not certain that it is correct. What I’m wondering about specifically is how I’m using the regression weight (regressor 2) for each person to correlate it with the questionnaire data; whether it is wrong to put the ‘data’ together with an estimated parameter as a ‘transformed parameter’ to put it into the multi_normal? In the model fit, maybe not surprisingly, there are only 2 ‘samples’ for the part of the transformed parameter that are given as data (and some rhats for this are ‘nan’). Everything else has rhats of about 1.0 and many samples. [I’m also uploading the .stan file because while the formatting looks good in the screen where I enter it, it doesn’t look good in the preview).

data {
  int<lower=1> N;                      //number of data points
  int<lower=1> Nsub;                   //number of participants
  int<lower=1> Npreds;                 //number of regressors/predictors (count excludes the constant)
  int subI[N];                         //which person a trial bleongs to
  real QuesScores[Nsub];               //independent questionnaire measurement for each person
  int<lower=0,upper=1> respMat[N];     //participants' responses(0 or 1 - independent variable)
  row_vector[Npreds+1] datMat[N];      //predictors for regression (also has a constant)
}

parameters {
  // Group level
  vector[Npreds+1] beta_mu;
  vector<lower=0>[Npreds+1] beta_sd;

  // Individual people
  vector[Npreds+1] beta_pr[Nsub];

  // For correlation
  real regQuesMu; 
  vector<lower=0>[1] sigma;//[2] sigma;
  real<lower=-1,upper=1> r; // correlation 
}

transformed parameters{
  vector[2] combRegQues[Nsub]; // ??putting together the estimated parameter and the questionnaire score (to later use with multinormal distribution)
  vector[2] combRegQuesMu;
  vector[Npreds+1] beta[Nsub];
  cov_matrix[2] T;

  // transform regression weights (non-centered distr. to speed up)
  for (is in 1:Nsub){
    beta[is]   = beta_pr[is].*beta_sd+beta_mu;   
  }

   // Add in the questionnaires??
  combRegQuesMu[1] = beta_mu[2];
  combRegQuesMu[2] = regQuesMu;
  for (is in 1:Nsub){
    combRegQues[is,1] = beta[is,2];
    combRegQues[is,2] = QuesScores[is];
  }
  
  // fill the covariance matrix
  T[1,1] = sigma[1]*sigma[1];
  T[1,2] = r * sigma[1] * beta_sd[2];//sigma[2];
  T[2,1] = T[1,2];
  T[2,2] = beta_sd[2]*beta_sd[2];//sigma[2]*sigma[2];
}


// This block runs the actual model
model {
  real mu[N];

  // Define the priors
  beta_mu    ~ cauchy(0,5);
  beta_mu[1] ~ cauchy(0,20);
  beta_sd    ~ cauchy(0,5);
  regQuesMu  ~ cauchy(0,5); // i have zscored the input
  
  for (is in 1:Nsub){
    beta_pr[is] ~ normal(0,1);//(beta_mu,beta_sd);
  }

  // For each trial, combine all the predictors and outcomes
  for (i in 1:N){ 
    mu[i] = datMat[i]*beta[subI[i]];  
  }
  
  // Describe the regression model (i.e. how data and parameters relate)
  respMat  ~ bernoulli_logit(mu);
  
  // Correlation between regression weight and questionnaire score
  combRegQues ~ multi_normal(combRegQuesMu,T); 
}

temp.data.R (833.5 KB)
ToyExamplCorrRegressorQuestionnaireScoreV2.stan (2.4 KB)
ToyExamplCorrRegressorQuestionnaireScoreV2.stan (2.4 KB)

[edit: escaped code]

Sounds like what the economists call a “discrete choice” model.

For correlation, you need two parallel vectors. The regression weight is a single value. So I’m not sure what you mean. You can measure the correlation of two parameters in the posterior by taking the parlalel vectors to be the value of the parameters in each draw making up a sample (a sample is a sequence of draws, so the sample has a size).

That’s fine. In fact, you need to do just this to deal with “missing data” problems—there’s a section in the manual.

You may need tighter priors than those Cauchy’s.

This:

  beta_mu    ~ cauchy(0,5);
  beta_mu[1] ~ cauchy(0,20);

is unlikely to be what you want, as it gives beta_mu[1] a prior that’s a product of Cauchy densities. What you probably want is

  beta_mu[2: ]   ~ cauchy(0,5);
  beta_mu[1] ~ cauchy(0,20);

so that each beta_mu[n] gets its own simple prior.

You may want to tighten the priors if you’re having problems—Cauchy is very week.

This

for (is in 1:Nsub){
    beta_pr[is] ~ normal(0,1);//(beta_mu,beta_sd);
  }

can be vectorized as

to_vector(beta_pr) ~ normal(0, 1);

and it’ll be faster and simpler. You’d need to declare beta_pr as a matrix.

This

    combRegQues[is,1] = beta[is,2];
    combRegQues[is,2] = QuesScores[is];

can just be

combRegQues[ , 1] = beta[ , 2];
combRegQues[ , 2] = QuesScores;

and this

for (i in 1:N){ 
    mu[i] = datMat[i]*beta[subI[i]];  
  }

can just be

mu = dataMat .* beta[subI];

if everything’s declared as a vector.

You need to escape with triple back-ticks on the line before and after. I edited the original post for you.

...code...

Dear Bob,
thank you so much for all these extremely useful pointers!!

For correlation, you need two parallel vectors. The regression weight is a single value. So I’m not sure what you mean. You can measure the correlation of two parameters in the posterior by taking the parlalel vectors to be the value of the parameters in each draw making up a sample (a sample is a sequence of draws, so the sample has a size).

Actually that’s what I have (I think) - I get for each person a single parameter (e.g. beta[sub1,2], beta[sub2,2] etc.)? or is this not what my code is doing? And then I also have a single questionnaire measurement from each person (QuesScore[nsub]); and then I thought I’m combining them into a 2d-vector (combRegQues[nsub,2]) and then calculating the correlation of that:
combRegQues ~ multi_normal(combRegQuesMu,T);

This:

beta_mu ~ cauchy(0,5);
beta_mu[1] ~ cauchy(0,20);
is unlikely to be what you want, as it gives beta_mu[1] a prior that’s a product of Cauchy densities. What you probably want is

beta_mu[2: ] ~ cauchy(0,5);
beta_mu[1] ~ cauchy(0,20);
so that each beta_mu[n] gets its own simple prior.

Thank you very much for pointing this out (and the other language recommendations)! I assumed I was just over-writing beta_mu[1] with my code; (I didn’t know how to write beta_mu[2:], so ‘over-writing’ (which isn’t what I did it seems) was the only thing I could think of

Calculating correlation within a model would look like this:

c = corr(a, b);

where a is a vector and b is a vector and corr() is a correlation function (we don’t have one built in, but we should—it’s on our to-do list).

What you’re doing is estimating the parameters of a multivariate normal density—that’s different than calculating correlations on a sample. It’s estimating covariance and you’re probably going to be using priors. The posteiror mean or mode isn’t going to correspond to the correlation of arguments (I think an MLE will get the actual correlation estimate if there’s no regularization, but I’m not enough of a math stats person to know that without going through a lot of calculations—it might not be true).

This kind of confusion is why we’re probably going to just get rid of the ~ notation—it’s leading to nothing but confusion among our users. All it does is increment the log density. The left-hand side has to already be defined. Data is read in externally, so that’s always defined. Parameters are provided by the estimation algorithm, so they’re always defined. Anyting else (transformed data, transformed parameters, local variables) must be explicitly defined with a value before being used on the left-hand side of the sampling statements.

thank you for taking the time to look at this again!

Now I’m completely confused - I thought [following the example on ‘latent dirichlet allocation’ in chapter 14.5 of the manual, section ‘correlated topic model’] that if I use a multi_normal, I can express the covariance as a function of the correlation


parameters {
vector[K] mu; // topic mean
corr_matrix[K] Omega; // correlation matrix
vector<lower=0>[K] sigma; // scales
vector[K] eta[M];
simplex[V] phi[K];
}
transformed parameters {
… eta as above …
cov_matrix[K] Sigma;
for (m in 1:K)
// logit topic dist for doc m
// word dist for topic k
// covariance matrix
Sigma[m, m] = sigma[m] * sigma[m] * Omega[m, m];
for (m in 1:(K-1)) {
for (n in (m+1):K) {
Sigma[m, n] = sigma[m] * sigma[n] * Omega[m, n];
Sigma[n, m] = Sigma[m, n];
} }
}

so here the important entry of the covariance matrix is:
sigma * sigma * Omega - where Omega is the correlation;
so I thought for my model that Omega measures the correlation between my measures of interest?

That needs to be a quadratic form sigma' * Omega * sigma. But it’s much more efficient and stable to do with a Cholesky-factor parameterization as described in the manual chapter on regression.

To use a simpler example, when you have a model defined for a parameter sigma by p(y | sigma) = normal(y | 0, sigma), and you estimate sigma, that’s not caclulating the sample standard deviation of y—it’s estimating a parameter. If you calculate var(y), then you’re calculating a sample variance, which itself may be viwed as an estimate of the variance of the data generating process based on observed data y (assuming the distribution has mean 0 for simplicity). If you divide by n-1 in the calculation of var(y), you get an unbiased estimate; if you divide by n you get the maximum likelihood estimate (which is biased).

Blockquote
That needs to be a quadratic form sigma’ * Omega * sigma.

yes, ok. I was a bit lazy with how i wrote it - in the code i’m doing it element-by-element…

So I’ve tried out now to use a simple script for correlations:

data{
  int nsub;
  vector[2] y[nsub]; // data
}

parameters{
  // Correlation for each variable
  vector[2] mu;
  vector<lower=0>[2] sigma;
  real<lower=-1,upper=1> r;  
}

transformed parameters{
  cov_matrix[2] T;
  // fill the covariance matrix
  T[1,1] = sigma[1]*sigma[1];
  T[1,2] = r * sigma[1] * sigma[2];
  T[2,1] = T[1,2];
  T[2,2] = sigma[2]*sigma[2];
}
   
model{ 
  // prior
  mu ~ normal(0,10);

  // Correlation
  y ~ multi_normal(mu,T);
}

And I’ve compared the ‘r’ that I get with doing a ‘normal’ Pearson Correlation, and what JASP says. and it is pretty much the same. So I guess for practical purposes, coding a correlation with multi_normal is a way to do it? (I’m sorry I did not quite understand what you said about estimating a parameter vs. calculating a sample standard deviation - I thought ‘estimating a parameter’ is the statistical measure I’m interested?)

But one last (hopefully) thing I’m wondering now in my original code - where I’m making a correlation between a ‘questionnaire measure’ for each participant that I’ve put in as data and a parameter that I’ve estimated for each participant (a regression weight from a regression model on a task that participants performed that is independent from the questionnaire measure) - the question I still have is whether adding in the questionnaire measure should affect (through the correlation) the estimate of the regression weight? Because the same mean and standard deviation feature in the equations for the regression and the correlation. The relevant lines from the code are where I use beta_mu and beta_sd to create regression weights (‘beta’) for each person. And where I use the same beta_mu and beta_sd for setting up the vector of means (‘combRegQuesMu’ - combination of regression weight (estimated) and questionnaire score) and the covariance matrix for the correlation (‘T’)

The issue here is that you’re estimating a model parameter rather than just calculating a sample correlation. For example, if we take a uniform prior on a chance-of-success parameter then estimate it using Bayesian methods, the posterior mean won’t be the same as the posterior mode. The MLE gives you the modal estimate and the Bayesian approach gives you the posterior mean or median as an estimate. The MLE is the only estimate here that matches the empirical frequency because the posterior will be a beta distribution and the mean, median, and mode are all different.

This is something you can evaluate yourself by trying it both ways. In general, if you use a hierarchical prior with correlation (as, for example, we suggest doing for multilevel models in the regression chapter of the manual), then yes, it will generally affect the coefficients—it will pull them to the population mean along the covariance matrix.

you are right, of course. there are small differences in the estimated values. I wonder whether it is the addition of a prior, or actually rather the fact that the same data (the regression weights) also features in the correlation. I’m thinking that this is maybe not what I’d want - I don’t really want the estimation of a correlation with one arbitrary questionnaire score to affect the estimate of the regression weights measured in an independent task (even though it is only a small effect). In this case, would the best thing be to make 2 separate Stan models? One that estimates the regression coefficients - and I then take e.g. the mean value (Across samples) for each person. To feed this regression weight for each person than into a separate Stan model that does the correlation with the questionnaire score. Why I was hoping to have a different solution is that now I’m loosing some information of the estimate of the regression weight for each person by only taking a single estimate, rather than the full distribution.

You almost never want to do that as it “cuts” the communication between the models. If you can, create a generative model that describes all the data jointly, then just fit that in one go.

Exactly why you want to build the joint model.

I completely agree. This is why I set out to build this joint model.
But, the big problem is that how people in psychology normally conceive of a behaviour of a task and questionnaire measures is that the task provides some kind of ‘objective measure’ (-> regression weight/ parameter estimate) of how people behave; and one wants to relate this then to subjective self report (the questionnaire). It feels weird in this framework to think that the parameter estimate from the task can be influenced by including the questionnaire score as a correlation with the regression weight.