Phylogenetic regression in Stan: can this model be improved?

Bob_Carpenter · September 27, 2024, 9:11pm

Hi, @aammd. Thanks for sharing everything here. I don’t see how this code could run without s and n being the same value, because x is an n-vector and it is being drawn multi-normal with a covariance matrix vcv_x that is s x s. I also see that the data simulation did not involve the variable s. It’s also unusual to have only a single observation for a multi-normal as it’s hard to infer things like variance. Are you going to use this in a situation where there would be more x and y values to sample? Otherwise, your priors are going to have a huge effect on the posterior with just a single data point. Imagine taking y ~ normal(mu, sigma), where y is only a single data point and mu and sigma have priors—it’s not enough data to get a good handle on either mu or sigma, so the prior has a huge effect.

The code was fine as you wrote it. I pulled out a factor in the covariance and added offsets and multipliers to move the normal distributions back closer to standard normal on the unconstrained scale. And I used compound declare and define to shorten things. I would’ve added the constraint cov_matrix to phyvcv but I didn’t have the patience to work out whether it’s required to be symmetric positive definite in order for the result to be—if you add enough to the diagonal, anything’s positive semi-definite.

data {
  int<lower=0> n;
  int<lower=0> s;
  vector[n] x;
  vector[n] y;
  matrix[s, s] phyvcv;
}
transformed data {
  vector[n] zero_vec = rep_vector(0, n);
}
parameters {
  real b0;
  real<offset=0.5, multiplier=0.5> b1;
  real sigma_x;
  real sigma_y;
  real<offset=3, multiplier=0.2> logit_lambda_x;
  real<multiplier=0.2> logit_lambda_y;
}
transformed parameters {
  real<lower=0, upper=1> lambda_x = inv_logit(logit_lambda_x);
  real<lower=0, upper=1> lambda_y = inv_logit(logit_lambda_y);
}
model {
  b0 ~ std_normal();
  b1 ~ normal(0.5, 0.5);
  sigma_x ~ exponential(1);
  sigma_y ~ exponential(1);
  logit_lambda_x ~ normal(3, 0.2);
  logit_lambda_y ~ normal(0, 0.2);
  matrix[s, s] vcv_x 
    = sigma_x^2 * add_diag(lambda_x * phyvcv, 1 - lambda_x);
  matrix[s, s] vcv_y
    = sigma_y^2 * add_diag(lambda_y * phyvcv, 1 - lambda_y);
  x ~ multi_normal(zero_vec, vcv_x);
  y ~ multi_normal(b0 + b1 * x, vcv_y);
}

A way you could make this faster is to parameterize the covariance matrix as a Cholesky factor. The multiplication is easy, but I’m not sure how to do the equivalent of add_diag.

For simplicity, I’d be inclined to just define lambda_x and lambda_y as parameters and give them beta priors that closely mimic your normal(3, 0.2) and normal(0, 0.2), both of which are unimodal once transformed with inverse logit.

Topic		Replies	Views
Implementing phylogenetic logistic regression for binary response variable in Stan Modeling techniques , specification	5	1247	March 27, 2019
Improving efficiency for dynamic coevolutionary Stan model Modeling	12	119	August 8, 2024
Difficulty using stan for phylogenetic comparative methods Modeling rstan , techniques , phylogenetic , brms	1	50	March 7, 2025
Phylogenetic signal for bernoulli families by brms brms fitting-issues , specification	13	1333	December 4, 2020
Trouble understanding difference between stan_lmer and a hand coded equivalent Modeling	3	504	February 22, 2018

Phylogenetic regression in Stan: can this model be improved?

Related topics