Retrieving sigma coefficients when sigma comes from a function

Hello,

I would like to retrieve the sigma parameters using a distributional model with brms. The data is generated as:

b0 = 0.8
b1 = 0.2
a = 0.5
h2 = 0.4
h = sqrt(h2)
e = sqrt(1-h^2)

E = rnorm(1000, 0, 1)
N = length(E)
g = rnorm(N,0,1)

sigma = sqrt((b0*e + b1*e*E)^2)
y = a*E + b0*h*g + b1*h*E*g + rnorm(N,0,sigma)
dat = data.frame(y, g, E)

print(paste0("b0*e = ", round(b0*e, 3), " and b1*e = ", round(b1*e, 3))) 
"b0*e = 0.62 and b1*e = 0.155"

This is the model I am running but it has the problem of negative values for the scale parameter:

f = bf(y ~ g + E + g * E, sigma ~ 1 + E)
fit = brm(f, data = dat, family = brmsfamily("gaussian", link_sigma = "identity"), chains = 1)
summary(fit)

If I try the non-linear approach the values are right but still got the scale issue:

f = bf(y ~ g + E + g * E) + 
    nlf(sigma ~ sqrt(vest)^2) + 
    lf(vest ~ 1  + E)

fit = brm(f, data = dat, 
    family = brmsfamily("gaussian", link_sigma = "identity"), chains = 1, 
      prior = c(prior(student_t(7, 0, 8), class='b', coef = 'E', nlpar = 'vest')))

SAMPLING FOR MODEL '8714549d61cef2c62abe24dcb0310027' NOW (CHAIN 1).
Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: normal_lpdf: Scale parameter[1] is nan, but must be > 0!  (in 'model7d783aa92632_8714549d61cef2c62abe24dcb0310027' at line 42)
...

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ g + E + g * E 
         sigma ~ sqrt(vest)^2
         vest ~ 1 + E
   Data: dat (Number of observations: 1000) 
Samples: 1 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 1000

Population-Level Effects: 
               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept          0.00      0.02    -0.04     0.04 1.00      932      749
vest_Intercept     0.61      0.01     0.58     0.64 1.00      785      746
vest_E             0.16      0.01     0.14     0.17 1.00      813      546
g                  0.53      0.02     0.49     0.56 1.00      722      615
E                  0.50      0.01     0.47     0.52 1.00      844      765
g:E                0.13      0.01     0.11     0.16 1.00      711      813

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

For sure I am making a silly mistake. Any suggestions on how to specify the model to recover b0*e and b1*e for sigma?

Thank you in advance!

1 Like

The model above would be equivalent to the Stan code below, still I don’t know how to avoid the square root of negative values in order to get the coefficients defining sigma.

// generated with brms 2.15.0
functions {
}
data {
  int<lower=1> N;  // total number of observations
  vector[N] Y;  // response variable
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  int<lower=1> K_sigma;  // number of population-level effects
  matrix[N, K_sigma] X_sigma;  // population-level design matrix
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  int Kc = K - 1;
  matrix[N, Kc] Xc;  // centered version of X without an intercept
  vector[Kc] means_X;  // column means of X before centering
  int Kc_sigma = K_sigma - 1;
  matrix[N, Kc_sigma] Xc_sigma;  // centered version of X_sigma without an intercept
  vector[Kc_sigma] means_X_sigma;  // column means of X_sigma before centering
  for (i in 2:K) {
    means_X[i - 1] = mean(X[, i]);
    Xc[, i - 1] = X[, i] - means_X[i - 1];
  }
  for (i in 2:K_sigma) {
    means_X_sigma[i - 1] = mean(X_sigma[, i]);
    Xc_sigma[, i - 1] = X_sigma[, i] - means_X_sigma[i - 1];
  }
}
parameters {
  vector[Kc] b;  // population-level effects
  real Intercept;  // temporary intercept for centered predictors
  vector[Kc_sigma] b_sigma;  // population-level effects
  real Intercept_sigma;  // temporary intercept for centered predictors
}
transformed parameters {
}
model {
  // likelihood including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N] mu = Intercept + Xc * b;
    // initialize linear predictor term
    vector[N] sigma = Intercept_sigma + Xc_sigma * b_sigma;
    for (n in 1:N) {
      // inverse function
      sigma[n] = sqrt(sigma[n])^2;
    }
    target += normal_lpdf(Y | mu, sigma);
  }
  // priors including constants
  target += student_t_lpdf(Intercept | 3, 0, 2.5);
  target += student_t_lpdf(Intercept_sigma | 3, 0, 2.5);
}
generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept - dot_product(means_X, b);
  // actual population-level intercept
  real b_sigma_Intercept = Intercept_sigma - dot_product(means_X_sigma, b_sigma);
}
1 Like

I’ve only skimmed your Stan code, but it appears to me that you want

sigma[n] = sqrt(sigma[n]^2);

instead of

sigma[n] = sqrt(sigma[n])^2

Even if I’ve got that right, it may not fix your problem, but it might be a start.

Kent

Thanks for your reply, @kholsinger !

The model below using brms gives me a good approximation of the coefficients that define sigma (vest):

f1 = bf(y ~ g + E + g * E) + 
    nlf(sigma ~ (sqrt(vest))^2) + 
    lf(vest ~  1 + E)


Population-Level Effects: 
               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept          0.04      0.02    -0.00     0.08 1.00     7694     6459
vest_Intercept     0.60      0.01     0.58     0.63 1.00     7736     5898
vest_E             0.15      0.01     0.13     0.17 1.00     6958     6162
g                  0.49      0.02     0.45     0.53 1.00     7650     5821
E                  0.53      0.02     0.50     0.56 1.00     7592     5897
g:E                0.13      0.02     0.09     0.16 1.00     7499     6472

But I get these messages:

Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: normal_lpdf: Scale parameter[1] is nan, but must be > 0!  (in 'model65de1eb63ec1_a2b8f3797cffb15953b3bf0a1f419319' at line 42)

If I use:

f2 = bf(y ~ g + E + g * E) + 
    nlf(sigma ~ sqrt(vest^2)) + 
    lf(vest ~  1 + E)

Population-Level Effects: 
               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept         -0.74      0.34    -1.12    -0.39 3.04        2       11
vest_Intercept     1.24      2.00    -0.34     6.25 3.04        2       11
vest_E            -3.25      3.82   -13.38    -0.49 3.04        2       11
g                  0.11      1.55    -1.44     1.66 3.04        2       11
E                 -0.15      0.52    -0.67     0.37 3.03        2       11
g:E                1.69      0.26     1.43     1.96 3.04        2       11

The coefficients are way off. As far as I understand, the sigma function should include the inverse function.

Thank you again!
Sebastian

Hi,
I think the main problem is probably this:

This is equivalent to taking the absolute value of vest, hower, absolute value is a problem for Stan as it makes the posterior not smooth (the derivative is not continuous).

I understand you use it to be able to work with the “identity” link, but what is the reason you don’t want to use the default “log” link (which is used precisely because it avoids those issue)? Maybe understanding the motivation will help us find a better solution for you.

Alternatively you can use the “softplus” link (Rectifier (neural networks) - Wikipedia), which behaves almost like identity for positive values of the linear predictor.

Best of luck with your model!

1 Like

Thanks, @martinmodrak. I am trying to verify and reproduce a test of the proportional contribution of h and e in the variability of an unobserved y. The purpose of the model is to test the proportional contribution of h and e using this hypothesis:

hyp <- "g * vest_E = g:E * vest_Intercept"

Using the data generated below, the difference between the g * vest_E = g:E * vest_Intercept should be zero.

b0 = 0.8
b1 = 0.2
a = 0.5
h2 = 0.4
h = sqrt(h2)
e = sqrt(1-h^2)

E = rnorm(1000, 0, 1)
N = length(E)
g = rnorm(N,0,1)

sigma = sqrt((b0*e + b1*e*E)^2)
y = a*E + b0*h*g + b1*h*E*g + rnorm(N,0,sigma)
dat = data.frame(y, g, E)

"b0*e = 0.62 and b1*e = 0.155"
"G = b0*h = 0.506 and GxE = b1*h = 0.126"
"diff1 = 0.816 and diff2 = 0.816"

If I use the default log link, the scale of the coefficients for sigma is multiplicative (not additive), and I cannot recover the coefficients b0*e and b1*e that define sigma and then perform the test. I can use the exp in sigma to generate the data, and the log link to recover the coefficients, but that changes a bit the data generation I am trying to test (btw, it’s not my model, I am just testing it and the scale of the model).

So I think the model as you present it is very weird. For example for fixed b1 increase in E leads to increase in sigma for some observations and decrease in sigma for others, just to demonstrate this with the values you have:

b0 = 0.8
b1 = 0.2
h2 = 0.4
h = sqrt(h2)
e = sqrt(1-h^2)
E = c(1, - 4.5)
sqrt((b0*e + b1*e*E)^2)
#> [1] 0.77459667 0.07745967
sqrt((b0*e + b1*e*(E + 0.5))^2)
#> [1] 0.8520563 0.0000000

So with the identity link + absolute value, the coefficients for sigma are very hard to interpret unless you can make sure that the linear predictor is never smaller than 0 (e.g. by constraining all coefficients AND predictors to be > 0). The softplus link also provides a similar effect where it makes the result always positive, but “softly”.

You also assume the coefficients for sigma and y are shared, which is currently not supported by brms and also makes IMHO little sense - why would something have exactly the same effect on mean and variance? (if there is a reason for this, then maybe better understanding the context would help to see how one can handle this better)

So even if we find a way to make the model work as you’ve written it, I think any result from such a model would be very likely meaningless. If you are 100% sure you want to use the model exactly like that, we can think of ways to make it work, but I would really first stop and ask why are we using such a model and think about alternatives that would make more sense and likely be also easier to fit.

1 Like

Thanks for taking the time to review this, @martinmodrak!

Let me go into details. This is a scaling model for gene x environment (GxE) interactions and a test to check if the GxE interaction you get from your model comes from a scaling model where the proportion variation of the phenotype remains constant across E.

Here the generative scaling model (assuming all the variables have variance 1 and mean 0). We have an unobserved phenotype (G = genetic info):

Y^*_i = h G_i + e \epsilon_i

Where e = \sqrt{1 - h^2}, and \epsilon_i is an unobserved error term.

The observed Y takes the form (E = environment):

Y_i = a_0 + a_1 E_i + (b_0 + b_1 E_i) Y_i^*
Y_i = a_0 + a_1 E_i + b_0hG_i + b_1hE_iG_i + b_0e\epsilon_i + b_1e E_i\epsilon_i

The model to run with real data would be (this is what I am trying to estimate using brms):

Y_i = \tau_0 + \tau_1 E_i + \pi_0G_i + \pi_1E_iG_i + \lambda_0\epsilon_i + \lambda_1 E_i\epsilon_i

Using the coefficients above, I can run a test to check if my empirical model comes from the scaling where the contribution to the phenotype remains constant across E:

\frac{\pi_0}{\lambda_0} = \frac{\pi_1}{\lambda_1} = \frac{b_0h}{b_0e} = \frac{b_1h}{b_1e}
H_0 : \pi_0\lambda_1 - \pi_1\lambda_0 = 0

Sorry, I dropped the ball a bit on this. So rewriting the model a bit just for my own clarity:

Y_i \sim N(\mu_i, \sigma_i) \\ \mu_i = \tau_0 + \tau_1 E_i + \pi_0G_i + \pi_1E_iG_i \\ \sigma_i = \lambda_0+ \lambda_1 E_i

Some things to note:

  • if E_i is a discrete predictor (as appears to be the case in the paper you linked), you just end up having a different fixed variance for each level of E. So changing link function to log still allows each level to have its own variance. The only thing that changed is the structure of the prior, but that can IMHO be easily adjusted to not be very different than when assuming identity link. If E is to be a smooth predictor than you can’t match it exactly - and as I said above you can get some very weird behaviour from such a model.
  • If you allow \lambda_0, \lambda_1 < 0 the model is implicitly multimodal - flipping signs on both lambdas gives you IMHO exactly the same likelihood. Weirdly enough the paper reports fitted coefficients for some lambdas less than zero, so they are probably not worried about this - possible precisely because they only have discrete (categorical) E.

The hypothesis test as you envision it will have its own challenges, but let’s keep that for later.

So the big question is: do you need continuous E? If not, just use the log link. If yes, I think the softplus link is going to give you behaviour quite close to the paper in the model while hopefully not breaki ng brms.

Hope that helps at least a bit.

Thanks again, @martinmodrak. In principle, E_i can be continuous or discrete. In their simulation, though, E_i is continuous. If I use the softplus link (based on my analysis), I can’t reproduce the test they propose because of the scale of the coefficients is different (multiplicative vs additive).

The key question for me would be: let’s assume I get some data and estimate the model using a softplus link. Then I try to use the test they propose to discard the scaling model is generating the GxE coefficient, but because the scale of the coefficients is different, the test seems useless. This would be independent whether the scaling model is or not a good model to test (that’s is my actual research question).

For now, I just changed the generation data definition so that the log link works and I can reproduce the scaling test:

sigma = exp(b0*e + b1*e*E) 
y = rnorm(N, a0 + a1*E + b0*h*G + b1*h*E*G, sigma)

Thank you so much!

2 Likes