Odd values from student_t_rng

I am running the model shown below to fit longitudinal data with multivariate skew-normal random effects and t-distributed errors. When I extract the random generated samples (y_tilde) in order to examine the posterior predictive distribution, I am finding some outlandish values and I am curious what might be causing this. The model is converging and there are no indication of divergent transitions or any other issues. However, for some data points I am getting random samples well beyond what would be expected. For instance, for one observation, the posterior predictive distribution given the specific covariate values and estimated random effects is:

mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
22.90 0.09 8.91 8.10 18.49 22.86 27.27 38.38 10368.83 1.00

However, for one iteration the randomly sampled value is -398.6698. At this iteration, the y_hat = 21.9732, nu = 3.328337, and sigma_e = 4.754296. Even using the rt function in R, the minimum from 100,000 random samples (i.e., min(y_hat + rt(1e5, nu)*sigma_e)) is only -121.6858.

Any ideas as to what the issue might be?

Thank you all immensely for your help!

    data{
      int<lower = 1> N;                       // Number of patients
      int<lower = N> Total;                   // Total number of observations
      int<lower = 1, upper = N> ID[Total];    // Patient ID for each observation
      vector<lower = 0>[Total] y_obs;         // Observed response
      vector[Total] Time;                     // Time of longitudinal measurement
      vector[Total] Cov1;                     // Covariate 1
      vector[Total] Cov2;                     // Covariate 2
      vector[Total] Cov3;                     // Covariate 3
      vector[Total] Cov4;                     // Covariate 4
    }
    parameters{
      real<lower = 0> nu;                     // Degress of fredoom                  
      real beta0_raw;                               
      real beta1_raw;                               
      real beta2_raw;
      real beta3_raw;
      real beta4_raw;
      real beta5_raw;
      real beta6_raw;
      vector[2] alpha_raw;                    // Skewness parameter
      matrix[N, 2] b_i_raw;                   // Random effects
      real<lower = 0> sigma_e;                // Std. dev. of residual
      vector<lower = 0>[2] sigma_b;           // Std. dev. of random effects
      cholesky_factor_corr[2] L_b;            // Cholesky factor of random effects
    }
    transformed parameters{
      vector[Total] y_hat;                    // Expected value
      real beta0;
      real beta1;
      real beta2;
      real beta3;
      real beta4;
      real beta5;
      real beta6;
      vector[2] alpha;
      vector[2] b_i[N];
      alpha = [10, 0]' + [10, 5]' .* alpha_raw;
      for(n in 1:N){
        b_i[n] = sigma_b .* (L_b * (b_i_raw[n]'));
      }
      beta0 = 10*beta0_raw;
      beta1 = 10*beta1_raw;
      beta2 = 5*beta2_raw;
      beta3 = 5*beta3_raw;
      beta4 = 5*beta4_raw;
      beta5 = 5*beta5_raw;
      beta6 = 5*beta6_raw;
      for(n in 1:Total){
        y_hat[n] = b_i[ID[n], 1] + beta0 +  
                   (b_i[ID[n], 2] + beta1)*(Time[n]) + 
                   beta2*Cov1[n] + 
                   beta3*Cov2[n] + 
                   beta4*Cov3[n] + 
                   beta5*Cov4[n] + 
                   beta6*(Cov3[n] * Time[n]);// + 
      }
    }
    model{
      L_b ~ lkj_corr_cholesky(1);
      nu ~ lognormal(0, 1);
      beta0_raw ~ normal(0, 1);
      beta1_raw ~ normal(0, 1);
      beta2_raw ~ normal(0, 1);
      beta3_raw ~ normal(0, 1);
      beta4_raw ~ normal(0, 1);
      beta5_raw ~ normal(0, 1);
      beta6_raw ~ normal(0, 1);
      sigma_b ~ lognormal(0, 1);
      sigma_e ~ lognormal(0, 1);
      alpha_raw ~ normal(0, 1);
      to_vector(b_i_raw) ~ normal(0, 1);
      for(n in 1:N){
        target += normal_lcdf(dot_product(alpha, b_i_raw[n]) | 0, 1);
      }
      for(n in 1:Total){
        y_obs[n] ~ student_t(nu, y_hat[n], sigma_e);
      }
    }
    generated quantities{
      vector[Total] y_tilde;
      for(i in 1:Total){
        y_tilde[i] = student_t_rng(nu, y_hat[i], sigma_e);
      }
    }

t_{3.3} is very heavy tailed. I don’t think this is unexpected. If this is actually implausible in your modelling scenario, then you have extra information about nu that you should be using (i.e. change the lognormal(0, 1) prior for something more appropriate. At the moment your lognormal(0, 1) prior is saying that (a priori) 50% of the time the distribution for y_obs /
y_tilde has no mean, and ~75.5% of the time it has no second moment (no variance).

Run this a few times to see this is not a stable estimate for this sample size.

1 Like

Thank you very much! And you are correct about increasing the number of random samples using the rt function; even bumping it up to a cool million I find a very different minimum value. I apologize for not being more careful in my testing. Thanks again for your help and time.

1 Like