Lognormal_rng return negative values and STAN is silent about it

NTorabi · February 15, 2022, 7:52pm

Hi,
I have a model for semi continuous positive data where the output values could be exactly zero or a positive real value. I got great advices from this post.
For my prior predictive check, I have the following code which I use lognormal_rng to get samples from. The problem is that lognormal_rng returns negative values for y_pred (predictions over test data) which is strange. I tried forcing the mean of the lognormal to be strictly non-negative because I saw that sometimes its mean is negative (but STAN is silent about that and does not throw an error). So, why negative values? what are the best practices to avoid overflow problem that leads to negative samples from?
I really appreciate if someone please help me debug the issue. Thanks and here is the relevant code:

    data {
      int<lower=1> N; // number of observations
      int<lower=1> N_pred; 
      int<lower=1> I; // number of predictors
      matrix[N_pred,I] X_test; // matrix of predictors
    }


    transformed data {
      vector[N_pred] ones_N_pred = rep_vector(1, N_pred);
      matrix[N_pred,I+1] X_test_intercept = append_col(ones_N_pred, X_test);
    }
    

    transformed parameters {
      vector[I+1] beta_coeff = mu + beta_coeff_raw;
    }

    parameters {
      real mu;          
      vector[I+1] beta_coeff_zi; // intercept and predictor coefficients
      real<lower=0> sigma; // scale parameter
      vector<lower=-mu> beta_coeff_raw[I+1]; //Implies lower=0 on beta_coeff to make the mean of lognormal positive
    }
  

    model {
      beta_coeff_raw ~ normal(0, 1);
      beta_coeff_zi ~ normal(0, 1);
      sigma ~ exponential(1);

    }

    generated quantities {
      vector[N_pred] y_pred = to_vector((lognormal_rng(X_test_intercept * beta_coeff, sigma)));
      vector[N_pred] labels = to_vector(bernoulli_rng(inv_cloglog(X_test_intercept * beta_coeff_zi)));
      y_pred = y_pred .* labels;
   }

Here is the histogram of y_pred[y_pred !=0] which clearly shows many negative values from lognormal_rng
Screen Shot 2022-02-15 at 11.43.49 AM

I also tried using normal_rng instead of lognormal_rng and then use y_pred[y_pred !=0] = exp(y_pred[y_pred !=0]) but I face the overflow issue.

PS: Here is the input to Stan code as a form of DataFrame.

mike-lawrence · February 16, 2022, 2:14pm

Can you show a histogram of the labels variable?

NTorabi · February 17, 2022, 1:21am

@mike-lawrence Screen Shot 2022-02-16 at 5.20.34 PM

Topic		Replies	Views
Negative numbers causes Bayesian Model to fail Modeling	5	1216	July 26, 2017
Prior Predictive Check From Tutorial Modeling fitting-issues , prior-predictive	1	487	June 6, 2022
Bug? Poisson rng is generating negative numbers? Modeling bug	9	1174	December 14, 2018
Stan doesn't like (near-) zeros? Modeling pystan	10	4126	November 17, 2021
Neg_binomial_2_log_rng overflow Modeling	12	368	March 28, 2024

Lognormal_rng return negative values and STAN is silent about it

Related topics