Negative numbers causes Bayesian Model to fail

Hi everyone!

I am running a Bayesian Hierarchical model and one of my explanatory variables has multiple negative numbers in the data. When I run my rstan model, I receive an error saying it produced NaNs, and I know rstan does not like NA values in data.

Is there any way to still run my model with negative values, or is there a way to transform my variable without losing actual data?

Thank you to all who replies!

It would help to know what your model is. For example, if a model were to raise a number to a non-integral power, that would cause a problem. There are of course other ways that a negative number would be a problem.

Hi jjramsey,

Thank you for the information on C++. My model uses a log transformation is to make the distribution mostly normal. As shown in the examples http://www.cplusplus.com/reference/cmath/log/, some of my x values are negative, which gives the similar error:

input.to.stan ← stan.input()
Warning message:
In log(in.data[, y.col]) : NaNs produced

Because of this, my fit1 and fit2 do not run due to this error:

fit1 ← stan(model_code=input.to.stan$model, data=input.to.stan$data,
init=input.to.stan$inits,chain=0)
Error in FUN(X[[i]], …) : Stan does not support NA (in y) in data
failed to preprocess the data; sampling not done

Due to this problem, is there any other method to run my model modified with the same negative values?
I hope this helps!

In that case, you can’t use a log transform of your data (a requirement for log-transforms to work is that the data is strictly non-negative).

There are some other options you could use, like log( const + data), where const is big enough to make the sum non-negative, but that’s hard to justify.

It might be easier to model the data on its natural scale and work out a sensible (non-normal) likelihood.

Hi Daniel_Simpson,

Thank you for your reply. I did remove the log transformation and modeled my data based on its natural scale and it worked well.
I would like to follow up however, with the advantages of modelling using the log transformation vs natural scale? Other than more normal distribution of my data for log transformations, compared to modelling negative numbers for natural scale, are there any other advantages?

Thank you again!

You really want to think about the generative process and the noise scale. If you have values that need to be positive for some reason, then you can log transform to an unconstrained scale and often find that the log transform is normal whereas the original values weren’t.

The main difference is that the lognormal has multiplicative error (error is proportional to value), whereas the standard normal has additive error that’s independent of the current value.