Problem recovering generated predictions for box-cox model

Hello,

I am trying to specify a box-cox model and generate predictions in the original scale.

I get several errors with this message: The following variables have undefined values
The problem is in this line: y_pred[n] = (normal_rng(alpha + log(x[n])*beta, sigma)* lambda + 1)^(1/lambda);

What am I doing wrong? Toy data here: https://github.com/sdaza/lambda/tree/master/data
Thanks!

data {
    int<lower=0> N;
    real x[N]; 
    real y[N]; 
}

parameters {
    real alpha;
    real beta;
    real lambda;
    real<lower=0> sigma;
}

model {
    lambda ~ normal(0,2);
    beta ~ normal(0,2);
    alpha ~ normal(0,2);

  if(lambda == 0){
        for(i in 1:N) {
                log(y[i]) ~ normal(alpha + log(x[i])*beta, sigma);
                target += -log(y[i]);
        }
    } else {
        for(i in 1:N){
                (y[i]^lambda - 1)/lambda ~ normal(alpha + log(x[i])*beta, sigma);
                target += (lambda - 1)*log(y[i]);
            }
        }
    }

generated quantities {
   real y_pred[N];
   real log_lik[N];

for(n in 1:N){
       if(lambda != 0){
          y_pred[n] = (normal_rng(alpha + log(x[n])*beta, sigma)* lambda + 1)^(1/lambda);
          log_lik[n] = normal_lpdf((y[n]^lambda - 1)/lambda | alpha + log(x[n])*beta, sigma) + (lambda-1)*log(y[n]);
        } else {
           y_pred[n] = exp(normal_rng(alpha + log(x[n])*beta, sigma));
           log_lik[n] = normal_lpdf(log(y[n]) | alpha +  log(x[n])*beta, sigma) - log(y[n]);
     }
     }
}

Hmm something funky is going on here. The generated quantities is generating NaNs in the output and Rstan is complaining that nothing was assigned (by default variables declared in generated quantities are NaN). To debug this, throw in some print statements in your generated quantities:

      y_pred[n] = (normal_rng(alpha + log(x[n])*beta, sigma)* lambda + 1)^(1/lambda);
      if(is_nan(y_pred[n])) {
        print(alpha);
        print(log(x[n]));
        // ... whatever
      }
1 Like

There are some cases in which the normal_rng function result in values that after transforming end being NAs.

Using these values (from printing some iterations):

alpha = -1.26999
logx = 7.90211 
beta = 0.101912
lambda = 1.12192
sigma = 0.316502


a = rnorm(100, alpha + beta*logx, sigma)
df = data.frame(value = a, tf = (a * lambda + 1)^(1/lambda))
df[is.na(df$tf),]

value  tf
9  -1.1070863 NaN
10 -1.1268851 NaN
16 -1.1499958 NaN
35 -1.2567046 NaN
38 -0.9276948 NaN
39 -0.9124289 NaN
42 -0.9351521 NaN
59 -1.1140359 NaN
89 -1.4590619 NaN

I should limit in some way the lower value I get from the random function.

Any suggestions?

Thanks!

Oh, it’s trying to raise a negative number to a power less than one, which I assume is a complex number and everything blows up.

I’m out of my element on the Box-Cox thing. Does this post help at all: User written Box-Cox type transformation function?

It looks like in those models they have avoided doing the posterior predictives that you’re using. Maybe for the same reason. Does this seem familiar to you @Stephen_Martin?

Oh btw to specifically limit the output of the rng – you can just do rejection sampling. It’s a bit crude unfortunately.

Thanks! Yes, I understand. I am trying to find a good solution. I plan to combine several models (stacking) where box-cox is one of them, so I would need predictions to weight them and use them later.

I hit a similar issue. Here’s what I basically did:

data {
	int N;
	vector[N] y;
	int K;
	matrix[N,K] X;
}

parameters {
	vector[K] beta;
	real lambda;
	real<lower=0> sigma;
}

model {
	lambda ~ normal(0,2);
	beta ~ normal(0,2);
	sigma ~ cauchy(0,5);

	if(lambda == 0){
		for(i in 1:N){
			log(y[i]) ~ normal(X[i]*beta,sigma);
			target += -log(y[i]);
		}
	} else {
		for(i in 1:N){
			(y[i]^lambda - 1)/lambda ~ normal(X[i]*beta,sigma);
			target += (lambda - 1)*log(y[i]);
		}
	}
}
generated quantities {
  vector[N] log_lik;
  vector[N] yrep;
  for(n in 1:N){
    if(lambda != 0){
      log_lik[n] = normal_lpdf((y[n]^lambda - 1)/lambda | X[n]*beta, sigma) + (lambda-1)*log(y[n]);
      yrep[n] = normal_rng(X[n]*beta, sigma); 
    } else {
      log_lik[n] = normal_lpdf(log(y[n]) | X[n]*beta, sigma) - log(y[n]);
      yrep[n] = normal_rng(X[n]*beta, sigma);
    }
  }
}
'

Then outside of stan, in R, I used:

bcSamp <- as.matrix(bcOut,pars=c('yrep','lambda'))
bcYrep <- t(apply(bcSamp,MARGIN = 1,function(x){
  (x[1:300]*x[301] + 1)^(1/x[301]) - 1 + stan_data_t$minY
}))

Mind you, I had done a “x - min(x) + 1” transform prior to using the stan model, hence the extra little “-1 + stan_data_t$minY” tidbit.

That seems to have worked.
The gist of it is - Fit the Box-cox model; obtain random draws ON THE BC-TRANSFORMED scale; then in R un-transform those yreps per-iteration. It handles those problems a bit better. You may still get some Inf/NaN problems, mind you, but such is life.

1 Like

Thanks.

That lambda == 0 case shouldn’t arise unless you explicitly initialize at zero because it’s a measure zero event (not quite with floating point, but close enough). Were you running into problems with that otherwise?

It’s worth doing the power calculation in a loop, then you can vectorize the matrix product and normal; and you can pull the Jacobian out (I’m assuming that’s what it is—I always have to work powers out from first principles) into a single vectorized operation:

{
  vector[N] y_pow;
  for (n in 1:N) y_pow[n] = y[n]^lambda;
  (y_pow - 1) / lambda ~ normal(X * beta, sigma);
  target += (lambda - 1) * sum(log(y));
}

We really need to vectorize ^. It’s an easy first project for someone wanting to dive into the language and math lib.

I’m sure the lambda == 0 case never actually occurs, it was included to be “correct”. Basically, that code was being used for a simulation paper - Correctness was valued over efficiency, b/c I wanted to avoid a reviewer saying “that BC transform is incomplete! What about lambda = 0’s case?”

And you’re right; the code could be made more efficient. Again - correctness (and readability) were valued over efficiency in this particular case.

Reviewer code. Got it :-) Thanks for the explanation.

In general, writing code to be readable is good. I only try to optimize when I need the speed.

Thanks @Stephen_Martin and @Bob_Carpenter. I was using a much more crude and probably wrong solution: just to skip those predictions with no valid values and assign a value after 5000 iterations. The uncertainty of the estimates, specially at the bottom of the distribution, becomes weird:

         i = 1;
         while ((is_nan(y_pred[n])) && (i <= 5000)) {
            y_pred[n] = ((normal_rng(alpha + log(x[n])*beta, sigma)) * lambda + 1)^(1/lambda);
            i = i + 1;
         }

         if (is_nan(y_pred[n])) {
           y_pred[n] = 0; // very crude solution!
         } 

I have further questions.

  • Does the way you compute log_lik make possible to compare with models that do not change the dependent variable using, for instance, WAIC or LOO?
  • I don’t get well your definition of target. Could you give some hints on that?

Thank you!
Sebastian

You need to stick to the _lpdf and _lpmf form so as not to drop constants. Otherwise, just be careful about what’s in the likelihood.

After the vector change of variables (y_pow - 1) / lambda, it applies the Jacobian for the whole vector all at once.

1 Like