Frontier model needs (unnecessary?) restriction

Although this is a long post, my question is in the end simple: why do I need a <lower=0> statement in my stan model. Below I explain everything in detail. First there is a block with R code that should work as long as the packages are installed. The Stan model is in the R code as a string and is written to a tempfile. After the R code is the ‘real’ question part. Hopefully somebody can clarify what my problem is!

The problem that I try to solve is the same as the first example in Griffin and Steel (2007),
although I already modified all prior statements.

A working JAGS model is included at the end.

1. R code

library(frontier)
library(data.table)
library(AER)
library(rstan)

rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

## data is first 123 rows of electricity dataset

data("Electricity1970")
dtel <- as.data.table(Electricity1970)
dtel <- dtel[1:123,]
dtel[,Y:=log( cost/fuel )]
dtel[,LF:=log( labor/fuel )]
dtel[,KF:=log( capital/fuel )]
dtel[,Q:=log( output )]
dtel[,Q2:=log( output )^2]
dtel[,id:=1:123]
xmat <- as.data.frame(dtel[,.(Q,LF,KF,Q2,id)])
## base case

sfa( Y ~ Q + LF + KF + Q2, data = dtel )

## Stan model

stanmodel <- "data{
    int<lower=1> N;
    int<lower=1> N_id;
    real y[N];
    real Q[N];
    real LF[N];
    real KF[N];
    real Q2[N];
    int id[N];
    real rstar;
}
transformed data{
    real<lower=0> lambda0;
    lambda0 = -1.0 * log(rstar);
}
parameters{
    vector[N_id] u;
    real b1;
    real b2;
    real b3;
    real b4;
    real a;
    real<lower=0> lambda;
    real<lower=0> sigma;
}
model{
    vector[N] mu;
    sigma ~ cauchy( 0 , 1 );
    b4 ~ normal( 0 , 1 );
    b3 ~ normal( 0 , 1 );
    b2 ~ normal( 0 , 1 );
    b1 ~ normal( 0 , 1 );
    a ~ normal( 0 , 3 );
    lambda ~ exponential(lambda0);
    for ( j in 1:N_id ){
        u[j] ~ exponential(lambda);
    }
    for ( i in 1:N ){
        mu[i] = a + b1 * Q[i] + b2 * LF[i] + b3 * KF[i] + b4 * Q2[i] + u[id[i]];
    }
    y ~ normal( mu , sigma);    
}"

tmpf=tempfile()
writeLines(text=stanmodel, con=tmpf)

initF <- function(){
    list(u=rep(1,123), a=-10, b1=1, b2=1, b3=1, b4=1, lambda0=10, lambda=10,     sigma=1)
}
staneldat <- list(
    rstar = 0.875,
    y = dtel$Y,
    Q = xmat$Q,
    LF = xmat$LF,
    KF = xmat$KF,
    Q2 = xmat$Q2,
    id = xmat$id,
    N_id = 123,    
    N = 123)
fit <- stan(file=tmpf,
            data=staneldat,
            init=initF,
            chains=3,
            iter=2000)

2. The Question(s)

The problem is as follows. In the current specification is the line

vector[N_id] u;

when i run the model I get many errors like:

Exception thrown at line 36: exponential_log: Random variable is -0.000101942, but must be >= 0!     1

line 36 is:

u[j] ~ exponential(lambda);

Question
I really don’t understand why I get the error, there are no restrictions on u. U is just exponential draws with a parameter that also has an exponential prior. In the end u is used as a component in mu, so why does the estimation break down (n_eff is very low)? Could the problem be in the initialization? I doubt it, at some point I put print statements in the code and did not see any negative u values.

I resolved the issue by:

vector<lower=0>[N_id] u;

Now I only get a very occasional error:

Exception thrown at line 36: exponential_log: Inverse scale parameter is 0, but must be > 0!     4

although the <lower=0> solution works I don’t like this because I have no clue why this works. Could somebody please explain to me:

  • Is putting the <lower=0> statement a correct thing to do and if so, why do I need it?
  • Can my problem be resolved without adding <lower=0>

3. Working JAGS model

library(R2jags)
jagdat <- list(y = dtel$Y,
               X = as.data.frame(dtel[,.(Q,LF,KF,Q2,id)]),
               p = 4,
               N = 123,
               K = 123,
               rstar = 0.875)
jinit <- function(){
    list(alpha=-8,
         lambda=10,
         beta=c(1,1,1,1),
         u=rep(1,123),
         prec=1)}
jagmodel <- "model {	
	for (i in 1:N) {
		u[i] ~ dexp(lambda)
		eff[i] <- exp(- u[i])
	}
	for ( k in 1:K ) {
		firm[k] <- X[k, p + 1]
		mu[k] <- alpha + u[firm[k]] + inprod(beta[1:p], X[k, 1:p])
		y[k] ~ dnorm(mu[k], prec)
	}
	lambda0 <- -log(rstar)
	lambda ~ dexp(lambda0)
	alpha ~ dnorm(0.0, 1.0E-06)
	for (i in 1:p) {
		beta[i] ~ dnorm(0.0, 1.0E-06)
	}
	prec ~ dgamma(0.001, 0.001)
	sigmasq <- 1 / prec
}"
totempf <- function(modelstr){
    tmpf=tempfile()
    writeLines(text=modelstr, con=tmpf)
    return(tmpf)
}
tmpf <- totempf(jagmodel)
jagsfit_basic <- jags.parallel(
    model.file = tmpf,
    parameters.to.save = c("alpha","beta",
                           "prec","lambda","sigmasq"),
    data = jagdat,
    inits = jinit,
    n.chains = 3,
    n.thin = 1,
    n.iter = 60e3)

Yes. You need it because otherwise NUTS can and will venture into negative territory on the u dimensions, in which case the gradient is not defined.

No.

Thank’s so much Ben! I have two short followup questions:

  1. Does this then hold for any variable storing draws from a non-negative probability distribution like the exponential? And does it make sense putting the <lower=0> restrictions preventively?

  2. I have not encountered the problem before, although I worked through McElreath’s Rethinking book and tried quite some stuff. Could it be because in this particular case lots of draws occur close to zero, giving NUTS plenty of opportunity to go to the negative side?

Yes

It does not make sense to not put the <lower=0> bound when the variable must logically be non-negative.

It is not particular to this case.

Thanks Ben!!

To rephrase @bgoodri’s response,

u ~ exponential(lambda);

does not impose a positivity constraint on u. If u is not defined to be positive with a <lower=0> then the Stan algorithms will try to explore negative values for which the corresponding exponential density is ill-defined.

Thanks for the additional remark. In any case it is clear to me that I will have to learn more about the underlying algorithms to prevent problems in the future.

The point isn’t algorithm specific. The issue is that every set of values for the parameters that meets the declared constraints should have a positive density (equivalently, finite log density).

Stan transforms all constrained parameters to be unconstrained so that all of the algorithms can work on unconstrained parameters (i.e., ones that have support on all of R^N). Given the vagaries of floating-point arithmetic, we’ll never truly achieve that goal because of underflow and other numerical issues.

When your Stan program satisfies this constraint, it provides a model with support on the entire unconstrained parameter space so none of the algorithms fall into undefined territory.

I’m about to write a FAQ, and this may very well be entry number one. We say it all over the manual, but people tend to work from examples rather than reading the manual.