Case Study: Stochastic Production and Cost Frontiers

I’m using Stan for some different projects and decided that it might be useful to write out the basic stochastic frontier models as a case study. These are “composed error models” and used for efficiency analysis in econometrics. The models are not that complicated but I would appreciate any feedback you can provide. In particular, is there a more elegant way of combining a user-defined function and using the loo package? I’ve found Stan to be a useful piece of software for my work, but I’m never quite sure that I’m using best practices.

Check the repository for the case study at
https://bitbucket.org/aframsey/stancase_frontier/src/master/
or view a PDF of the final output. The PDF is a little wonky.
spfstan.pdf (144.4 KB)

If there’s interest in this case study, I may update with a few of the more advanced models for panel data. Or hierarchical models. But that might generate a whole new set of questions for the Stan forums!

2 Likes

Just for concreteness, I’m pasting in one of the models from the linked repo:

functions {
   
   real hnormal_lpdf(vector y, real sigma) {

      int N = rows(y);

      vector[N] prob;
      real lprob;
      for (i in 1:N) {
         prob[i] = log(2*sigma)-log(pi())-(y[i]^2*sigma^2)/pi();
      }
      lprob = sum(prob);
      return lprob;
   }

   real nhnormal_pdf(real y, row_vector x, real alpha, vector beta, real sigma, real lambda) {
      real prob;
      real sigs = sqrt(sigma^2 + lambda);
      real delt = sqrt(lambda)/sigma;
      real epsi = y - alpha - x * beta;
      prob = (0.5) * log(2 / pi()) - log(sigs) + normal_lcdf(epsi*delt/sigs|0,1) - (epsi^2)/(2*sigs^2);
      return prob;
   }
}

data {
   int<lower=0> N; // Number of observations
   int<lower=0> K; // Number of factors of cost function
   matrix[N, K] x; // Matrix of factors of cost function
   vector[N] y;    // Vector of costs
}
parameters {
   real alpha;            // Intercept
   vector[K] beta;        // Coefficients of the cost function
   real<lower=0> sigma;   // Error standard deviation
   real<lower=0> lambda;  // Parameter of half-normal inefficiency distribution
   vector<lower=0>[N] u;  // Inefficiency terms
}
model {
   alpha ~ normal(0, 100);
   beta ~ normal(0, 100);
   sigma ~ inv_gamma(0.001, 0.001);
   lambda ~ gamma(1, 1/37.5);
   u ~ hnormal(lambda);
   y ~ normal(alpha + x * beta + u, sigma);
}
generated quantities {
   vector[N] eff;
   vector[N] log_lik;
   eff = exp(-u);
   for (i in 1:N) {
      log_lik[i] = nhnormal_pdf(y[i], x[i], alpha, beta, sigma, lambda);
   }
}

First, we don’t recommend those diffuse gamma priors—there’s a discussion in the Stan manual in the chapter on regression. We generally recommend at least weakly informative priors that indicate the scale of the parameters.

You can vectorize for efficiency and you don’t need all those intermediates. This,

   real hnormal_lpdf(vector y, real sigma) {

      int N = rows(y);

      vector[N] prob;
      real lprob;
      for (i in 1:N) {
         prob[i] = log(2*sigma)-log(pi())-(y[i]^2*sigma^2)/pi();
      }
      lprob = sum(prob);
      return lprob;
   }

can be reduced to a much more efficient one liner

real hnormal_lpdf(vector y, real sigma) {
  return rows(y) * (log(2 * sigma) - log(pi()))
     - dot_self(y) / pi() * square(sigma);
}

with only N + 3 multiplications, one division, and two subtractions. I took the additional measure of guessing that y might be data, so that the dot-self and division by pi() are together so as to minimize operations involving parameters (which is where the real cost is because of the need to store the expressions and compute derivatives).

2 Likes

Hi

I know that your post about Stochastic Productions frontiers is too old, but I’m interested in this kind of models. Are your project alive?