Calculating layers in a Neural Net

For a two-layer network with tanh, I used this Stan function:

  /**
   * Returns linear predictor for restricted Boltzman machine (RBM).
   * Assumes one-hidden layer with logistic sigmoid activation. 
   *
   * @param x Predictors (N x M)
   * @param alpha First-layer weights (M x J)
   * @param beta Second-layer weights (J x (K - 1))
   * @return Linear predictor for output layer of RBM.
   */
  matrix rbm(matrix x, matrix alpha, matrix beta) {
    return tanh(x * alpha) * beta;
  }

More layers look just the same.

This is still a mess of inefficient autodiff compared to actually building the back-prop algorithm statically. So if we really wanted to do these efficiently, we’d write custom derivatives for functions like rbm() direclty in C++. Calling autodiff requires lots of extra space and is also slower. The direct C++ implementation would require very little memory and be at least 4 times faster.

But for reasons @betanalpha mentions and because this still isn’t going to scale in parallel, we haven’t been very focused on neural nets (aka deep belief nets).

1 Like