That’s what we’d call a logistic regression. David McKay, in his info theory book, presents logistic regression as classification with a single neuron.

If there are N observations with M continuous predictors (“neurons”) and a single binary output y then you have a data matrix x of size N \times M, and a coefficient vector \beta of size M, and a binary observation array y you need to evaluate the log likelihood function

\mathcal{L}(\theta) = \log \mbox{bernoulli}(y \mid \mbox{logit}^{-1}(x \, \beta))

The matrix-vector multiply requires N \times M multiply-add operations, the inverse logit requires N additions, subtractions, and exponentiations, and the Bernoulli requires N conditionals and N logarithms. It’s not quite done that way because have a `bernoulli_logit`

function that composes the operations and provides more stable arithmetic and unfolded derivatives. But at worst, it’s \mathcal{O}(N \times M). The derivatives require the same number of operations.