One example of custom_function_lpdf

It is really hard to find coding example. I read thru the stan manual, but can’t get a custom function going. Suppose I tried to write this tutorial as a custom function

data {
    int N; // number of obs (pregnancies)
    int M; // number of groups (women)
    int K; // number of predictors
    
    int y[N]; // outcome
    row_vector[K] x[N]; // predictors
    int g[N];    // map obs to groups (pregnancies to women)
}
parameters {
    real alpha;
    real a[M]; 
    vector[K] beta;
    real<lower=0,upper=10> sigma;  
}
model {
  alpha ~ normal(0,100);
  a ~ normal(0,sigma);
  beta ~ normal(0,100);
  for(n in 1:N) {
    y[n] ~ bernoulli(inv_logit( alpha + a[g[n]] + x[n]*beta));
  }
}

I test out a direct substitute

functions {
   real my_func_lpdf(row_vector x, real alpha, vector beta){
	y ~ bernoulli(inv_logit( alpha + x*beta));
	return y;
   }
}

and it bombs (I kind of expect that). Do I make use of the target += syntax in the function? Please show me the right syntax to make this example work.

You need to pass y to your function and since y must be binary, your function should be called my_func_lpmf. The function should not return y but rather target += bernoulli_logit(y | alpha + x * beta);

1 Like

I am not trying to just replace y ~ bernoulli(inv_logit( alpha + x*beta)); with bernoulli_logit. I want to wrap the distribution inside my own lpdf function.

functions {
   real my_func_lpmf(row_vector x, real alpha, vector beta){
	// what goes here?
	return result;
   }
}

I think the only rule is that the lpmf/lpdf functions need to return the log of their probability mass/density functions. You can do whatever you want on the inside to compute that log probability.

If, for some reason, you wanted a slow version of bernoulli_lpmf, you could do this:

functions {
  real my_func_lpmf(int[] y, real theta){
    real total = 0.0;
    for(n in 1:size(y))
      total = total + bernoulli_lpmf(y[n] | theta);
    return total;
  }
}

Hope that helps!

Thanks. I pretty much figured out what’s the deal with lpdf. The rest of question of how to implement bernoulli_logit is just incidental logic.

1 Like

That’s not going to work in Stan. The sampling notation doesn’t literally draw a new random variate, but rather it’s just shorthand. That is, these three are the same:

y ~ bernoulli(inv_logit(alpha + x * beta));
y ~ bernoulli_logit(alpha + x * beta);
target += bernoulli_logit(y | alpha + x * beta);

That is, y needs to be defined before any of these three is called.
They’re not always exactly the same because the ~ notation will drop additive constants in the log density—it’s just that there are no constants to drop in the Bernoulli.

Now if what you want to do is call the log density increment inside a function, it needs to end in _lp. So you could do this:

functions {
  void foo_lp(vector y, row_vector x, real alpha, real beta) {
     y ~ bernoulli_logit(alpha + x * beta);
  }
}
model {
  foo_lp(y, x, alpha, beta);  // equiv: y ~ bernoulli_logit(alpha + x * beta)
}

What you are looking for may be something like bernoulli_rng(theta), which returns a random draw with theta chance of being 1 and 1 - theta chance of being 0. That can only be used in the generated quantities block.

Plus, everything’s faster with vectorization, so you really want to do everything together:

y ~ bernoulli_logit(alpha + x * beta)

where y is an N-array of integers, x is the N x K data matrix, alpha is a real parameter, and beta is a K x 1 vector.

1 Like