Custom Likelihood function with hierarchical model

bcych · July 10, 2019, 11:53pm

Hi, I’m trying to run a hierarchical model. The model involves fitting a line to a set of observed x and y data. In this case x is not a predictor variable, but x and y covary linearly (their errors are assumed to be independent). The likelihood function and number of parameters for this problem can be significantly reduced with some analytical integration (see https://bayes.wustl.edu/etj/articles/leapz.pdf). As such, I want to have my model use a custom likelihood function.

The idea is that my data should predict some group level ‘intensity’ parameter (the slope of the line multiplied by a constant). This is informed by the individual level parameters i.e. I_{individual} ~ StudentT(\nu, \mu=I_{site}, \sigma).

My code runs but gives completely incorrect results for a test dataset. I think this might have to do with the syntax I’ve used. The line:
likelihood_lpdf(S2xx|S2xy,S2yy,xbar,ybar,n,vari,theta,b_scaled,p_fail)
isn’t really what I want it should be more like
likelihood_lpdf(S2xx,S2xy,S2yy,xbar,ybar,n,vari|theta,b_scaled,p_fail)
but this isn’t good syntax in stan.

I’m also not entirely sure that I’m dealing with correctly in the likelihood function itself. Technically each of the elements in the vector should be a different parameter in the hierarchical model. I’m trying to vectorize the problem by taking each of these parameters and taking the product of the resulting vector of log probabilities (i.e. the probability of all of these parameters at the same time given the data should be equivalent to an AND statement). My code is below:

functions {
  real likelihood_lpdf(data vector S2xx, data vector S2xy, data vector S2yy, data vector xbar, data vector ybar, data vector n, data vector vari, vector theta, vector b, vector p_fail) {
    return log(sqrt(dot_product((1 - p_fail) .* exp(-(n ./ (2 * vari)) .* ((S2xx .* sin(theta) .* sin(theta)) - (2 * S2xy .* cos(theta) .* sin(theta)) + (S2yy .* sin(theta) .* sin(theta))) + (cos(theta) .* (b - ybar + xbar .* tan(theta)) .* (b - ybar + xbar .* tan(theta)))) + p_fail * 0.00001, (1 - p_fail) .* exp(-(n ./ (2 * vari)) .* ((S2xx .* sin(theta) .* sin(theta)) - (2 * S2xy .* cos(theta) .* sin(theta)) + (S2yy .* sin(theta) .* sin(theta))) + (cos(theta) .* (b - ybar + xbar .* tan(theta)) .* (b - ybar + xbar .* tan(theta)))) + p_fail * 1e-12)));
  }
}
data {
    int<lower=1> N;
    vector[N] S2xx;
    vector[N] S2yy;
    vector[N] S2xy;
    vector[N] xbar;
    vector[N] ybar;
    vector[N] B_lab;
    vector[N] vari;
    vector[N] NRM0;
    vector[N] n;
}

parameters {
    vector<lower=-pi()/2,upper=pi()/2>[N] theta;
    vector<lower=0,upper=2>[N] b;
    vector<lower=0,upper=1>[N] p_fail;
    real <lower=-pi()/2,upper=pi()/2> theta_site;
}

transformed parameters {
    vector[N] intensity = -tan(theta) .*B_lab;
    vector[N] b_scaled = b .* NRM0;
    real intensity_site= -tan(theta_site) * mean(B_lab);
}

model {
    intensity ~ student_t(3,intensity_site,6);
    target += likelihood_lpdf(S2xx|S2xy,S2yy,xbar,ybar,n,vari,theta,b_scaled,p_fail);
}

KariOlson · July 12, 2019, 3:42pm

On the syntax, the conditional notation requirement is just notation - it has no effect on calculation. It occurs when you name a function _lpdf. If you don’t like this notation, change the function name.

I can’t penetrate what you’re trying to do with the likelihood calculation but you say:

While it is (imho) far more likely that you have made a mistake in your likelihood function, it is possible that you are not getting the expected results due to computational issues. Recall that an AND statement in probability space is multiplication, however when using log_probabilities AND is addition because log(ab)=log(a)+log(b). Similarly you might be helped computationally by recalling log(sqrt(c))=0.5log(c). When you reformulate in terms of log_probabilities, it is also a really good idea to look at the built-in functions in the STAN reference manual that computationally optimize some common operations (“composed functions”), like lmultiply: log(sqrt(c))=0.5*log(c)=lmulitply(0.5,c)

Topic		Replies	Views
Bayesian hierarchical modeling of a two-stage process model (Planning stages) Modeling techniques	2	521	September 26, 2023
Log likelihood for hurdle model Modeling loo	6	1713	May 24, 2017
Modelling joint likelihood of two outcomes & parameter correlation Modeling rstan	1	577	August 26, 2022
Custom mixed normal PDF/likelihood Modeling fitting-issues	1	361	September 4, 2023
Advice me in Theoretical concept for GP Likelihood General rstan , techniques	1	859	December 17, 2021

Custom Likelihood function with hierarchical model

Related topics