Model for point process in Stan

Hello!
I am trying to write a model for a 2D point process where the number of points (locations) is fixed and known by design, and the locations themselves (the coordinates) are random, and what I am trying to model. I am not sure whether I am doing something wrong or whether this is just not possible in Stan.

Setup
For now I am just simulating data to ensure my model is working
I defined a gaussian process on a 1x1 unit square, \phi(s) , and pick n observed locations s_1,...s_n with probability p(s) = exp(\phi(s))/\int(\phi(s))ds.

My observations and what I consider the data are just the locations (coordinates) of the points that I observe (which I can use to define a variance covariance matrix for the prior of the spatial process \phi(s)) and the number of points that I observe.

So my observations would have a distribution from an inhomogenous Poisson Process with log intensity \phi(s), but where the number of observations is known.

The likelihood then is be for s=(s_1,s_2,\dots, s_n) :
L(s | params ) = P(N=n)*\frac{\prod_{i = 1:n} exp(\phi(s))}{\int\phi(s)ds} = \\ = 1\times \frac{\prod_{i = 1:n} exp(\phi(s))}{\int\phi(s)ds}

The log likelihood=
log(L(s,params) = \sum_{i=1:n}(\phi(s_i)) - n*\int(exp(\phi(s))ds)

With priors:

\phi \sim GP(\phi_\mu , \Sigma) \\ \sigma2 \sim Gamma(3,4) \\ \rho \sim InvGamma(3,0.5) \\

Where \sigma^2 and \rho are hyperparameters of the variance-covariance matrix \Sigma.


Stan model

I first consider the case in which I already know the integral and pass its value to the model as is (I am simulating data so I can know a very good approximation to this integral for now).
I tried implementing this in Stan by adding a likelihood part for the probability of being sampled, but there is something wrong because the likelihood explodes (becomes very large). I wanted to see if anyone can provide recommendations on whether I am implementing something incorrectly or if this is even possible.

functions{
 real pp_lp(int s, vector phi_locs, real est_integral){
   return (sum(phi_locs)) - s*log(est_integral);
 }
}
data {
  int<lower=0> s;
  array[s] vector[2] coords;
  real est_integral;
}
parameters {
  real<lower=0> sigma2;
  real<lower=0> rho;
  vector[s] phi_locs;
  real phi_mean;
}
transformed parameters{
  matrix[s,s] SIGMA;
  matrix[s,s] L_SIGMA;
  
  SIGMA = gp_exponential_cov(coords, sqrt(sigma2), rho);
  L_SIGMA = cholesky_decompose(SIGMA);
  
}
model {
  target += normal_lpdf(phi_mean | 0, 1);
  target += inv_gamma_lpdf(sigma2 | 4, 0.5);
  target += gamma_lpdf(rho | 2, 4);
  
  target += multi_normal_cholesky_lpdf(phi_locs|rep_vector(phi_mean,s),L_SIGMA);

 // likelihood 
 target += pp_lp(s,phi_locs,est_integral);
}

Thanks for any help!

I think there might be a few sources of confusion here. First, if the number of points are fixed then it’s not really a Poisson process of any kind (the whole defining feature of PP’s is that the number of points in a subset of the domain is Poisson distributed with mean equal to the integral of the intensity function over that subset).

However, assuming that the model is correct for the task at hand, I think the problem is tractable. With observed locations s=\{\vec{s}_1, \dots, \vec{s}_n\} and parameters \theta = \{\sigma^2, \rho\}

The line:
\log p(s|\phi, \theta) = \sum_{i=1}^n\phi(\vec{s}_i) - n\log\int\phi(\vec{t})d\vec{t}
is correct but note that I’ve added the dependence on the realization of the field \phi. Put in other words, \phi(\vec{s}_i) is the value of a realization of a GP at \vec{s}_i, not the multivariate normal density evaluated at \vec{s_i}. Also, because of the doubly stochastic nature of the model you can’t precompute the integral of the intensity field. So you may need to take a latent variable approach where you model \phi as the appropriate multivariate normal, find its values at the observed locations, and numerically integrate it.

So maybe something like:

data {
  ...
  array[N_pts] vector[2] pts; //points to evaluate latent field at
  int N_pts; //number of points to evaluate latent field at
  ...
}
parameters {
  ...
  vector[N_pts] eta;
  ...
}

transformed parameters{
  vector[N_pts] phi;
    {
    matrix[N_pts, N_pts] L_SIGMA;
    matrix[N_pts, N_pts] SIGMA = gp_exponential_cov(grd, sqrt(sigma2), rho);

    L_SIGMA = cholesky_decompose(SIGMA); //if you are getting numerical instability add a small jitter to diagonal
    phi = L_SIGMA * eta;
  }
}

model {
   ...
   eta ~ normal(0,1);
   real int_phi = ... //numerically do this integral. If you're using a regular grid you could maybe do something sum(phi*grid cell area)
   vector[N_obs] phi_s = ...//look up values of Phi corresponding to locations
  ... //do appropriate target increment with the above
}

I haven’t tried the above code or tried to optimize it but just some food for thought.

Edit: I realized I wasn’t super careful with exp(\phi vs \phi in the pseudocode. I think the general procedure shouldn’t change much.

Upon some further thought, I think the problem you’re describing reduces to (2-d) density estimation via Gaussian processes. This is covered in Section 21.5 of Bayesian Data Analysis and probably in other texts on Gaussian processes.

Thank you so much for the help! The approach you described makes sense, but involves a quite large matrix of dimensions (N_pts, N_pts) defining the grid.

I was trying to precompute the integral from the known truth ahead of time in order to check whether the above model would work (and then concentrating later on approximating the integral in the sampler). I am having a hard time understanding why this would not be possible - you mention that we are conditioning \phi(\vec{s_i}) on the realization of a GP - but if we are able to know the true value of the integral ahead of time (from simulation ), wouldn’t it be sufficient to pass it as a known constant for the likelihood calculation? When we are approximating it with the grid, wouldn’t we still just be trying to approximate a value that should eventually converge to the true value of the integral. I see how this could be a pointless experiment since we are never able to actually pre-compute the integral in real life, but I am quite confused as to why it would not work.

Thank you again for your time and for the recommendation - I will look into Section 21.5 of BDA.