Kernel Density Estimation with STAN

Dear All,

this is going to be a terrible post, because what I am essentially looking for is for someone to point me in the direction of appropriate programming techniques. I apologize.

I am currently trying to implement a two step regression. In the first step, I calculate the empirical density of some variable, which serves as a normalizing constant for the right hand side in the second step (the actual regression). So far I have estimated the empirical density by applying some mixture model of gaussian normal likelihoods to the data, retrieve parameters, simulate predicted values, to which I apply R’s approxfun() so I can evaluate it at every point. I understand that this is a bit odd, since I can just apply R’s built in KDE to the observed data.

I would however like to implement the whole project in STAN so I can factor in the uncertainty of estimating the empirical density into the uncertainty of the regression, since otherwise I use point estimates without regard to the underlying uncertainty.

Thus my question: is there any advantage in estimating the empirical density with STAN, and if so, how can I make my code more efficient? And furthermore, is there a simple way of putting estimating the density, calculating the right hand side of the regression, and running the regression in one script?

Mixture Model Estimation

data{
  int<lower=1> K; // number of mixing components
  int<lower=1> N; //number of data points
  real y[N];      //observations
}
parameters{
  simplex[K] theta;         //mixing proportions
  ordered[K] mu;            //locations of mixture components
  vector<lower=0>[K] sigma; //scales of mixture components
}
model{
  vector[K] log_theta = log(theta); //Cache log calculation
  sigma ~ lognormal(20, 5);          //Uninformative prior
  mu ~ normal(50, 15);               //Uninformative prior
  for(n in 1:N){
    vector[K] lps = log_theta;
    for(k in 1:K)
      lps[k] += normal_lpdf(y[n] | mu[k], sigma [k]);
      target += log_sum_exp(lps);
  }
}

Separate Generation of Ypreds

data{
  int                 K; //number of mixture components
  vector<lower=0, upper=1>[K]  theta; //weights of components
  vector[K]           mu; //position parameters
  vector<lower=0>[K]  sigma; //scale parameters
}
transformed data{
  vector[K]       log_theta; //transform weights to allow for later summation instead of multiplication
  log_theta      = log(theta);
}
parameters{
  real y_pred; //samples
}
model{
  vector[K]   lps;
  for(k in 1:K){
    lps[k]    = log_theta[k]; //weights
    lps[k]    += normal_lpdf(y_pred | mu[k], sigma[k]); //estimated log density function
  }
  target += log_sum_exp(lps);
}
1 Like