I’m fitting a four-parameter logistic curve where the response has measurement error and is rounded to the nearest integer. My goal is to estimate parameters of the logistic curve as well as the measurement error. In addition, I am fitting a mixture of normals for the underlying density (with unknown number of components, therefore setting a large number of mixture of components). I know I can do this in two steps but I plan to relate the two in a bigger model later.
The program takes about 20 minutes to run. Is there a way to optimize so it performs better? Can I vectorize the first for loop? From what I’ve read vectorizing doesn’t necessarily increase performance. Is 2000 iterations enough?
The program does estimate the parameters well and seems to have good mixing between the chains. The mixture of normals seems to really slow the program down, but I don’t want to remove it.
The Stan and R programs are below.
Stan:
functions{
// get current true y value based on coefficient estimates
real getytrue(vector coef, real x){
real mb;
real fx;
real ytrue;
mb = (2*coef[3]*coef[4])/(coef[3]+coef[4]);
fx = 1/(1+exp(-mb*(coef[2]-x)));
ytrue = coef[1]*(fx*exp(coef[3]*(coef[2]-x))+(1-fx)*exp(coef[4]*
(coef[2]-x)))/(1+fx*exp(coef[3]*(coef[2]-x))+(1-fx)*
exp(coef[4]*(coef[2]-x)));
return(ytrue);
}
// likelihood for rounded data
real ylikelihood(real yobs, real ytrue,real ysig){
return(log_diff_exp(normal_lcdf(yobs+.5 | ytrue,ysig),normal_lcdf(yobs-.5 | ytrue,ysig)));
}
}
data {
int N;
vector[N] yobs;
vector[N] x;
int n_groups;
int Ngrid;
real xgrid[Ngrid];
}
parameters {
ordered[n_groups] mu;
simplex[n_groups] Theta;
real<lower=0> sigma[n_groups];
vector<lower=0>[4] coef;
real ysig;
}
model {
real contributions[n_groups];
real y_mu;
real alpha = 0.5;
// priors
sigma ~ cauchy(0,2);
mu ~ normal(0,20);
Theta ~ dirichlet(rep_vector(alpha, n_groups));
coef ~ lognormal(0,5);
ysig ~ cauchy(0,2);
// update logistic coefficients based on rounded data
for(i in 1:N){
y_mu = getytrue(coef,x[i]);
target+=ylikelihood(yobs[i], y_mu, ysig);
}
// update mixture normals
for(i in 1:N) {
for(k in 1:n_groups) {
contributions[k] = log(Theta[k]) + normal_lpdf(x[i] | mu[k], sigma[k]);
}
target += log_sum_exp(contributions);
}
}
// save density estimate
generated quantities{
real dens[Ngrid];
// compute MIC density
for (i in 1:Ngrid) {
dens[i] = 0;
for (j in 1:n_groups) {
dens[i]=dens[i]+(Theta[j]*1/sqrt(2*pi()*sigma[j]^2)*
exp(-(xgrid[i]-mu[j])^2/(2*sigma[j]^2)));
}
}
}
R program
library(mixtools)
library(rstan)
set.seed(0807)
### Generate data
nobs=700; ysig=2
popmn=c(-2.5,-1,3); popstd=c(2,1,2); popprob=c(.5,.2,.3)
x=rnormmix(n=nobs,lambda=popprob,mu=popmn,sigma=popstd)
# Four parmeter logistic curve
coef=c(35,1.17,.1,1.2)
mb = (2*coef[3]*coef[4])/(coef[3]+coef[4])
fx = 1/(1+exp(-mb*(coef[2]-x)))
ytrue=coef[1]*(fx*exp(coef[3]*(coef[2]-x))+(1-fx)*exp(coef[4]*
(coef[2]-x)))/(1+fx*exp(coef[3]*(coef[2]-x))+(1-fx)*
exp(coef[4]*(coef[2]-x)))
yobs=round(ytrue+rnorm(nobs,0,ysig))
plot(x,yobs,pch=16,cex=.5)
### Run Stan model
options(mc.cores = parallel::detectCores())
compiled_model <- stan_model(file="logistic.stan")
dat=list(yobs=yobs,n_groups=6,N=nobs,x=x,xgrid=seq(-5,5,length=1000),Ngrid=1000)
init_fun <- function() {list(mu=seq(min(x),max(x),length=6),sigma=rep(1,6),Theta=rep(1/6,6),
coef=c(30,1,1,1),ysig=5)}
fit <- sampling(compiled_model, data = dat, iter = 2000,chains = 3,verbose=TRUE,init=init_fun)
### Output
plot(fit, plotfun = "trace", pars = c('ysig','coef'), inc_warmup = FALSE)
summary(fit,pars=c('ysig','coef'))