Implementing DS(G,m) custom prior for STAN


#1

@deepstan and I respectfully request any assistance in implementing a custom prior distribution in STAN.

I have attached a one-page description of the model for the rat tumor example, along with relevant R scripts. We were successful in implementing the beta-binomial model but having some issues to make it work for our DS(G,m) prior model ( https://www.nature.com/articles/s41598-018-28130-5), which is an extension of the vanilla beta prior.

Thank you for your time and consideration. Please let us know if we can provide any additional information or insights into our request.

ModelDescription.pdf (112.7 KB)
LP.basis.R (696 Bytes)
RatTumorExample.r (1.3 KB)


#2

Just code it up manually! Check chapter 23, “Custom Probability Functions” of the users manual (edit: this is the 2.17.0 manual – things are changing soon).

You’ll probably have to just code in all m Legendre polynomials that you want and that’ll be very manual, but it souldn’t be too bad.


#3

Thanks Ben. Just to clarify, do you mean to code those Legendre polynomials in C++? We have very little expertise in that. It would be really helpful to have some function inside Stan that can compute Legendre polynomials.


#4

No need to code C++. See the chapter 23 and a case study http://mc-stan.org/users/documentation/case-studies/gpareto_functions.html


#5

The easiest way to get this working might just be to manually code up however many Legende polynomials you need. Just find the eqs. somewhere and copy them down. No need to make things too fancy. Like @avehtari said you should be able to code these as functions in your Stan model.


#6

Thank you @avehtari and @bbbales2 for your time and assitance! Very much appreciated! I will work on implementing your suggestions.


#7

Thank you @avehtari and @bbbales2 for helping to integrate DS(G,m) with STAN…your advice helped us implement the DS(G,m) model described in the “ModelDescription.pdf” attachment of the original post:

library(rstan)
### MC Options
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)
########
# Rat Data from Gelman's BDA (Chptr 5)
J <- 71
y <-  as.integer(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 
1, 1, 1, 1, 1, 1, 1, 3, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 1, 5, 2, 5, 2, 7, 7, 3, 3, 2, 9, 10, 4, 4, 
4, 4, 4, 4, 4, 10, 4, 4, 4, 5, 11, 12, 5, 5, 6, 
5, 6, 6, 6, 6, 16, 15, 15, 9,4))
n <- as.integer(c(20, 20, 20, 20, 20, 20, 20, 19, 19, 19, 19, 18, 
18, 17, 20, 20, 20, 20, 19, 19, 18, 18, 27, 25, 24, 
23, 20, 20, 20, 20, 20, 20, 10, 49, 19, 46, 17, 49, 
47, 20, 20, 13, 48, 50, 20, 20, 20, 20, 20, 20, 20, 
48, 19, 19, 19, 22, 46, 49, 20, 20, 23, 19, 22, 20, 
20, 20, 52, 46, 47, 24,14))
#####################
####### STAN
# The model specification
rat.model2.string <- "
functions{
	real DSbeta_lpdf(real theta , real alpha, real beta, real c1,
									real c2, real c3, real c4){
		
		real lprobBeta;
		real Goftheta;
		real lprobDS;
		Goftheta = exp(beta_lcdf(theta | alpha, beta));
		lprobBeta = beta_lpdf(theta|alpha, beta);
		lprobDS = log(1 + c1*sqrt(12)*(Goftheta-0.5) +
						  c2*sqrt(5)*(6*Goftheta^2 - 6*Goftheta +1) +
						  c3*sqrt(7)*(20*Goftheta^3 - 30*Goftheta^2 +12*Goftheta -1)+
						  c4*3*(70*Goftheta^4 -140*Goftheta^3+90*Goftheta^2-20*Goftheta+1));
		return(lprobBeta + lprobDS);
		}
	}	
data {
    int<lower=0> J;
    int<lower=0> n[J];
    int<lower=0> y[J];
}
parameters {
    real<lower=0> alpha;
    real<lower=0> beta;
    real<lower=0,upper=1> theta[J];
	real c1;
	real c2;
	real c3;
	real c4;
}
model {
    theta[J] ~ DSbeta(alpha, beta, c1, c2, c3, c4);
    y ~ binomial(n, theta);
	alpha ~ normal(2.4, 5);
	beta ~ normal(14,5);
	c1 ~ normal(0,5);
	c2 ~ normal(0,5);
	c3 ~ normal(0,5);
	c4 ~ normal(0,5);
}"
# Running the model
stan.mcmc.rat <- stan(model_code= rat.model2.string, 
					 data= list(J = J, y = y, n = n), 
					 pars=c("c1", "c2", "c3", "c4"), 
					 chains=3, iter=10000, warmup=2000,
					 control = list(adapt_delta = 0.99))

Although the code is running, it is not producing desirable output for the parameters…@deepstan and I suspect that there is something we have missed in the construction of the model. We respectfully request your assistance in verifying whether we have properly coded the model in STAN.


#8

Why are you only providing this prior for a single (the last) component of theta? Should this be in a loop or should DSbeta accept a vector variate?


#9

It’ll be more stable if you work with logGoftheta = beta_lcdf(theta | alpha, beta) and then use log-sum-exp to define lprobDS. It’ll be more efficient if you collect up all the terms of various orders. Those will be the arguments to log-sum-exp. Then you can do log1p_exp with the result or just throw in a 0 term for the 1.


#10

@Bob_Carpenter, thank you for your recommendations…they are greatly appreciated. To answer your question about the prior, it should be applied to all theta and I have made that correction. I will also work on your recommendations for stability and efficiency. Again, many thanks!