Bayesian inference in Stan

#1

I have been learning modeling in Stan using MCMC sampling technique for Bayesian estimation. But I have not quite gotten some of the details in the modeling part using Stan, so hopefully someone could help me out here. My problem is the following:

Assume I need to estimate some posterior distribution of p(x|y_h), where I am given the prior distribution p(x|y) ~ N(Ay, sigma^2*I) (I= identity matrix, sigma^2 = some constant and x=Ay), and p(y|y_h) ~ N(y_h, \sigma_{y}^2). I want to model the Bayesian inference of this whole thing in Stan, and use R to run the modeling. I would like to do this by modifying the code below, but I am confused which variable among y_h and Ay should I enter as data and which one should I enter as parameters;p

functions {
// Define log probability density function
real myNormal_lpdf(real y, real mu, real sigma) {
return -log(2 * pi()) / 2 - log(sigma)
- square(mu - y) / (2 * sigma^2);
}
}
data {
int N;
real y[N];
}
parameters {
real mu;
real<lower = 0> sigma;
}
model {
// Priors
mu ~ normal(0, 10);
sigma ~ cauchy(0,10);
// Likelihood
for(n in 1:N){
target += myNormal_lpdf(y[n] | mu, sigma);
}
}
generated quantities{
real y_ppc;
{
real x;
x = uniform_rng(0, 1);
// Phi is the probit function in Stan, the CDF of the standardised Normal N(0,1)
// inv_Phi is the quantile function of the standardised Normal N(0,1)
y_ppc = mu + sigma * inv_Phi(x);
}
}

#2

The data block is for known valuesâ€”you need to supply those to run. The parameters block is for the unknown values you want to estimate. You should take a look at the Stan manualâ€”it goes over the different kinds of deterministic and random variables used in stat modeling and how to code them in Stan.

#3

@Bob_Carpenter thank you so much for your comment. I read the Stan manual, but I could not find it helpful to the problem above, because I want to see a sample code that demonstrates the use of Bayesian inference (your explanation is much better). I tried modifying the original code above, and adding the main R files that executes the â€śmodelStringâ€ť below, but the code failed to run (it keeps giving me the error message: PARSER EXPECTED: whitespace to end of file. FOUND AT line 35: }, but I checked my code and saw no whitespace at the end of line 35). Also, I am not sure if the code does what i want it do though, so could you please help review the code below to see if my coding setup is correct? I really appreciate your help!

library("truncnorm")
modelString= "data {

#   // Define log probability density function
#   real myNormal_lpdf(real x, real A%*%yT, real sigma2*rep(1, rtruncnorm(ncol, 0, 1))) {
#     return -log(2 * pi()) / 2 - log(sigma2*rep(1, rtruncnorm(ncol, 0, 1)))
#     - square(A%*%yT - x) / (2 * (sigma2*rep(1, rtruncnorm(ncol, 0, 1)))^2);
#   }
# }

# data {
#   int N;
#   real yh[N];
}
# parameters {
#   real<lower = 0> sigma2;
#   real<lower=0> sigma;
real<lower = 0> x;
# }

# model {
#   // Priors
#   yT ~ normal(yh, sigma*rep(1, rtruncnorm(ncol, 0, 1)));
#   // Likelihood
#   for(n in 1:N) {
#     target +=  myNormal_lpdf(x| A%*%yT, sigma2*rep(1, rtruncnorm(ncol, 0, 1)));
#   }
# }

# generated quantities{
#   real yh_ppc;
#   {
#     real x;
#     x = uniform_rng(0, 1);
#     // Phi is the probit function in Stan, the CDF of the standardised Normal N(0,1)
#     // inv_Phi is the quantile function of the standardised Normal N(0,1)
#     yh_ppc =  mu + sigma * inv_Phi(x);}
}"

library(rstan)
library(MASS)
library(truncnorm)
set.seed(1)
nrow=16;
ncol = 24;
sigma=0.2;
sigma2=0.3;
lambda = matrix(sample.int(100, size = ncol*nrow, replace = T),nrow,ncol);
lambda = lambda - diag(lambda)*diag(x=1, nrow, ncol);
yh = rpois(ncol,lambda) + rtruncnorm(ncol,0,1,mean = 0, sd = 1);
A = matrix(c(1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, #x_A^1 - Done
0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0, #x_A^2 - Done
0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, #x_B^1 - Done
1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0, #x_B^2 - Done
0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, #x_C^1 - Done
0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0, #x_C^2 - Done
0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0, #x_D^1 - Done
0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, #x_D^2 - Done
0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0, #x_E^1 - Done
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,1,0, #x_E^2 - Done
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0, #x_F^1 - Done
0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,1,0,0, #x_F^2 - Done
0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,0,0, #x_G^2 - Done
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1, #x_G^1 - Done
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0, #x_H^1 - Done
0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,1), #x_H^2 - Done
nrow, ncol, byrow= TRUE);
truehist(x, col="#B2001D")
lines(density(x), col="skyblue", lwd=2)
summary(yT)
ret_sm <- stan_model(model_code =  modelString) # Compile Stan code
fit <- sampling(ret_sm, warmup=100, iter=600, seed=1,
data=list(yh, N=length(yh)))
stan_trace(fit, inc_warmup = TRUE)
stan_hist(fit)
print(fit, probs=c(0.025, 0.5, 0.975))
summary(extract(fit, "yh_ppc")[["yh_ppc"]])
plot(ecdf(x), main="Posterior predictive check")
lines(ecdf(extract(fit, "yh_ppc")[["yh_ppc"]]), col="#B2001D")

#4

Iâ€™m pretty sure copy-paste failed somewhere here:

myNormal_lpdf(real x, real A%%yT, real sigma2rep(1, rtruncnorm(ncol, 0, 1)))

Thereâ€™s a little button (6th from the left) at the top of the text box to do preformatted code. Could you put your code in one of those boxes so itâ€™s easier to see? It looks like a little html delimiter thing: </>.

#5

@bbbales2 Thank you for your help. I put my code into the format you like. But the strange thing is, the mu and sigma for my log-normal pdf is equal to A%%yT, and sigma2rep(1, rtruncnorm(ncol,0,1)), respectively. The code could execute this, but it keeps saying this message:
Error in if (h > 0) ceiling(diff(range(x))/h) else 1L : missing value where TRUE/FALSE needed

when it runs this line: truehist(x, col="#B2001D"). I donâ€™t see whyâ€¦Also, I am interested in finding the posterior distribution of p(yT|x), which is proportional to p(x|yT)p(yT) (likelihood functionprior), but I am not sure if the variable for my log-normal pdf function, defined from 5th-7th line above, is actually x? Should it be yT?

Sorry but I am very confused when coding Bayesianâ€¦;p

#6

I couldnâ€™t understand the formulasâ€”theyâ€™re garbled as to types of things and order of whatâ€™s prior and whatâ€™s a parameter.

I donâ€™t think you want to write your own normal distribution.

You enter everything in the data block as data.

Iâ€™d strongly suggest following along a course or textbook presentation of Bayesian stats as there are a lot of subtleties that will trip up the unaware and it will be hard to ask questions on a forum like this that we can understand.

#7

@Bob_Carpenter The prior is p(yT), which is this line: yT ~ normal(yh, sigma*rep(1, rtruncnorm(ncol, 0, 1))). And I did not write my own normal distribution though. Itâ€™s just that itâ€™s technically easier to deal with log-normal distribution, and Stan does not give us the built-in function for log-normal distribution. I am not sure what confused you? Could you clarify?? Parameter is just sigma, sigma2 and x, I think? Because my Bayesian formulation is: p(yT|x) is proportional to p(yT)*p(x|yT), so maybe I am wrong on the parameter??

The data is yh, which is generated as a vector whose entries follows Poisson distribution + some noises (epsilon) that follow normal distribution with mean 0 and S.D = 1. By the way, this forum does not have Latex code for typing Math formula?? Itâ€™s so hard to type these distributions without using LaTex.

#8

@bbbales2 @Bob_Carpenter could either of you please help point out the mistake in my code above? I still could not see why it gives this error message:
Error in if (h > 0) ceiling(diff(range(x))/h) else 1L : missing value where TRUE/FALSE needed.

Is this a mismatch of dimension due to the label col="#B2001D" in truehist(x, col="#B2001D")?

#9

The error I get when I run your code is:

> truehist(x, col="#B2001D")


I think you have the syntax of all this a little mixed up. Variables are not shared between R and Stan, and not all R code works in Stan (and not all Stan code works in R). Iâ€™d recommend looking around online for a simpler Stan tutorial to start with (to get used to how things are coded). And like Bob said, before you spend too much time doing that youâ€™ll want to make sure you have the stats stuff under control (or the code will seem unnatural anyway).

There is a lognormal function in Stan, as well, so you shouldnâ€™t have to code one yourself.

Good luck!

#10

@bbbales2 thank you so much for your patience. But what did you mean by " Variables are not shared between R and Stan, and not all R code works in Stan?" I meant, why the variables in Stan (sigma, sigma2 and x) are not shared by R (I specified sigma and sigma2 in my execution code in R). I think the stats stuff behind Bayesian is very simple, but I am struggled to implement it in R using Stan (probably since I am a very bad programmer)

Do you know ifI stated the priors and the likelihood function as above and ran the execution code, the prior and likelihood functions are automatically multiplied with each other? Is the syntax in Stan to call out the built-in function for lognormal pdf = normal_log? Honestly, I have not been able to find a useful reference sources to learn about modeling in Stan with Bayesian inference;p Could you suggest some?

#11

There are examples of using Stan with R here.

Even more examples of Stan programs here. There is also a comprehensive user manual. A good place to start might be to replicate a worked example.

#12

Thank you for your link. Could you find some examples of using Stan to model Bayesian? I did read through the user manual, but it shows absolutely nothing on coding Bayesian modeling (it shows the theory, which I could understand easily). But did you have a chance to help look through my code above? If you saw what went wrong, please help let me know.

#13

I couldnâ€™t undersetand what you wrote in math, but I can read your code. The main problem with the code is the declaration for sigma, which as it states in the manual in many places, should be:

real<lower=0> sigma;


because sigma is constrained to be positive. Every possible value of the parameters that meet the declared constraint should have support in the model.

Otherwise, assuming you push in reasonable data, it should work.

Itâ€™s much more efficient to write

y ~ normal(mu, sigma);


than your direct increment. And itâ€™s better in generated quantities to just write

real x = normal_rng(mu, sigma);


If you want case studies, check out the docs tab of http://mc-stan.org/ There are literally hundreds of examples of using Stan to do Bayesian inference.

#14

@Bob_Carpenter thank you so much for your great help. I think I did write the line â€śreal<lower=0> sigmaâ€ť but I accidentally left some spaces between â€śrealâ€ť and â€ś<lower=0>â€ť Did that cause the problem? So you meant I should declare, within the parameters section, real x = normal_rng(mu, sigma)?? Also, I think I should delete the entire log pdf function above, but does this cause any problems to prevent the code from running?

Also, in the â€śparametersâ€ť section, would it be OK to declare a statement like real<lower=0> sigma = sigma*rep(1, rtruncnorm(ncol, 0, 1)?? I meant, I want to assign a formula for our parameter this way, would that be OK?

Finally, if I indicated the prior and likelihood functions as in the code, do I need any code line to indicate the posterior distribution? I meant, do I need to have a code line saying target*yT? One more thing, I am not sure why the author of the original code uses for loop for the likelihood function. Do you know if this loop is essential for doing Bayesian?

#15

No idea what you mean by period.

You should declare and define x within the generated quantities block; _rng functions arenâ€™t allowed elsewhere and you canâ€™t assign to parameters.

You need the lower bound on sigma because it must be positive.

#16

@Bob_Carpenter Sorry I meant <lower=0>. But I think I messed up on the coding really badly, which is the result of trying to modify some codes without understanding clearly what itâ€™s doingâ€¦my intention was only to generate a posterior distribution of yh given the observed data x (vector form). I did not mean to do the math like the one indicated in the generated quantities. We donâ€™t need this section for the code to give us meaningful output in terms of the posterior distribution if we use Bayesian and MCMC, do we?

Also, I did indicate sigma >=0 in my original code (itâ€™s within the parameter section).

#17

The generated quantities block is just for computing (perhaps random) functions of other random variables efficiently. You donâ€™t need to have anything there. You can just fit a simple model with parameters (and perhaps transformed parameters).

Discourse may have eaten your constraints on sigma because theyâ€™re in plain text. Letâ€™s see:

real<lower=0> sigma; // did they show up?

Nope â€” they get eaten if you donâ€™t escape. With escapes they show up:

real<lower=0> sigma;


You to use three back ticks () on their own line before and after code.

As usual, it's <a href="https://meta.discourse.org/t/post-format-reference-documentation/19197">a complete mess using markdown</a>.

#18

After studying for the examples, I still cannot figure out the correctly modified code that demonstrates the math that I want. Could you please help me point out where i should fix in the code above to make it work exactly like my math below, Mr. Carpenter?

Basically, I want to use Bayesian inference to find the posterior distribution of a true data yT, given the data x (find p(yT|x)). From the theory of Bayesian inference, we know p(yT|x) is proportional to p(x|yH)p(yH), where yH = historical data obtained in the past, x = observed data currently. Now, we make a key assumption: yT ~ N(y_h, sigma_1) (y_h = vector, sigma_1 = matrix), and x ~ N(Ay_T, sigma_2) where A = a pre-defined 16*24 matrix shown in the original code above.

My big question: How could I modify the code above so that I could use MCMC to draw a sampling of distribution of p(yT|x), and then graph it to see if it follows normal distribution?

#19

Hi,

I donâ€™t understand at all what you want to estimate. Could you please explain the following:
1: What is your data, and what are the dimensions of it?
2: Which parameters do you want to estimate?
3: How are the parameters related to the data?
4: Which priors do you have for your parameters.

best,
Ole-Petter

#20

1. The given historical data (yh) are generated as in the following code lines:
> lambda = matrix(sample.int(100, size = ncol*nrow, replace = T),nrow,ncol);
        lambda = lambda - diag(lambda)*diag(x=1, nrow, ncol);
yh = rpois(ncol,lambda) + rtruncnorm(ncol,0,1,mean = 0, sd = 1);


The dimension is a ncol*1 vector, where ncol=24 in this case.

1. The parameters I tried to estimate is (yT given parameter x) where x = observed data. Note that yT #x because yT is real-life data, while x might have measurement error
2. yT is assumed to follow normal distribution with mean yh and variance = sigma_h (make up this variance!). Similarly, x is assumed to follow normal distribution with mean AyT, and variance = sigma_T (make up this variance!). A is a matrix with size nrowncol = 16*24, defined as follows:

A = matrix(c(1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,1,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,1,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,1),
nrow, ncol, byrow= TRUE);

1. The prior that I have is: p(yT). the likelihood function is p(x|yT).

I hope this is clear and makes sense now. Please let me know if it still does not make sense. I am still struggled to make the code run based on all the info above.