# Bayesian inference in Stan

Did you really mean to use lambda three times? I don’t see how that connects to yT or x.

Can you just write out the density you’re trying to code? That’s all Stan’s doing under the hood. We need all the dimensions and to know whether a variable is observed (data) or not (parameters).

Part of our confusion may be that Stan’s doc is written following Gelman et al.'s Bayesian Data Analysis, where we use Greek letters for unknown parameters, y for modeled data (usually “observations” of some kind) and x for unmodeled data like predictors in a regression.

There are simple examples like this in the manual chapter on measurement error models.

yT connects with lambda through its normal distribution with mean yh and sigma_h (keep in mind yh depends on lambda). Similar reasoning applied to x to see why x depends on lambda

Density function of yh(y) = e^(-lambda)*lambda^t/t! + [1/(2pi)^0.5]*e^(y-1)^2/2. Density function of yT(y) = [1/(2pisigma_h)^0.5]e^(y-yh)^2/(2sigma_h). Density function of x(y) = 1/(2pisigma_T)^0.5]e^(y-AyT)^2/(2*sigma_T).

I hope the above is helpful, although it’s very annoying not to have LaTex embedded to type math formula. I did not expect this problem to confuse everyone here in terms of the mathematics;p

What you list as the density function of yh is not a valid density function. If you want to mix multiple densities then you need to weight them appropriately, and even then you’re trying to mix a discrete and continuous density function in what seems like a nonsensical manner.

We keep going back to the math because the confusion in your code is indicating that something deeper is going on. And clearly something deeper is going on!

Perhaps it would be easier if you tried to give the code that you would use to generate a sample from this data generating process? Or provide some context as to the application?

If the math formatting is bothering you then type in up in TeX and attach a PDF.

@betanalpha and @Bob_Carpenter: Thank you so much for your clear comment. Now, please let me explain further about the actual motivation of generating yh in a very specific way.

Purpose: I need to generate a matrix yh representing a set of historical data which follow Poisson process. Now, because it’s historical, there might be some errors into it, so we add to each entry a positive epsilon to account for such error. Now, for the sake of modeling and the math, we assume that this epsilon follows a truncated normal distribution with mean 0 and variance 1.

Context: yh stands for number of people getting to/from various bus stations at a particular point in time (for example, yh_12 = number of people getting on at station 1 and getting off at station 2). This is why all the entries on the diagonal of matrix yh have to be 0.

My code for modeling the above

lambda = matrix(sample.int(100, size = ncol*nrow, replace = T),nrow,ncol);
lambda = lambda - diag(lambda)*diag(x=1, nrow, ncol);
yh = rpois(ncol,lambda) + rtruncnorm(ncol,0,1,mean = 0, sd = 1);

I’m still confused here because yh is discrete but you’re adding continuous noise to it.

What does sample.int do? That just generated a matrix with uniformly generated entries between 0 and 100? Is that really the generating process for the model, uniform(0, 100)?

R is so confusing. That * is elementwise multiplication and diag(lambda) pulls the diagonal out of lambda whereas the second use of diag(x=1, nrow, ncol) just creates a diagonal matrix of size nrow x ncol with entries 1. I think that second diag() is redundant as it just scales the diagonoals by 1, and the net effect is to set the diagonals of lambda to zero.

Then I’m lost on the rpois? lambda is a matrix and ncol is a number, so what comes out? And then you add it to an ncol-vector of continuous parameters.

Still no idea what’s going on here or what the resulting shape of yh is going to be.

No, you are on the right track. rpois would come out to be a vector of size ncol x 1. I am not an expert in R, so it confuses me too, but I don’t see any problems coming from generating yh this way (you are correct that we are adding a continuous noise to each discrete components of vector yh, but why is this weird?). As I explained, yh stands for measurement of the number of people going from one bus station to another bus station. It should have error in such measurement, and it’s unreasonable to say the error needs to be discrete.

Again, I am sure there is nothing wrong in generating yh this way. The hard thing now is, how to cook up the Bayesian estimation using all the information about x and yT I posted above? I need your help in coding this Bayesian in Stan…

What are you actually measuring that comes up with fractional people? I don’t see how you could measure a fractional yh[i,j].

If you have a continuous measurement and a latent Poisson variable underlying it, you’re not going to be able to fit this in Stan anyway—it can’t handle discrete count data, because Stan requires discrete parameters to be marginalized out. You’d need to use BUGS or JAGS for that.

You made a good point, but because this is an estimation, I think it does not make sense to have an estimation that purely produces discrete data, does it? Could you please elaborate on how I could implement this Bayesian modeling in JAGS or BUGS?? I have never used/learned about them before…;p

You’ll have to take that up in their forums.

I don’t understand what you’re trying to do. Let me reiterate that for Bayesian inference you need to define a joint density p(y, theta) where y is observed data and theta are unknown parameters. Then you can estimate the posterior p(theta | y). That’s it. I can’t figure out what your joint density is in terms of what is the data shape and what are the parameter shapes and what is the density function.

That’s correct. The joint density p(y, theta) = p(y|theta)p(theta) (this is a conditional probability rule). Now, we have y|theta ~ N(yh, sigma_h) and theta ~ N(Ay, sigma_T). I thought Stan could automatically approximate the joint density of p(y|theta)p(theta) (at least, that’s what my advisor told me) when specifying the distributions of (y|theta) and (theta)?

What happend to theta? I don’t see it in N(yh, sigma_h).

And to fully define the model, we need to know what is data and what its shape is (e.g., is y a vector or what?) and what is being estimated as a parameter and what its shape is (I can’t tell here as theta doesn’t actually show up).

You are right. My bad. y|theta ~ N(A*theta, \sigma_theta). y is a VECTOR (I mentioned it several times in my previous posts) . It should make sense now?

You’ll need to help me out by putting this all together. We get a lot of mail and this thread is already ridiculously long.

So one component of p(y, theta) is

p(y, theta) = PRODUCT_n  Normal(y[n] | A * theta, sigma_theta) ...


Now what is A and theta and sigma_theta. Matrix, vector, and scalar perhaps? What are the dimensions?

It’s really difficult for us to put user questions together from guesswork. What looks obvious to someone knee deep in a model isn’t obvious to someone from the outside unless it’s a standard model.

1 Like

Thank you so much for your patience, Dr. Carpenter. I wonder if we could move this on-going discussion into a private conversation?
Regards to your question: A is a matrix defined as follows (nrow = 16, ncol=24):

A = matrix(c(1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,1,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,1,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,1),
nrow, ncol, byrow= TRUE);


sigma_theta = sigma*I where I = identity column vector with size nrow x 1 = 16x1,` and sigma is some positive constant (choose anything, really).

I am really sorry for your poor experience with my question. But I indeed addressed all the components that you asked in multiple times above, when other users also asked me about them. I thought you would read them, but my apology for making that assumption. Please let me know if this is good enough for you to lend a hand on helping me implement the code?

No, I don’t want to move this private.

What I’m asking for is a single post that defines the density with all of the components so I don’t have to work back through the 35 posts in this thread piecing it together like a puzzle.

What I need is

• density function defined as a function of the parameters and data
• sizes and shapes for all variables
• distinction between which variables are data (known) and which are parameters (unknown)

From that, I can write a Stan program in no time at all. Without that, it’s guesswork.

Sure. I would create another thread, and tag you now. What do you meant with “shape of all variables?” I know they are column vectors, but that’s about all I could say regards to this question.

Yes, that’s what I mean by shape — scalar, vector, matrix, etc.

I am waiting for your help here Sir: A hard modeling problem using Bayesian inference.