MCMC sampling does not work when execute

fitting-issues

#41

Could you please give me an example of marginalize discrete unknowns out of the posterior? I think truncating in this case works, because the unknown should be finite number (less than 1000, probably)


#42

There is an entire chapter (currently the 15th) in the Stan User Manual about it.


#43

Do you happen to know if PyMC3 could work much easier for MCMC when we need to model prior distributions to be Poisson?


#44

The best thing to do even if you were using PyMC3, BUGS, etc. is to marginalize out the discrete unknowns. Otherwise, it is very difficult to sample from the posterior distribution efficiently.


#45

Thank you so much for your insight. But for PyMC3 I plan to mimic this link using Metrpolis Hastings rather than NUTS: https://am207.github.io/2017/wiki/Lab7_bioassay.html.

Regards to Stan: Would there be a difference if I tried changing the default simulation model from NUTS (a HMC) to M-H (what is the default for calling out this simulation model in stanfit(), do you know?) HMC primarily works with continuous variables, while M-H works very well with discrete varaible.


#46

No it doesn’t, which is why we never exposed the M-H algorithm in Stan.


#47

Metropolis-Hastings certainly doesn’t work well in general for combinatorially hard problems like variable selection where there are exponentially many variable configurations and a very high degree of dependency.

In many models that can be expressed using integer parameters, it’s almost always more efficient and robust in Stan or in other systems, to marginalize them out if it’s tractable.

For something like missing Poisson data, Metropolis might work. The problem you’ll have is that HMC is very sensitive to its tuning parameters and changing the discrete parameters changes the continuous part of the posterior conditionally. You can code this in PyMC3, but you should be careful in checking conergence. As far as I know, they haven’t extensively evaluated whether these models work. For instance, you’ll want to be careful about divergences which will indicate potential bias in estimates (don’t know if PyMC3 reports these, but they probably do). The way to test more generally is to generate fake data from the model then make sure you can recover the model parameters to within posterior intervals. And please report back if you’re successful—I’m curious as to how something like this will go in PyMC3.


#48

So I read Chapter 15 in the Stan manual, but did not quite see how to apply it to my problem at hand now. Basically, I need to marginalize the prior p(yT|y) where yT~ Poisson(lambda), while my likelihood p(x|yT) ~ Normal(A*yT, sigma) is very easy to manage, even when its mean contain the parameter we want to estimate. So if I take the log of p(yT|y).p(x|yT), I would get log(Poisson(lambda)) + log(Normal(A*yT, sigma)). Could you please help me on modeling this in Stan so that I could run the built-in Metropolis-Hastings algorithm in R? Also, even after I finish the sampling part, would I be able to get the posterior distribution of x|y using mcmc_trace() or plot(stanfit2, pars = c("yT")) command?


#49

Yes, I would let you know right away if PyMC3 works out for me. But I still want to get it with R;) Could you please show me how to marginalize the product of p(yT|y)p(x|yT) where yT|y ~ Poisson(lambda) and x|yT ~ Normal(A*yT, sigma) (A = a matrix, you could either make it up or use the matrix in the OP).


#50
for (i in 0:2147483647) // or other large integer
  target += poisson_log_lpmf(i | log_lambda) + normal_lpdf(x | A * i, sigma);

To get the posterior predictive distribution of x just draw from a Poisson and use that realization multiplied by A to draw from a normal distribution with standard deviation sigma. You can do that in generated quantities or on the outside in R.


#51

Thank you so much for your prompt help!! I wonder what does poisson_log_lpmf(i | log_lambda) stand for (I did not see this function in the manual, only saw poisson_lpdf())?

Also, I think I did not clear it somehow, but what I am trying to find is the posterior distribution of yT based on the product p(x|y_T)p(y_T|y) (this final form is attained after some algebraic manipulations already). So shouldn’t I replace i by yT[i]? Furthermore, i should be a column vector in order to make A*i makes sense. But i here is just a number, so I am not sure if you are discretizing everything basically??

Finally, could you elaborate on To get the posterior predictive distribution of x just draw from a Poisson and use that realization multiplied by A to draw from a normal distribution with standard deviation sigma? I meant, didn’t you already do that with the target variable? Also, in this case, what does my model section in Stan become?


#52

I tried your model with the following code in Stan, and it tells me the mismatch in the variable type (once again, same error as before;p) for poisson_log_lpmf. The mistake originated from the fact that parameters cannot be determined as int-type variable, but poisson_log_lpmf requires one input as int variable or int array. How could we get around this:(

 data {
  int<lower=1> ncol;
  int<lower=1> nrow;
  vector[ncol] yH;
  int x[nrow];
  matrix[nrow, ncol] A;
  vector[nrow] sigma_x;
  vector[ncol] sigma_y;
  vector[nrow] epsilon;
  int log_lambda[ncol];
}

parameters {
   vector[ncol] yT;
}

model {
  for (i in 0:ncol){
       target += poisson_log_lpmf(yT[i]| log_lambda[i]) + normal_lpdf(x[i]| A[i,]*yT, sigma_x[i]);
  }  
}

#53

@bgoodri: could you please be patient and help me overcome this difficult problem? I am stumbled…;p


#54

poisson_log_lpmf will not accept a vector, only an int or an int array. So, you have to loop over observations and apply poisson_log_lpmf to the loop index.


#55

Yes, you are correct. But then when I tried that, it still gives me error that yT[i] cannot be a vector. Could you please help try my above code yourself and let me know why it still cannot accept yT[i] as an int?

Also, could you elaborate on what exactly does the target variable represent? I think target equals to the log of p(yT|y) x p(x|yT). How do we go back from the log to the pdf of the posterior, (aka, p(y|yT))? Is it as simple as applying exp() to target?

Now, I am thinking of re-writing the first term on the RHS of target +=... as log(Poisson) simply as log(exp(-lambda[i]\)*(lambda[i])^(yT[i]))/Gamma(yT[i]+1). But I am not sure if Gamma accepts 'yT` declared as a vector rather than an int-type. How do you think?


#56

The manual lists what signatures are available for which functions in the index.


#57

By saying “loop over observations”, you mean I have to loop over yH rather than yT? Could Stan still work in a model without parameters being specified??


#58

Well, you have to specify the continuous parameters, but then Stan can work very well by marginalizing over the discrete unknowns that cannot be declared in the parameters block. However, that means for each observation, you have to evaluate the log-probability that the discrete count takes on each integer value between 0 and some big number and then do a log_sum_exp with the other distribution. All of this is discussed in the Stan User Manual.


#59

I read the Stan User Manual - Chapter 15, but when I tried to apply into my model above, it does not simply work. The reason is because of the way the example does marginalization (they have more parameters than I do, and the terms that I have Poisson on, is the Prior, not the likelihood. My prior (yT|yH, A) is conditional over 2 (out of 3) parameters already; my likelihood is (x[i] | d). That’s why when you say “specifying the continuous parameter”, I switched to vector[ncol] yT, but then cannot fit this into poisson_log_lpmf() because of the constraint on input-type for this particular function. Not sure how could I overcome this issue in R:p


#60

It doesn’t matter whether the distribution is for the prior or the likelihood because the process of marginalization is the same. And it doesn’t matter if you try to declare a discrete unknown as an integer array or a vector because Stan is going to refuse to deal with discrete unknowns.

If you just simulate some data in R according to your generative model, such as

x <- rpois(1)
y <- rnorm(1, mean = x, sd = 1)

and do that many times, you get a marginal distribution of y. And if you have a bunch of observed y values that come from that data-generating process, you can use that marginal distribution as a likelihood with something like

vector[big_integer] log_weights;
vector[big_integer] log_dens;
for (i in 0:big_integer) {
  log_weights[i] = poisson_log_lpmf(i | log_lambda);
  log_dens[i] = normal_log_lpdf(y | i, 1);
}
target += log_sum_exp(log_weights + log_dens);

but you have to do that for each observation.