Sampling from a matrix in STAN

evihas · August 5, 2021, 12:04am

Hi, all stan users and experts,

I am dealing with some issues when coding a program using stan. In my stan code, I need to take random data samples (data raws) from the generated data matrix relevant to the likelihood function. But, I don’t know how to do that. Can anyone help me here?

Thank you

jsocolar · August 5, 2021, 12:36am

I think there’s some ambiguity in what you are trying to do. Are you able to share a bit more about the model? Note that the likelihood, as a function of the data and the parameters must be completely deterministic in a Stan program. Random-number generation is restricted to the transformed data and generated quantities blocks of a Stan program.

evihas · August 5, 2021, 12:58am

Hi @jsocolar,

Thanks for getting back to me. Okay, I will try to explain in this way.

I am trying to use stan in the Bayesian calibration of the microsimulation lung cancer model and here, the likelihood function follows poison distribution. At this point, I need to figure out how many individuals at each disease state with particular age groups. What I am trying to do is simulate lung cancer data and then do sampling to find summary distributions.

Does that make sense for you? Are there any different kinds of approaches for doing this?

Thank you

jsocolar · August 5, 2021, 1:28am

It sounds like maybe you want to repeatedly simulate datasets under known parameters and then fit a model to those datasets using Stan. Is that an accurate description? In general, the way to do this is to simulate the data externally to Stan and then fit the model using Stan.

evihas · August 5, 2021, 1:34am

That’s great. Ultimately, I need to end up with calibrated parameters.

But, I don’t know how to link data that is simulated externally with stan to calibrate parameters. If you have any idea please kindly share with me.

Thank you

jsocolar · August 5, 2021, 1:53am

So I should correct myself a little bit. The dataset simulation can be done inside a Stan program, but if you aren’t comfortable setting that up then the most straightforward and familiar way to achieve this will be to simulate the data externally.

For an overview of how to do it all inside the Stan program (maybe you’ve already seen this), consult the Stan User’s Guide here: 26.3 SBC in Stan | Stan User’s Guide

To do it outside of the Stan program, just simulate your data in R (or Python etc, but I see from your previous posts that you use R) and then fit the Stan model to those data using rstan or cmdstanr.

evihas · August 5, 2021, 2:15am

First, thank you @jsocolar ,

Yes, I am using both rstan and cmdstanr (but, I would prefer to work with cmdstanr). Here, I will follow SBC in stan.

One thing, let’s say there are 100000 individual simulations in my data table, and a random sample of 1000 individuals is needed to be sampled. What stan command can be used for that?

Thank you

jsocolar · August 5, 2021, 2:19am

I’m sorry; I don’t understand what you are trying to achieve. Can you provide some code to illustrate what you are trying to achieve and where you are getting hung up?

In what data table?

For what purpose?

evihas · August 5, 2021, 2:31am

Sorry for the inconveniences caused to you.

This is the paper I am following.

Okay, lets’ say like that, first, I need to simulate 1000 micro-simulations as individual-level data, then, I need to make samples of size 100 from these micro-simulations in order to count how many individuals are in different disease states with time (different independent samples for each disease states)
Also I need to repeat these steps 10 times(say) using a for loop.

Does that make sense for you? :(

Thank you.

jsocolar · August 5, 2021, 3:05am

I think I still don’t understand. What you describe above can be achieved by running 1000 micro-simulations using the MILC package mentioned in the abstract of the paper you linked, then randomly selecting 100 of them (but why then did you bother to simulate the other 900?) with something like base::sample(1:1000, 100), and wrapping this whole process in a for (i in 1:10) {...} loop. The end result of this process would be that you would have 1000 microsimulations sitting saved in your R environment.

Clearly this isn’t what you’re aiming for. If it were, you would just run the microsimulation 1000 times and be done with it. Since you’re talking about doing this with Stan, I imagine that at some point you want to fit a statistical model to the output of the simulations. And this is still the missing piece in your description of the problem. What is that Stan model? To answer this question, don’t worry about the part of the Stan program that simulates the data; pretend like you have a single fixed dataset. What does that dataset look like? What model are you trying to fit to those data? How do you write the model in Stan (provide code)? I think we’ll be able to move forward if we can get ourselves on the same page about what the model you’re working with looks like.

And don’t worry about any inconvenience to me; I answer questions here because I enjoy it.

evihas · August 5, 2021, 4:48am

Hi @jsocolar ,

I much appreciated your ideas.
I hope I can solve the issue in my code along with what you have suggested to me. I am doing manual coding rather than using MILC package and also this a small part of my work. So, it is little bit hard to explain all here.

Will back here if i have further coding errors.

Thank you

evihas · August 10, 2021, 1:06pm

Hi @jsocolar,

I am here again.
I need to find the inverse of CDF of equation (1) described in the attached paper above.
Can I use inverse() for that?

Any idea is highly appreciated.

Thank you

jsocolar · August 10, 2021, 3:04pm

No, inverse in Stan is for inverting a matrix, not for finding the inverse of a function. Your best bet, I’d guess, is to plug the function into Wolfram and ask it for the inverse. Note that I don’t have access to the paper, and can’t see equation 1, so it’s possible that there are better options depending on the function itself.

evihas · August 11, 2021, 1:01am

See the equation (1) here

ijm-00164.pdf (1.4 MB)

evihas · August 11, 2021, 1:31am

Hi @jsocolar ,

I am weird with some situations when going to define functions for random number generation. See this function below, and I need to use this y in user defined function block (i.e. in my likelihood). As it is not possible to use _rng in functions block, where should I accurately define this?

real fun1(real x1; real x2){ 
    real y = exponential_rng(1/83.5);
    if(x1 < y && y < x2){
      return(y);
    } else {
      fun1(x1,x2);
    }
  }

Thank you.

jsocolar · August 11, 2021, 1:33am

You cannot use random number generation anywhere in the likelihood. The likelihood must be a deterministic function of the parameters and data.

jsocolar · August 11, 2021, 1:39am

What distribution do you need the inverse cdf for? The distribution of failure times implied by that function? In any case, either Wolfram can give you a usable inverse cdf or it can’t. If it can’t, then I definitely won’t have anything to add :/

evihas · August 11, 2021, 2:54am

Thank you.
Okay, even though random numbers can be generated in transformed parameter block or generated quantity block, there are no way to generate it using a function as I mentioned in above R code?

jsocolar · August 11, 2021, 3:02am

Random numbers cannot be generated in the transformed parameter block, just in transformed data and generated quantities. The key principle here is that the likelihood must be a deterministic function of the parameters and the data/transformed data. For more see the Stan Users Guide section here: 18.4 Functions acting as random number generators | Stan User’s Guide

evihas · August 11, 2021, 3:07am

ohh… yes not transformed parameter block, I meant transformed data block.

Thank you

Topic		Replies	Views
Using Stan for MLE-like estimation? Modeling	5	1382	September 4, 2019
Sample from a custom distribution in the generated quantiles block Modeling posterior-predictive , rng	3	465	July 1, 2022
Simulate design matrix using Stan General	1	991	May 9, 2017
Fitting Stan our using Stan_model and then sampling Modeling	1	459	July 14, 2021
Using Stan as part of a Gibbs sampling algorithm General	1	484	January 13, 2024

Sampling from a matrix in STAN

Related topics