Hi,

Sorry for the non-informative title. I just couldn’t come up with a better one.

I have a dataset with a little over 1,000 observations, each with a vector of 0s and 1s associated with it. Each vector has 10 elements. Each individual log-likelihood comprises the sum of two different summation equations: one for the 0 elements of the vector, and another for the 1 elements. When computing the maximum likelihood estimates in R, my solution was to create two subsets from the original data, one for each type of outcome (0 or 1). Then I computed the two parts of the log-likelihood separately and added them at the end to get the total log-likelihood. I’m having trouble doing similar operations in Stan. I think the main issue I’m facing is that when I subset my data, I end up with vectors of different sizes for each observation, and I’m unsure how I can deal with this kind of data in Stan coding.

Do you have any suggestions?

Thanks a lot for your time.

Sorry, your question fell through a bit.

I’ll admit I don’t really understand the concern. A Stan program is - in the end - just a way to compute the likelihood. If you can write it in R it usually (but not always) has a straightforward counterpart in Stan. Could you share the R code and your attempts at converting it?

Also, if the data require some preprocessing, it is often useful to do the preprocessing in R and pass processed data to Stan (simply because it is usually faster to write and debug R than Stan code).

Best of luck with your model!

Thanks for the reply, Martin. I think I have figured out a solution. I’m new to the language and not an experienced programmer. I think I was just intimidated by it. It’s getting better, though. Thanks for your response anyway.

Hi, would this help https://mc-stan.org/docs/2_23/stan-users-guide/ragged-data-structs-section.html

You would just need to reorder your data in R.

Thank you for the reference!