HMM/State Space Model with unobserved outcome


I’ve been trying to figure this out for quite a few days now but I still couldn’t find a definitive answer.

I am trying to fit an HMM/State Space Model model where there are binary outcomes but one of the outcomes is unobserved.

Suppose you want opinions on how satisfied students are with their education. To attract responses, you are giving away free lunch boxes. You have 100 lunch boxes. In order for people to get lunch boxes, they will have to fill out a short survey. However, you do not know how many people walked by your table and decided not to pick up a lunch box thus you do not have any information about them. So essentially, you have 100 observations, all of which are 1 in your logit/probit model and you have some information about them. You do not know how many 0s are there in your model, it could be 100, or 10,000.

How should I write the model block in stan? I could write something like

decision ~ normal( +, sigma_d);
satisfaction ~ normal(decison +, sigma_s);

In this model, I know satisfaction has 100 rows as I have 100 surveys on hand. I know there are 100 rows of decisions which are 1(decided to take survey & take lunch box), but I do not know how many 0s(or how many total rows to assign to decision) there are and thus do not know how to code it in Stan. The ultimate goal is to realize all students’ satisfaction with education, but again suppose we do not know how many students there are, is there a way to model it?

One way to do it is to give it a ridiculous amount of rows(say 1 million), and stop when we have 100 1s in the decision. I am not sure but I have a feeling that I won’t be able to extract the true opinion in students if I do this. Also, I didn’t find where to put such codes in Stan.

Any comments/suggestions will be greatly appreciated!


Seems like you don’t know anything about non respondents, which makes it kind of hard?

e.g My girlfriend likes anchovies. Do I like anchovies? Maybe I’m missing something?

1 Like

Thanks for your response, Charles!

That’s correct. I don’t know anything, even the exact number of people who chose not to participate. I understand it’s a difficult scenario intuition-wise, but is it possible to do modeling-wise?

It’s actually a quite common scenario, you only observe who chose to participate, but you don’t even know how many potential participants there are. E.g, you are a fast-food restaurant in a shopping mall. You only observe people who bought food from you, but you don’t even know how many people have considered/walked by you, yet you still want to know the dining preference of all the shoppers.

Sure, but unless you know or assume something then there’s just no extra information to improve the estimate beyond your observations…

Perhaps you could have a prior on the non-response rate and marginalise over that. Meanwhile you could still say something about your model conditional on any particular value of the prior.

As others have said if your response rate might be 0.0001% and you want to infer something about the population from your sample then you just can’t. At least with a prior you may be able to make statements regarding to which non response rate some thing is true or some condition holds.