Stan does not support NA in data

Hi
I am trying to run a model to estimate stock abundance of a specie. My data are from three different surveys conducted over several years. But none of the surveys are conducted every year, meaning I have NA’s in my dataset. Is there a way to solve this problem?

Best regards
Hiwin

Yes, there is a way to solve this and the stan reference manual tells you.

Hey that was a bit nasty, it’s not that the manual is 10 pages. And the answer is not that straight forward.

Hiwin: you should definitely look at the manual. See “11. Missing Data & Partially Known Parameters” if you want to treat the NAs as missing data. Otherwise, check “16. Sparse and Ragged Data Structures”

4 Likes

sorry, I was in a hurry this morning. It reads like RTFM, I understand.

I agree that the manual is huge, but it is also organized quite well - and NA things are really a dedicated section. So specific questions to whats in there are very welcome.

(I also barely read user manuals, I don’t like them like most users - but the Stan manual is a real exception to that. It is written with great care thanks to @Bob_Carpenter)

1 Like

I had similar issues with my missing data and indeed the manual basically provided the solution. You need to go through sections 11.1 and 16.1 as mentioned above and apply accordingly.

Searching’s not always so easy. Especially in pdfs (I need to move the manual to HTML, but it’s huge). You won’t find any mention of the R-specific notation NA in the manual. At least I don’t recall putting it in there.

What’s missing is also Ben’s really nice approach. You can decompose the full matrix with missing data and observed data conceptually as

X = X_miss + X_obs;

where the observed data matrix X_obs is sparse and has zeroes where data is missing; X_miss has parameters where data is missing and zeroes where its observed.

What you can do rather than actually adding them and doing X * beta, is to instead use

X_miss * beta + X_obs * beta

using the sparse matrix multiplication function csr_matrix_times_vector for the multiplies and plain old addition for the additon.

5 Likes

Sorry for the late reply. I’ve been out of office.
Thank for all your replies. I will give it a go and get back to you if I fail.
Best regards
Hiwin

I am completely new to this and I 'm having some difficulties implementing the codes in my model. Any help is appreciated.
BarentsSea2.R (2.1 KB)
BarentsSea.stan (3.9 KB)

The datasets including NA’s is survR, survRu, survE and CPUE

Best regards Hiwin

You’re starting with a fairly difficult example. Unlike BUGS and JAGS, Stan will not accept NA data. You need to model it all explicitly as a parameter as outlined in the missing data chapter of the manual.