Stan does not support NA in data

Hiwin · February 23, 2018, 8:25am

Hi
I am trying to run a model to estimate stock abundance of a specie. My data are from three different surveys conducted over several years. But none of the surveys are conducted every year, meaning I have NA’s in my dataset. Is there a way to solve this problem?

Best regards
Hiwin

wds15 · February 23, 2018, 8:38am

Yes, there is a way to solve this and the stan reference manual tells you.

bnicenboim · February 23, 2018, 1:41pm

Hey that was a bit nasty, it’s not that the manual is 10 pages. And the answer is not that straight forward.

Hiwin: you should definitely look at the manual. See “11. Missing Data & Partially Known Parameters” if you want to treat the NAs as missing data. Otherwise, check “16. Sparse and Ragged Data Structures”

wds15 · February 23, 2018, 4:41pm

sorry, I was in a hurry this morning. It reads like RTFM, I understand.

I agree that the manual is huge, but it is also organized quite well - and NA things are really a dedicated section. So specific questions to whats in there are very welcome.

(I also barely read user manuals, I don’t like them like most users - but the Stan manual is a real exception to that. It is written with great care thanks to @Bob_Carpenter)

Panagiotis_Arsenis · February 23, 2018, 6:30pm

I had similar issues with my missing data and indeed the manual basically provided the solution. You need to go through sections 11.1 and 16.1 as mentioned above and apply accordingly.

Bob_Carpenter · February 27, 2018, 7:32am

Searching’s not always so easy. Especially in pdfs (I need to move the manual to HTML, but it’s huge). You won’t find any mention of the R-specific notation NA in the manual. At least I don’t recall putting it in there.

What’s missing is also Ben’s really nice approach. You can decompose the full matrix with missing data and observed data conceptually as

X = X_miss + X_obs;

where the observed data matrix X_obs is sparse and has zeroes where data is missing; X_miss has parameters where data is missing and zeroes where its observed.

What you can do rather than actually adding them and doing X * beta, is to instead use

X_miss * beta + X_obs * beta

using the sparse matrix multiplication function csr_matrix_times_vector for the multiplies and plain old addition for the additon.

Hiwin · March 6, 2018, 7:44am

Sorry for the late reply. I’ve been out of office.
Thank for all your replies. I will give it a go and get back to you if I fail.
Best regards
Hiwin

Hiwin · March 7, 2018, 7:57am

I am completely new to this and I 'm having some difficulties implementing the codes in my model. Any help is appreciated.
BarentsSea2.R (2.1 KB)
BarentsSea.stan (3.9 KB)

The datasets including NA’s is survR, survRu, survE and CPUE

Best regards Hiwin

Bob_Carpenter · March 7, 2018, 10:04pm

You’re starting with a fairly difficult example. Unlike BUGS and JAGS, Stan will not accept NA data. You need to model it all explicitly as a parameter as outlined in the missing data chapter of the manual.

Topic		Replies	Views
How to handle missing values in Stan Modeling	2	629	November 30, 2021
Missing data handling Modeling	10	4266	June 23, 2017
Stan Modeling	3	276	August 3, 2022
Dealing with missing data in a data matrix Modeling	8	1192	November 14, 2019
Missing response model (section 10.3 of Stan manual) Modeling	11	2303	May 24, 2017

Stan does not support NA in data

Related Topics