Which rows of my dataset are used by the model?

Pietro · March 24, 2019, 1:23pm

Perhaps a stupid question but I could not find a reply anywhere.

My dataset has over 2000 rows.
In the model the number of observations is a little over a 1000.
I get the warning message (Rows with NAs have been excluded)
How do I know exactly which rows have been used by the model?
I tried complete.cases of my dataset but it does not match.

(I am asking because I have both linear and categorical predictors and to plot from the fitted() I would need to know which observation belong to which category)

Cheers

torkar · March 24, 2019, 1:32pm

Hi,

are NAs coded as NA only, or do you have empty cells to signify NAs, or something else?

foo <- data[complete.cases(data), ]

Should give you only the cleaned data. However, note what ?complete.cases tells you:

A current limitation of this function is that it uses low level functions to determine lengths and missingness, ignoring the class. This will lead to spurious errors when some columns have classes with length or is.na methods, for example "POSIXlt", as described in PR#16648.

Topic		Replies	Views
Fail to complete an IRT model with missing data Modeling	2	679	April 27, 2018
Replacement has 6 rows, data has 2 Modeling	2	615	July 12, 2022
PSA: where possible, use columns_dot_product rather than rows_dot_product Modeling techniques , specification	1	663	October 28, 2021
Missing data problem: Missing probabilities in categorical distributions General specification , meta-analysis , missing-data	2	778	December 4, 2021
Missing data in binary outcome model Modeling	1	407	July 6, 2019

Which rows of my dataset are used by the model?

Related Topics