Fail to run due to relatively high number of missing observations

Panagiotis_Arsenis · February 21, 2018, 8:02pm

Hello,

referring to a previous post (Missing data in a 2PL (IRT) model - #37 by Panagiotis_Arsenis), we currently have the following Stan code:

functions {
real rsm(int y, real theta, real beta, vector kappa) {
vector[rows(kappa) + 1] unsummed;
vector[rows(kappa) + 1] probs;
unsummed = append_row(rep_vector(0, 1), theta - beta - kappa);
probs = softmax(cumulative_sum(unsummed));
return categorical_lpmf(y + 1 | probs);
}
real rsm_rng(real theta, real beta, vector kappa) {
vector[rows(kappa) + 1] unsummed;
vector[rows(kappa) + 1] probs;
unsummed = append_row(rep_vector(0, 1), theta - beta - kappa);
probs = softmax(cumulative_sum(unsummed));
return categorical_rng(probs);
}
}
data {
int<lower=1> I; // # items
int<lower=1> J; // # persons
int<lower=1> N; // # observations
int<lower=1> N_mis; // # missing observations
int<lower=1, upper=I> ii[N]; // item for n
int<lower=1, upper=J> jj[N]; // person for n
int<lower=0> y[N]; // response for n
}
transformed data {
int m; // # steps
m = max(y);
}
parameters {
vector[I] beta;
vector[m-1] kappa_free;
vector[J] theta;
real<lower=0> sigma;
}
transformed parameters {
vector[m] kappa;
kappa[1:(m-1)] = kappa_free;
kappa[m] = -1*sum(kappa_free);
}
model {
beta ~ normal(0, 3);
target += normal_lpdf(kappa | 0, 3);
theta ~ normal(0, sigma);
sigma ~ exponential(.1);
for (n in 1:N)
target += rsm(y[n], theta[jj[n]], beta[ii[n]], kappa);
}
generated quantities {
vector[N_mis] y_mis;
for (n in 1:N_mis)
y_mis[n] = rsm_rng(theta[jj[n]], beta[ii[n]], kappa);
}

The issue here is that when N_mis > N, Stan will come back with this:

Exception: : accessing element out of range. index 7455 out of range; expecting index to be between 1 and 7454; index position = 1jj (in ‘model52b1217b4c_1e9f8627a1bbd76d859458272d0dfc57’ at line 54)

We are not sure how to deal with this. Any suggestions?

Panos

ahartikainen · February 21, 2018, 11:07pm

I’m just speculating here:

Are y_mis[n] and y[n] someway related?

would this return the same goal if you randomly pick an integer between 1 to N each round and use that as a n.

mike-lawrence · February 21, 2018, 11:33pm

The problem is that jj and ii are of length N but in the generated quantities you’re indexing into them up to N_miss, which as you say is > N.

Panagiotis_Arsenis · February 23, 2018, 6:38pm

Well, both N_mis and N come from the same data set. So, what is not N_mis (NA, missing data), it is N.

Panagiotis_Arsenis · February 23, 2018, 7:44pm

I see, this sounds right. If I simply re-index ii and jj as ii[N_mis] and jj[N_mis] I get the following:

Error in new_CppObject_xp(fields$.module, fields$.pointer, …) :
Exception: mismatch in dimension declared and found in context; processing stage=data initialization; variable name=ii; position=0; dims declared=(10285); dims found=(7454) (in ‘model79fbfba1a_d0f40f95b93802bc6d0fcad905f25cf3’ at line 24)

A bit more insight here if possible?

syclik · February 23, 2018, 9:04pm

The error message is telling you that the data you passed in is the wrong size. I’ll try to break it down.

“mismatch in dimension declared and found”: it’s telling you that the size you declared and what it found were different.
“processing stage=data initialization”: it’s telling you this is when it’s initializing data, i.e. reading data from R / Python / command line
“variable name=ii”: check ii
“position = 0”: this might throw you off and is unimportant for this error. It’s telling you which dimension of the array that the problem was found.
“dims declared=(10285)”: in your program, it’s expecting ii to be length 10285
“dims found=(7454)”: in your data, check the size of ii. I bet it’s length 7454

The error messages contain a lot of info if you unpack them.

Bob_Carpenter · February 27, 2018, 7:41am

It’s telling you which line in the model is causing problems:

That’s always a good place to start debugging. (Just thought I’d call that out explicitly in addition to @syclik’s comments.

Panagiotis_Arsenis · February 28, 2018, 11:19am

Thank you for this helpful information.

Topic		Replies	Views
Missing data in a 2PL (IRT) model Modeling	37	4073	October 22, 2017
Missing data in Stan - some difficulties understanding Modeling	6	468	August 16, 2021
Missing response model (section 10.3 of Stan manual) Modeling	11	2306	May 24, 2017
Help with understanding my results Modeling fitting-issues , specification , performance	5	1077	February 8, 2019
How to circumvent defining a integer array in transformed parameter block Modeling specification , ecology , capture-recapture	3	4271	March 7, 2018

Fail to run due to relatively high number of missing observations

Related Topics