Specifying input data as array of integers from R to Stan

I have a model named Model2.stan that I think is correct, because I do not have any error when running with rstan:

stan_model(file=“Model2.stan”)

But when I try to fit my model, I have an error: PARSER FAILED TO PARSE INPUT COMPLETELY
STOPPED AT LINE 1
I am new in stan this is my first model, and I really do not understand this error. It seems that it should be a problem with my input data, maybe the data specified in R do not have the good format. In particular I should specify arrays of integers. I tried several syntax in R without any success. Following is the model, the specified data and the fitting command which ends with an error.
The model:

//
// The input data: 
// 'n0' an array of integers,
// 'n' an array if integers, 
// 'n0' and 'n'  of length 'N'.
data {
  int<lower=1> N;
  int<lower=0> n0[N];
  int<lower=0> n[N];
}

// The parameters of the model.
parameters {
  vector<lower=0>[N] lambda;
}
// Transformed parameters. Here theta and delta calculated from lambda
transformed parameters {
  vector[N] theta;    
  vector[N] delta; 
//  real delta[N];
  theta = cumulative_sum(lambda);
  delta = exp(-theta);
}
// The model to be estimated. 
model {
  vector[N] logitlambda;
  real temp;
  logitlambda = logit(lambda);
  // prior
  for (i in 1:61){
      logitlambda[i] ~ normal(0, 1000);
  } 
  temp ~ normal(0, 1000);
  for (i in 62:N){
      logitlambda[i] = temp;
  }  
  n0 ~ binomial(n,delta);
}

The data specified in R:

n0 <- array(as.integer(c(6,4,5,6,6,1,6,4,8,1,6,9,5,8,10,7,10,7,22,11,22,52,34,22,23,22,30,31,30,32,34,34,39,17,27,24,26,40,16,22,22,30,25,21,20,22,24,22,22,21,26,27,10,22,14,17,16,11,23,5,25,33,9,16,7,13,8,6,3,9,11,13,4,5,4,8,1,5,4,0,3,6,1,0,2,1,0,0,1,0,0,1,0,0,1)))
n <- array(as.integer(c(6,4,5,6,6,1,6,4,8,1,6,9,5,8,1,7,10,7,22,11,22,52,34,22,23,22,30,31,30,32,34,34,39,17,27,24,26,40,16,22,22,30,25,21,20,22,24,22,22,21,26,27,10,22,14,17,16,11,23,5,25,33,9,16,7,13,8,6,3,9,11,13,4,5,4,8,1,5,4,0,3,6,1,0,2,1,0,0,1,0,0,1,0,0,1)))

dat = list(
  N = length(n0),
  n0 = n0,
  n = n
)

And the fitting ending with an error:

fit = stan(
  model_code = "Model2.stan",
  data   = dat,
  iter   = 200,
  warmup = 100,
  chains = 4,
  seed=4938483
)

Any help would be greatly appreciated, because this is my first modeling with stan and I’m starting to get discouraged (I had problems specifying my model too).

I don’t think there is any issue with your model syntax or your data. I used your code to estimate the model using cmdstanr and I don’t get the same error. However, lambda looks like it should be a probability but is only bounded from below. When lambda is greater than 1, logit(lambda) and in turn logitlambda[i] ~ normal(0, 1000) are undefined. It isn’t obvious that the error you’re getting is related to this issue, but you might add an upper bound to lambda and see if it solves it.

In general, I’d recommend moving from RStan to cmdstanr if you can. I believe it is the most up-to-date. I’ve found it to be more stable and clearer when it runs into problems. And, thankfully, the installation can be done entirely from within R at this point. Getting started with CmdStanR • cmdstanr

One more thing I noticed is that temp is declared in the model block but it isn’t given a value. You either need to assign it a value or move it to the parameters block.

One last point: the 15th observation is invalid. It is equivalent to

10 \sim binomial(1, p)

That might also be causing the problem

Thanks a lot Simon!
I move to cmdstanr. Then I put the temp parameter in the parameters block. And I use a more specific gaussian distribution for the priors (sd of 10 instead of 1000).
It is now running without any error, here is the final model:

// The input data: 
// 'n0' and 'n'  of length 'N' the number of ages observed in the sample.
data {
  int<lower=1> N;
  array[N] int<lower=0> n0; //int<lower=0> n0[N];
  array[N] int<lower=0> n;//int<lower=0> n[N];
}

// The parameters of the model.
parameters {
  vector<lower=0,upper=1>[N] lambda;
  real temp;
}
// Transformed parameters. Here theta and delta calculated from lambda
transformed parameters {
  vector[N] theta;    
  vector[N] delta; 
//  real delta[N];
  theta = cumulative_sum(lambda);
  delta = exp(-theta);
}
// The model to be estimated. 
model {
  vector[N] logitlambda;
  logitlambda = logit(lambda);
  // prior
  for (i in 1:61){
      logitlambda[i] ~ normal(0, 10);
  } 
  temp ~ normal(0, 10); //uniform(-11,3)
  for (i in 62:N){
      logitlambda[i] = temp;
  }  
  n0 ~ binomial(n,delta);
}

But there is something I do not understand: I want all the components of lambda (and hence of logitlambda to be the same from the 62th to the last one (N=95). That is why I wrote in my model

temp ~ normal(0, 10); //uniform(-11,3)
  for (i in 62:N){
      logitlambda[i] = temp;
  }  

But when I look at the output, I do not have the same results for the components 62 to 95 (and they are quite different from one component to another). Do you think the way I wrote the model does not correspond to what I want?

Glad it helped! Per your model, I don’t intuitively get the distinction between lambda, theta, and delta, nor why you want 62:N to be constant. If you could provide some explanation of what you’re trying to do with the model then that might help.

Either way, let me make sure I understand what’s happening…

  1. You have a vector of probabilities lambda, one for each observation N
  2. On one end, you’re transforming the [0,1] probabilities lambda into (-Inf, Inf) real numbers logitlambda and you are putting normal priors on logitlambda
  3. On the other end, you are taking a cumulative sum of the probabilities lambda and storing them as theta
  4. Finally, you take the exponentiation of the negative cumulative sum (exp(-theta)) and storing them as delta. These are then the probabilities that you’re using to model your data

So there is something about the cumulative probabilities that relate to the outcome.

Based on what I’m seeing, I think I have a few reasonable suggestions on how to get what you’re looking for.

First, you should put logitlambda in the parameters section and then transform them to lambda (and so on). In Stan, you need to adjust by the Jacobian if you make distributional assumptions of non-linear transforms of parameters (see here). I am not an expert on this, but I think it applies to this case, where lambda is the parameter and logitlambda is the non-linearly transformed parameters. Note that your model might run without this change and appear to work, but it might end up giving you the wrong answer because it is sampling from the wrong posterior.

Second, on further inspection, temp isn’t doing anything in your model. To put it one way, it never works its way backwards to affect the data model. (bad description, but best way I can describe it). So that’s just to say that it isn’t achieving what you want in creating a constant probability from 62 onwards.

Finally, I would suggest only specifying 62 elements of lambda/logitlambda rather than N.

So, putting it all together, here are some of the pieces you’ll want to consider…

parameters{
    // 1. specify logitlambda as a parameter
    // 2. only specify 62 elements of logitlambda 
    vector[62] logitlambda; 
}
transformed parameters{
    // 3. treat lambda as a transformed parameter
    vector[N] lambda;

    lambda[1:61] = inv_logit(logitlambda[1:61]);
    // 4. Assign the last element of logitlambda to the last 62:N elements of lambda
    lambda[62:N] = inv_logit(logitlambda[62]);
}
model {
    // 5. Put a prior on logitlambda; no need for temp
    logitlambda ~ normal(0, 10);
}

Hopefully this provides some helpful hints for future models as well. The Jacobian/transform issue comes up regularly. Best of luck as you dive deeper into Stan.

1 Like

Thank you Simon. You explained what I had the intuition for: logitlambda should probably be in the parameter section and temp never works its way backwards.
It is beginning to be quite interesting from a modelling point of you, as I really need the last components of lambda to be equal.
As you answer my intial question, I suggest that I close this post and I will begin a new one in modeling, in which I will provide some explanation on what I want to do. Maybe you can put your answer on this new post. Thnks again for your help, it is encouraging to use stan!

1 Like