Syntactically correct STAN model not working

Hi All,

I have a program that, while syntactically correct, does not actually run in either rstan or CmdStan.

I took inspiration from the Mark-Recapture model in the Stan User Manual.

The specific error I receive is

SAMPLING FOR MODEL 'model' NOW (CHAIN 1).
R(55719,0x7fffa061a380) malloc: *** error for object 0x7f8eb43e7758: pointer being     freed was not allocated
*** set a breakpoint in malloc_error_break to debug
Error in unserialize(socklist[[n]]) : error reading from connection

SAMPLING FOR MODEL 'model' NOW (CHAIN 2).
R(55747,0x7fffa061a380) malloc: *** error for object 0x7fadc8024768: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug

The program is below.

data {
  int<lower=2> N; // number of sampled specimens
  int<lower=1> Hstar; // number of unique haplotypes
  int<lower=2> perms; // number of permutations
  int HAC_mat[perms]; // array of sampled haplotype labels
}

transformed parameters {
  real<lower=N> Nstar;
  vector<lower=1,upper=Hstar>[perms] H;
  Nstar = (N * Hstar) / mean(H);
}

model {
  HAC_mat ~ binomial(Hstar, N / Nstar);
}

Here is some fake data to try it out via RStudio:

N <- 10
Hstar <- 5
perms <- 5
HAC_mat <- c(3, 4, 1, 1, 2)

Any ideas here?

Alternatively, Iā€™m thinking to employ binomial_rng in a for loop, which will avoid having to specify a model.

Not sure if this is the entire problem, but one issue is that H doesnā€™t have a value but youā€™re trying to use it to calculate Nstar.

@jonah

Ah yes. Thanks for that.

I have removed the declaration for H (that was a mistake on my part) and would need instead Nstar to be

Nstar = (N * Hstar) / mean(HAC_mat);

However, the mean function does not work on integer arrays. I could change HAC_mat to be a real array (even though itā€™s not), but then the binomial distribution in the model block would obviously need to be changed.

I havenā€™t come across any solution in the Userā€™s Guide.

Any workaround here?

The one I see is to write

 Nstar = (N * Hstar) / (sum(HAC_mat) * 1.0 / perms)

However the program still doesnā€™t run, getting the same error as in my original post.

Does

mean(to_vector(HAC_mat))

work?

Also a few other comments/questions:

  • Your transformed parameters block doesnā€™t contain transformations of parameters but rather transformations of data. So you can move it to the transformed data block and it will run faster.

  • Not having a parameters block means you need to use the fixed_param version of the sampler. In RStan I think itā€™s stan(..., algorithm="Fixed_param"). In CmdStanR itā€™s $sample(fixed_param=TRUE).

  • Not having a parameters block or a generated quantities block means youā€™re not estimating anything or generating anything. What exactly are you trying to do here?

1 Like

Thanks Again.

Iā€™ll try to explain more clearly.

I am basing my model on the Lincoln-Peterson mark-recapture Stan code found in the User Guide.

Essentially, I have an estimate of Nstar from an R package of mine, but I donā€™t have a good idea as to its standard deviation (or other quantities of interest). Thus, I thought Stan might be a natural way to get this information.

Nstar is computed as

Nstar = (N * Hstar) / H

where N and Hstar are both integers, whereas H is a mean that takes on a real value between 1 and Hstar. H corresponds to the number of recaptures ( R ) in the mark-recapture model, except that H it is not an integer. HAC_mat contains ā€˜permsā€™ number integers {1, ā€¦, Hstar}.

I place a binomial prior on the elements of HAC_mat. Next, I take the mean of these values. Finally, I plug all the information into the above equation for Nstar, No other probability distribution seems to work (due to type declarations required).

I would then get all required information from the Stan output.

Let me know if this explanation helps.

As far as I can see the output of your model is just a function of the data:

(N * Hstar) / mean(HAC_mat)

This is a fixed number, and can be calculated in R, there is no need of Stan at all.

Yeah exactly. The Stan program is syntactically correct but itā€™s not estimating anything.

Here both N and Nstar are constants so thereā€™s nothing for Stan to learn (Also, if Nstar = (N * Hstar) / mean(HAC_mat) then isnā€™t N/Nstar just mean(HAC_mat)/Hstar, so you donā€™t need to define Nstar at all?). Because everything in this Stan program is just fixed to constant values, thereā€™s nothing for Stan to do here.

Your use case may indeed be suitable for Stan but the way the program is currently written you donā€™t need Stan at all, like @dlakelan said. So I think we might still be misunderstanding what you want to get out of Stan. If youā€™re willing to keep trying to explain it Iā€™m willing to keep listening and trying to help!

For that you can just use a vector instead of an integer array and it will let you use other distributions. If you need the object as both real numbers and integers then you can pass it in twice, using both the vector type and the int type. But thatā€™s not going to matter here unless thereā€™s something for Stan to estimate.

Thanks @jonah!

After some thinking and revisting the mark-recapture example in the Stan User Guide, I realize that I may be thinking too much on developing the most realistic model, while ignoring perhaps the most basic one.

Turns out that everything ā€œworksā€ if I adopt a poisson likelihood on N, even though itā€™s fully observed.

I suppose Iā€™m struggling then to see how the Lincoln-Peterson mark-recapture Stan model is useful (or even works as expected) despite the number of recaptures ( R ) also being fully specified as data.

Hereā€™s the link to the mark-recapture model in the Stan Userā€™s Guide:

Hereā€™s my Stan program.

data {
  int<lower=2> N;
  int<lower=1> Hstar;
  real<lower=0,upper=Hstar> H;
 int K; // for posterior predictive check
}
parameters {
  real<lower=N> Nstar;
}
model {
  N ~ poisson((Nstar * H) * 1.0 / Hstar);
}
generated quantities {
  real Nstar_tilde[K];
  for (i in 1:K) {
    Nstar_tilde[i] = poisson_rng((Nstar * H) * 1.0 / Hstar); // posterior predictive check
  }
}

Note above the inclusion of a generated quantities block to check that the model is doing a good job. As it stands, I simply return the statistic of interest, but should probably plot the individual histograms to see that the model is indeed capturing the same data as seen in the observations.

Any advice here going forward?