Syntactically correct STAN model not working

Hi All,

I have a program that, while syntactically correct, does not actually run in either rstan or CmdStan.

I took inspiration from the Mark-Recapture model in the Stan User Manual.

The specific error I receive is

SAMPLING FOR MODEL 'model' NOW (CHAIN 1).
R(55719,0x7fffa061a380) malloc: *** error for object 0x7f8eb43e7758: pointer being     freed was not allocated
*** set a breakpoint in malloc_error_break to debug
Error in unserialize(socklist[[n]]) : error reading from connection

SAMPLING FOR MODEL 'model' NOW (CHAIN 2).
R(55747,0x7fffa061a380) malloc: *** error for object 0x7fadc8024768: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug

The program is below.

data {
  int<lower=2> N; // number of sampled specimens
  int<lower=1> Hstar; // number of unique haplotypes
  int<lower=2> perms; // number of permutations
  int HAC_mat[perms]; // array of sampled haplotype labels
}

transformed parameters {
  real<lower=N> Nstar;
  vector<lower=1,upper=Hstar>[perms] H;
  Nstar = (N * Hstar) / mean(H);
}

model {
  HAC_mat ~ binomial(Hstar, N / Nstar);
}

Here is some fake data to try it out via RStudio:

N <- 10
Hstar <- 5
perms <- 5
HAC_mat <- c(3, 4, 1, 1, 2)

Any ideas here?

Alternatively, I’m thinking to employ binomial_rng in a for loop, which will avoid having to specify a model.

Not sure if this is the entire problem, but one issue is that H doesn’t have a value but you’re trying to use it to calculate Nstar.

@jonah

Ah yes. Thanks for that.

I have removed the declaration for H (that was a mistake on my part) and would need instead Nstar to be

Nstar = (N * Hstar) / mean(HAC_mat);

However, the mean function does not work on integer arrays. I could change HAC_mat to be a real array (even though it’s not), but then the binomial distribution in the model block would obviously need to be changed.

I haven’t come across any solution in the User’s Guide.

Any workaround here?

The one I see is to write

 Nstar = (N * Hstar) / (sum(HAC_mat) * 1.0 / perms)

However the program still doesn’t run, getting the same error as in my original post.

Does

mean(to_vector(HAC_mat))

work?

Also a few other comments/questions:

  • Your transformed parameters block doesn’t contain transformations of parameters but rather transformations of data. So you can move it to the transformed data block and it will run faster.

  • Not having a parameters block means you need to use the fixed_param version of the sampler. In RStan I think it’s stan(..., algorithm="Fixed_param"). In CmdStanR it’s $sample(fixed_param=TRUE).

  • Not having a parameters block or a generated quantities block means you’re not estimating anything or generating anything. What exactly are you trying to do here?

1 Like

Thanks Again.

I’ll try to explain more clearly.

I am basing my model on the Lincoln-Peterson mark-recapture Stan code found in the User Guide.

Essentially, I have an estimate of Nstar from an R package of mine, but I don’t have a good idea as to its standard deviation (or other quantities of interest). Thus, I thought Stan might be a natural way to get this information.

Nstar is computed as

Nstar = (N * Hstar) / H

where N and Hstar are both integers, whereas H is a mean that takes on a real value between 1 and Hstar. H corresponds to the number of recaptures ( R ) in the mark-recapture model, except that H it is not an integer. HAC_mat contains ā€˜perms’ number integers {1, …, Hstar}.

I place a binomial prior on the elements of HAC_mat. Next, I take the mean of these values. Finally, I plug all the information into the above equation for Nstar, No other probability distribution seems to work (due to type declarations required).

I would then get all required information from the Stan output.

Let me know if this explanation helps.

As far as I can see the output of your model is just a function of the data:

(N * Hstar) / mean(HAC_mat)

This is a fixed number, and can be calculated in R, there is no need of Stan at all.

Yeah exactly. The Stan program is syntactically correct but it’s not estimating anything.

Here both N and Nstar are constants so there’s nothing for Stan to learn (Also, if Nstar = (N * Hstar) / mean(HAC_mat) then isn’t N/Nstar just mean(HAC_mat)/Hstar, so you don’t need to define Nstar at all?). Because everything in this Stan program is just fixed to constant values, there’s nothing for Stan to do here.

Your use case may indeed be suitable for Stan but the way the program is currently written you don’t need Stan at all, like @dlakelan said. So I think we might still be misunderstanding what you want to get out of Stan. If you’re willing to keep trying to explain it I’m willing to keep listening and trying to help!

For that you can just use a vector instead of an integer array and it will let you use other distributions. If you need the object as both real numbers and integers then you can pass it in twice, using both the vector type and the int type. But that’s not going to matter here unless there’s something for Stan to estimate.

Thanks @jonah!

After some thinking and revisting the mark-recapture example in the Stan User Guide, I realize that I may be thinking too much on developing the most realistic model, while ignoring perhaps the most basic one.

Turns out that everything ā€œworksā€ if I adopt a poisson likelihood on N, even though it’s fully observed.

I suppose I’m struggling then to see how the Lincoln-Peterson mark-recapture Stan model is useful (or even works as expected) despite the number of recaptures ( R ) also being fully specified as data.

Here’s the link to the mark-recapture model in the Stan User’s Guide:

Here’s my Stan program.

data {
  int<lower=2> N;
  int<lower=1> Hstar;
  real<lower=0,upper=Hstar> H;
 int K; // for posterior predictive check
}
parameters {
  real<lower=N> Nstar;
}
model {
  N ~ poisson((Nstar * H) * 1.0 / Hstar);
}
generated quantities {
  real Nstar_tilde[K];
  for (i in 1:K) {
    Nstar_tilde[i] = poisson_rng((Nstar * H) * 1.0 / Hstar); // posterior predictive check
  }
}

Note above the inclusion of a generated quantities block to check that the model is doing a good job. As it stands, I simply return the statistic of interest, but should probably plot the individual histograms to see that the model is indeed capturing the same data as seen in the observations.

Any advice here going forward?