Dynamic vector size within model


#1

Hi,
I am trying to go with a model but I am missing something and I am pretty sure it is straight forward but can’t seem to figure it out.
I am trying to do a model where one “vector” has a dynamic size:

library(rstan)

model.code <- "

data{
int<lower=1> T;
real<lower=0> median;
int s;
real y[T];
}
parameters {
real<lower=0> p;
real<lower=0> n;
//int s;
real<lower=0> delta;
real<lower=0> tau;
real<lower=0> t0[s];
real<lower=0> sigma;
real<lower=0> mu;



}
model{
 real ps[T];
 //p ~ beta(1,1);
 //n ~ uniform(0,5);
 //s ~ binomial(n,p); 
 delta ~ uniform(0,5);
 t0 ~ uniform(1,T);
 tau ~ inv_gamma(0.001,0.001) ;
 for (i in 1:T){
  ps[i] =  normal_lpdf( y[i] | mu,sigma );
  for(j in 1:s){
    if(t0[j] <= i){
      ps[i] = ps[i] + normal_lpdf( y[i] | median * delta * exp(-( i - t0[j] ) / tau),sigma );
     }
   }
 }
target += log_sum_exp(ps);
}
"
data <- rnorm(100)+5*rpois(100,1) // bad dummy data (should have some exp. decay)
med <- median(data)
model.dat <- list(T=100,median=med,s=5,y=data)
stan.fit <- stan(model_code=model.code,
                 model_name="lm_decay",
                 data=model.dat)

As you can see I have commented the part with the binomial part and added it as a parameter:

 //p ~ beta(1,1);
 //n ~ uniform(0,5);
 //s ~ binomial(n,p); 

However, I’d like not to comment it and have it as part of my model.
Is that possible?


#2

Not directly, why do this thing?


#3

To model an unknown number of exponential decay shock in a time serie model
and have a second shock when the first is not finished.
You could model the number of “parallel” possible shocks as a binomial.


#4

So n can only be an integer, not a real number,and only takes on values 0, 1, …, 5, which makes it easy to sum out. Check out th emixture chapter.


#5

Thanks @sakrejda.
n is a natural number ℕ = {0, 1, 2, …,T} where T is the sample size.
I am trying to follow the example but something is wrong:

model.code <- "
functions {
  int num_zero(int[] y) {
    int nz;
    nz = 0;
    for (n in 1:size(y)){
      if (y[n] == 0){
        nz = nz + 1;
      }
    }
    return nz;
  }
}
data{
int<lower=1> T;
real<lower=0> median;
real y[T];
int<lower=0> N_trials;
}
transformed data {
  int<lower=0, upper=T> trials[N_trials];
  int<lower=0, upper=T> s;
  s = num_zero(trials);
}
parameters {
real<lower=0> p_trials;
real<lower=0> delta;
real<lower=0> tau;
real<lower=0> sigma;
real alpha;
real<lower=0> t0[s];

  


}
model{
 real ps[T];
 alpha ~ normal(0,10);    
 sigma ~ cauchy(0,5);
 p_trials ~ beta(1,1);
 trials ~ bernoulli(p_trials); 
 delta ~ uniform(0,5);
 t0 ~ uniform(1,T);
 tau ~ inv_gamma(0.001,0.001);
 for (i in 1:T){
  ps[i] = 0;
  for(j in 1:s){
    if(t0[j] <= i){
      ps[i] = ps[i] +  median * delta * exp(-( i - t0[j] ) / tau);
     }
   }
 }
 for (nn in 1:T){
  y[nn] ~ normal(alpha  + ps[nn]  , sigma);
 }
}
"
data <- rnorm(100)+5*rpois(100,1)
med <- median(data)
model.dat <- list(T=100,median=med,N_trials=5,y=data)
stan.fit <- stan(model_code=model.code,
                 model_name="lm_decay",
                 data=model.dat)

I am just doing a Bernoulli and summing the 0 (I could sum the 1 but for the sake of the example it is the same) and storing the result in s:

int<lower=0, upper=T> trials[N_trials];
int<lower=0, upper=T> s;
s = num_zero(trials);
...
 p_trials ~ beta(1,1);
 trials ~ bernoulli(p_trials); 

And .I am getting this error:

Error in new_CppObject_xp(fields$.module, fields$.pointer, ...) : 
  model15dbd43ae3188_lm_decay_namespace::model15dbd43ae3188_lm_decay: trials[k0__] is -2147483648, but must be greater than or equal to 0
trying deprecated constructor; please alert package maintainer
Error in new_CppObject_xp(fields$.module, fields$.pointer, ...) : 
  no valid constructor available for the argument list
failed to create the sampler; sampling not done

What am I missing?


#6

trials is undefined here.


#7

A vector is a data type in Stan—you’re talking about arrays here.

Stan doesn’t support integer parameters—any integers need to be marginalized out.

Also, the number of parameters needs to remain fixed across iterations. Data and transformed data are fixed once after data load. Every other vector size is dynamic.

So there’s really not a way to handle the kind of model you’re trying to build.

We have alternative prior recommendations to BUGs (which you appear to be follwoing with diffuse inverse gammas and uniform on constrained intervals)—their recommendations can lead to both statistical and computational problems. The manual chapter on regression has details and reference, and there’s a wiki we have on priors.


#8

Thanks a lot for all your help.
I am trying to adapt my model such as I won’t have this problem.
One of the thing I was thinking of is allowing peak at every point but with a prior that is concentred at 0.
Initially I was thinking of doing a prior that is a mixture model but I don’t think it will work.
Really what I want is to have a heaviside step function with a dynamic “x” and when it is not 0 multiply it by a gamma prior.
I understand it is a bit hard to understand.
I am basically trying to fit something like that:


generate_data <- function(n_point=100,noise_mu=1000,noise_sigma=5,n_peak=5,coef_shape=5,coef_rate=1,tau_min=3,tau_max=10){
  idx <- rbinom(n_peak,n_point, runif(n_peak,0,1))
  noise <-  rnorm(n_point,noise_mu,noise_sigma)
  init_data <- noise 
  coef_delta <- rgamma(n_peak,coef_shape,coef_rate)
  tau <- runif(n_peak,tau_min,tau_max)
  data <- init_data
  for (i in seq_along(idx)){
    v <- idx[i]
    delt <- coef_delta[i]
    t <- tau[i]
    data <- data + median(init_data) * delt * as.numeric(seq(1,n_point) >= v ) * c( rep(0,v-1),   exp(-c(seq(0,n_point-v)/t    )))   
  }
  list(
    data = data,
    tau = tau,
    delta = coef_delta,
    peak_idx = idx,
    noise = noise
  )
}

Where any of the parameters from the function could change.
i.e.


generated <- generate_data()

data <- generated$data

plot(data,t="l")

The model I came up with is this one:


model.code <- "

data{
int<lower=1> T;
real y[T];
}

transformed data{
  real<lower=0> median; //median of y
  int middle; // idx of middle data
  real y_sorted[T];
  middle = T / 2;
  y_sorted= sort_asc(y);
  if (T % 2)
	{
		median = (y_sorted[middle] + y_sorted[middle + 1]) / 2.0;
	}
	else
	{
		median = y_sorted[middle ];
	}
}

parameters {
  real<lower=0> delta[T];
  real<lower=0.0000001> tau[T];
  real<lower=0> sigma;
}

transformed parameters {
  vector[T] decay;                  // decay
  for (i in 1:T){
    decay[i] = 0;
    for(j in 0:(i-1)){
        decay[i] = decay[i] +  median * delta[i-j] * exp(-j / tau[i-j]);
    }
  }
}

model{
 real ps[T];
 sigma ~ cauchy(0,5);
 delta ~ exponential(1);
 y ~ normal( decay  , sigma);
 
}
generated quantities{
 real ps_rep[T];
 real y_rep[T];
  for (i in 1:T){
  ps_rep[i] = 0;
  for(j in 0:(i-1)){
      ps_rep[i] = ps_rep[i] +  median * delta[i-j] * exp(-j / tau[i-j]);
   }
  }

 for (nn in 1:T){
  y_rep[nn] =  ps_rep[nn] ; 
 }


}
"

However, as you can see I am not really handling the delta as I should.
And I end up having my sampler not happy:

model.dat <- list(T=100,y=data)
stan.fit <- stan(model_code=model.code,
                 model_name="lm_decay",
                 data=model.dat)
There were 4000 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmupThere were 3 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-lowExamine the pairs() plot to diagnose sampling problems

More importantly the fit is just bad

library(bayesplot)
y_rep <- rstan::extract(stan.fit,"y_rep")
ppc_dens_overlay(y = data,
                 yrep = y_rep[[1]][1:100,])

Could someone advise me on how I could solve this or just give me a hint?


#9

I’m afraid that’s too much to digest. It looks like a standard generative model until you get to the definition of data. I couldn’t follow the medians or the R.


#10

Thanks @Bob_Carpenter to take the time to at least read it.
I will work on it and try to figure it by myself.
What I am trying to model is shocks that follow an exponential decay model.
I am using the median * delta as the amplitude of the shocks (the idea was that delta tell me the order of magnitude of the shocks). tau control the length of it.
Therefore, the idea is that you have some “noise” (modeled by a normal distribution in R) and at some random point you have shocks ( median * delta(t-t0) * exp(-(t-t0)/tau) where median * delta(t-t0) is the amplitude and tau is the decay) that are additive (you can have a second shocks while the “first” shock is not finished).
If for example you think about it as modeling website traffic. You have the “normal” traffic that is modeled with the “noise” data (the name is all except intuitive ahaha) and at some random point of time you’re having an article the goes viral that generate a shock that follow an exponential decay. Assuming you post new article everyday you can virtually have new shocks everyday (but it is very unlikely). Therefore, I need a model that is flexible enough to allow new shocks at every point while keeping track of the “active” shocks.
Initially I wanted to allow the number of shocks to be drawn from a prior but from your previous answer it is not possible.
Therefore, I thought of allowing new shock at each points and having delta being a mixture concentrated at 0 and a gamma distribution (so you’ll not have new shock at every points) but I didn’t really like this option either. I guess I could also add a Bernoulli as my indicator if there is a shock at each point (i.e. s(t-t0) * median * delta(t-t0)).


#11

But you can have a large number of them and a Dirichlet prior that favors subsetting and get a close approximation of a Dirichlet Process. It’s just very hard to fit (and would be even if we implemented DPs in the usual way).

Thanks, the description was helpful.

This gets hard to fit because of the combinatorics if you have a bunch of indicators and the outputs interact (as in behaving as regression coefficients). Something like Dirichlet(1e-4) really likes to have a lot of near-zero values in the simplexes it generates. They’ll round to zero with floating point in many cases.


#12

Thanks a lot again for answering me.
From what you said:

Am I missing an easier solution for such cases?
Additionally, I know that there are lot of resources for stan and was wondering if there was some code/litterature somewhere I could be using that could help me.
I have tried to run some code but I am a bit stuck on how to do the subsetting part.
Where I am where I know it does not work…


parameters {
  real<lower=0.0000001> tau[T];
  real<lower=0> sigma;
  simplex[2] mix[T];
}
transformed parameters {
  vector[T] decay;                  // decay
  real delta[T];
  for (i in 1:T){
    delta[i] = log(exp(log(mix[i,1])+gamma_lpdf(y[i]|5,1)) + exp(log(mix[i,2]) * 0.0000000001)); //this is for sure not good..
  }
  for (i in 1:T){
    decay[i] = 0;
    for(j in 0:(i-1)){
        decay[i] = decay[i] +  median * delta[i-j] * exp(-j / tau[i-j]);
    }
  }
}

#13

The basic idea is that you just have a mixture model with K components where K is larger than you think you need. But now that I think harder about it, I don’t see how to map it onto the impulse placement problem you’re dealing with.

It might work if you just assume there are a large number K of impulses (more than you actually see), then you have

parameters {
  simplex[K] theta;  // mixture ratios
  ...
model {
  theta ~ dirichlet(rep_vector(1e-4, K));
  ...

That is you model it as a mixture of K impulses. But many of the theta[i] will be zero, so it’ll wind up not quite being like K impulses. No idea if that’ll work. Usually it’s a straight mixture problem without the ordering/spline basis.


#14

Thanks bob,
Sadly it does not look to work well.
As you suspected the theta get mostly on the first parameters.

Another way I can generate data is as a mixture of 0 and a gamma

generate_data_2 <- function(n_point=100,noise_mu=1000,noise_sigma=5,coef_shape=5,coef_rate=1,tau_min=3,tau_max=10,prop.shock=0.02){
  
  noise <-  rnorm(n_point,noise_mu,noise_sigma)
  init_data <- noise
  #create mixture of 0 and gamma
  mix <- density(c(rgamma(1000000*prop.shock,coef_shape,coef_rate) , rbeta(1000000*(1-prop.shock),0,1)))
  #draw from it
  delta <- sample(mix$x,n_point,prob=mix$y,replace=T)
  tau <- runif(n_point,tau_min,tau_max)
  data <- init_data
  for (i in seq(n_point)){
    for (j in 0:(i-1)){
      data[i] <- data[i] +  median(init_data) * delta[i-j] * exp(-j / tau[i-j]);
    }
  }
  list(
    data = data,
    tau = tau,
    delta = delta,
    noise = noise
  )
}

I wonder if I could translate that into stan using some kind of mixture prior.


#15

Then you probably don’t want to use a K that’s so large.


#16

Aren’t you already using a mixture prior? I thought that’s what we were talking about.


#17

I think overall I am a bit confused ahaha.
What I am trying to model is shocks that follow this:
Given a shock i we have δ(t−t0(i))∗C(i)∗exp(−(t−t0(i))/τ(i))
Where δ(t−t0(i)) is the amplitude of the shock that follow image.
C(i) is the initial value (i.e. median of the data). t0(i) is the initial time of the shock and τ(i) is the decay of the shock i.
Therefore assuming you can have multiple shocks we have:
image
where n is the total number of shocks.
The alternative that I initially thought about was to change this and have shocks at each point.
In order to do that I change δ as follow image.
Therefore we will have a bunch of δ that are 0 that allow the subsetting.
Which means we now have:
image
Where I am confused is how to model δ in stan such that I can do prediction on out of sample data.


#18

To do prediction out of sample, you need to generate from all the parameters distributions. Just code the simulation in the generated quantities block. For instance, in a normal, to predict new points you’d do this:

generated quantities {
  vector[N_new] y;
  for (n in 1:N_New) y[n] = normal_rng(mu, sigma);
}

If your generative process is more complex, you need to code that. For instance, in a regression, mu is replaced with x_new * beta where x_new is the data matrix for the new observations and beta is the coefficient vector.


#19

I was curious about this as I also use impulse functions, though at the moment in an observed form. Maybe I’ve read through too quickly, but, assuming you’re working in discrete time, couldn’t you just sample the number of impulses per time point from a binomial distribution, and multiply your delta effect by that? Or collapse your ‘number of deltas’ and ‘magnitude of deltas’ parameters into one, and use a continuous distribution instead of a binomial.


#20

Hi @Charles_Driver,
The problem with using the binomial is that you need two things:

  1. Number of shocks/impulse
  2. start time of the impulse.

However, this would result in dynamic vector size which would be dictated by the number of shocks you sampled hence my initial topic :-).
I am not sure what you mean by collapse the number of delta and magnitude of delta.