Equivalent of pyro `mask`?

There is an interesting blog post from Uber engineering showing how to vectorize a Pyro model using pyro.plate and pyro.mask.

An non-optimized Stan version of the same model is given below.

In the Pyro version, pyro.mask allows each branch of the conditional in the for loop to be vectorized. Is there an equivalent approach in Stan?

Cheers,

Micky

functions {
  vector softplus(vector x) {
    return log1p(exp(x));
  }
}

data {
    int<lower=0> N;
    vector[N] x;
    vector[N] y_obs;
    int<lower=0,upper=1> censored[N];
}

parameters { 
  real alpha;
  real beta;
}

transformed parameters {
  vector[N] scale;
  scale = softplus(alpha * x + beta);
}


model { 
  alpha ~ normal(0,10);
  beta ~ normal(0,10);
  
  for(i in 1:N) {
    if(censored[i]) {
      censored[i] ~ bernoulli_lpmf(1.0 - exponential_cdf(y_obs[i],1/ scale[i]));
    } else {
      y_obs[i] ~ exponential(1/ scale[i]);
    }
  }
}

1 Like

To answer my own question, you can achieve this by manually transforming the data into two vectors for censored and uncensored observations.

functions {
  vector softplus(vector x) {
    return log1p(exp(x));
  }
}

data {
  int<lower=0> N;
  vector[N] x;
  vector[N] y_obs;
  int<lower=0,upper=1> censored[N];
}

transformed data {
  int<lower=0> N_censored = sum(censored);
  int<lower=0> N_uncensored = N - N_censored;
  
  int<lower=1,upper=N_censored+1> c;
  int<lower=1,upper=N_uncensored+1> u;
  
  vector[N_censored] x_censored;
  vector[N_censored] y_censored;
  vector[N_uncensored] x_uncensored;
  vector[N_uncensored] y_uncensored;
  
  u = 1;
  c = 1;
  for(i in 1:N){
    if(censored[i]){
      x_censored[c] = x[i];
      y_censored[c] = y_obs[i];
      c += 1;
    } else {
      x_uncensored[u] = x[i];
      y_uncensored[u] = y_obs[i];
      u += 1;
    }
  }
}

parameters { 
  real alpha;
  real beta;
}

transformed parameters {
  vector[N_uncensored] scale_uncensored;
  vector[N_censored] scale_censored;
  vector[N_censored] q;
  
  scale_uncensored = softplus(alpha * x_uncensored + beta);
  scale_censored = softplus(alpha * x_censored + beta);
  
  for(i in 1:N_censored){
    q[i] = exponential_cdf(y_censored[i], 1 / scale_censored[i]);
  }
}

model { 
  alpha ~ normal(0,10);
  beta ~ normal(0,10);
  1 ~ bernoulli(1 - q);
  y_uncensored ~ exponential(1 ./ scale_uncensored);
}

Note that I’m still looping in the transformed data block to populate q. I’m getting a dimension mismatch error when I the the following:

q = exponential_cdf(y_censored, rate_censored);

Dimension mismatch in assignment; variable name = q, type = vector; right hand side type = real.

The functions reference indicates that cdf functions support vectorization so I’m clearly doing something wrong - any ideas?

Cheers,

Micky

1 Like

The message is telling you that exponential_cdf returns a single number, even if the parameters are vectors. The single number is the multiple of all _cdf calls for each of the vector elements. I am not sure why that is or what it is useful for. For the _lpdf functions the vectorized versions return a single number that is the sum of the log probabilities over the vector, which makes sense, so maybe this is by analogy?