Equivalent of pyro `mask`?

enetsee · July 12, 2019, 3:31pm

There is an interesting blog post from Uber engineering showing how to vectorize a Pyro model using pyro.plate and pyro.mask.

An non-optimized Stan version of the same model is given below.

In the Pyro version, pyro.mask allows each branch of the conditional in the for loop to be vectorized. Is there an equivalent approach in Stan?

Cheers,

Micky

functions {
  vector softplus(vector x) {
    return log1p(exp(x));
  }
}

data {
    int<lower=0> N;
    vector[N] x;
    vector[N] y_obs;
    int<lower=0,upper=1> censored[N];
}

parameters { 
  real alpha;
  real beta;
}

transformed parameters {
  vector[N] scale;
  scale = softplus(alpha * x + beta);
}


model { 
  alpha ~ normal(0,10);
  beta ~ normal(0,10);
  
  for(i in 1:N) {
    if(censored[i]) {
      censored[i] ~ bernoulli_lpmf(1.0 - exponential_cdf(y_obs[i],1/ scale[i]));
    } else {
      y_obs[i] ~ exponential(1/ scale[i]);
    }
  }
}

enetsee · July 13, 2019, 11:49am

To answer my own question, you can achieve this by manually transforming the data into two vectors for censored and uncensored observations.

functions {
  vector softplus(vector x) {
    return log1p(exp(x));
  }
}

data {
  int<lower=0> N;
  vector[N] x;
  vector[N] y_obs;
  int<lower=0,upper=1> censored[N];
}

transformed data {
  int<lower=0> N_censored = sum(censored);
  int<lower=0> N_uncensored = N - N_censored;
  
  int<lower=1,upper=N_censored+1> c;
  int<lower=1,upper=N_uncensored+1> u;
  
  vector[N_censored] x_censored;
  vector[N_censored] y_censored;
  vector[N_uncensored] x_uncensored;
  vector[N_uncensored] y_uncensored;
  
  u = 1;
  c = 1;
  for(i in 1:N){
    if(censored[i]){
      x_censored[c] = x[i];
      y_censored[c] = y_obs[i];
      c += 1;
    } else {
      x_uncensored[u] = x[i];
      y_uncensored[u] = y_obs[i];
      u += 1;
    }
  }
}

parameters { 
  real alpha;
  real beta;
}

transformed parameters {
  vector[N_uncensored] scale_uncensored;
  vector[N_censored] scale_censored;
  vector[N_censored] q;
  
  scale_uncensored = softplus(alpha * x_uncensored + beta);
  scale_censored = softplus(alpha * x_censored + beta);
  
  for(i in 1:N_censored){
    q[i] = exponential_cdf(y_censored[i], 1 / scale_censored[i]);
  }
}

model { 
  alpha ~ normal(0,10);
  beta ~ normal(0,10);
  1 ~ bernoulli(1 - q);
  y_uncensored ~ exponential(1 ./ scale_uncensored);
}

Note that I’m still looping in the transformed data block to populate q. I’m getting a dimension mismatch error when I the the following:

q = exponential_cdf(y_censored, rate_censored);

Dimension mismatch in assignment; variable name = q, type = vector; right hand side type = real.

The functions reference indicates that cdf functions support vectorization so I’m clearly doing something wrong - any ideas?

Cheers,

Micky

martinmodrak · July 15, 2019, 9:38am

The message is telling you that exponential_cdf returns a single number, even if the parameters are vectors. The single number is the multiple of all _cdf calls for each of the vector elements. I am not sure why that is or what it is useful for. For the _lpdf functions the vectorized versions return a single number that is the sum of the log probabilities over the vector, which makes sense, so maybe this is by analogy?

Topic		Replies	Views
Converting PYMC to Stan Model Modeling	0	328	August 20, 2023
Vectorized if-else? Modeling	2	995	June 25, 2019
Transformed parameters with target += Modeling techniques	1	675	January 11, 2019
How to vectorize a for loop in Stan? Modeling	2	360	April 6, 2023
Help with vectorizing Stan program Modeling	9	759	January 1, 2021

Equivalent of pyro `mask`?

Related topics