Integer array in parameters block

Hi, I’m a novice and currently working on a project involving IBP stick-breaking. However, I have encountered an issue where the Bernoulli distribution in the model block requires the int matrix to be in the parameters block. I was wondering if there are any methods that I can utilize to avoid this situation.

Below are the pieces of code that I have written:

v_l \sim beta(\alpha,1),

\theta_{(k)} = \prod_{l=1}^k v_l,

\gamma_{jk}|\theta_{(k)} \sim Bernoulli[\theta_{(k)}].

Where j=1,...,G, k=1,...K.

parameters {
  ...
  vector<lower=0, upper=1>[K] v;
  int<lower=0, upper=1> gamma[G,K];
  ...
}

transformed parameters{
  
  simplex[K] theta;
  {
    theta[1] = v[1];
    
    for(k in 2:K){
      theta[k] = theta[k-1] * v[k];
    }
    
  }
  
}


model {
  
  v ~ beta(alpha, 1);
  
  for(j in 1:G){
    for(k in 1:K){
      gamma[j,k] ~ bernoulli(theta[k]);
    }
  }
  
  ...

}

Thanks.

What’s not clear from this question is how the parameter gamma gets used in the model. In general, when you want to include integer parameters, the solution in Stan is to marginalize over those parameters (which often leads to a more computationally efficient model regardless of whether you are using Stan or not). But whether this marginalization is feasible and how this marginalization can be implemented depends on the details of how the likelihood is computed conditional on gamma. If the likelihood can be factorized as the product of likelihoods associated with particular observations (or groups of observations), each of which depends only on one element of gamma, then the marginalization will be straightforward. If there are observations whose likelihoods depend jointly on many elements of gamma, then the marginalization might be combinatorially hard/intractable. However, if dependence on multiple elements of gamma takes one of several particular forms, then there might be efficient algorithms for computing the marginalization without dealing with the exponential complexity. For example, in hidden markov models we can use the forward algorithm to efficiently compute the marginalized likelihood.

Thanks a lot!

I want to use ‘gamma’ to construct a spike-and-slab prior like this:

\pi(\beta_{jk}|\gamma_{jk}) = (1-\gamma_{jk}) \psi(\beta_{jk}|\lambda_{0k})+\gamma_{jk}\psi(\beta_{jk}|\lambda_{1})

Where \psi(\beta|\lambda) = \frac{\lambda}{2}exp\{ -\lambda|\beta| \} is a Laplace prior.

Is \beta data, or is it another parameter?

Another parameter.

Loading matrix \mathbf{B} = \{ {\beta_{jk}}\}_{j,k=1}^{G,K}

I’m writing a factor model.

This looks like a combinatorially hard marginalization. I can’t guarantee that there’s no efficient trick to compute the marginalization, particularly because I still haven’t seen the full model written out. But it looks like each data point will depend jointly on all the betas, and therefore on all the gammas. To fit this thing properly, you would need to marginalize over all of the 2^{jk} possible values of the gamma parameter. This might be tractable if j and k are quite small, but it will quickly become infeasible as those values get large. The marginalization could be computed by computing the full log-likelihood conditioned on every possible value of gamma, and then taking the log_sum_exp of the results.

Thanks for your reply.

Factor model:

\mathbf{Y} = \mathbf{B} \boldsymbol{\Omega} + \mathbf{E}

Where observation matrix \mathbf{Y}=(y_1,...y_i,...,y_N) \in \mathbb{R}^{G \times N}, loading matrix \mathbf{B} \in \mathbb{R}^{G \times K} , latent factor matrix \boldsymbol{\Omega} = (\omega_1,...\omega_i,...,\omega_N) \in \mathbb{R}^{K \times N}, idiosyncratic matrix \mathbf{E} \in \mathbb{R}^{G \times N}. \mathbf{Y} is observed data and we want to estimate \mathbf{B}.

For each observation \mathbf{y}_i \in \mathbb{R}^{G}

f(\mathbf{y}_i| \boldsymbol{\omega}_i, \mathbf{B}, \mathbf{\Sigma} ) \sim \mathcal{N}_G( \mathbf{B}\boldsymbol{\omega}_i, \mathbf{\Sigma}), \boldsymbol{\omega}_i \sim \mathcal{N}_K( \mathbf{0}, \mathbf{I}_K)

Where \mathbf{\Sigma} = diag\{ \sigma_j^2 \}_{j=1}^G, i=1,…,N.

Marginally,

f(\mathbf{y}_i| \mathbf{B}, \mathbf{\Sigma} ) \sim \ \mathcal{N}_G( \mathbf{0}, \mathbf{B}\mathbf{B}^T+\mathbf{\Sigma})

The whole code is like this:

data {
  int<lower=0> G;
  int<lower=0> N;
  matrix[G,N] Y; 
  int<lower=0> K;
  real<lower=0> alpha;
}


parameters {
  vector<lower=0, upper=1>[K] v;
  cov_matrix[G] Sigma;
  int<lower=0, upper=1> gamma[G,K];
  matrix[G,K] B;
}



transformed parameters{
  
  vectorr<lower=0, upper=1>[K] theta;
  {
    theta[1] = v[1];
    
    for(k in 2:K){
      theta[k] = theta[k-1] * v[k];
    }
    
  }
  
}


model {
  
  v ~ beta(alpha, 1);
  
  for(j in 1:G){
    for(k in 1:K){
      gamma[j,k] ~ bernoulli(theta[k]);
    }
  }

 for(j in 1:G){
     Sigma[j,j] ~ inv_gamma(0.5,0.5);
 }
  
  //
  for(j in 1:G){
      for(k in 1:K){
        target += log_sum_exp( log(1-gamma[j,k]) + double_exponential_lpdf(B[j,k]|0,100),
                               log(gamma[j,k])  + double_exponential_lpdf(B[j,k]|0,0.1)
                              );
      }
  }
  
  for(i in 1:N){
    target += multi_normal_lpdf(Y[i]|rep_vector(0,G), B*B'+Sigma);
  }

}




K is large, and I want to set K=100. It seems infeasible.

By the way, is it feasible by Nimble?

Models like this are nominally feasible in Nimble, but in general accurate posterior inference in these settings is hard. The basic intuition is that there are in excess of 2^100 combinations, and the sampler necessarily will visit only a tiny fraction of them, and so it is difficult to be confident that there aren’t important modes in unexplored parts of the parameter space.

1 Like

Thank you very much