 Slow sampling for latent variable model

I am trying to run a relatively simple latent variable model on a matrix of count data (code and replicating data provided, brief explanation of the model at the end). The problem is that sampling time seems to be incredibly long considering how small my data set is. The dimensions of my document-term matrix are 1600x1800.
multinomial_wordfish.stan (2.7 KB)

When applied to my dataset, the estimated time is the following

1000 transitions using 10 leapfrog steps per transition would take 15500 seconds.

1000 transitions using 10 leapfrog steps per transition would take 16110 seconds.

1000 transitions using 10 leapfrog steps per transition would take 16470 seconds.

1000 transitions using 10 leapfrog steps per transition would take 16460 seconds.

I have tried tightening priors and initialize at a non-random starting point (following Slapin and Proksch (2008), whose Wordfish model is closely related to mine). However, it has taken more than 12h for the chains to run the 200 first iterations, which is way off the estimates. I wonder if this behavior is normal, or there is something I’m not taking account.

I’m using a Windows with 4 core processor, python version is 3.7.11 and pystan 2.19.1.1.

Any suggestion will be appreciated, as I’m not very familiar with Stan.

Here follows a brief explanation of the model I am trying to estimate is the following:

[y_{i1}…y_{iJ}]∼Multinomial(\pi_{i1}…\pi_{iJ},y_{i+})

where y_{ij} are the counts of word j in document i and y_{i+} is the total length of document i. The event probabilities are given by

\pi_{ij} = \frac{exp(\psi_{j}+\theta_{i} \beta{j})}{\sum_1^J exp(\psi_{j}+\theta_{i} \beta_{j})}

where \psi_j are document fixed effects, \beta_j are the ideological positions of words and \theta_i the ideological direction of documents.

The following code simulates a dataset similar to mine (I couldn’t upload mine because it is >4MB) and runs the model:

import numpy as np
import random
from scipy.special import softmax
import pystan as stan

## Simulate data
np.random.seed(1)

D = 1500
V = 1500
N = V * D
C = np.random.randint(10,1000, size = D)

theta = np.random.normal(size = D)
theta = np.expand_dims(theta,axis = -1)
beta  = np.random.normal(scale = 3, size = V)
beta  = np.expand_dims(beta,axis = -1)
psi   = np.random.normal(scale = 3, size = V)
psi   = np.expand_dims(psi,axis = -1)

pi  = softmax(np.transpose(psi) + np.multiply(theta,np.transpose(beta)),axis=1)

DTM = np.empty((D,V))
for i in range(0,D):
DTM[i,:] = np.random.multinomial(C[i],pi[i,:])

# Read and compile the model in C++
multinomial_wordfish = open('multinomial_wordfish.stan','r')
sm                   = stan.StanModel(model_code = multinomial_wordfish)

test_data = {'D': D,
'V': V,
'N': N,
'counts': DTM.astype('int').flatten(),
'neg_id': theta.argmin(),
'pos_id': theta.argmax()
}
fit = sm.sampling(data=test_data, iter=2000, chains=chains, init = '0')

To clarify, I suspect that the part that is slowing down the sampling is the following:

for (i in 1:D){
counts[V*(i-1)+1:V*i] ~ multinomial(softmax(psi_constrained + theta[i] * beta)); // https://mc-stan.org/docs/2_18/functions-reference/softmax.html);
}

That is, the counts words of every document are drawn form a multinomial. I did my best, but might there be any way to vectorize this bit?

You could try profiling to check how much time that part is really taking before trying to optimize too much. See the vignette Profiling Stan programs with CmdStanR • cmdstanr