Is it possible to compute log_sum_exp(matrix) and get a column (so just apply the operation on each element in every row) or get a row (so apply the operation on each element in every column)? I am trying to speed up the forward algorithm in a hidden Markov model longitudinal problem. But right now it loops thru each person, each period and all the possible states. It is quite slow. I want to see if I can increase the speed if I can do this log_sum_exp operation for every person together in a vectorized manner.

This is the code that I am running. The forward-algorithm part under the `model{}`

is looping every person (I), every period (T), and every state (S). I want to vectorize that portion of the code. The sticking point is the use of `log_sum_exp`

. Thank you.

```
data {
int<lower=1> S; // num states
int<lower=1> I; // num customers
int<lower=1> T; // num calibration periods
int<lower=0> Y[I,T]; // observed behavior
int<lower=1> K; // Binomial number trials
}
parameters {
matrix[S*(S-1), I] z; // indep normals for transition prob.
cholesky_factor_corr[S*(S-1)] L_Omega; // cholesky corr. for transition prob.
row_vector[S*(S-1)] mu_theta; // mean of unconstrained transition prob.
vector<lower=0,upper=pi()/2>[S*(S-1)] tau_unif; // scaled variance of transition prob.
simplex[S+1] mu_unif_phi; // transformed state dependent prob.
simplex[S] ppi; // initial prob.
}
transformed parameters{
matrix[I,S*(S-1)] theta; // individual transition parameters
vector<lower=0>[S*(S-1)] tau; // variance of transition prob.
matrix[S,S] log_Q[I]; // log transition prob.
for (s in 1:S*(S-1))
tau[s] = 2.5 * tan(tau_unif[s]);
theta = rep_matrix(mu_theta,I) + (diag_pre_multiply(tau,L_Omega) * z)'; // heterogenity for transition prob.
for (i in 1:I){
for (k in 1:S){
row_vector[S-1] ttheta;
ttheta = theta[i,((S-1)*(k-1)+1):((S-1)*k)];
log_Q[i,k,]=to_row_vector(log_softmax(to_vector(append_col(ttheta,0))));
// add the zero at the end, so vector is size S,
// and compute softmax (multinomial logistic link)
}
}
}
model {
to_vector(z) ~ normal(0, 1);
L_Omega ~ lkj_corr_cholesky(2);
mu_theta ~ normal(0, 2);
mu_unif_phi~dirichlet(rep_vector(1.0,S+1));
{
vector[S] mu_phi = head(cumulative_sum(mu_unif_phi),S);
// Forward algorithm
for (i in 1:I){
real acc[S];
real gamma[T,S];
for (k in 1:S)
gamma[1,k] = log(ppi[k])+ binomial_lpmf(Y[i,1]|K,mu_phi[k]);
for (t in 2:T) {
for (k in 1:S) {
for (j in 1:S){
acc[j] = gamma[t-1,j] + log_Q[i,j,k] + binomial_lpmf(Y[i,t]|K,mu_phi[k]) ;
}
gamma[t,k] = log_sum_exp(acc);
}
}
target +=log_sum_exp(gamma[T]);
}
}
}
```