# Log_sum_exp(matrix) by column or by row

Is it possible to compute log_sum_exp(matrix) and get a column (so just apply the operation on each element in every row) or get a row (so apply the operation on each element in every column)? I am trying to speed up the forward algorithm in a hidden Markov model longitudinal problem. But right now it loops thru each person, each period and all the possible states. It is quite slow. I want to see if I can increase the speed if I can do this log_sum_exp operation for every person together in a vectorized manner.

This is the code that I am running. The forward-algorithm part under the `model{}` is looping every person (I), every period (T), and every state (S). I want to vectorize that portion of the code. The sticking point is the use of `log_sum_exp`. Thank you.

``````data {
int<lower=1> S; // num states
int<lower=1> I; // num customers
int<lower=1> T; // num calibration periods
int<lower=0> Y[I,T]; // observed behavior
int<lower=1> K; // Binomial number trials
}

parameters {
matrix[S*(S-1), I] z; // indep normals for transition prob.
cholesky_factor_corr[S*(S-1)] L_Omega; // cholesky corr. for transition prob.
row_vector[S*(S-1)] mu_theta; // mean of unconstrained transition prob.
vector<lower=0,upper=pi()/2>[S*(S-1)] tau_unif; // scaled variance of transition prob.
simplex[S+1] mu_unif_phi; // transformed state dependent prob.
simplex[S] ppi; // initial prob.
}

transformed parameters{
matrix[I,S*(S-1)] theta; // individual transition parameters
vector<lower=0>[S*(S-1)] tau; // variance of transition prob.
matrix[S,S] log_Q[I]; // log transition prob.

for (s in 1:S*(S-1))
tau[s] = 2.5 * tan(tau_unif[s]);
theta = rep_matrix(mu_theta,I) + (diag_pre_multiply(tau,L_Omega) * z)';  // heterogenity for transition prob.
for (i in 1:I){
for (k in 1:S){
row_vector[S-1] ttheta;
ttheta = theta[i,((S-1)*(k-1)+1):((S-1)*k)];
log_Q[i,k,]=to_row_vector(log_softmax(to_vector(append_col(ttheta,0))));
// add the zero at the end, so vector is size S,
// and compute softmax (multinomial logistic link)
}
}
}

model {
to_vector(z) ~ normal(0, 1);
L_Omega ~ lkj_corr_cholesky(2);
mu_theta ~ normal(0, 2);
mu_unif_phi~dirichlet(rep_vector(1.0,S+1));
{
// Forward algorithm
for (i in 1:I){
real acc[S];
real gamma[T,S];
for (k in 1:S)
gamma[1,k] = log(ppi[k])+ binomial_lpmf(Y[i,1]|K,mu_phi[k]);
for (t in 2:T) {
for (k in 1:S) {
for (j in 1:S){
acc[j] = gamma[t-1,j] + log_Q[i,j,k] + binomial_lpmf(Y[i,t]|K,mu_phi[k]) ;
}
gamma[t,k] = log_sum_exp(acc);
}
}
target +=log_sum_exp(gamma[T]);
}
}
}``````
1 Like

I’m not exactly sure if you can do a whole matrix, but I think there’s a way to speed up at least part of your forward algorithm implementation. Here’s an example in Stan of an HMM with gamma state-dependent distributions:

``````data {
int<lower=0> T; // length of the time series
vector[T] steps; // step lengths
int<lower=1> N; // number of states
}

parameters {
positive_ordered[N] mu; // mean of gamma - ordered
vector<lower=0>[N] sigma; // SD of gamma
//tpm
simplex[N] gamma[N];
}

transformed parameters {
vector<lower=0>[N] shape;
vector<lower=0>[N] rate;

// transform mean and SD to shape and rate
for(n in 1:N)
shape[n] = mu[n]*mu[n]/(sigma[n]*sigma[n]);

for(n in 1:N)
rate[n] = mu[n]/(sigma[n]*sigma[n]);
}

model {
vector[N] logp;
vector[N] logptemp;
matrix[N,N] log_gamma_tr;

// priors
mu ~ normal(1, 1);
mu ~ normal(5, 1);
sigma ~ student_t(3, 0, 1);

// transpose
for(i in 1:N)
for(j in 1:N)
log_gamma_tr[j,i] = log(gamma[i,j]);

for(i in 1:N)
logp = rep_vector(-log(N), N) + gamma_lpdf(steps[t] | shape[n], rate[n]);

// likelihood computation
for (t in 2:T) {
for (n in 1:N) {
logptemp[n] = log_sum_exp(to_vector(log_gamma_tr[n]) + logp) +
gamma_lpdf(steps[t] | shape[n], rate[n]);
}
logp = logptemp;
}

target += log_sum_exp(logp);
}

``````

In the likelihood computation, you can use the log_sum_exp to perform operations by row. Hope that speeds it up a bit.

5 Likes

You can also use the ’ operator to transpose the matrix. However log_sum_exp will work with vectors (columns) and row_vectors. So you just give it whichever you need to sum.

3 Likes

I am trying to apply log_sum_exp to a matrix but my goal is to perform the actual computation by row or by column of that matrix, and then get the corresponding vector or row_vector as the result.

log_sum_exp returns a real, so it will sum an entire matrix. So you need to break it down with a for-loop I think.
f it was me, I would write as a function. Something like…

``````  vector matrix_logsumexp(matrix Mat) {
int NR = rows(Mat);
vector[NR] res;
for (i in 1:NR) {
res[i] = log_sum_exp(Mat[i,]);
}
return res;
}  //matrix_logsumexp
``````

Then you can expose the function and test it in R to make sure it is doing what you think it should (useful for us non-experts with complex functions). Is this what you were thinking of?

2 Likes