I’ve the code below for two different implementations of a Markov-Switching Model where one uses a multinomial and one a probit implementation for the transition probabilities. It turns out that the multinomial implementation takes about twice the time compared to the probit implementation and I was wondering why that is. Both model recover parameters drawn from the respective prior distributions and initialize from the same prior. Most of the additional time is spent on the warmup and before even the first warm up draws come in.

multinomial

```
transformed parameters {
vector[K] logalpha[T, N];
vector[K] A[N, T, K]; // A[t][i][j] = p(z_t = j | z_{t-1} = i)
matrix[Md_var, K] lambda_prime[K];
{
int count = 1;
for (i in 1:K){
for (j in 1:K){
for (q in 1:Md_var){
if (i == j){
lambda_prime[i, q, j] = 0;
} else {
lambda_prime[i, q, j] = lambda[count]; count += 1;
}
}
}
}
}
for (nn in 1:N){
for (t in 1:T){
for (i in 1:K){
vector[K] unA;
vector[K] nom;
real den;
nom = to_vector(exp(z_var[startstop[nn, 1] + t - 1] * lambda_prime[i]));
den = log(log_sum_exp(nom));
unA = exp(log(nom) - den);
A[nn, t, i] = softmax(unA);
}
}
}
{ // Forward algorithm log p(z_t = j | x_{1:t})
// logalpha[t, j] = Pr(z_t = j| x_{1:t})
// logalpha[t-1, i] = Pr(z_t = i| x_{1:t-1})
// normal_lpdf(y[t] | mu[j] + phi[j] * y[t - 1], sigma): p(x_t|z_t = j) local evidence at t for j
for (nn in 1:N){
for (j in 1:K){
logalpha[1, nn, j] = log(pi1[nn, j]) + normal_lpdf(y[startstop[nn, 1]] | mu[j], sigma);
}
for (t in 2:T){
for (j in 1:K) {
real accumulator[K];
for (i in 1:K) {
accumulator[i] = logalpha[t - 1, nn, i] +
log(A[nn, t, i, j]) +
normal_lpdf(y[startstop[nn, 1] + t - 1] | mu[j] +
phi[j] * y[startstop[nn, 1] + t - 2] +
x_sha[startstop[nn, 1] + t - 1] * zeta +
x_var[startstop[nn, 1] + t - 1] * beta[:,j], sigma);
}
logalpha[t, nn, j] = log_sum_exp(accumulator);
}
}
}
}
}
```

probit

```
transformed parameters {
vector[2] logalpha[T, N];
simplex[2] A[N, T, 2]; // A[t][i][j] = p(z_t = j | z_{t-1} = i)
for (nn in 1:N){
for (t in 1:T){
A[nn, t, 1, 1] = normal_cdf(z_sha[startstop[nn, 1] + t - 1] * gamma +
z_var[startstop[nn, 1] + t - 1] * lambda[:,1] , 0, 1);
A[nn, t, 2, 2] = normal_cdf(z_sha[startstop[nn, 1] + t - 1] * gamma +
z_var[startstop[nn, 1] + t - 1] * lambda[:,2], 0, 1);
A[nn, t, 1, 2] = 1 - A[nn, t, 1, 1];
A[nn, t, 2, 1] = 1 - A[nn, t, 2, 2];
}
}
{ // Forward algorithm log p(z_t = j | x_{1:t})
// logalpha[t, j] = Pr(z_t = j| x_{1:t})
// logalpha[t-1, i] = Pr(z_t = i| x_{1:t-1})
// normal_lpdf(y[t] | mu[j] + phi[j] * y[t - 1], sigma): p(x_t|z_t = j) local evidence at t for j
for (nn in 1:N){
for (j in 1:2){
logalpha[1, nn, j] = log(pi1[nn][j]) + normal_lpdf(y[startstop[nn, 1]] | mu[j], sigma);
}
for (t in 2:T){
for (j in 1:2) {
real accumulator[2];
for (i in 1:2) {
accumulator[i] = logalpha[t - 1, nn, i] +
log(A[nn, t, i, j]) +
normal_lpdf(y[startstop[nn, 1] + t - 1] | mu[j] +
y[startstop[nn, 1] + t - 2] * phi[j] +
x_sha[startstop[nn, 1] + t - 1] * zeta +
x_var[startstop[nn, 1] + t - 1] * beta[:,j], sigma);
}
logalpha[t, nn, j] = log_sum_exp(accumulator);
}
}
}
}
}
```