Dear wonderful Stan team,

I’m returning to some very old Stan code (from 2014!) that I’m hoping to “modernize.”

The setup is very similar to a standard multi-logit problem. Units belong to one of K types and we want to model the probability that a unit belongs to a given type as function of covariates X. The complication is that types are (partially) latent. For types {A, B, C}, we can directly observe a subset of units that are type A or type B, but we can never observe units of type C directly — for those units we only know that they are, say, *either* A or C. As a result, I can no longer use the off-the-shelf multi-logit Stan models (e.g., Section 9.6 of the Stan manual).

The good news is that I can easily modify the likelihood for these units directly, which seems to work well (see attached code). At the same time, I feel like my code *must* be inefficient for this non-standard setting. I’m planning to expand this to much more complex models, so any help would be greatly appreciated.

[The motivation for this setup is a classic causal inference problem where we’re trying to estimate compliance types: {Always Takers, Never Takers, and Compliers}. We can directly observe some Always Takers and some Never Takers. But we can never directly observe Compliers.]

Stan code, R file, and knitted PDF attached (apparently I can’t post Rmd or html?).

Many thanks,

Avi

pscore_script_for_list.R (1.4 KB)

pscore_script_for_list.pdf (167.6 KB)

pscore_model_for_list.stan (1.3 KB)

parameters {

matrix[N_type-1, N_cov] beta_raw;

}

transformed parameters{

matrix[N_type, N_cov] beta;

to_vector(beta) ~ normal(0, 5);

In your posted code I found this one. I didn’t look further.

Thanks for taking a look. But that code chunk is copy-and-paste from the

Stan manual. Is that no longer the recommendation? It’s certainly easy to

change that part.

If you have a minute, I’m most worried about the main model block.

Thanks again.

Which version/page is that?

You have to use:

to_vector(**beta_raw**) ~ normal(0, 5);

Ah. Good catch. That was my mistake!

Again, I was focused on the main model block (hence missing that typo!).

I’d especially welcome comments there.

Thanks again,

Avi

Also, while I obviously just made a coding error, it might be worth noting

this simple point in the discussion on p. 135 of the manual. Looking at it

again, this was easy to gloss over in going through that section.

While that would certainly speed this up, it unfortunately destroys the

underlying structure of the problem. For example, I couldn’t, I believe,

use the posterior from beta to make a marginal prediction about a unit

being a Complier (corresponding to pi[3] in what you wrote). Perhaps I’m

missing something?

You could omit the log_sum_exp, log_softmax and work directly on the probability level:

matrix[N, N_type] pi = exp(beta * x);

Then you could try:

target += log(pi[n, G[n]] / sum(pi[n]));

just to get the idea. You may left-side multiply a design matrix X built upon G.

target += log(X * PI) - log(row_sum(pi));

(Sorry not my day).

This was an excellent suggestion, cutting compute time by about 80 percent

for similar ESS relative to the previous model (with my typo fixed).

Amazing that that change made such a difference.

Thanks again for the help. Incredibly useful.

@andre.pfeuffer got the big one by converting to matrix multiply rather than a loop for `beta * x`

.

The next step would be to break down `x`

based on `z`

and `d`

because you only ever need one or two columns out of the result. And there are a lot of zeroes that aren’t efficient to multiply.

Some smaller suggestions:

```
to_vector(beta) ~ normal(0, 5);
```

should be

```
to_vector(beta_raw) ~ normal(0, 5);
```

because you don’t need the priors on the zero entries.

It doesn’t help with speed, but

```
row_vector[N_cov] zeros;
zeros = rep_row_vector(0, N_cov);
```

can use declare-define to put it on a single line

```
row_vector[N_cov] zeros = rep_row_vector(0, N_cov);
```

And was that really supposed to be `0`

and not `1`

? Using a `1`

would be an intercept, but using `0`

doesn’t do anything I can see.

Many thanks, Bob.

I’m not sure if I totally follow your idea to break down x based on z and d, since I need the full multinomial structure.

Regardless, that new declare-define notation is pretty slick!

Thanks again,

Avi

Bob_Carpenter

Developer

```
February 19
```

@andre.pfeuffer got the big one by converting to matrix multiply rather than a loop for `beta * x`

.

The next step would be to break down `x`

based on `z`

and `d`

because you only ever need one or two columns out of the result. And there are a lot of zeroes that aren’t efficient to multiply.

Some smaller suggestions:

```
> to_vector(beta) ~ normal(0, 5);
```

should be

```
> to_vector(beta_raw) ~ normal(0, 5);
```

because you don’t need the priors on the zero entries.

It doesn’t help with speed, but

```
> row_vector[N_cov] zeros;
> zeros = rep_row_vector(0, N_cov);
```

can use declare-define to put it on a single line

```
> row_vector[N_cov] zeros = rep_row_vector(0, N_cov);
```

Let me write the relevant loop out. I moved the inner multiplies out into a single matrix multiply—that’ll need to

```
{
matrix[N, N_type] beta_x = beta * x; // declare `x` as matrix
for (n in 1:N) {
vector[N_type] log_pi = log_softmax(beta_x[n]);
if(z[n] == 0 && d[n] == 1)
target += log_pi[1];
else if (z[n] == 1 && d[n] == 0)
target += log_pi[2];
else if (z[n] == 0 && d[n] == 0)
target += log_sum_exp(log_pi[2], log_pi[3]);
else if (z[n] == 1 && d[n] == 1)
target += log_sum_exp(log_pi[1], log_pi[3]);
}
}
```

I had thought you’d be able to avoid computing some of the components, but that won’t work given the softmax.

Softmax itself isn’t the most efficient way to compute just `log_pi[1]`

in these cases because it produces a multivariate return and builds up a bigger structure than necessary. So instead of

```
target += sofmax(a)[n]
```

it’s better to do

```
log(a[n]) - log_sum_exp(a)
```

which can be coded in Stan as

```
n ~ categorical_logit(a);
```

This only computes the gradients needed.

I actually don’t see how to do what I suggested, since you need all of the columns of `beta * x`

.

What is the size of `log_pi`

(i.e., `N_type`

) here? If it’s 3, then the last two can be reduced to the first two, because `sum(softmax(a)[1:3]) = 1`

.