Unpooled multiple linear regression model

My stan model looks like the following (saved in a file called unpooled.stan):


When including Stan code in your post it really helps if you make it as readable as possible by using Stan code chunks (```stan) with clear spacing and indentation. For example, use

data{
    // size of data
    int<lower=0> N;                      // number of training instances
    int<lower=0> K;                      // number of input variables
    int<lower=0> J;                      // number of landcover classes (group-level; J)
    
    // data
    matrix[N, K] X;                      // input matrix (X)
    vector[N] y;                         // target vector (Y)
    int<lower=0, upper=J> landcover_idx[N];    // indicator variable for each landcover class (u)
}
parameters{
    vector[J] alpha;                    // intercepts (a_j[i])
    matrix[J, K] beta;                  // coefficients (b_j[i])
    real<lower=0> sigma;
}
model{
    //  priors
    alpha ~ normal(0, 1);
    // beta ~ normal(0, 0.5);

    //  likelihood
   vector[N] y_hat;
   for (n in 1:N){
       y_hat[n] = alpha[landcover_idx[n]] + beta[landcover_idx[n]] * X[n]);
    }

    y ~ normal(y_hat, sigma);
}

The python code for creating the data and sampling from it using cmdstanpy

import pandas as pd
import numpy as np 
from cmdstanpy import cmdstan_path, CmdStanModel, CmdStanMCMC


def make_dummy_data():
    variables = [f"SM_lag_{i}" for i in range(6)]
    variables += [f"PCP_lag_{i}" for i in range(6)]
    variables += [f"VCI_lag_{i}" for i in range(6)]
    N_samples = 100
    
    X_train = pd.DataFrame({
        var: np.random.random(N_samples)
        for var in variables
    })
    y_train = np.random.random(len(X_train))
    lc_idx = np.repeat([0, 1, 2], 20)[:len(X_train)]
    K = len(X_train.columns)

    data = dict(
        N=X_data.shape[0],
        K=K,
        X_train=X_data.values,
        y_train=y_data.values,
        landcover_idx=lc_idx,
        J=len(np.unique(lc_idx))
    )
    return data


if __name__ == "__main__":
    data = make_dummy_data()

    # build model
    stan_file = "unpooled.stan"
    stan_model = CmdStanModel(stan_file=stan_file)
    stan_model.compile()

    # fit model
    model_fit: CmdStanMCMC = stan_model.sample(
        data=data,
        chains=4,
        parallel_chains=4,
        seed=1111,
        show_progress=True,
    )

I get the following error:

INFO:cmdstanpy:compiling stan file /Users/tommylees/github/LEARNING/Hierarchical-Bayesian-ARDL/unpooled.stan to exe file /Users/tommylees/github/LEARNING/Hierarchical-Bayesian-ARDL/unpooled
ERROR:cmdstanpy:Stan program failed to compile:
WARNING:cmdstanpy:
--- Translating Stan model to C++ code ---
bin/stanc  --o=/Users/tommylees/github/LEARNING/Hierarchical-Bayesian-ARDL/unpooled.hpp /Users/tommylees/github/LEARNING/Hierarchical-Bayesian-ARDL/unpooled.stan
Semantic error in '/Users/tommylees/github/LEARNING/Hierarchical-Bayesian-ARDL/unpooled.stan', line 31, column 45 to column 74:
   -------------------------------------------------
    29:     vector[N] y_hat;
    30:     for (n in 1:N){
    31:         y_hat[n] = alpha[landcover_idx[n]] + (beta[landcover_idx[n]] * X[n]);
                                                      ^
    32:      }
    33:
   -------------------------------------------------

Ill-typed arguments supplied to infix operator *. Available signatures:
(int, int) => int
(real, real) => real
(row_vector, vector) => real
(real, vector) => vector
(vector, real) => vector
(matrix, vector) => vector
(complex, real) => complex
(complex, complex) => complex
(real, row_vector) => row_vector
(row_vector, real) => row_vector
(row_vector, matrix) => row_vector
(real, matrix) => matrix
(vector, row_vector) => matrix
(matrix, real) => matrix
(matrix, matrix) => matrix
Instead supplied arguments of incompatible type: row_vector, row_vector.
make: *** [/Users/tommylees/github/LEARNING/Hierarchical-Bayesian-ARDL/unpooled.hpp] Error 1

Command ['make', '/Users/tommylees/github/LEARNING/Hierarchical-Bayesian-ARDL/unpooled']
	error during processing No such file or directory

I am guessing the key line is this:

Ill-typed arguments supplied to infix operator *. Available signatures:

This refers to the multiplication of my matrix object (matrix[J, K] beta; indexed into a row vector), multiplied by another row vector (matrix[N, K] X;)

How can I correctly specify/code the mean for each group? Is it possible to give the same prior to each Beta/Alpha separately? Or would I be better just fitting the model using different data samples?

The reason I am fitting an unpooled model with one .stan file is because I want to progress from this unpooled estimate to a hierarchical model.

Iā€™m a little confused by your post because you correctly identify the error with your code then ask about a different thing right away. You cannot matrix multiply a 1 x k and a 1 x k because they do not conform, so you need to transpose the second row vector to return a scalar dot product (assuming your data are shaped for that, if not you need to rearrange the matrices and then transpose the beta matrix).

As far as priors go you can just loop through beta one row-vector at a time and give the rows normal priors. Someone might chime in with a more efficient approach, but that would work.

1 Like