Unpooled multiple linear regression model

tommylees112 · December 21, 2021, 11:36pm

My stan model looks like the following (saved in a file called unpooled.stan):

When including Stan code in your post it really helps if you make it as readable as possible by using Stan code chunks (```stan) with clear spacing and indentation. For example, use

data{
    // size of data
    int<lower=0> N;                      // number of training instances
    int<lower=0> K;                      // number of input variables
    int<lower=0> J;                      // number of landcover classes (group-level; J)
    
    // data
    matrix[N, K] X;                      // input matrix (X)
    vector[N] y;                         // target vector (Y)
    int<lower=0, upper=J> landcover_idx[N];    // indicator variable for each landcover class (u)
}
parameters{
    vector[J] alpha;                    // intercepts (a_j[i])
    matrix[J, K] beta;                  // coefficients (b_j[i])
    real<lower=0> sigma;
}
model{
    //  priors
    alpha ~ normal(0, 1);
    // beta ~ normal(0, 0.5);

    //  likelihood
   vector[N] y_hat;
   for (n in 1:N){
       y_hat[n] = alpha[landcover_idx[n]] + beta[landcover_idx[n]] * X[n]);
    }

    y ~ normal(y_hat, sigma);
}

The python code for creating the data and sampling from it using cmdstanpy

import pandas as pd
import numpy as np 
from cmdstanpy import cmdstan_path, CmdStanModel, CmdStanMCMC


def make_dummy_data():
    variables = [f"SM_lag_{i}" for i in range(6)]
    variables += [f"PCP_lag_{i}" for i in range(6)]
    variables += [f"VCI_lag_{i}" for i in range(6)]
    N_samples = 100
    
    X_train = pd.DataFrame({
        var: np.random.random(N_samples)
        for var in variables
    })
    y_train = np.random.random(len(X_train))
    lc_idx = np.repeat([0, 1, 2], 20)[:len(X_train)]
    K = len(X_train.columns)

    data = dict(
        N=X_data.shape[0],
        K=K,
        X_train=X_data.values,
        y_train=y_data.values,
        landcover_idx=lc_idx,
        J=len(np.unique(lc_idx))
    )
    return data


if __name__ == "__main__":
    data = make_dummy_data()

    # build model
    stan_file = "unpooled.stan"
    stan_model = CmdStanModel(stan_file=stan_file)
    stan_model.compile()

    # fit model
    model_fit: CmdStanMCMC = stan_model.sample(
        data=data,
        chains=4,
        parallel_chains=4,
        seed=1111,
        show_progress=True,
    )

I get the following error:

INFO:cmdstanpy:compiling stan file /Users/tommylees/github/LEARNING/Hierarchical-Bayesian-ARDL/unpooled.stan to exe file /Users/tommylees/github/LEARNING/Hierarchical-Bayesian-ARDL/unpooled
ERROR:cmdstanpy:Stan program failed to compile:
WARNING:cmdstanpy:
--- Translating Stan model to C++ code ---
bin/stanc  --o=/Users/tommylees/github/LEARNING/Hierarchical-Bayesian-ARDL/unpooled.hpp /Users/tommylees/github/LEARNING/Hierarchical-Bayesian-ARDL/unpooled.stan
Semantic error in '/Users/tommylees/github/LEARNING/Hierarchical-Bayesian-ARDL/unpooled.stan', line 31, column 45 to column 74:
   -------------------------------------------------
    29:     vector[N] y_hat;
    30:     for (n in 1:N){
    31:         y_hat[n] = alpha[landcover_idx[n]] + (beta[landcover_idx[n]] * X[n]);
                                                      ^
    32:      }
    33:
   -------------------------------------------------

Ill-typed arguments supplied to infix operator *. Available signatures:
(int, int) => int
(real, real) => real
(row_vector, vector) => real
(real, vector) => vector
(vector, real) => vector
(matrix, vector) => vector
(complex, real) => complex
(complex, complex) => complex
(real, row_vector) => row_vector
(row_vector, real) => row_vector
(row_vector, matrix) => row_vector
(real, matrix) => matrix
(vector, row_vector) => matrix
(matrix, real) => matrix
(matrix, matrix) => matrix
Instead supplied arguments of incompatible type: row_vector, row_vector.
make: *** [/Users/tommylees/github/LEARNING/Hierarchical-Bayesian-ARDL/unpooled.hpp] Error 1

Command ['make', '/Users/tommylees/github/LEARNING/Hierarchical-Bayesian-ARDL/unpooled']
	error during processing No such file or directory

I am guessing the key line is this:

Ill-typed arguments supplied to infix operator *. Available signatures:

This refers to the multiplication of my matrix object (matrix[J, K] beta; indexed into a row vector), multiplied by another row vector (matrix[N, K] X;)

How can I correctly specify/code the mean for each group? Is it possible to give the same prior to each Beta/Alpha separately? Or would I be better just fitting the model using different data samples?

The reason I am fitting an unpooled model with one .stan file is because I want to progress from this unpooled estimate to a hierarchical model.

jack_monroe · December 22, 2021, 1:26am

I’m a little confused by your post because you correctly identify the error with your code then ask about a different thing right away. You cannot matrix multiply a 1 x k and a 1 x k because they do not conform, so you need to transpose the second row vector to return a scalar dot product (assuming your data are shaped for that, if not you need to rearrange the matrices and then transpose the beta matrix).

As far as priors go you can just loop through beta one row-vector at a time and give the rows normal priors. Someone might chime in with a more efficient approach, but that would work.

Topic		Replies	Views
Linear Mixed Model Modeling	2	429	January 21, 2020
Multi Normal model Modeling	4	599	December 21, 2019
Specifying a Multinomial Logistic Regression with Unpooled Estimates Modeling pystan , specification , multinomial-response	3	1122	November 29, 2021
Help with simple multilevel model code [solved] Modeling specification	5	1202	February 10, 2018
Help with structuring a (simple) random slope hierarchical model in Stan Modeling	8	1386	July 11, 2019

Unpooled multiple linear regression model

Related Topics