Difficulty passing data to PyStan: variable name=N; dims declared=(); dims found=(1)

JLC · April 20, 2020, 4:39am

I’m coming from R and rstan and trying some of the same models with PyStan. I’ve hit a wall trying to sample from a bernoulli model with an error that seems to say I am not passing ‘N’ in the data, but it’s there in the data dict file.

The error I get is:
RuntimeError: Exception: mismatch in number dimensions declared and found in context; processing stage=data initialization; variable name=N; dims declared=(); dims found=(1) (in 'How_Stan_Model.stan' at line 2)

The model is:

data {
  int<lower=1> N;  // number of observations
  int Y[N];  // response variable
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  // data for group-level effects of ID 1
  int<lower=1> N_1;  // number of grouping levels
  int<lower=1> M_1;  // number of coefficients per level
  int<lower=1> J_1[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_1_1;
  // data for group-level effects of ID 2
  int<lower=1> N_2;  // number of grouping levels
  int<lower=1> M_2;  // number of coefficients per level
  int<lower=1> J_2[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_2_1;
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  int Kc = K - 1;
  matrix[N, Kc] Xc;  // centered version of X without an intercept
  vector[Kc] means_X;  // column means of X before centering
  for (i in 2:K) {
    means_X[i - 1] = mean(X[, i]);
    Xc[, i - 1] = X[, i] - means_X[i - 1];
  }
}
parameters {
  vector[Kc] b;  // population-level effects
  real Intercept;  // temporary intercept for centered predictors
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  vector[N_1] z_1[M_1];  // standardized group-level effects
  vector<lower=0>[M_2] sd_2;  // group-level standard deviations
  vector[N_2] z_2[M_2];  // standardized group-level effects
}
transformed parameters {
  vector[N_1] r_1_1;  // actual group-level effects
  vector[N_2] r_2_1;  // actual group-level effects
  r_1_1 = (sd_1[1] * (z_1[1]));
  r_2_1 = (sd_2[1] * (z_2[1]));
}
model {
  // initialize linear predictor term
  vector[N] mu = Intercept + Xc * b;
  for (n in 1:N) {
    // add more terms to the linear predictor
    mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_2_1[J_2[n]] * Z_2_1[n];
  }
  // priors including all constants
  target += normal_lpdf(b | 0,1);
  target += normal_lpdf(Intercept | 0,1);
  target += cauchy_lpdf(sd_1[1] | 0,1)
    - 1 * cauchy_lccdf(0 | 0      int Kc = K - 1;
  matrix[N, Kc] Xc;  // centered version of X without an intercept
  vector[Kc] means_X;  // column means of X before centering
  for (i in 2:K) {
    means_X[i - 1] = mean(X[, i]);
    Xc[, i - 1] = X[, i] - means_X[i - 1];
  }
}
parameters {
  vector[Kc] b;  // population-level effects
  real Intercept;  // temporary intercept for centered predictors
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  vector[N_1] z_1[M_1];  // standardized group-level effects
  vector<lower=0>[M_2] sd_2;  // group-level standard deviations
  vector[N_2] z_2[M_2];  // standardized group-level effects
}
transformed parameters {
  vector[N_1] r_1_1;  // actual group-level effects
  vector[N_2] r_2_1;  // actual group-level effects
  r_1_1 = (sd_1[1] * (z_1[1]));
  r_2_1 = (sd_2[1] * (z_2[1]));
}
model {
  // initialize linear predictor term
  vector[N] mu = Intercept + Xc * b;
  for (n in 1:N) {
    // add more terms to the linear predictor
    mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_2_1[J_2[n]] * Z_2_1[n];
  }
  // priors including all constants
  target += normal_lpdf(b | 0,1);
  target += normal_lpdf(Intercept | 0,1);
  target += cauchy_lpdf(sd_1[1] | 0,1)
    - 1 * cauchy_lccdf(0 | 0,1);
  target += normal_lpdf(z_1[1] | 0, 1);
  target += cauchy_lpdf(sd_2[1] | 0,1)
    - 1 * cauchy_lccdf(0 | 0,1);
  target += normal_lpdf(z_2[1] | 0, 1);
  // likelihood including all constants
  if (!prior_only) {
    target += bernoulli_logit_lpmf(Y | mu);
  }
}
generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept - dot_product(means_X, b);
  // additionally draw samples from priors
  real prior_b = normal_rng(0,1);
  real prior_Intercept = normal_rng(0,1);
  real prior_sd_1_1 = cauchy_rng(0,1);
  real prior_sd_2_1 = cauchy_rng(0,1);
  // use rejection sampling for truncated priors
  while (prior_sd_1_1 < 0) {
    prior_sd_1_1 = cauchy_rng(0,1);
  }
  while (prior_sd_2_1 < 0) {
    prior_sd_2_1 = cauchy_rng(0,1);
  }
},1);
  target += normal_lpdf(z_1[1] | 0, 1);
  target += cauchy_lpdf(sd_2[1] | 0,1)
    - 1 * cauchy_lccdf(0 | 0,1);
  target += normal_lpdf(z_2[1] | 0, 1);
  // likelihood including all constants
  if (!prior_only) {
    target += bernoulli_logit_lpmf(Y | mu);
  }
}
generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept - dot_product(means_X, b);
  // additionally draw samples from priors
  real prior_b = normal_rng(0,1);
  real prior_Intercept = normal_rng(0,1);
  real prior_sd_1_1 = cauchy_rng(0,1);
  real prior_sd_2_1 = cauchy_rng(0,1);
  // use rejection sampling for truncated priors
  while (prior_sd_1_1 < 0) {
    prior_sd_1_1 = cauchy_rng(0,1);
  }
  while (prior_sd_2_1 < 0) {
    prior_sd_2_1 = cauchy_rng(0,1);
  }
}

I pulled the data from a few files and created a single dictionary of the data:

Stan_Data = {'N' : How_Dict3["N"], \
             'Y' : How_Dict2["Y"], \
                 'K' : How_Dict3["K"], \
                     #'X' : How_Dict1["X"], \
                         'X' : How_Dict1, \
                         'Z_1_1' : How_Dict2["Z_1_1"], \
                             'Z_2_1' : How_Dict2["Z_2_1"], \
                                 'J_1' : How_Dict2["J_1"], \
                                     'J_2' : How_Dict2["J_2"], \
                                         'N_1' : How_Dict3["N_1"], \
                                             'M_1' :  How_Dict3["M_1"], \
                                                'NC_1' : How_Dict3["NC_1"], \
                                                     'N_2' : How_Dict3["N_2"], \
                                                         'M_2' : How_Dict3["M_2"], \
                                                             'NC_2' : How_Dict3["NC_2"], \
                                                                 'prior_only' : How_Dict3["prior_only"]}

Stan_Data.keys()
Out[169]: dict_keys(['N', 'Y', 'K', 'Z_1_1', 'Z_2_1', 'J_1', 'J_2', 'N_1', 'M_1', 'NC_1', 'N_2', 'M_2', 'NC_2', 'prior_only'])

ahartikainen · April 20, 2020, 4:50am

Hi, what does How_Dict3['N'] contain?

It should be integer number, not numpy array or a list.

N : int(How_Dict3['N'][0])

JLC · April 20, 2020, 5:02am

Thank you! That solved the N issue, but now I have a problem with X, which is a dict itself (I have it as a matrix in R).

ValueError: Variable X is neither int nor float nor list/array thereof

How_Dict1.keys()
Out[177]: dict_keys(['Intercept', 'Condition', 'Session_Type', 'Facilitator', 'Condition:Session_Type', 'Session_Type:Facilitator'])

Is there a way to preserve X as a matrix/dict within the overall data object?

ahartikainen · April 20, 2020, 5:31am

Yes, it should be list of list, or I usually use numpy arrays.

Is that named list in R?

I Would transform it first to pandas dataframe (if it is 2d) and then use .values

import pandas as pd
...
'X' : pd.DataFrame(my_dict['X']).values

See that the order is right. If you need transpose use .T for the object: .values.T

JLC · April 20, 2020, 1:28pm

Thank you!

X is/was a matrix in R, but this helped. With a little more wrangling, I’m able to run the model in PyStan now.

For anyone who might come across this thread, here is my data import and prep (from .txt and .csv files written in R)

# Data files
How_Data1 = pd.read_csv("How_Stan_Data1.txt", delim_whitespace=True)
How_Data2 = pd.read_csv("How_Stan_Data2.csv")
How_Data3 = pd.read_csv("How_Stan_Data3.csv")

# Turn into dictionaries
How_Dict1 = How_Data1.to_dict('list')
How_Dict2 = How_Data2.to_dict('list')
How_Dict3 = How_Data3.to_dict('list')

Stan_Data = {'N' : int(How_Dict3['N'][0]), \
             'Y' : How_Dict2["Y"], \
                     'K' : int(How_Dict3['K'][0]), \
                             'X' : pd.DataFrame(How_Data1).values, \
                         'Z_1_1' : How_Dict2["Z_1_1"], \
                             'Z_2_1' : How_Dict2["Z_2_1"], \
                                 'J_1' : How_Dict2["J_1"], \
                                     'J_2' : How_Dict2["J_2"], \
                                         'N_1' : int(How_Dict3['N_1'][0]), \
                                             'M_1' :  int(How_Dict3['M_1'][0]), \
                                                'NC_1' : int(How_Dict3['NC_1'][0]), \
                                                     'N_2' : int(How_Dict3['N_2'][0]), \
                                                         'M_2' : int(How_Dict3['M_2'][0]), \
                                                             'NC_2' : int(How_Dict3['NC_2'][0]), \
                                                                 'prior_only' : int(How_Dict3['prior_only'][0])}

ahartikainen · April 20, 2020, 1:52pm

Also, if you have this data already in R you can do stan_rdump in R and then pystan.read_rdump in Python.

JLC · April 20, 2020, 3:30pm

Thank you very much!

I’m kind of learning in reverse. I’ve started with BRMS and then used ‘stancode()’ and ‘stan_code()’ from BRMS to generate some initial stan code and data formats, then I’m trying to transfer/convert over to a PyStan implementation.

Your tips and tricks are a tremendous help!

Topic		Replies	Views
Learning Pystan and vectorizing models Modeling	2	886	July 3, 2017
Help with first model in Pystan Modeling	4	730	January 10, 2020
Dumping data in json file by cmdstanpy does not see a variable to be declared Interfaces cmdstanpy	2	837	March 10, 2022
Mismatch in number of dimensions declared and found Modeling cmdstanr , cognitive-science	5	1516	October 1, 2021
Initialization Error when converting from RStan to PyStan Modeling	3	466	January 30, 2020

Difficulty passing data to PyStan: variable name=N; dims declared=(); dims found=(1)

Related topics