New User Question - Can you use character variables in the data/model blocks?

Dear All,

I am a new user of STAN (today), and can confirm the new user material, online examples and R code provided are excellent, and have allowed me to fit and assess models on day 1.

I have a query, and my apologies if this message is poorly structured or overly naive.

I wish to fit complex non-linear mixed effect meta-analysis type models that I current fit using SAS PROC MCMC. In the toy example code below, I am looking to fit a very simple model, where we read in a “drugnum” variable (as integer) in the Data Block, which is either 0 or 1 in the dataset (representing the drug treatment (yes/no). In the Model Block, we then use an IF statement to define the model (“mymod”) differently depending on this variable. This works (great!), however I would need to read in a different variable, say “DrugName” which contains “Placebo”, “Drug 1”, “Drug 2” etc., and then have code in the Model Block such as:

if (DrugName[n] == “Placebo”) mymod = myint ;
if (DrugName[n] == “Drug 1”) mymod = myint + drugeff ;

Is it possible to read in character strings like this in the Data Block, and use them in the Model Block?

I appreciate this may be slow computation wise, but I have over 40 drugs, and many parts to the “mymod” computation, and hence trying to do this with (integer) flag variables for each drug would not be feasible.

I hope this is sufficiently clear and concise, and look forward to any suggestions.

Best wishes


data {  
int  N; 
real y[N];
real myse[N]; 
int drugnum[N];

model {
  real mymod;
  myint   ~ normal(0, 100); 
  drugeff ~ normal(0, 100); 
  for (n in 1:N) { 
   if (drugnum[n] == 0) mymod  = baseline[n]  + myint  ;
   if (drugnum[n] == 1) mymod  = baseline[n]  + myint + drugeff  ;
    y[n] ~ normal(  mymod, myse[n]);

Data looks like:

y, myse, drugnum, drugname
8, 1, 0, “Placebo”
9, 1, 1, “Drug 1”

No. According to reference guide

Arguments for built-in and user-defined functions and local variables are required to be basic data types, meaning an unconstrained primitive, vector, or matrix type or an array of such.

1 Like

You have to handle the drug names in R and only pass ID numbers to Stan.
You should probably declare drugeff as an array and then do

  for (n in 1:N) { 
   if (drugnum[n] == 0) mymod  = baseline[n]  + myint  ;
   if (drugnum[n] != 0) mymod  = baseline[n]  + myint + drugeff[drugnum[n]]  ;
    y[n] ~ normal(  mymod, myse[n]);

Thanks to Yizhang and Nhuurre for their clear replies. I am surprised, and thought I was would have been able to have characters as elements of the vectors/matrices.

As a note to the developers, this is a very big limitation for me, since my models are around 2 pages long, with code like:

If drugclass = “SU” and Drug in (“Glipizide”, “Glimepiride”, “Gliclazide”) then drug1eff = parmx…
If drugclass2 = “GLP1” and Drug2 in (“Exenatide”, “Lixisenatide”) then drug2eff = parmy…

This is much more readable (less prone to errors) compared with:

If drugclass = 4 and Drug in (2, 23, 37) then …
If drugclass = 6 and Drug in (7, 16) then …

Clearly possible, but I struggle to understand why this limitation is there, given there is no such restriction in PROC MCMC (is C in SAS so different to C++ for STAN?)

Thanks again!

Stan doesn’t have in operator either. If you really want to write Glimeazide instead of 7 you can always do

transformed data {
  int GLIMEAZIDE = 7;

or maybe pass the identifier as data.

1 Like

Thank you…perhaps this may be a work-around - I will give it a go.

For the linear parts of your model, construct a model matrix outside of Stan and make sure you have the labeling straight there. Then you can pass the model matrix into Stan and your code will likely get much shorter.

Thank you for replying…unfortunately the need to refer to the drug names is in a part of the model with non-linear functions of the parameters, and hence I cannot code this outside. I wonder if others have found this restriction equally limiting - for my modelling work, I need IF THEN type code for the (non-linear) models I fit, allowing different parts of the model to refer to different drugs (like here) or different endpoints (e.g. SBP, DBP, weight etc.) and hence making code ‘readable’ (i.e. not integers to define what is what) is pretty important.