New User Question - Can you use character variables in the data/model blocks?

Al_in_Sweden · September 13, 2019, 4:45pm

Dear All,

I am a new user of STAN (today), and can confirm the new user material, online examples and R code provided are excellent, and have allowed me to fit and assess models on day 1.

I have a query, and my apologies if this message is poorly structured or overly naive.

I wish to fit complex non-linear mixed effect meta-analysis type models that I current fit using SAS PROC MCMC. In the toy example code below, I am looking to fit a very simple model, where we read in a “drugnum” variable (as integer) in the Data Block, which is either 0 or 1 in the dataset (representing the drug treatment (yes/no). In the Model Block, we then use an IF statement to define the model (“mymod”) differently depending on this variable. This works (great!), however I would need to read in a different variable, say “DrugName” which contains “Placebo”, “Drug 1”, “Drug 2” etc., and then have code in the Model Block such as:

if (DrugName[n] == “Placebo”) mymod = myint ;
if (DrugName[n] == “Drug 1”) mymod = myint + drugeff ;

Is it possible to read in character strings like this in the Data Block, and use them in the Model Block?

I appreciate this may be slow computation wise, but I have over 40 drugs, and many parts to the “mymod” computation, and hence trying to do this with (integer) flag variables for each drug would not be feasible.

I hope this is sufficiently clear and concise, and look forward to any suggestions.

Best wishes

Al


data {  
int  N; 
real y[N];
real myse[N]; 
int drugnum[N];
}

model {
  real mymod;
  myint   ~ normal(0, 100); 
  drugeff ~ normal(0, 100); 
 
  for (n in 1:N) { 
   if (drugnum[n] == 0) mymod  = baseline[n]  + myint  ;
   if (drugnum[n] == 1) mymod  = baseline[n]  + myint + drugeff  ;
    y[n] ~ normal(  mymod, myse[n]);
     }

Data looks like:

y, myse, drugnum, drugname
8, 1, 0, “Placebo”
9, 1, 1, “Drug 1”
etc.

yizhang · September 13, 2019, 4:50pm

No. According to reference guide

Arguments for built-in and user-defined functions and local variables are required to be basic data types, meaning an unconstrained primitive, vector, or matrix type or an array of such.

nhuurre · September 13, 2019, 4:56pm

You have to handle the drug names in R and only pass ID numbers to Stan.
You should probably declare drugeff as an array and then do

  for (n in 1:N) { 
   if (drugnum[n] == 0) mymod  = baseline[n]  + myint  ;
   if (drugnum[n] != 0) mymod  = baseline[n]  + myint + drugeff[drugnum[n]]  ;
    y[n] ~ normal(  mymod, myse[n]);
     }

Al_in_Sweden · September 14, 2019, 7:44am

Thanks to Yizhang and Nhuurre for their clear replies. I am surprised, and thought I was would have been able to have characters as elements of the vectors/matrices.

As a note to the developers, this is a very big limitation for me, since my models are around 2 pages long, with code like:

If drugclass = “SU” and Drug in (“Glipizide”, “Glimepiride”, “Gliclazide”) then drug1eff = parmx…
If drugclass2 = “GLP1” and Drug2 in (“Exenatide”, “Lixisenatide”) then drug2eff = parmy…

This is much more readable (less prone to errors) compared with:

If drugclass = 4 and Drug in (2, 23, 37) then …
If drugclass = 6 and Drug in (7, 16) then …

Clearly possible, but I struggle to understand why this limitation is there, given there is no such restriction in PROC MCMC (is C in SAS so different to C++ for STAN?)

Thanks again!

nhuurre · September 14, 2019, 9:07am

Stan doesn’t have in operator either. If you really want to write Glimeazide instead of 7 you can always do

transformed data {
  int GLIMEAZIDE = 7;
  ...
}

or maybe pass the identifier as data.

Al_in_Sweden · September 14, 2019, 10:50am

Thank you…perhaps this may be a work-around - I will give it a go.

sakrejda · September 15, 2019, 11:39am

For the linear parts of your model, construct a model matrix outside of Stan and make sure you have the labeling straight there. Then you can pass the model matrix into Stan and your code will likely get much shorter.

Al_in_Sweden · September 17, 2019, 5:10pm

Thank you for replying…unfortunately the need to refer to the drug names is in a part of the model with non-linear functions of the parameters, and hence I cannot code this outside. I wonder if others have found this restriction equally limiting - for my modelling work, I need IF THEN type code for the (non-linear) models I fit, allowing different parts of the model to refer to different drugs (like here) or different endpoints (e.g. SBP, DBP, weight etc.) and hence making code ‘readable’ (i.e. not integers to define what is what) is pretty important.

Topic		Replies	Views
Plot data from a stan file and define char-valued functnions in a Stan file RStan	1	380	April 14, 2020
Using external data as input into STAN? Modeling specification	6	145	June 6, 2024
Define data block inputs conditionally RStan rstan	2	868	April 12, 2019
How to declare a variable in model block Modeling	24	4539	August 12, 2018
SYNTAX PROBLEM: modifying data in model block or define int in transformed parameter block Modeling rstan	2	449	June 16, 2022

New User Question - Can you use character variables in the data/model blocks?

Related topics