Developing a General ChatGPT Prompt for Stan

Hi all,

I was wondering if it might be an idea to develop a ChatGPT Prompt template for Stan? I’ve seen @Bob_Carpenter mention a few times about asking GPT to use Stan’s built-in functions (don’t code like a physicist) and I was wondering if we could share useful prompts. This would have the added benefit of making the training of ChatGPT to be good at Stan more efficient. We could also potentially distil some Stan style-guide type information into it although I’m less clear on how to prompt that.

Prompt Draft 1:
Prioritise using Stan’s built-in functions where possible.

2 Likes

@bgoodri has been interested in fine tuning for Stan help, so may have something about this

1 Like

This would be great - for training, we’d need to have “vanilla” and “optimized” versions of a bunch of models. Maybe we could mine sections of the “example-models” repo.

Note that ChatGPT is already pretty good at writing Stan code. a while ago I asked ChatGPT the following question:

Please help me translate the following statistical model description into a Stan program.
the model is a Bayesian logistic regression model with a sparsity-inducing prior on the coefficients,

We define the model as:
τ ∼ Gamma(0.5, 0.5)
λd ∼ Gamma(0.5, 0.5)
βd ∼N(0,1)
yn ∼ Bernoulli(σ((τλ ⊙ β)T xn))),
where τ is a scalar global coefficient scale, λ is a vector of local scales, β is the vector of unscaled coefficients, x are the features and y are the labels. d indexes the feature dimensions, while n indexes the examples dimension. σ is the logistic function and ⊙ is the Hadamard product.

here is ChatGPT’s response - do you see ways to make the following more efficient?

Certainly! Below is the Stan program that translates your Bayesian logistic regression model with a sparsity-inducing prior on the coefficients.

data {
    int<lower=1> N;                     // Number of observations
    int<lower=1> D;                     // Number of features
    int<lower=0, upper=1> y[N];         // Binary labels (0 or 1)
    matrix[N, D] x;                     // Feature matrix
}

parameters {
    real<lower=0> tau;                  // Global scale parameter τ
    vector<lower=0>[D] lambda;          // Local scale parameters λ_d
    vector[D] beta;                     // Unscaled coefficients β_d
}

model {
    // Priors
    tau ~ gamma(0.5, 0.5);               // Prior for τ
    lambda ~ gamma(0.5, 0.5);            // Prior for λ_d
    beta ~ normal(0, 1);                 // Prior for β_d

    // Compute the scaled coefficients γ_d = τ * λ_d * β_d
    vector[D] gamma = tau * (lambda .* beta);

    // Compute linear predictor η_n = x_n * γ
    vector[N] eta = x * gamma;

    // Likelihood
    y ~ bernoulli_logit(eta);            // y_n ~ Bernoulli(σ(η_n))
}

Explanation:

  • Data Block:

    • N and D represent the number of observations and features, respectively.
    • y[N] is the binary response variable.
    • x[N, D] is the feature matrix.
  • Parameters Block:

    • tau is the global scale parameter (τ) and is constrained to be positive.
    • lambda[D] is the vector of local scale parameters (λₐ), each constrained to be positive.
    • beta[D] is the vector of unscaled coefficients (βₐ).
  • Model Block:

    • Priors are assigned to tau, lambda, and beta as specified in your model.
    • gamma computes the element-wise product (Hadamard product) of tau, lambda, and beta.
    • eta computes the linear predictor by multiplying the feature matrix x with the scaled coefficients gamma.
    • The likelihood is specified using bernoulli_logit, which applies the logistic function internally.

Notes:

  • The Hadamard product is implemented using .* in Stan.
  • The logistic function σ is handled by bernoulli_logit, which expects the log-odds (η) as input.
  • Vectorization is used where possible for computational efficiency.

Feel free to adjust the code according to your specific data and requirements!

1 Like

As @mitzimorris says, she and @bgoodri have been working on something like this. Their goal is more to help scientists formulate and fit models. So one of the things to look at is how well it can move from a generative formulation of a model in R or Python code to a Stan program. Mitzi showed the formulation in math, which also works. It can also translate things from other systems, so it’s probably fine with BUGS/JAGS because those have been around so long.

Just for the record, I didn’t put that in my query. I saw that it coded like a physicist and asked it to use built-ins if possible. That’d be good to put in a system prompt or project prompt for ChatGPT—to use built-ins wherever possible.

Overall, though, I haven’t found it to be nearly as good at Stan programming as it is at Python programming. ChatGPT o1 (not pro) falls over when you try to get it to code for JAX. It can tell you what it needs to do in terms of making things static or binding with closures, it just can’t seem to write code that JAX can jit.

I haven’t tried Claude Sonnet on Stan. I’m curious if other people have and how well it works.

For coding, I use Claude almost exclusively. Not so much for Stan, but a couple of times I have generated a skeleton Stan model and the R backend for that, given a data frame (sample), and Claude has been fine for these. Its R is suprisingly good, sometimes using metaprogramming from rlang, and also R code in general elegant especially if I ask it to be functional. (It has also told about correlation structures available for brms random effects, which I was happy about over a lunch break until afterwards realized they don’t exist at all. But they could!)

o1 I have asked complete Stan models with somewhat hazy definitions. These have been ok as well, except for using the old array syntax. o1 was also of help on reparametrizing exp. modified gaussian in a way that is numerically stable and practical for electricity price forecasting.

3 Likes