Incorporating pre-computed grid of models into Stan model

I’m working on a model where the general procedure is looking for the parameters that best minimize the difference between a model based on those parameters and data.

My code is the following:

       real get_model(real logage, real zh) {
            return -1.0;
            }

        real chi2_lpdf(matrix obs, real logage, real zh) {
            vector[num_elements(obs[:,1])] tmp;
            vector[num_elements(obs[:,1])] holder;
            real ln_like;

            for (i in 1:num_elements(obs[:,1])) {
                holder[i] = 1;
            }

            for (i in 1:num_elements(obs[:,1])) {
                tmp[i] = (obs[i,1]-holder[i])^2/obs[i,2];
            }

            ln_like = -0.5*sum(tmp);
            return ln_like;
        }
    }

    data {
        int num;
        matrix[num, 2] obs;
    }

    parameters {
        real<lower=1.0, upper=1.1> logage;
        real<lower=-0.02, upper=0.00> zh;
    }

    model {
        real mflux;
        logage ~ normal(0.5, 1);
        zh ~ normal(0.5, 1);
        obs ~ chi2(logage, zh);
    }

As you can see, get_model is just a dummy function right now. What it eventually needs to be is a function that can take the zh and logage parameters, find and return the model associated with those parameters from a pre-computed grid so that it can be used to compare to the observations (obs variable). I’ve searched through the manual looking for any information on how I might go about this but I didn’t find anything. Has anyone dealt with a similar problem? I’d appreciate any suggestions or if there’s a tutorial you can point me to. Thank you!

If the difference is measured in log density, you’re talking about maximum likelihood estimation.

To answer your question, you can pass in any kind of data you want as data then pass it to a function as one or more arguments.

Usually, we just build a statistical model, which is why there’s nothing in the manual. I don’t quite see what your’e doing with that function.

Thanks for replying, Bob. I’m new to both Stan and C++ so I appreciate any help as I get started.

To answer your question, you can pass in any kind of data you want as data then pass it to a function as one or more arguments.

Usually, we just build a statistical model, which is why there’s nothing in the manual. I don’t quite see what your’e doing with that function.

I’ve worked more on the model I’m trying to implement and so hopefully it’s clearer what I’m trying to do –

First I generate a lookup table, where the parameters zh and logage vary and for each combination of zh and logage there’s a physically motivated model called ssp

def generate_lookup_table():

logzgrid   = [-1.0, -0.1]
logagegrid = np.log10([1.0, 11.0])

ssp1 = np.loadtxt('ssp_age01.0_zm1.0.dat')
ssp2 = np.loadtxt('ssp_age01.0_zm0.1.dat')
ssp3 = np.loadtxt('ssp_age11.0_zm1.0.dat')
ssp4 = np.loadtxt('ssp_age11.0_zm0.1.dat')

sspgrid    = [ssp1[:,1], ssp2[:,1], ssp3[:,1], ssp4[:,1]]

return [logzgrid, logagegrid, sspgrid]

I pass those grids and the data I’m trying to fit to Stan,

mock = np.loadtxt('mock.dat')
lookup = generate_lookup_table()

norm_data = {
             'num':        len(mock[:,0]),
             'obs':        mock[:,1:3],
             'logzgrid':   lookup[0],
             'logagegrid': lookup[1],
             'sspgrid':    lookup[2]
            }

I’m trying to find the logage' and zh` that best describes the data,

    parameters {
        real<lower=1.0, upper=1.1> logage;
        real<lower=-0.02, upper=0.01> zh;
    }

    model {
        logage ~ normal(0.5, 1);
        zh ~ normal(0.5, 1);
        obs ~ chi2(logzgrid, logagegrid,
                   sspgrid, num, logage, zh);
    }

To find the best values for logage and zh I need to retrieve the ssp that most closely matches the logage and zh at any given step.

    functions {

        vector get_ssp(vector logzgrid, vector logagegrid,
                        matrix sspgrid, int num,
                        real logage, real zh) {

            vector[num] ssp;

            /*
            Find the ssp for a given logage and zh -
            look up which elements in the logzgrid
            and logagegrid are the closest match to
            the given logage and zh and return the
            SSP associated with those values.
            */


            ssp =
            return ssp;
        }

        real chi2_lpdf(matrix obs, vector logzgrid,
                       vector logagegrid, matrix sspgrid,
                       int num, real logage, real zh) {

            vector[num_elements(obs[:,1])] tmp;
            real ln_like;
            vector[num] ssp;

            ssp = get_ssp(logzgrid, logagegrid,
                            sspgrid, num, logage, zh);

            for (i in 1:num) {
                tmp[i] = (obs[i,1]-ssp[i])^2/obs[i,2];
            }

            ln_like = -0.5*sum(tmp);
            return ln_like;
        }
    }

What I really need help with, is how to do the lookup in the get_ssp function. Do you have any suggestions about how I can retrieve the indices in the logzgrid and logagegrid arrays for the input logage and zh values such that I can use them to find the ssp associated with those values?

I’m still just not sure why you’re trying to do this kind of optiization within Stan. Usually, stan programs define parametric log density function (the parameter block defines the parameters). We can then optimize the parameters w.r.t. the data.

I’m not sure how you define closest. If there’s a real value, you can use the built-in sorting functions, or if you only need the best, just search. If I want the index of the max element of an array, I’d use:

// assumes size(y) > 0
int index_of_max(real[] y) {
  int max_idx = 1;
  for (n in 2:size(y))
    if (y[n] > y[max_idx])
      max_idx = n;
  return max_idx;
}

Works the same way for other comparisons than >. You may want to check that there’s at least one element in the array if you want to reuse this.