PARSER EXPECTED: <distribution and parameters>

jmoravec · December 6, 2022, 5:03am

I am trying to do some deconvolution of tissue.

I have a tissue sample, which is a combination of normal tissue and cancer.
I am modelling it as a mixture of normal distributions, assuming that the methylation profile of cancer and normal tissue follows relatively narrow normal distributions. That is:

M = \alpha N (\mu_{\text{cancer}}, \sigma_{\text{cancer}}) + (1 - \alpha) N (\mu_{\text{normal}}, \sigma_{\text{normal}})

where \mu_{\text{cancer}}, \sigma_{\text{cancer}} is an unknown mean and standard deviation of the cancer methylation profile, and \mu_{\text{normal}}, \sigma_{\text{normal}} are the known parameters of the normal healthy tissue. The mixture proportion parameter \alpha is unknown.

Note that this is different from the example mixture distribution, where the samples can belong to one or the other distribution. Here the Mixture is a sum of both.

My STAN code is this:

    data {
    int<lower=1> S; // number of samples
    int<lower=1> P; // number of probes
    matrix[S, P] mixed_tissue; // mixed tissue
    vector[P] mu_lung; // mean of lung, estimated from data
    vector[P] sd_lung; // sd of lung, estimated from data
    }

    parameters {
    real<lower=0, upper=1> alpha; // cancer cell fraction
    vector[P] mu_cancer; // mean of cancer
    vector[P] sd_cancer; // sd of cancer
    }

    model {
    // using for cycle:
    for (s in 1:S) {
        for(p in 1:P) {
            mixed_tissue[s, p] ~ alpha[s] * normal(mu_cancer[p], sd_cancer[p]) + (1 - alpha[s]) * normal(mu_lung[p], sd_lung[p]);
            }
        }
    }

Yet, I get a non-explanatory error “PARSER EXPECTED: ” pointing at mu_lung. I don’t understand what it means, mu_lung is a fixed value, nothing interesting, nothing required to sample/estimate. In other examples, fixed values are explicitly passed.

Thanks for help.
ps. first time using STAN. Not yet familiar with the target notation.

ahartikainen · December 6, 2022, 7:00am

Was there also something else given with the error msg?

Did you give the data in correct format?

jmoravec · December 6, 2022, 7:09am

Full error message:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
 error in 'model433c049566996_entex' at line 20, column 33
  -------------------------------------------------
    18:     for (s in 1:S) {
    19:         for(p in 1:P) {
    20:             mixed_tissue[s, p] ~ alpha[s] * normal(mu_cancer[p], sd_cancer[p]) + (1 - alpha[s]) * normal(mu_lung[p], sd_lung[p]);
                                        ^
    21:             }
  -------------------------------------------------

PARSER EXPECTED: <distribution and parameters>
Error in stanc(file = file, model_code = model_code, model_name = model_name,  :

Haven’t thought about the format of the data. I am passing some data.table columns instead of R objects. I converted it into a native R object matrix, but this didn’t solve the issue.

Is the notation correct? I am looking at the documentation and the notation ~ is typically used as:

y ~ distribution(parameters)

not as

y ~ distribution(parameters) + distribution(parameters) + values

(see 1.1 Linear regression | Stan User’s Guide )

For more complex models, the target += notation is used instead, but I am not yet comfortable specifying models this way. Might that be the issue?

WardBrian · December 6, 2022, 3:50pm

I think this is your problem. The ~ syntax is used only when a distribution is on the right, not a complex expression.

You can write your own l[u]pdf function, or replace it with an equivalent target +=.

jmoravec · December 6, 2022, 6:35pm

Thank you Brian.

This seems like something documentation should mention explicitly.

Could you please suggest how should I rewrite the model using the target += notation? I am still unfamiliar with it, as well with the f(y|x) notation (still not sure if it is standard conditional probability or some special stan notation).

WardBrian · December 6, 2022, 7:29pm

A statement like a ~ normal(b, c) is perfectly equivalent to target += normal_lupdf(a | b, c)

So, I think your above statement would be best translated as something like

target += alpha[s] * normal_lupdf(mixed_tissue[s, p] | mu_cancer[p], sd_cancer[p]);
target += (1 - alpha[s]) * normal_lupdf(mixed_tissue[s, p] | mu_lung[p], sd_lung[p]);

This still fails because you’re trying to index into alpha, which is not an array/vector, but if you fix that it compiles (make sure it means what you want, though!)

jmoravec · December 6, 2022, 7:43pm

Thanks Brian

I will certainly fix the alpha, but the way you wrote it is essentially the model mentioned in the chapter:

which reads to me that individual data point in mixed_tissue can come with the probability alpha from cancer tissue OR with probability 1 - alpha from the normal tissue, which is not what I want (and the reason I wrote it the way I wrote it with ~ in the first place).

The issue is that the normal distribution are not conditional on mixed_tissue data directly, but their sum is. (i.e., it is not N(m | mu_1, sd_1) + N(m | mu_2, sd_2) but m = a + b = N(a | mu_1, sd_1) + N(b | mu_2, sd_2). At least as far as I understand the | notation.

Am I understanding this correctly?

Thanks.

WardBrian · December 6, 2022, 11:33pm

I am not sure how you could express that model (not saying it’s impossible, I just don’t know!)

jmoravec · December 6, 2022, 11:35pm

Welp, I restated model through: Z = aX + bY; Z ~ N(a*mu_x + b*mu_y, sqrt( (a*mu_x)^2 + (b*mu_y)^2 )), see i.e.,: Sum of normally distributed random variables - Wikipedia

and now it runs, but it runs out of memory. I guess I need to reduce the probes from 850 000 to something a bit smaller. :)

At least the maximum likelihood optimizing runs well.

Thanks Brian
– Jirka

Topic		Replies	Views
Estimating Parameters of a Mixture: Parser Error for Example? Modeling specification	3	1691	January 22, 2018
Estimating parameters of two normal distribution, mixed at proportions Modeling	5	1426	May 10, 2019
Help with Stan modeling using Mixture model Modeling rstan	1	111	July 5, 2024
Modeling multi modal distribution Modeling fitting-issues	3	635	March 19, 2023
Survival data modeling using mixture priors Modeling rstan , mixture , survival	2	63	December 30, 2024

PARSER EXPECTED: <distribution and parameters>

Related topics