Hi guys,
Thanks for the response. Here’s a more specific example to hopefully make the problem more tangible.
measurement_error_data.csv (2.7 KB)
When we measure microbial biomass, we report respiration–after adding a carbon substrate–per dry weight of soil. We collect replicated measurement on soil moisture (Gravimetric Moisture Content = GMC) as well as replicated measurement on soil respiration (Substrate Induced Respiration = SIR). We then normalize respiration by soil moisture content to get our final microbial biomass. Currently, almost everyone in the field does something that looks like this:
library(tidyverse)
# Read in data on soil moisture content
gmc <- read_csv("measurement_error_data.csv",
col_types = cols(`CO2_flux_ug.C.h-1` = col_skip(),
sample = col_character(), soil_mass_ambient_g = col_skip()))
# Calculate percent moisture
gmc$gmc <- (1-(gmc$oven_dry_g/gmc$ambient_mass_g))*100
# Read in data on microbial biomass (respiration, after adding C substrate)
sir <- read_csv("measurement_error_data.csv",
col_types = cols(ambient_mass_g = col_skip(),
oven_dry_g = col_skip(), sample = col_character()))
# Calculate average moisture for each sample
avg.gmc <- aggregate(gmc$gmc~gmc$sample, FUN=mean)
names(avg.gmc) <- c('sample','gmc')
# Merge mean moisture into respiration data set
merged <- merge(sir,avg.gmc,by='sample')
# Calculate soil moisture per dry weight of soil, based on average moisture value for each sample
merged$`CO2_flux_ug.C.gsoil-1h-1` <- merged$`CO2_flux_ug.C.h-1`/
(merged$soil_mass_ambient_g - (merged$soil_mass_ambient_g*(merged$gmc/100)))
# Average two microbial biomass readings for each sample and report as single value
final.data <- aggregate(merged$`CO2_flux_ug.C.gsoil-1h-1`~merged$sample,FUN=mean)
names(final.data) <- c('sample','microbial_biomass')
I’m not quite sure of the right Stan syntax, but it seems like I’d want something like:
a ~ normal(0,1);
tau ~ normal(0,1);
rho ~ normal(0,1);
sigma ~ normal(0,1);
gmc ~ normal(a * gmc_measured, tau);
mu = (sir/(soil.mass - (soil.mass*(gmc/100))));
y_measured ~ normal(b*mu, rho);
y ~ normal(y_measured, sigma);
Am I way off base with logic & syntax? How would I account for the fact that each sample should be getting its own distributions, for both moisture content and respiration, that are defined by the observed data? The way that I understand what I just wrote (and what was written by @sakredja) is that both x and y are defined by some universal distribution from which intermediate calculations are made, not that each distribution will be specific to each sample.