I’m developing a statistical model to understand how changes in website clicks (which are necessary for later bookings) relate to actual bookings in various experiments. My goal is to create a generative model.
To start, I’ve been simulating data in R to reflect possible outcomes from different experimental actions on clicks and bookings. This is an attempt to mirror real-life variations in these metrics. Here is the initial code for my simulation:
library(tidyverse)
# Simulation Parameters
n <- 1000 # Number of observations
se_click <- 50 # Standard error for clicks
se_book <- 40 # Standard error for bookings
conversion_from_click_to_book <- 0.5 # Average conversion rate from clicks to bookings
# Simulated Data
simulated_data <- tibble(
real_click = rnorm(n, 0, 100), # Actual clicks (normally distributed)
observed_click = rnorm(n, real_click, se_click), # Observed clicks with added noise
real_book = rnorm(n, conversion_from_click_to_book * real_click, 30), # Actual bookings (based on clicks)
observed_book = rnorm(n, real_book, se_book) # Observed bookings with added noise
)
I think my current simulation lacks a way to show that clicks and bookings are still related through noise, as they should be correlated even without a direct impact. This idea is supported by this paper. I’m looking for advice on how to improve my model to accurately reflect these relationships. Any feedback on my approach and the correctness of my current model would be very helpful. Thanks!
P.S. I hope it’s okay that I’m posting this here since I’m not using STAN yet. Let me know if I should (re)move the post. Thanks!