--- title: "Introduction to Multivariable Functional Mendelian Randomization" author: "Nicole Fontana" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to Multivariable Functional Mendelian Randomization} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5 ) ``` ## Overview This vignette introduces **Multivariable Functional Mendelian Randomization (MV-FMR)**, a method to estimate time-varying causal effects of multiple correlated longitudinal exposures on health outcomes. ### Key Features - **Joint estimation** of multiple time-varying exposures - Automatic component selection via cross-validation - Support for continuous and binary outcomes - Bootstrap inference for confidence intervals - Instrument strength diagnostics ### When to Use MV-FMR Use joint multivariable estimation (`mvfmr()`) when: - You have **multiple exposures** that may be correlated - Exposures may share **horizontal pleiotropy** (common genetic variants) - There may be **mediation** between exposures - You want to estimate **independent causal effects** of each exposure ## Installation ```{r install, eval=FALSE} # Install from GitHub devtools::install_github("NicoleFontana/mvfmr") ``` ```{r load} library(mvfmr) library(fdapace) library(ggplot2) ``` ## Example: Joint Estimation of Two Exposures ### Step 1: Simulate Data We generate genetic instruments and two longitudinal exposures: ```{r simulate_data} set.seed(12345) # Generate exposure data sim_data <- getX_multi_exposure( N = 300, # Sample size J = 25, # Number of genetic instruments nSparse = 10 # Observations per subject ) # Check dimensions cat("Sample size:", nrow(sim_data$details$G), "\n") cat("Number of instruments:", ncol(sim_data$details$G), "\n") ``` ### Step 2: Generate Outcome We create an outcome with **linear effect** from X1 and **quadratic effect** from X2: ```{r generate_outcome} outcome_data <- getY_multi_exposure( sim_data, X1Ymodel = "2", # Linear: β₁(t) = 0.02 * t X2Ymodel = "5", # Late-age effect: β₂(t) = 0.1 * (t > 30) X1_effect = TRUE, X2_effect = TRUE, outcome_type = "continuous" ) cat("Outcome summary:\n") summary(outcome_data$Y) ``` ### Step 3: Functional Principal Component Analysis We apply FPCA to both exposures to extract principal components: ```{r fpca} # FPCA for exposure 1 fpca1 <- FPCA( sim_data$X1$Ly_sim, sim_data$X1$Lt_sim, list(dataType = 'Sparse', error = TRUE, verbose = FALSE) ) # FPCA for exposure 2 fpca2 <- FPCA( sim_data$X2$Ly_sim, sim_data$X2$Lt_sim, list(dataType = 'Sparse', error = TRUE, verbose = FALSE) ) cat("FPCA completed:\n") cat(" Exposure 1:", fpca1$selectK, "components selected\n") cat(" Exposure 2:", fpca2$selectK, "components selected\n") ``` ### Step 4: Joint Multivariable Estimation Now we perform joint estimation using `mvfmr()`: ```{r joint_estimation} result_joint <- mvfmr( G = sim_data$details$G, fpca_results = list(fpca1, fpca2), Y = outcome_data$Y, outcome_type = "continuous", method = "gmm", max_nPC1 = 4, max_nPC2 = 4, improvement_threshold = 0.001, bootstrap = FALSE, n_cores = 1, true_effects = list(model1 = "2", model2 = "5"), X_true = list(X1_true = sim_data$details$X1, X2_true = sim_data$details$X2), verbose = FALSE ) # View results print(result_joint) ``` ### Step 5: Visualize Time-Varying Effects The estimated time-varying causal effects can be visualized using the built-in plot method: ```{r plot_effects, fig.width=10, fig.height=4} # Plot both effects plot(result_joint) ``` The solid colored lines show the estimated time-varying causal effects, with shaded bands representing 95% confidence intervals. The dashed red lines (when present) indicate the true time-varying effects used in the simulation. ### Step 6: Extract Coefficients ```{r coefficients} # Estimated beta coefficients for basis functions coef(result_joint) # Time-varying effects at each time point head(result_joint$effects$effect1) head(result_joint$effects$effect2) ``` ### Step 7: Performance Metrics Since we simulated data with known truth, we can evaluate performance: ```{r performance} cat("Performance Metrics:\n") cat("\nExposure 1 (Linear effect):\n") cat(" MISE:", round(result_joint$performance$MISE1, 6), "\n") cat(" Coverage:", round(result_joint$performance$Coverage1, 3), "\n") cat("\nExposure 2 (Quadratic effect):\n") cat(" MISE:", round(result_joint$performance$MISE2, 6), "\n") cat(" Coverage:", round(result_joint$performance$Coverage2, 3), "\n") ``` - **MISE** (Mean Integrated Squared Error): measures accuracy of effect estimation - **Coverage**: proportion of time points where true effect falls within 95% CI ## Comparison: Joint vs. Separate Estimation We can compare joint (multivariable) with separate (univariable) estimation: ```{r separate_estimation} result_separate <- mvfmr_separate( G1 = sim_data$details$G, G2 = sim_data$details$G, fpca_results = list(fpca1, fpca2), Y = outcome_data$Y, outcome_type = "continuous", method = "gmm", max_nPC1 = 4, max_nPC2 = 4, n_cores = 1, true_effects = list(model1 = "2", model2 = "5"), verbose = FALSE ) print(result_separate) ``` ### Performance Comparison ```{r comparison_table} comparison <- data.frame( Method = rep(c("Joint (MV-FMR)", "Separate (U-FMR)"), each = 2), Exposure = rep(c("X1", "X2"), times = 2), MISE = c( result_joint$performance$MISE1, result_joint$performance$MISE2, result_separate$exposure1$performance$MISE, result_separate$exposure2$performance$MISE ), Coverage = c( result_joint$performance$Coverage1, result_joint$performance$Coverage2, result_separate$exposure1$performance$Coverage, result_separate$exposure2$performance$Coverage ) ) print(comparison) ``` **Key Insight:** Joint estimation (MV-FMR) typically performs better when exposures are correlated or share pleiotropic instruments. ## Instrument Strength Diagnostics Check instrument strength using F-statistics: ```{r diagnostics} # Calculate F-statistics K_total <- result_joint$nPC_used$nPC1 + result_joint$nPC_used$nPC2 fstats <- IS( J = ncol(sim_data$details$G), K = K_total, PC = 1:K_total, datafull = cbind( sim_data$details$G, cbind(fpca1$xiEst[, 1:result_joint$nPC_used$nPC1], fpca2$xiEst[, 1:result_joint$nPC_used$nPC2]) ), Y = outcome_data$Y ) print(fstats) ``` - **FF**: Standard F-statistic for instrument strength - **cFF**: Conditional F-statistic (accounts for other exposures) - **Qvalue**: Hansen's J overidentification test statistic - **pvalue**: P-value for Q-test (high p-value = instruments are valid) **Rule of thumb:** cFF > 10 indicates strong instruments. ## Binary Outcomes MV-FMR also supports binary outcomes using the control function approach: ```{r binary_outcome, eval=FALSE} # Generate binary outcome outcome_binary <- getY_multi_exposure( sim_data, X1Ymodel = "2", X2Ymodel = "5", outcome_type = "binary" ) # Estimate with control function result_binary <- mvfmr( G = sim_data$details$G, fpca_results = list(fpca1, fpca2), Y = outcome_binary$Y, outcome_type = "binary", method = "cf", # Control function for binary max_nPC1 = 3, max_nPC2 = 3, n_cores = 1, verbose = FALSE ) print(result_binary) ``` ## Next Steps ### Learn More - **Univariable FMR**: See `vignette("univariable-fmr")` for single exposure analysis - **Advanced examples**: Check `inst/examples/test_MV-FMR.R` for complete scenarios - **Manuscript**: See paper for methodological details ## Citation If you use this package, please cite: > Fontana, N., Ieva, F., Zuccolo, L., Di Angelantonio, E., & Secchi, P. (2025). Unraveling time-varying causal effects of multiple exposures: integrating Functional Data Analysis with Multivariable Mendelian Randomization. *arXiv preprint arXiv:2512.19064*. ## Session Info ```{r session_info} sessionInfo() ```