dnaEPICO 0.99.14
This vignette describes the file-based route in dnaEPICO. In this mode, the
same wrapper functions still return structured result objects, but they also
write files to disk when saveOutputs = TRUE. This route is appropriate when
you want a reproducible folder structure, Makefile-driven execution, or a
pipeline that can be resumed on a local workstation or an HPC system.
The dnaEPICO package uses methods from several external
packages. Because no single manuscript describes all components, the guidance
below explains how to cite dnaEPICO depending on the functions you use.
This citation guidance is adapted from the vignettes and user guides of minfi, ENmix, and wateRmelon.
make f3 MODEL=model1, please cite Aryee et al. (2014); Fortin, Triche Jr., and Hansen (2017); Xu, Niu, and Taylor (2021);
Pidsley et al. (2013); Maksimovic, Gordon, and Oshlack (2012); Fortin et al. (2014); Triche et al. (2013); Touleimat and Tost (2012); Murat et al. (2020).make f4 MODEL=model1, please cite Aryee et al. (2014); Fortin, Triche Jr., and Hansen (2017); Xu, Niu, and Taylor (2021);
Pidsley et al. (2013); Maksimovic, Gordon, and Oshlack (2012); Fortin et al. (2014); Triche et al. (2013); Touleimat and Tost (2012); Murat et al. (2020); Marschner (2011).make f3lme MODEL=model1, please cite Aryee et al. (2014); Fortin, Triche Jr., and Hansen (2017);
Xu, Niu, and Taylor (2021); Pidsley et al. (2013); Maksimovic, Gordon, and Oshlack (2012); Fortin et al. (2014); Triche et al. (2013); Touleimat and Tost (2012); Murat et al. (2020); Bates et al. (2015).make all MODEL=model1, please cite Aryee et al. (2014); Fortin, Triche Jr., and Hansen (2017); Xu, Niu, and Taylor (2021);
Pidsley et al. (2013); Maksimovic, Gordon, and Oshlack (2012); Fortin et al. (2014); Triche et al. (2013); Touleimat and Tost (2012); Murat et al. (2020); Marschner (2011); Bates et al. (2015).dnaEPICO is built on core Bioconductor infrastructure for high-dimensional genomic data, with a focus on Illumina DNA methylation arrays. To read this vignette, we assume that you are already familiar with the general DNA methylation workflow. If not, we recommend first reading this tutorial, which provides a practical introduction to the main concepts and analysis steps.
Preprocessing and quality control are performed using established Bioconductor tools, including minfi, ENmix, and wateRmelon. Downstream statistical modelling relies on base R and CRAN frameworks, including generalised linear models and linear mixed-effects models. Users are expected to have basic familiarity with R, command-line execution, and Illumina IDAT file structures.
If you are asking yourself the question “Where do I start using Bioconductor?”, you might be interested in this blog post.
The dnaEPICO package depends on minfi, ENmix, and wateRmelon, among others. Install dependencies with BiocManager:
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("dnaEPICO")
BiocManager::valid()
library("dnaEPICO")
The package includes a template Makefile that can be copied into a project directory and adapted to local paths and model definitions.
makefilePath <- extractMake(destDir = tempdir(), overwrite = TRUE)
basename(makefilePath)
#> [1] "Makefile"
After extraction, the file should be edited so that the project-specific paths, model labels, and cluster settings match the target analysis environment.
Arguments:
destDir = tempdir() writes the template to a temporary directory so the
example does not modify the working tree.overwrite = TRUE allows the template to be replaced if it already exists in
that temporary location.The file-based route usually starts with preprocessingMinfiEwasWater(). This
step writes the filtered RGSet, normalized metric matrices, quality-control
figures, and the phenoLC.csv file that will be consumed by svaEnmix() and
preprocessingPheno().
The next chunk shows how to recreate the temporary files used in this example. It writes a small phenotype table to a temporary directory, copies a small set of IDAT files, and records the cross-reactive probe reference used during probe filtering.
preprocessing_inputs <- dnaEPICO:::exampleMinfiIdatInputsDnaEpico(n = 6L)
names(preprocessing_inputs)
#> [1] "tempDir" "idatFolder" "phenoFile"
#> [4] "targets" "arrayType" "annotationVersion"
#> [7] "crossReactivePath"
preprocessing_inputs$tempDir
#> [1] "/tmp/Rtmp6rWi3o/dnaEPICO-idat-example-3a4cfb42a656eb"
basename(preprocessing_inputs$phenoFile)
#> [1] "pheno.csv"
head(basename(list.files(preprocessing_inputs$idatFolder, full.names = TRUE)), 4)
#> [1] "5723646052_R02C02_Grn.idat" "5723646052_R02C02_Red.idat"
#> [3] "5723646052_R04C01_Grn.idat" "5723646052_R04C01_Red.idat"
basename(preprocessing_inputs$crossReactivePath)
#> [1] "12864_2024_10027_MOESM8_ESM.csv"
preprocessing_inputs is a small list returned by the internal example helper.
names(preprocessing_inputs) shows its main elements: tempDir is the
temporary working directory, idatFolder contains the copied example IDAT
files, phenoFile is the phenotype table used by the wrapper, targets is the
same phenotype information already loaded into memory, arrayType and
annotationVersion describe the array platform, and crossReactivePath points
to the cross-reactive probe reference used during probe filtering.
preprocessPipelineResult <- preprocessingMinfiEwasWater(
phenoFile = preprocessing_inputs$phenoFile,
idatFolder = preprocessing_inputs$idatFolder,
outputLogs = file.path(preprocessing_inputs$tempDir, "logs"),
nSamples = 6,
SampleID = "Sample_Name",
arrayType = preprocessing_inputs$arrayType,
annotationVersion = preprocessing_inputs$annotationVersion,
scriptLabel = "preprocessingMinfiEwasWater",
baseDataFolder = file.path(preprocessing_inputs$tempDir, "rData"),
figureBaseDir = file.path(preprocessing_inputs$tempDir, "figures"),
detPThreshold = 1,
normMethods = "quantile",
sexColumn = "Sex",
pvalThreshold = 1,
chrToRemove = "",
snpsToRemove = "SBE",
mafThreshold = 1,
crossReactivePath = preprocessing_inputs$crossReactivePath,
plotGroupVar = "Sex",
lcRef = "saliva",
phenoOrder = "Sample_Name;Sex;Basename;Sentrix_ID;Sentrix_Position",
lcPhenoDir = file.path(
preprocessing_inputs$tempDir,
"data",
"preprocessingMinfiEwasWater"
),
display = FALSE,
verbose = FALSE,
logs = FALSE,
saveOutputs = TRUE
)
#> Plotting STAINING .jpg
#> Plotting EXTENSION .jpg
#> Plotting HYBRIDIZATION .jpg
#> Plotting TARGET_REMOVAL .jpg
#> Plotting BISULFITE_CONVERSION_I .jpg
#> Plotting BISULFITE_CONVERSION_II .jpg
#> Plotting SPECIFICITY_I .jpg
#> Plotting SPECIFICITY_II .jpg
#> Plotting NON-POLYMORPHIC .jpg
#> Plotting NEGATIVE .jpg
#> Plotting NORM_A .jpg
#> Plotting NORM_C .jpg
#> Plotting NORM_G .jpg
#> Plotting NORM_T .jpg
#> Plotting NORM_ACGT .jpg
preprocess_paths <- c(
phenoLC = file.path(
preprocessing_inputs$tempDir,
"data",
"preprocessingMinfiEwasWater",
"phenoLC.csv"
),
rgset = file.path(
preprocessing_inputs$tempDir,
"rData",
"preprocessingMinfiEwasWater",
"objects",
"RGSet.RData"
),
beta = file.path(
preprocessing_inputs$tempDir,
"rData",
"preprocessingMinfiEwasWater",
"metrics",
"beta_NomFilt_MSetF_Flt_Rxy_Ds_Rc.RData"
),
qc = file.path(
preprocessing_inputs$tempDir,
"figures",
"preprocessingMinfiEwasWater",
"qc",
"quality_control(MSet).tiff"
)
)
class(preprocessPipelineResult)
#> [1] "dnaEPICO_preprocessingMinfiEwasWater"
basename(preprocess_paths)
#> [1] "phenoLC.csv"
#> [2] "RGSet.RData"
#> [3] "beta_NomFilt_MSetF_Flt_Rxy_Ds_Rc.RData"
#> [4] "quality_control(MSet).tiff"
file.exists(preprocess_paths)
#> [1] TRUE TRUE TRUE TRUE
The returned object still has class dnaEPICO_preprocessingMinfiEwasWater, but
the printed vectors in this chunk focus on the written outputs. In
preprocess_paths, phenoLC is the phenotype table with estimated cell
proportions, rgset is the filtered RGSet, beta is the saved beta matrix
used later by phenotype preparation, and qc is an example quality-control
figure. basename(preprocess_paths) shows the file names, while
file.exists(preprocess_paths) confirms that each one was written successfully.
Arguments:
phenoFile and idatFolder define the phenotype table and the IDAT folder
used to start the pipeline.SampleID = "Sample_Name" is the key used to align phenotype rows with the
methylation objects.arrayType and annotationVersion define the array manifest and annotation
used by minfi.baseDataFolder, figureBaseDir, and lcPhenoDir define where serialized
objects, figures, and phenoLC.csv are written.detPThreshold, normMethods, pvalThreshold, chrToRemove,
snpsToRemove, mafThreshold, and crossReactivePath define the main
preprocessing and filtering choices.plotGroupVar, lcRef, and phenoOrder control QC grouping, cell-type
estimation, and the structure of the exported phenotype table.saveOutputs = TRUE is what turns this wrapper into the first file-producing
step of the pipeline.The next example shows svaEnmix() in file-writing mode. The function still
returns a structured result object, but the savedFiles element now records the
main output files written to disk.
In the file-based route, svaEnmix() consumes the phenoLC.csv and
RGSet.RData files written by preprocessingMinfiEwasWater(). The next chunk
shows how those temporary file paths are reconstructed locally.
sva_targets_file <- file.path(
preprocessing_inputs$tempDir,
"data",
"preprocessingMinfiEwasWater",
"phenoLC.csv"
)
sva_rgset_file <- file.path(
preprocessing_inputs$tempDir,
"rData",
"preprocessingMinfiEwasWater",
"objects",
"RGSet.RData"
)
basename(c(sva_targets_file, sva_rgset_file))
#> [1] "phenoLC.csv" "RGSet.RData"
file.exists(c(sva_targets_file, sva_rgset_file))
#> [1] TRUE TRUE
Unlike the previous helper objects, sva_targets_file and sva_rgset_file are
plain file paths reconstructed from preprocessing_inputs$tempDir. They point
to the concrete outputs written by Step 1: phenoLC.csv, which is the
phenotype table augmented with cell-composition estimates, and RGSet.RData,
which is the filtered methylation object saved to disk.
svaPipelineResult <- svaEnmix(
phenoFile = sva_targets_file,
rgsetData = sva_rgset_file,
outputLogs = file.path(preprocessing_inputs$tempDir, "logs"),
SampleID = "Sample_Name",
arrayType = "IlluminaHumanMethylation450k",
annotationVersion = "ilmn12.hg19",
SentrixIDColumn = "Sentrix_ID",
SentrixPositionColumn = "Sentrix_Position",
ctrlSvaPercVar = 0.90,
ctrlSvaFlag = 1,
scriptLabel = "svaEnmix",
dataBaseDir = file.path(preprocessing_inputs$tempDir, "data"),
rBaseDir = file.path(preprocessing_inputs$tempDir, "rData"),
figureBaseDir = file.path(preprocessing_inputs$tempDir, "figures"),
display = FALSE,
verbose = FALSE,
logs = FALSE,
saveOutputs = TRUE
)
#> 3 surrogate variables explain 91.17398 % of
#> data variation
class(svaPipelineResult$savedFiles)
#> [1] "dnaEPICO_svaEnmix_paths"
names(svaPipelineResult$savedFiles)
#> [1] "svaRData" "svaCSV" "phenoWithSva" "dataDir" "rDir"
basename(unlist(svaPipelineResult$savedFiles, use.names = FALSE))
#> [1] "svaMatrix.RData" "svaMatrix.csv" "phenoLC.csv" "svaEnmix"
#> [5] "svaEnmix"
The returned savedFiles object records the main paths written by this step.
names(svaPipelineResult$savedFiles) identifies the output groups: svaRData
is the serialized surrogate-variable matrix, svaCSV is the CSV version of
that matrix, phenoWithSva is the updated phenotype table with appended SVA
columns, and dataDir / rDir are the parent directories used for the saved
outputs. The basename(...) call shows the file names generated from those
paths.
Arguments:
phenoFile and rgsetData identify the phenotype table and saved RGSet
produced by the earlier preprocessing step.SampleID, SentrixIDColumn, and SentrixPositionColumn specify how
samples and array positions are identified in the phenotype table.ctrlSvaPercVar = 0.90 keeps enough control-derived surrogate variables to
explain 90% of the control-probe variance.ctrlSvaFlag = 1 enables the ENmix control-based SVA workflow.dataBaseDir and rBaseDir define where file outputs are written when
saveOutputs = TRUE.display = FALSE, verbose = FALSE, and logs = FALSE keep the example
quiet while still demonstrating file creation.preprocessingPheno() is the step that usually prepares the largest number of
pipeline-ready files because it writes timepoint-specific tables, combined
longitudinal tables, and the Clock Foundation export inputs.
The next chunk shows how to recreate the temporary phenotype and matrix files used in this example. The helper writes a phenotype table plus aligned beta, M-value, and copy-number objects to a temporary directory.
pheno_inputs <- dnaEPICO:::examplePreprocessingPhenoStateDnaEpico()
names(pheno_inputs)
#> [1] "tempDir" "pheno" "phenoPath"
#> [4] "betaPath" "mPath" "cnPath"
#> [7] "metricsData" "timepointData" "combinedData"
#> [10] "clockFoundation" "preprocessingData"
pheno_inputs$tempDir
#> [1] "/tmp/Rtmp6rWi3o/dnaEPICO-preprocessingPheno-example-3a4cfb6cbe96f0"
basename(c(
pheno_inputs$phenoPath,
pheno_inputs$betaPath,
pheno_inputs$mPath,
pheno_inputs$cnPath
))
#> [1] "phenoLC.csv" "beta.RData" "m.RData" "cn.RData"
pheno_inputs is a list that bundles the temporary files and the aligned
in-memory objects used in this stage. names(pheno_inputs) shows that it
contains the temporary directory, the phenotype table, the saved beta, M-value,
and copy-number files, and precomputed helper objects such as metricsData,
timepointData, combinedData, and clockFoundation.
phenoPipelineResult <- preprocessingPheno(
phenoFile = pheno_inputs$phenoPath,
betaPath = pheno_inputs$betaPath,
mPath = pheno_inputs$mPath,
cnPath = pheno_inputs$cnPath,
SampleID = "Sample_Name",
timeVar = "Timepoint",
timepoints = "1,2",
combineTimepoints = "1,2",
outputPheno = file.path(pheno_inputs$tempDir, "data", "preprocessingPheno"),
outputRData = file.path(
pheno_inputs$tempDir,
"rData",
"preprocessingPheno",
"metrics"
),
outputRDataMerge = file.path(
pheno_inputs$tempDir,
"rData",
"preprocessingPheno",
"mergeData"
),
sexColumn = "Sex",
outputLogs = file.path(pheno_inputs$tempDir, "logs"),
outputDir = file.path(pheno_inputs$tempDir, "clockFoundation"),
verbose = FALSE,
logs = FALSE,
saveOutputs = TRUE
)
names(phenoPipelineResult$savedFiles)
#> [1] "timepoints" "combinedPheno" "combinedPhenoBeta"
#> [4] "betaCSV" "betaZIP" "phenoCF"
names(phenoPipelineResult$savedFiles$timepoints)
#> [1] "1" "2"
basename(unlist(phenoPipelineResult$savedFiles$timepoints[["1"]]))
#> [1] "phenoT1.csv" "betaT1.RData" "mT1.RData"
#> [4] "phenoBetaT1.RData"
basename(unlist(phenoPipelineResult$savedFiles[c(
"combinedPheno",
"combinedPhenoBeta",
"betaCSV",
"phenoCF"
)]))
#> [1] "phenoT1T2.csv" "phenoBetaT1T2.RData" "beta.csv"
#> [4] "phenoCF.csv"
These outputs are the files most commonly consumed by downstream modeling
functions. names(phenoPipelineResult$savedFiles) shows the high-level output
groups returned by the wrapper. names(phenoPipelineResult$savedFiles$timepoints)
lists the available timepoint-specific subsets. The basename(...) calls then
show examples of the concrete files written for one timepoint and for the
combined outputs. In this structure, combinedPhenoBeta is the merged
phenotype-plus-beta object that becomes the direct input to the GLM and GLMM
wrappers, while betaCSV, betaZIP, and phenoCF are the Clock Foundation
exports.
Arguments:
phenoFile, betaPath, mPath, and cnPath point to the phenotype table
and the aligned methylation matrices from the previous workflow stages.SampleID = "Sample_Name" defines the sample key used during alignment.timeVar = "Timepoint", timepoints = "1,2", and
combineTimepoints = "1,2" determine the timepoint-specific outputs and the
combined longitudinal object.outputPheno, outputRData, outputRDataMerge, and outputDir define the
directories used for phenotype exports, metric objects, merged objects, and
Clock Foundation inputs.saveOutputs = TRUE is what makes this step useful in a file-based pipeline,
because later modeling steps consume these written files.The modeling wrappers also separate in-memory results from file outputs. The next example runs a small GLM analysis and prints the names of the files written to disk.
The next chunk shows how to recreate the temporary merged phenotype-plus-beta input file used by the GLM example.
glm_inputs <- dnaEPICO:::exampleMethylationGLMStateDnaEpico()
names(glm_inputs)
#> [1] "tempDir" "inputPath" "preparedData" "modelResults"
#> [5] "modelSummaries"
glm_inputs$tempDir
#> [1] "/tmp/Rtmp6rWi3o/dnaEPICO-glm-example-3a4cfb6af5ad13"
basename(glm_inputs$inputPath)
#> [1] "phenoBT1.RData"
file.exists(glm_inputs$inputPath)
#> [1] TRUE
glm_inputs is a list returned by the GLM example helper. names(glm_inputs)
shows that it contains the temporary directory, the saved merged
phenotype-plus-beta input file (inputPath), and the corresponding in-memory
objects produced during helper construction: preparedData, modelResults, and
modelSummaries.
glmPipelineResult <- methylationGLM_T1(
inputPheno = glm_inputs$inputPath,
phenotypes = "status",
covariates = "sex,age",
factorVars = "status,sex",
cpgLimit = 2,
nCores = 1,
outputLogs = file.path(glm_inputs$tempDir, "logs"),
outputRData = file.path(glm_inputs$tempDir, "rData", "models"),
outputPlots = file.path(glm_inputs$tempDir, "figures"),
significantCpGDir = file.path(glm_inputs$tempDir, "significant"),
summaryTxtDir = file.path(glm_inputs$tempDir, "summaries"),
summaryPval = 1,
significantCpGPval = 1,
annotationPackage = "IlluminaHumanMethylation450kanno.ilmn12.hg19",
annotationCols = "Name,chr,pos",
annotatedGLMOut = file.path(glm_inputs$tempDir, "annotated"),
display = FALSE,
verbose = FALSE,
logs = FALSE,
saveOutputs = TRUE
)
class(glmPipelineResult$savedFiles)
#> [1] "dnaEPICO_methylationGLM_T1_paths"
names(glmPipelineResult$savedFiles)
#> [1] "modelFiles" "summaryFiles" "summaryTxtFiles"
#> [4] "significantCpGFiles" "annotatedGLM"
The returned savedFiles object records the groups of outputs written by the
GLM step. names(glmPipelineResult$savedFiles) typically includes the model
files, summary files, diagnostic plot files, significant-CpG exports, summary
text files, and the annotated results table. This is the object to inspect when
you want to confirm which modeling outputs were written and how they are
grouped.
Arguments:
inputPheno points to the merged phenotype-plus-beta object generated by
preprocessingPheno().phenotypes lists the outcome variables that will be modeled one at a time.covariates lists the adjustment variables included in every model.factorVars identifies the categorical variables that should be treated as
factors before model fitting.cpgLimit = 2 keeps the example fast by fitting only two CpG models.nCores = 1 keeps the example deterministic and lightweight; production HPC
runs typically use higher values.annotationPackage and annotationCols control which array annotation is
merged into the final summary tables.outputRData, outputPlots, significantCpGDir, summaryTxtDir, and
annotatedGLMOut define where model objects, plots, summaries, and annotated
results are written.saveOutputs = TRUE enables the file-based pipeline behaviour expected by
Makefile and HPC use.The longitudinal modeling step follows the same pattern as the GLM wrapper, but the saved outputs now correspond to mixed-effects models, longitudinal summary tables, and interaction-specific result files.
The next chunk shows how to recreate the temporary combined longitudinal phenotype-plus-beta input file used by the LME example.
glmm_inputs <- dnaEPICO:::exampleMethylationGLMMStateDnaEpico()
names(glmm_inputs)
#> [1] "tempDir" "inputPath" "preparedData" "modelResults"
#> [5] "modelSummaries"
glmm_inputs$tempDir
#> [1] "/tmp/Rtmp6rWi3o/dnaEPICO-glmm-example-3a4cfb12508696"
basename(glmm_inputs$inputPath)
#> [1] "phenoBT1T2.RData"
file.exists(glmm_inputs$inputPath)
#> [1] TRUE
glmm_inputs plays the same role for the longitudinal example. The helper
returns a list with the temporary directory, the saved combined longitudinal
input file (inputPath), and the in-memory objects used to build that example:
preparedData, modelResults, and modelSummaries.
glmmPipelineResult <- methylationGLMM_T1T2(
inputPheno = glmm_inputs$inputPath,
outputLogs = file.path(glmm_inputs$tempDir, "logs"),
outputRData = file.path(glmm_inputs$tempDir, "rData", "models"),
outputPlots = file.path(glmm_inputs$tempDir, "figures"),
personVar = "person",
timeVar = "Timepoint",
phenotypes = "score",
covariates = "sex",
factorVars = "sex,Timepoint",
cpgLimit = 2,
nCores = 1,
summaryPval = 1,
saveSignificantInteractions = TRUE,
significantInteractionDir = file.path(
glmm_inputs$tempDir,
"results",
"cpgs",
"methylationGLMM_T1T2"
),
significantInteractionPval = 1,
saveTxtSummaries = TRUE,
summaryTxtDir = file.path(
glmm_inputs$tempDir,
"results",
"summary",
"methylationGLMM_T1T2"
),
annotationPackage = "IlluminaHumanMethylation450kanno.ilmn12.hg19",
annotationCols = "Name,chr,pos",
annotatedLMEOut = file.path(glmm_inputs$tempDir, "annotated"),
display = FALSE,
verbose = FALSE,
logs = FALSE,
saveOutputs = TRUE
)
class(glmmPipelineResult$savedFiles)
#> [1] "dnaEPICO_methylationGLMM_T1T2_paths"
names(glmmPipelineResult$savedFiles)
#> [1] "modelFiles" "summaryFiles"
#> [3] "summaryTxtFiles" "significantInteractionFiles"
#> [5] "annotatedLME"
The returned savedFiles object records the groups of outputs written by the
LME step. names(glmmPipelineResult$savedFiles) typically includes the model
files, summary files, diagnostic plot files, summary text files, significant
interaction exports, and the annotated longitudinal results table. This is the
object to inspect when you want to confirm which longitudinal modeling outputs
were written and how they are grouped.
Arguments:
inputPheno points to the combined longitudinal phenotype-plus-beta object.personVar defines the participant identifier used for the random effect.timeVar defines the repeated-measures time variable.phenotypes, covariates, and factorVars define the fixed-effect portion
of the mixed model.cpgLimit = 2 keeps the example short by fitting only two CpG models.nCores = 1 keeps the example lightweight; production HPC runs usually use
higher parallel settings.saveSignificantInteractions, significantInteractionDir, and
significantInteractionPval control the export of interaction-specific
results.summaryTxtDir defines where plain-text model summaries are written.annotationPackage, annotationCols, and annotatedLMEOut define the
annotation resources and output location for the final longitudinal summary
table.saveOutputs = TRUE enables the file-based longitudinal workflow expected by
Makefile and HPC use.For each CpG, methylationGLM_T1() fits a regression model of the form:
\[ \text{Methylation}_{CpG_i} = \beta_0 + \beta_1 \cdot \text{Phenotype} + \sum_{k = 1}^{K} \beta_{k + 1} \cdot \text{Covariate}_k + \varepsilon \]
where \(\text{Methylation}_{CpG_i}\) is the methylation value at CpG \(i\), \(\beta_0\) is the intercept, the phenotype term is the variable of interest, and the remaining fixed effects represent the listed covariates.
If no polygenic risk score is included, a model can be read schematically as:
\[ \begin{aligned} \text{Methylation}_{CpG_i} =\;& \beta_0 + \beta_1 \cdot \text{Phenotype} + \beta_2 \cdot \text{Age} + \beta_3 \cdot \text{Sex} + \beta_4 \cdot \text{Ethnicity} \\ &+ \varepsilon \end{aligned} \]
When prsMap is supplied, a phenotype-specific PRS term is appended only for
the matching phenotype. For example, the mapping
"Pheno1:PRS_1,Pheno2:PRS_2" means that:
Pheno1 include PRS_1Pheno2 include PRS_2For a phenotype mapped to a PRS, the model becomes:
\[ \text{Methylation}_{CpG_i} = \beta_0 + \beta_1 \cdot \text{Phenotype} + \sum_{k = 1}^{K} \beta_{k + 1} \cdot \text{Covariate}_k + \beta_{\text{PRS}} \cdot \text{PRS} + \varepsilon \]
If interactionTerm is supplied, the same framework is extended by adding the
requested interaction to the fixed-effect portion of the model.
The longitudinal wrapper methylationGLMM_T1T2() follows the same file-based
pattern, but uses a mixed-effects model with a participant-level random
intercept. In schematic form:
\[ \begin{aligned} \text{Methylation}_{CpG_i} =\;& \beta_0 + \beta_1 \cdot \text{Phenotype} + \beta_2 \cdot \text{Timepoint} + \beta_3 \cdot (\text{Phenotype} \times \text{Timepoint}) \\ &+ \sum_{k = 1}^{K} \beta_{k + 3} \cdot \text{Covariate}_k + \beta_{\text{PRS}} \cdot \text{PRS} + b_{\text{person}} + \varepsilon \end{aligned} \]
where \(b_{\text{person}}\) is the subject-specific random intercept. As in the
GLM workflow, the PRS term is included only when the current phenotype is
matched in prsMap.
Common arguments in this longitudinal stage are:
inputPheno for the combined longitudinal phenotype-plus-beta objectpersonVar for the participant identifier used in the random effecttimeVar for the repeated-measures variablephenotypes, covariates, and factorVars for the fixed-effect structureprsMap for phenotype-specific PRS termscpgLimit and nCores for computational controlannotationPackage and annotationCols for annotated summariesIn a project directory, the Makefile route is usually simpler than calling each wrapper by hand. A minimal project layout looks like this:
my_project/
Makefile
metadata/
pheno_model1.csv
data/
preprocessingMinfiEwasWater/
idats/
<IDAT files>
The exported template should then be edited so that at least the model name, phenotype path, and directory variables match the project. A minimal example is:
MODEL = model1
PHENO_FILE = metadata/pheno_model1.csv
LOGS_DIR := $(abspath logs)
DATA_DIR = data
RDATA_DIR = rData
RESULTS_DIR = results
FIGURES_DIR = figures
With that structure in place, a typical sequence of commands looks like this:
make step1 MODEL=model1
make step2 MODEL=model1
make step3 MODEL=model1
make f3 MODEL=model1
make f4 MODEL=model1
make f3lme MODEL=model1
make all MODEL=model1
make status MODEL=model1
make clean MODEL=model1
These targets correspond to the grouped outputs defined in the exported rules:
make step1 MODEL=model1 runs preprocessing only.make step2 MODEL=model1 runs the SVA step using the files written by Step 1.make step3 MODEL=model1 runs phenotype preparation using the files written by Step 1.make f3 MODEL=model1 runs preprocessing, SVA, phenotype preparation, and the report target defined by the current rules.make f4 MODEL=model1 runs the same route, adds the GLM step, and refreshes the report target.make f3lme MODEL=model1 runs the same route, adds the longitudinal LME step, and refreshes the report target.make all MODEL=model1 runs the full pipeline, including both modeling steps and the report.make status MODEL=model1 checks which expected output files are already present.make clean MODEL=model1 removes the files generated for the selected model while keeping the shared raw input area.If several models are declared in MODELS, the grouped multi-model targets can
be used instead:
make f3_models
make f4_models
make f3lme_models
make models
These grouped targets loop over every model listed in MODELS:
make f3_models runs the f3 route for each declared model.make f4_models runs the f4 route for each declared model.make f3lme_models runs the f3lme route for each declared model.make models runs the full all pipeline for each declared model.Two maintenance targets are also useful during routine work:
make status MODEL=model1
make clean MODEL=model1
make clean MODEL=all
make status MODEL=model1 reports which expected outputs are already present for one model.make clean MODEL=model1 removes the generated outputs for one model.make clean MODEL=all removes the generated outputs for every model listed in MODELS.The exact target names and variables depend on the Makefile settings extracted
for the project. In practice, saveOutputs = TRUE is what makes the individual
steps compatible with this route because later steps consume the files written by
earlier ones.
The full Makefile exported by extractMake() is shown below. This is the
template that should be adapted for a project before running the pipeline on a
workstation or HPC system.
# ===============================================
# USER CONFIGURATION
# ===============================================
MAKEFLAGS += --output-sync
# ===============================================
# MODELS SELECTION PARALLEL RUN
# ===============================================
MODEL ?= model1 # Single model
MODELS = modelA modelB modelC # Multiple models for parallel run
# ===============================================
# PER-MODEL OVERRIDES
# ===============================================
ifeq ($(MODEL), modelA)
PHENO_FILE = $(DATA_DIR)/$(STEP1)/phenoA.csv
else ifeq ($(MODEL), modelB)
PHENO_FILE = $(DATA_DIR)/$(STEP1)/phenoB.csv
else ifeq ($(MODEL), modelC)
PHENO_FILE = $(DATA_DIR)/$(STEP1)/phenoC.csv
else # default (model1)
PHENO_FILE = $(DATA_DIR)/$(STEP1)/pheno.csv
endif
# ===============================================
# DIRECTORIES
# ===============================================
LOGS_DIR := $(abspath logs)
DATA_DIR = data
RDATA_DIR = rData
RESULTS_DIR = results
FIGURES_DIR = figures
STEP1 = preprocessingMinfiEwasWater
STEP2 = svaEnmix
STEP3 = preprocessingPheno
STEP4 = methylationGLM_T1
STEP5 = methylationGLMM_T1T2
STEP6 = reports
METRICS_DIR = metrics
IDAT_DIR = idats
OBJ_DIR = objects
MERGE_DIR = mergeData
MODEL_DIR = models
CPG_DIR = cpgs
SUMMARY_DIR = summary
ENMIX_DIR = enmix
SVA_DIR = sva
QC_DIR = qc
R_DIR = ~/R/x86_64-pc-linux-gnu-library/4.4 # NULL is default R library path
# ===============================================
# GLOBAL PARAMETERS
# ===============================================
# STEP 1-5 PARAMETERS
PHENO_FILE = $(DATA_DIR)/$(STEP1)/phenoEMD.csv
SEP_TYPE = ""
SAMPLE_ID = Sample_Name
N_SAMPLES = 30
ARRAY_TYPE = IlluminaHumanMethylationEPICv2
ANNOTATION_VERSION = 20a1.hg38
TIFF_WIDTH = 2000
TIFF_HEIGHT = 1000
TIFF_RES = 150
SEX_COLUMN = Sex
SENTRIX_ID_COLUMN = Sentrix_ID
SENTRIX_POSITION_COLUMN = Sentrix_Position
BASENAME_COLUMN = Basename
TIME_VAR = Timepoint
PHENO_ORDER = $(SAMPLE_ID);$(TIME_VAR);$(SEX_COLUMN);PredSex;$(BASENAME_COLUMN);$(SENTRIX_ID_COLUMN);$(SENTRIX_POSITION_COLUMN)
# STEP 4-5 PARAMETERS
PHENOTYPES = TreatmentGroup
COVARIATES = Sex
FACTOR_VARS = Sex,TreatmentGroup
CPG_PREFIX = cg
CPG_LIMIT = NA
PRS_MAP = NULL
SUMMARY_PVAL = NA
N_CORES = 64
SAVE_TXT_SUMMARIES = TRUE
CHUNK_SIZE = 10000
FDR_THRESHOLD = 0.05
PADJ_METHOD = fdr
ANNOTATION_PACKAGE = IlluminaHumanMethylationEPICv2anno.20a1.hg38
ANNOTATION_COLS = Name,chr,pos,UCSC_RefGene_Group,UCSC_RefGene_Name,Relation_to_Island,GencodeV41_Group
# ===============================================
# STEP 1 PARAMETERS
# ===============================================
QC_CUTOFF = 10.5
DET_PTYPE = m+u
DET_PTHRESHOLD = 0.05
PVAL_THRESHOLD = 0.01
CHR_TO_REMOVE = chrX,chrY
SNPS_TO_REMOVE = SBE,CpG
CROSS_REACTIVE_FILE = $(DATA_DIR)/$(STEP1)/12864_2024_10027_MOESM8_ESM.csv
MAF_THRESHOLD = 0.1
PLOT_GROUP_VAR = TreatvControl_Time1_vs_Time2
LC_REF = salivaEPIC
# ===============================================
# STEP 2 PARAMETERS
# ===============================================
CTRL_SVA_PERC_VAR = 0.90
CTRL_SVA_FLAG = 1
# ===============================================
# STEP 3 PARAMETERS
# ===============================================
TIMEPOINTS = 1,2
COMBINE_TIMEPOINTS = 1,2
# ===============================================
# STEP 4 PARAMETERS
# ===============================================
GLM_LIBS = glm2
INTERACTION_GLM = NULL
SUMMARY_RESIDUAL_SD = TRUE
SAVE_SIGNIFICANT_CPGS = TRUE
SIGNIFICANT_CPG_PVAL = 0.1
# ==============================================
# STEP 5 PARAMETERS
# ==============================================
PERSON_VAR = person
LME_LIBS = lme4,lmerTest
INTERACTION_LME = TreatvControl_Time1_vs_Time2
SAVE_SIGNIFICANT_INTERACTIONS = TRUE
SIGNIFICANT_INTERACTION_PVAL = 0.1
# ===============================================
# ARGUMENT NORMALISATION
# ===============================================
ifeq ($(PRS_MAP),NULL)
PRS_MAP_ARG = NULL
else
PRS_MAP_ARG = '$(PRS_MAP)'
endif
ifeq ($(R_DIR),NULL)
R_DIR_ARG = NULL
else
R_DIR_ARG = '$(R_DIR)'
endif
ifeq ($(INTERACTION_GLM),NULL)
INTERACTION_GLM_ARG = NULL
else
INTERACTION_GLM_ARG = '$(INTERACTION_GLM)'
endif
ifeq ($(INTERACTION_LME),NULL)
INTERACTION_LME_ARG = NULL
else
INTERACTION_LME_ARG = '$(INTERACTION_LME)'
endif
# ===============================================
# INCLUDE PIPELINE FROM dnaEPICO
# ===============================================
DNAPIPE_MK := $(shell Rscript -e "cat(system.file('extdata','make','Makefile.rules.pipeline',package='dnaEPICO'))")
include $(DNAPIPE_MK)
The exported Makefile is organised into sections so that project-specific choices can be edited without changing the pipeline rules themselves.
MAKEFLAGS += --output-sync keeps parallel Make output grouped by recipe,
which makes logs easier to read.MODEL defines a single pipeline configuration to run.MODELS defines a set of configurations that can be launched in parallel by
the grouped Make targets.In single-model use, commands such as make f4 MODEL=model1 run one workflow.
In multi-model use, commands such as make f4_models evaluate the same template
for each name listed in MODELS.
PHENO_FILE is reassigned according to the selected MODEL.This block is useful when different cohorts, case definitions, or phenotype encodings should reuse the same analysis pipeline without duplicating the full Makefile.
LOGS_DIR, DATA_DIR, RDATA_DIR, RESULTS_DIR, and FIGURES_DIR define
the top-level folder structure.STEP1 to STEP6 define the canonical names of the major pipeline stages.METRICS_DIR, IDAT_DIR, OBJ_DIR, MERGE_DIR, MODEL_DIR, CPG_DIR,
SUMMARY_DIR, ENMIX_DIR, SVA_DIR, and QC_DIR define the subdirectory
names used by the rules file.R_DIR optionally points to a custom R library path on shared systems. When
set to NULL, the default library path is used.These variables control where files are written and how paths are composed across all stages of the pipeline.
These parameters are shared across multiple workflow steps.
QC_CUTOFF: sample-quality threshold applied during preprocessing.DET_PTYPE: detection p-value method.DET_PTHRESHOLD: detection p-value cutoff.PVAL_THRESHOLD: probe-level p-value filter applied after preprocessing.CHR_TO_REMOVE: chromosomes to exclude from downstream matrices.SNPS_TO_REMOVE: SNP-related probe categories to remove.CROSS_REACTIVE_FILE: cross-reactive probe reference file.MAF_THRESHOLD: minor-allele-frequency threshold used during SNP filtering.PLOT_GROUP_VAR: phenotype variable used for grouped QC plots.LC_REF: reference panel used for cell-type estimation.These values control the main preprocessing and filtering choices in
preprocessingMinfiEwasWater().
CTRL_SVA_PERC_VAR: target proportion of control-probe variance explained by
the retained surrogate variables.CTRL_SVA_FLAG: enables the control-based ENmix SVA workflow.These values determine how svaEnmix() estimates and retains surrogate
variables.
TIMEPOINTS: comma-separated timepoints exported separately.COMBINE_TIMEPOINTS: comma-separated timepoints combined into the
longitudinal object.These values determine how preprocessingPheno() splits and recombines the
phenotype-methylation data.
GLM_LIBS: GLM implementation used by methylationGLM_T1().INTERACTION_GLM: optional interaction term added to the fixed-effect part
of the GLM.SUMMARY_RESIDUAL_SD: whether residual standard deviations are reported in
summaries.SAVE_SIGNIFICANT_CPGS: whether tables of significant CpGs are exported.SIGNIFICANT_CPG_PVAL: p-value threshold used for those significant-CpG
exports.PERSON_VAR: participant identifier used as the random effect grouping
factor.LME_LIBS: mixed-effects libraries used by methylationGLMM_T1T2().INTERACTION_LME: optional interaction term used in the longitudinal model.SAVE_SIGNIFICANT_INTERACTIONS: whether significant interaction tables are
exported.SIGNIFICANT_INTERACTION_PVAL: p-value threshold for those exported
interaction results.PRS_MAP_ARG, R_DIR_ARG, INTERACTION_GLM_ARG, and INTERACTION_LME_ARG
convert Makefile string values into arguments that can be passed safely to R.This section is especially important for values such as NULL, because it
prevents the literal string "NULL" from being interpreted as a real model
value inside the wrapper functions.
DNAPIPE_MK resolves the packaged Makefile.rules.pipeline file.include $(DNAPIPE_MK) imports the actual pipeline rules after the user
configuration has been defined.This separation is what allows the template Makefile to stay compact while the package keeps the rule implementation in one maintained location.
The file-based route is the correct choice when you need:
The interactive route and the file-based route are therefore complementary: local use is best for understanding the returned objects, while pipeline use is best for reproducible multi-step execution.
Date the vignette was generated.
#> [1] "2026-04-22 21:03:53 EDT"
Wallclock time spent generating the vignette.
#> Time difference of 1.766 mins
R session information.
#> R version 4.6.0 RC (2026-04-17 r89917)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] parallel stats4 stats graphics grDevices utils datasets
#> [8] methods base
#>
#> other attached packages:
#> [1] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.1
#> [2] IlluminaHumanMethylation450kmanifest_0.4.0
#> [3] minfi_1.57.0
#> [4] bumphunter_1.53.0
#> [5] locfit_1.5-9.12
#> [6] iterators_1.0.14
#> [7] foreach_1.5.2
#> [8] Biostrings_2.79.5
#> [9] XVector_0.51.0
#> [10] SummarizedExperiment_1.41.1
#> [11] Biobase_2.71.0
#> [12] MatrixGenerics_1.23.0
#> [13] matrixStats_1.5.0
#> [14] GenomicRanges_1.63.2
#> [15] Seqinfo_1.1.0
#> [16] IRanges_2.45.0
#> [17] S4Vectors_0.49.2
#> [18] BiocGenerics_0.57.1
#> [19] generics_0.1.4
#> [20] dnaEPICO_0.99.14
#> [21] BiocStyle_2.39.0
#>
#> loaded via a namespace (and not attached):
#> [1] splines_4.6.0 BiocIO_1.21.0
#> [3] bitops_1.0-9 filelock_1.0.3
#> [5] tibble_3.3.1 preprocessCore_1.73.0
#> [7] XML_3.99-0.23 lifecycle_1.0.5
#> [9] httr2_1.2.2 Rdpack_2.6.6
#> [11] doParallel_1.0.17 lattice_0.22-9
#> [13] MASS_7.3-65 base64_2.0.2
#> [15] scrime_1.3.7 magrittr_2.0.5
#> [17] limma_3.67.3 sass_0.4.10
#> [19] rmarkdown_2.31 jquerylib_0.1.4
#> [21] yaml_2.3.12 otel_0.2.0
#> [23] doRNG_1.8.6.3 askpass_1.2.1
#> [25] minqa_1.2.8 DBI_1.3.0
#> [27] RColorBrewer_1.1-3 abind_1.4-8
#> [29] quadprog_1.5-8 purrr_1.2.2
#> [31] RCurl_1.98-1.18 rappdirs_0.3.4
#> [33] ggrepel_0.9.8 irlba_2.3.7
#> [35] rentrez_1.2.4 genefilter_1.93.0
#> [37] annotate_1.89.0 DelayedMatrixStats_1.33.0
#> [39] codetools_0.2-20 DelayedArray_0.37.1
#> [41] xml2_1.5.2 tidyselect_1.2.1
#> [43] glm2_1.2.1 farver_2.1.2
#> [45] lme4_2.0-1 beanplot_1.3.1
#> [47] BiocFileCache_3.1.0 dynamicTreeCut_1.63-1
#> [49] illuminaio_0.53.0 GenomicAlignments_1.47.0
#> [51] jsonlite_2.0.0 multtest_2.67.0
#> [53] survival_3.8-6 tools_4.6.0
#> [55] Rcpp_1.1.1-1 glue_1.8.1
#> [57] SparseArray_1.11.13 xfun_0.57
#> [59] dplyr_1.2.1 HDF5Array_1.39.1
#> [61] withr_3.0.2 numDeriv_2016.8-1.1
#> [63] BiocManager_1.30.27 fastmap_1.2.0
#> [65] boot_1.3-32 rhdf5filters_1.23.3
#> [67] openssl_2.4.0 caTools_1.18.3
#> [69] digest_0.6.39 R6_2.6.1
#> [71] RPMM_1.25 gtools_3.9.5
#> [73] dichromat_2.0-0.1 RSQLite_2.4.6
#> [75] cigarillo_1.1.0 h5mread_1.3.3
#> [77] minfiData_0.57.0 tidyr_1.3.2
#> [79] data.table_1.18.2.1 rtracklayer_1.71.3
#> [81] httr_1.4.8 S4Arrays_1.11.1
#> [83] pkgconfig_2.0.3 gtable_0.3.6
#> [85] blob_1.3.0 S7_0.2.2
#> [87] siggenes_1.85.0 impute_1.85.0
#> [89] htmltools_0.5.9 bookdown_0.46
#> [91] geneplotter_1.89.0 scales_1.4.0
#> [93] png_0.1-9 reformulas_0.4.4
#> [95] knitr_1.51 tzdb_0.5.0
#> [97] rjson_0.2.23 nloptr_2.2.1
#> [99] nlme_3.1-169 curl_7.1.0
#> [101] cachem_1.1.0 rhdf5_2.55.16
#> [103] BiocVersion_3.23.1 KernSmooth_2.23-26
#> [105] AnnotationDbi_1.73.1 restfulr_0.0.16
#> [107] GEOquery_2.79.0 pillar_1.11.1
#> [109] grid_4.6.0 reshape_0.8.10
#> [111] vctrs_0.7.3 gplots_3.3.0
#> [113] ENmix_1.47.3 dbplyr_2.5.2
#> [115] xtable_1.8-8 cluster_2.1.8.2
#> [117] evaluate_1.0.5 readr_2.2.0
#> [119] GenomicFeatures_1.63.2 cli_3.6.6
#> [121] compiler_4.6.0 Rsamtools_2.27.2
#> [123] rlang_1.2.0 crayon_1.5.3
#> [125] rngtools_1.5.2 labeling_0.4.3
#> [127] nor1mix_1.3-3 mclust_6.1.2
#> [129] plyr_1.8.9 BiocParallel_1.45.0
#> [131] lmerTest_3.2-1 Matrix_1.7-5
#> [133] ExperimentHub_3.1.0 hms_1.1.4
#> [135] sparseMatrixStats_1.23.0 bit64_4.8.0
#> [137] ggplot2_4.0.3 Rhdf5lib_1.99.6
#> [139] KEGGREST_1.51.1 statmod_1.5.1
#> [141] AnnotationHub_4.1.0 rbibutils_2.4.1
#> [143] memoise_2.0.1 bslib_0.10.0
#> [145] bit_4.6.0
As package developers, we try to explain clearly how to use our packages and in
which order to use the functions. But R and Bioconductor have a steep
learning curve, so it is critical to learn where to ask for help. We would like
to highlight the Bioconductor support site
as the main resource for getting help. Please remember to use the dnaEPICO
tag and check the
older posts. If you want to
receive help, please provide a small reproducible example and your session
information so the source of the problem can be tracked efficiently.
Aryee, Martin J, Andrew E Jaffe, Hector Corrada Bravo, Christine Ladd-Acosta, Andrew P Feinberg, Kasper D Hansen, and Rafael A Irizarry. 2014. “Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays.” Bioinformatics 30 (10): 1363–9. https://doi.org/10.1093/bioinformatics/btu049.
Bates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015. “Fitting Linear Mixed-Effects Models Using lme4.” Journal of Statistical Software 67 (1): 1–48. https://doi.org/10.18637/jss.v067.i01.
Fortin, Jean-Philippe, Aurélie Labbe, Mathieu Lemire, Brent W Zanke, Thomas J Hudson, Elana J Frtig, Celia MT Greenwood, and Kasper D Hansen. 2014. “Functional normalization of 450k methylation array data improves replication in large cancer studies.” Genome Biology 15 (11): 503. https://doi.org/10.1186/s13059-014-0503-2.
Fortin, Jean-Philippe, Timothy Triche Jr., and Kasper D Hansen. 2017. “Preprocessing, Normalization and Integration of the Illumina HumanMethylationEPIC Array with Minfi.” Bioinformatics 33 (4): 558–60. https://doi.org/10.1093/bioinformatics/btw691.
Maksimovic, Jovana, Lavinia Gordon, and Alicia Oshlack. 2012. “SWAN: Subset quantile Within-Array Normalization for Illumina Infinium HumanMethylation450 BeadChips.” Genome Biology 13 (6): R44. https://doi.org/10.1186/gb-2012-13-6-r44.
Marschner, Ian. 2011. “Glm2: Fitting Generalized Linear Models with Convergence Problems.” The R Journal 3 (2): 12–15. https://doi.org/10.32614/RJ-2011-012.
Murat, Kubra, Björn Grüning, Pawel W Poterlowicz, Gareth Westgate, Desmond J Tobin, and Krzysztof Poterlowicz. 2020. “Ewastools: Infinium Human Methylation BeadChip pipeline for population epigenetics integrated into Galaxy.” GigaScience 9 (5): giaa049. https://doi.org/10.1093/gigascience/giaa049.
Pidsley, Ruth, Ching Ching Y Wong, Matteo Volta, Katie Lunnon, Jonathan Mill, and Leonard C Schalkwyk. 2013. “A data-driven approach to preprocessing Illumina 450K methylation array data.” BMC Genomics 14: 293. https://doi.org/10.1186/1471-2164-14-293.
Touleimat, Nizar, and Jörg Tost. 2012. “Complete Pipeline for Infinium() Human Methylation 450K BeadChip Data Processing Using Subset Quantile Normalization for Accurate DNA Methylation Estimation.” Epigenomics 4 (3): 325–41. https://doi.org/10.2217/epi.12.21.
Triche, Timothy J, Daniel J Weisenberger, David Van Den Berg, Peter W Laird, and Kimberly D Siegmund. 2013. “Low-level processing of Illumina Infinium DNA Methylation BeadArrays.” Nucleic Acids Research 41 (7): e90. https://doi.org/10.1093/nar/gkt090.
Xu, Zongli, Li Niu, and Jack A Taylor. 2021. “The ENmix DNA methylation analysis pipeline for Illumina BeadChip and comparisons with seven other preprocessing pipelines.” Clinical Epigenetics 13 (1): 216. https://doi.org/10.1186/s13148-021-01207-1.