RFLOMICS is a R package that provides a complete workflow for the
analysis of multi-omics data within complex experimental designs
framework. The package was originally designed to be used through a user
interface (see RFLOMICS vignette), but has been developed
in an object oriented programming with a limited number of classes and
methods. This allow to run data analysis as command line in a workflow
with the “|>” operator, each step being a method of the extended MultiAssayExperiment
or SummarizedExperiment
classes.
To ensure efficient data management the MultiAssayExperiment
and SummarizedExperiment
classes were extended to fix an organization for the metadata slot. The
statistical design and analysis parameters of the multi-omics data
integration are stored in the metadata slot of the
RflomicsMAE class, while the parameters and results of the
single-omics are stored in the metadata slot of
RflomicsSE objects of the ExperimentList.
This organization can be illustrated by presenting the workflow of the RFLOMICS package.
The workflow is structured in three main parts:
The second part allows a complete single-omics analysis for each dataset. The raw and processed omic datasets are stored in the ExperimentList as RflomicsSE. For each analysis step, settings and results are stored (in green in Fig 2)
The DataProcessing list in the metadata slot contains the parameters and results of pre-processing including filtering, normalization and transformation
The DiffExpAnal list in the metadata slot contains the parameters and results of the differential expression analysis.
The CoExpAnal list in the metadata slot stores the parameters and results of the coexpression analysis on differentially expressed entities.
DiffExpEnrichAnal and CoExpEnrichAnal lists in the metadata slot store the results and parameters of the functional enrichment analysis for the differential expression and the coexpression analysis, respectively.
As you will see, this analysis workflow is implemented as methods of
the two classes (RflomicsMAE and RflomicsSE),
providing a generic interface whatever the type of omics data being
analyzed (RNAseq, proteomics, metabolomics, etc.). These methods return
the same class as the input object (RflomicsMAE and
RflomicsSE) with the metadata slot enriched with parameters
and results. Getter and Setter methods have been defined to access the
metadata objects.
In this example we will instantiate a RflomicsMAE object from an
existing MAE object and a design data.frame. We will use the ecoseed
dataset from the RFLOMICS package (see data(ecoseed)).
ecoseed is a multi-omics dataset composed of three data
matrices: transcriptomics (raw RNAseq read count data matrix),
metabolomics and proteomics (relative abundance matrix as XIC) obtained
from a study examining the effect of seed production temperature on the
germination potential of Arabidopsis thaliana.
The experimental design includes three factors:
- Repeat: batch factor: rep1, rep2, rep3
- temperature: biological factor with three conditions: Low, Medium, Elevated
- imbibition: biological factor with three conditions: DS (Dry Deed), EI (Early Imbibition), LI (Late Imbibition)
Crucial information is needed to complete the design and create the RflomicsMAE object. Experimental factors must be defined with their type (batch or biological) and a reference level. These information are used to automatically generate the statistical design.
PCA on raw data is performed during the instantiation of the RflomicsMAE object.
data(ecoseed.mae)
factorInfo <- data.frame(
"factorName" = c("Repeat", "temperature", "imbibition"),
"factorType" = c("batch", "Bio", "Bio"),
"factorRef" = c("rep1", "Low", "DS")
)
# create rflomicsMAE object with ecoseed data
MAE <- createRflomicsMAE(
projectName = "Tests",
omicsData = ecoseed.mae,
omicsTypes = c("RNAseq", "proteomics", "metabolomics"),
factorInfo = factorInfo
)
names(metadata(MAE))
#> [1] "omicList" "projectName" "design"
#> [4] "IntegrationAnalysis" "date" "sessionInfo"
#> [7] "rflomicsVersion" "color"
head(getDesignMat(MAE))
#> Repeat groups temperature imbibition samples
#> Low_DS_1 rep1 Low_DS Low DS Low_DS_1
#> Low_DS_2 rep2 Low_DS Low DS Low_DS_2
#> Low_DS_3 rep3 Low_DS Low DS Low_DS_3
#> Medium_DS_1 rep1 Medium_DS Medium DS Medium_DS_1
#> Medium_DS_2 rep2 Medium_DS Medium DS Medium_DS_2
#> Medium_DS_3 rep3 Medium_DS Medium DS Medium_DS_3Omics data is heterogeneous and each omics requires specific methods
and parameters to take into account of the nature of data. RFLOMICS has
a set of generic methods than can be applied to the
RflomicsMAE or RflomicsSE objects and which
adapt the methods and parameters to the type of omics data: RNAseq,
proteomics, metabolomics.
The main methods are:
| Methods | Description |
|---|---|
createRflomicsMAE |
instantiate a RflomicsMAE object from a
MAE object and a design data.frame |
runDataProcessing |
Process data: filtering: features or samples, normalization and/or transformation, PCA |
runDiffAnalysis |
perform a Differential features analysis on omics data |
runCoExpression |
perform a CoExpression analysis from the union/intersection of differentially expressed features list. |
runAnnotationEnrichment |
perform an annotation enrichment analysis (GO, KEGG, custom) on the list of differentially or co-expressed features |
runOmicsIntegration |
perform an integration analysis (MOFA or DIABLO) |
generateReport |
generate a report from the RflomicsMAE object and optionally export the report and results |
Table 1: Main methods of the RflomicsMAE and RflomicsSE classes for running a multi-omics analysis workflow.
When applied to a RflomicsMAE object, the name of the
RflomicsSE object must be specified with the
SE.name parameter.
The next step is to define the model and associated contrasts by
translating the biological question into statistical terms. The
generateModelFormulae function is used to find all possible
models given the experimental design settings.
Rflomics accepts up to three biological factors and one batch effect. Only second-order interaction terms between biological factors will appear in the model formulae. Batch factors will never appear in the interaction terms. For the rest of the example, we choose the second formula. The model without the interaction term.
form <- generateModelFormulae(MAE)
form
#> $`~Repeat + temperature + imbibition + temperature:imbibition`
#> ~Repeat + temperature + imbibition + temperature:imbibition
#> <environment: 0x5636e4c13200>
#>
#> $`~Repeat + temperature + imbibition`
#> ~Repeat + temperature + imbibition
#> <environment: 0x5636e4c13200>
MAE <- setModelFormula(MAE, form[2])
getModelFormula(MAE)
#> [1] "~Repeat + temperature + imbibition"Given the formula chosen, all the hypotheses (contrasts) can be
obtained using the generateExpressionContrast function.
Three types of contrast are expressed: pairwise comparison, average
expression and interaction expression.
In this example, we will select the first three contrasts from the averaged contrasts.
possibleContrasts <- generateExpressionContrast(MAE)
possibleContrasts$averaged$contrastName
#> [1] "(temperatureMedium - temperatureLow) in mean"
#> [2] "(temperatureElevated - temperatureLow) in mean"
#> [3] "(temperatureElevated - temperatureMedium) in mean"
#> [4] "(imbibitionEI - imbibitionDS) in mean"
#> [5] "(imbibitionLI - imbibitionDS) in mean"
#> [6] "(imbibitionLI - imbibitionEI) in mean"
MAE <- setSelectedContrasts(MAE, contrastList = possibleContrasts$averaged[1:3])
getSelectedContrasts(MAE)[, -c(1, 3)]
#> contrastName type tag
#> <char> <char> <char>
#> 1: (temperatureMedium - temperatureLow) in mean mean H1
#> 2: (temperatureElevated - temperatureLow) in mean mean H2
#> 3: (temperatureElevated - temperatureMedium) in mean mean H3This statistical framework will be applied on all loaded datasets.
Data processing, including normalization and transformation
parameters and results, can be carried out in a single step using the
runDataProcessing method. Or in several steps using the
methods called by runDataProcessing, which
are:filterLowAbundance (RNAseq),
runNormalization and runTransformData.
In this example, the three omics data sets are processed in a single step.
MAE <- runDataProcessing(
object = MAE,
SE.name = "RNAtest",
samples = colnames(MAE[["RNAtest"]])[-1],
filterStrategy = "NbReplicates",
cpmCutoff = 1,
normMethod = "TMM"
) |>
runDataProcessing(
SE.name = "metatest", normMethod = "median",
transformMethod = "log2"
) |>
runDataProcessing(
SE.name = "protetest", normMethod = "none",
transformMethod = "log2"
) |>
runOmicsPCA(SE.name = "metatest")
# Access to the normalization settings for metabolomics data
getNormSettings(object = MAE[["metatest"]])
#> $method
#> [1] "median"
#>
#> $suppInfo
#> NULL
# Obtain the list of filtered features for the RNAseq data
getFilteredFeatures(object = MAE[["RNAtest"]])[1:10]
#> [1] "AT1G07680" "AT1G11920" "AT1G23210" "AT1G31670" "AT1G33540" "AT1G34510"
#> [7] "AT1G44050" "AT1G47600" "AT1G50060" "AT1G52050"It is important to note that none of the data is directly transformed
in the assay, unless the user specifically requests it by setting the
modifyAssay argument to TRUE. This way, you can always
change your mind and test several normalisation or transformation
processes.
You can access to processing results and plot them.
plotDataDistribution(MAE[["metatest"]], raw = TRUE)
plotDataDistribution(MAE[["metatest"]], raw = FALSE)Differential analysis on each contrasts can be performed using the
runDiffAnalysis method. This method is specialized for each
omics type and uses the limma package for proteomics and
metabolomics data and the edgeR package for RNAseq data.
The method returns the same object with the metadata slot enriched with
the parameters and results of the analysis.
MAE <- runDiffAnalysis(MAE,
SE.name = "RNAtest", p.adj.method = "BH",
method = "edgeRglmfit", p.adj.cutoff = 0.05, logFC.cutoff = 0
) |>
runDiffAnalysis(
SE.name = "protetest", p.adj.method = "BH",
method = "limmalmFit", p.adj.cutoff = 0.05, logFC.cutoff = 0
) |>
runDiffAnalysis(
SE.name = "metatest", p.adj.method = "BH",
method = "limmalmFit", p.adj.cutoff = 0.05, logFC.cutoff = 0
)
# access to the settings
getDiffSettings(MAE, SE.name = "RNAtest")
#> $method
#> [1] "edgeRglmfit"
#>
#> $p.adj.method
#> [1] "BH"
#>
#> $p.adj.cutoff
#> [1] 0.05
#>
#> $abs.logFC.cutoff
#> [1] 0
#>
#> $contrastCoef
#> Intercept Repeatrep2
#> (temperatureMedium - temperatureLow) in mean 0 0
#> (temperatureElevated - temperatureLow) in mean 0 0
#> (temperatureElevated - temperatureMedium) in mean 0 0
#> Repeatrep3 temperatureMedium
#> (temperatureMedium - temperatureLow) in mean 0 1
#> (temperatureElevated - temperatureLow) in mean 0 0
#> (temperatureElevated - temperatureMedium) in mean 0 -1
#> temperatureElevated
#> (temperatureMedium - temperatureLow) in mean 0
#> (temperatureElevated - temperatureLow) in mean 1
#> (temperatureElevated - temperatureMedium) in mean 1
#> imbibitionEI imbibitionLI
#> (temperatureMedium - temperatureLow) in mean 0 0
#> (temperatureElevated - temperatureLow) in mean 0 0
#> (temperatureElevated - temperatureMedium) in mean 0 0
# Summary of the differential analysis
getDiffStat(MAE[["RNAtest"]])
#> All Up Down
#> (temperatureMedium - temperatureLow) in mean 4879 2387 2492
#> (temperatureElevated - temperatureLow) in mean 9648 4868 4780
#> (temperatureElevated - temperatureMedium) in mean 4141 2096 2045plotDiffAnalysis(MAE,
SE.name = "RNAtest",
contrastName = "(temperatureMedium - temperatureLow) in mean",
typeofplots = "volcano"
)
#> $Volcano.plotCo-expression analysis on the list of differentially expressed
features can be performed using the runCoExpression method.
This method uses methods from the coseq package with a set
of optimal parameters tailored to each omics. In this example, we will
perform a co-expression analysis on the proteomics
data.
The results can be plot using the
getCoExpAnalysesSummary method.
A co-expression profile can be plotted for a given cluster using the
plotCoExpressionProfile function.
MAE <- runCoExpression(MAE,
SE.name = "protetest",
K = 2:5, replicates = 5,
merge = "union"
)
#> ****************************************
#> coseq analysis: Normal approach & none transformation
#> K = 2 to 5
#> Use seed argument in coseq for reproducible results.
#> ****************************************
#> ****************************************
#> coseq analysis: Normal approach & none transformation
#> K = 2 to 5
#> Use seed argument in coseq for reproducible results.
#> ****************************************
#> ****************************************
#> coseq analysis: Normal approach & none transformation
#> K = 2 to 5
#> Use seed argument in coseq for reproducible results.
#> ****************************************
#> ****************************************
#> coseq analysis: Normal approach & none transformation
#> K = 2 to 5
#> Use seed argument in coseq for reproducible results.
#> ****************************************
#> ****************************************
#> coseq analysis: Normal approach & none transformation
#> K = 2 to 5
#> Use seed argument in coseq for reproducible results.
#> ****************************************
getCoExpAnalysesSummary(MAE)Annotation enrichment analysis can be performed using the
runAnnotationEnrichment method. This method uses the
clusterProfiler package. In this example, the
org.At.tair.db database for Arabidopsis thaliana will be
used to perform the GO enrichment analysis of the DE protein list.
The results can be viewed using the getEnrichRes method,
a summary of the number of enriched terms by the sumORA
function and plotted, by example, using the plotEnrichComp
method.
MAE <- runAnnotationEnrichment(MAE,
SE.name = "protetest",
OrgDb = "org.At.tair.db",
keyType = "TAIR",
pvalueCutoff = 0.05,
from = "DiffExp",
database = "GO",
domain = "CC"
)
results <- getEnrichRes(MAE[["protetest"]],
from = "DiffExp", database = "GO"
)
sumORA(MAE[["protetest"]], from = "DiffExp", database = "GO")
#> Contrast
#> (temperatureMedium - temperatureLow) in mean (temperatureMedium - temperatureLow) in mean
#> (temperatureElevated - temperatureLow) in mean (temperatureElevated - temperatureLow) in mean
#> (temperatureElevated - temperatureMedium) in mean (temperatureElevated - temperatureMedium) in mean
#> CC
#> (temperatureMedium - temperatureLow) in mean 0
#> (temperatureElevated - temperatureLow) in mean 1
#> (temperatureElevated - temperatureMedium) in mean 1
plotEnrichComp(MAE[["protetest"]],
from = "DiffExp", database = "GO", domain = "CC",
matrixType = "FC"
)The multi-omics step in RFLOMICS takes two steps: the preparation of
the object, based on the user-selected method, and the actual run of the
selected method. This is done using prepareForIntegration
and runOmicsIntegration. In this example, we will use only
the proteomics and metabolomics dataset with the mixOmics package
(block.plsda function).
prepareForIntegration does not return a RflomicsMAE
object, it returns the corresponding type to be an input for either
MOFA2 or mixOmics functions.
listIntegration <- prepareForIntegration(MAE,
omicsNames = c("protetest", "metatest"),
variableLists = rownames(MAE),
method = "mixOmics"
)
MAE <- runOmicsIntegration(
object = MAE,
preparedObject = listIntegration,
method = "mixOmics",
selectedResponse = "temperature",
ncomp = 3
)Visualizing results is made through the original package functions.
The generateReport function is used to generate a report
from a RflomicsMAE object. The report is generated in HTML format.
The MAE object can be saved (export = TRUE), exporting the object in a .RData, the report in html format, the graphs in png format and the data in txt (tabulated) format.
It’s use is straightforward, using the
generateReport(object = MAE, fileName = "ecoseed_report.html", export = FALSE)
command. It can take a long time, depending on the amount of analyses
performed and the data size.
sessionInfo()
#> R version 4.5.1 (2025-06-13)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] RFLOMICS_1.3.0 coseq_1.33.0
#> [3] knitr_1.50 htmltools_0.5.8.1
#> [5] ggplot2_4.0.0 dplyr_1.1.4
#> [7] shinyBS_0.61.1 MultiAssayExperiment_1.35.9
#> [9] SummarizedExperiment_1.39.2 Biobase_2.69.1
#> [11] GenomicRanges_1.63.0 Seqinfo_0.99.4
#> [13] IRanges_2.45.0 S4Vectors_0.47.6
#> [15] BiocGenerics_0.55.4 generics_0.1.4
#> [17] MatrixGenerics_1.21.0 matrixStats_1.5.0
#> [19] BiocStyle_2.37.1
#>
#> loaded via a namespace (and not attached):
#> [1] fs_1.6.6 enrichplot_1.31.0 httr_1.4.7
#> [4] RColorBrewer_1.1-3 doParallel_1.0.17 tools_4.5.1
#> [7] backports_1.5.0 R6_2.6.1 DT_0.34.0
#> [10] HDF5Array_1.37.0 lazyeval_0.2.2 uwot_0.2.3
#> [13] rhdf5filters_1.21.4 GetoptLong_1.0.5 withr_3.0.2
#> [16] gridExtra_2.3 bayesm_3.1-6 cli_3.6.5
#> [19] flashClust_1.01-2 labeling_0.4.3 sass_0.4.10
#> [22] mvtnorm_1.3-3 S7_0.2.0 robustbase_0.99-6
#> [25] proxy_0.4-27 systemfonts_1.3.1 yulab.utils_0.2.1
#> [28] gson_0.1.0 DOSE_4.5.0 R.utils_2.13.0
#> [31] plotrix_3.8-4 limma_3.65.7 org.At.tair.db_3.22.0
#> [34] RSQLite_2.4.3 gridGraphics_0.5-1 shape_1.4.6.1
#> [37] vroom_1.6.6 car_3.1-3 HTSCluster_2.0.11
#> [40] GO.db_3.22.0 leaps_3.2 Matrix_1.7-4
#> [43] abind_1.4-8 R.methodsS3_1.8.2 lifecycle_1.0.4
#> [46] scatterplot3d_0.3-44 yaml_2.3.10 edgeR_4.9.0
#> [49] carData_3.0-5 rhdf5_2.53.6 qvalue_2.41.0
#> [52] SparseArray_1.9.2 Rtsne_0.17 grid_4.5.1
#> [55] blob_1.2.4 promises_1.4.0 shinydashboard_0.7.3
#> [58] crayon_1.5.3 dir.expiry_1.17.0 ggtangle_0.0.7
#> [61] lattice_0.22-7 cowplot_1.2.0 KEGGREST_1.49.2
#> [64] HTSFilter_1.51.0 capushe_1.1.3 sys_3.4.3
#> [67] maketools_1.3.2 pillar_1.11.1 ComplexHeatmap_2.25.2
#> [70] fgsea_1.35.8 rjson_0.2.23 estimability_1.5.1
#> [73] corpcor_1.6.10 mixOmics_6.33.0 codetools_0.2-20
#> [76] fastmatch_1.1-6 glue_1.8.0 ggiraph_0.9.2
#> [79] ggfun_0.2.0 fontLiberation_0.1.0 data.table_1.17.8
#> [82] vctrs_0.6.5 png_0.1-8 treeio_1.33.0
#> [85] gtable_0.3.6 cachem_1.1.0 xfun_0.53
#> [88] S4Arrays_1.9.3 mime_0.13 pheatmap_1.0.13
#> [91] iterators_1.0.14 statmod_1.5.1 nlme_3.1-168
#> [94] ggtree_3.99.2 bit64_4.6.0-1 fontquiver_0.2.1
#> [97] filelock_1.0.3 UpSetR_1.4.0 tensorA_0.36.2.1
#> [100] bslib_0.9.0 otel_0.2.0 colorspace_2.1-2
#> [103] DBI_1.2.3 DESeq2_1.51.0 tidyselect_1.2.1
#> [106] emmeans_2.0.0 bit_4.6.0 compiler_4.5.1
#> [109] Rmixmod_2.1.10 compositions_2.0-9 h5mread_1.1.1
#> [112] fontBitstreamVera_0.1.1 DelayedArray_0.37.0 plotly_4.11.0
#> [115] scales_1.4.0 DEoptimR_1.1-4 multcompView_0.1-10
#> [118] rappdirs_0.3.3 stringr_1.5.2 digest_0.6.37
#> [121] MOFA2_1.19.0 rmarkdown_2.30 basilisk_1.21.5
#> [124] XVector_0.49.3 pkgconfig_2.0.3 FactoMineR_2.12
#> [127] fastmap_1.2.0 rlang_1.1.6 GlobalOptions_0.1.2
#> [130] htmlwidgets_1.6.4 shiny_1.11.1 farver_2.1.2
#> [133] jquerylib_0.1.4 jsonlite_2.0.0 BiocParallel_1.43.4
#> [136] GOSemSim_2.35.2 R.oo_1.27.1 magrittr_2.0.4
#> [139] Formula_1.2-5 ggplotify_0.1.3 patchwork_1.3.2
#> [142] Rhdf5lib_1.31.1 Rcpp_1.1.0 ape_5.8-1
#> [145] gdtools_0.4.4 reticulate_1.44.0 stringi_1.8.7
#> [148] MASS_7.3-65 plyr_1.8.9 parallel_4.5.1
#> [151] ggrepel_0.9.6 forcats_1.0.1 Biostrings_2.77.2
#> [154] splines_4.5.1 circlize_0.4.16 locfit_1.5-9.12
#> [157] igraph_2.2.1 ggpubr_0.6.2 ggsignif_0.6.4
#> [160] buildtools_1.0.0 reshape2_1.4.4 evaluate_1.0.5
#> [163] BiocManager_1.30.26 tzdb_0.5.0 foreach_1.5.2
#> [166] httpuv_1.6.16 tidyr_1.3.1 purrr_1.1.0
#> [169] clue_0.3-66 BiocBaseUtils_1.11.2 broom_1.0.10
#> [172] xtable_1.8-4 e1071_1.7-16 RSpectra_0.16-2
#> [175] tidytree_0.4.6 rstatix_0.7.3 later_1.4.4
#> [178] viridisLite_0.4.2 class_7.3-23 rARPACK_0.11-0
#> [181] tibble_3.3.0 clusterProfiler_4.17.0 aplot_0.2.9
#> [184] memoise_2.0.1 AnnotationDbi_1.71.2 ellipse_0.5.0
#> [187] cluster_2.1.8.1 corrplot_0.95 shinyWidgets_0.9.0