--- title: "Mass Spectrometry Data on ExperimentHub" author: - name: Laurent Gatto package: MsDataHub output: BiocStyle::html_document: toc_float: true vignette: > %\VignetteIndexEntry{Mass Spectrometry Data on ExperimentHub} %\VignetteEngine{knitr::rmarkdown} %%\VignetteKeywords{Mass Spectrometry, MS, MSMS, Proteomics, Metabolomics} %\VignetteEncoding{UTF-8} --- ```{r style, echo = FALSE, results = 'asis'} BiocStyle::markdown() ``` ```{r env, echo = FALSE, message = FALSE} library(Spectra) library(PSMatch) library(QFeatures) ``` # Introduction The `MsDataHub` package provides example mass spectrometry data, peptide spectrum matches or quantitative data from proteomics and metabolomics experiments. The data are served through the `ExperimentHub` infrastructure, which allows download them only ones and cache them for further use. Currently available data are summarised in the table below and details in the next section. ```{r data} library("MsDataHub") DT::datatable(MsDataHub()) ``` # Installation To install the package: ```{r install1, eval = FALSE} if (!require("BiocManager")) install.packages("BiocManager") BiocManager::install("MsDataHub") ``` # Available data ## TripleTOF - Type: Raw MS data - Files: `PestMix1_DDA.mzML` and `PestMix1_SWATH.mzML` - More details: `?TripleTOF` Load with ```{r, eval = TRUE} f <- PestMix1_DDA.mzML() library(Spectra) Spectra(f) ``` ```{r, eval = TRUE} f <- PestMix1_SWATH.mzML() Spectra(f) ``` ## sciex - Type: Raw MS data - Files: `20171016_POOL_POS_1_105-134.mzML` and `20171016_POOL_POS_3_105-134.mzML` - More details: `?sciex` Load with ```{r, eval = TRUE} f <- X20171016_POOL_POS_1_105.134.mzML() Spectra(f) ``` ```{r, eval = TRUE} f <- X20171016_POOL_POS_3_105.134.mzML() Spectra(f) ``` ## PXD000001 - Type: Raw MS data and peptide spectrum matches - Files: `TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML.gz` and `TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzid` - More details: `?PDX000001` Load with ```{r, eval = TRUE} f <- TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.20141210.mzML.gz() Spectra(f) ``` ```{r, eval = TRUE} f <- TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.20141210.mzid() library(PSMatch) PSM(f) ``` ## CPTAC - Type: tab-delimited quantitative proteomics data tables (as produced by MaxQuant) - Files: `cptac_a_b_c_peptides.txt`, `cptac_a_b_peptides.txt` and `cptac_peptides.txt` - More details: `?cptac` Load with ```{r, eval = TRUE} library(QFeatures) f <- cptac_peptides.txt() ecols <- grep("Intensity\\.", names(read.delim(f))) readSummarizedExperiment(f, ecols, sep = "\t") ``` ```{r, eval = TRUE} cptac_a_b_c_peptides.txt() cptac_a_b_peptides.txt() ``` ## FAAH KO - Type: Raw MS data, in netCDF format. - File: `ko15.CDF` - More details: `?cdf` Load with ```{r, eval = TRUE} f <- ko15.CDF() Spectra(f) ``` ## DIA-NN software outputs - Type: tab-delimited DIA quantitative proteomics data tables produced by [DIA-NN](https://github.com/vdemichev/DiaNN). - Files: - Label-free DIA: `benchmarkingDIA.tsv` - mTRAQ plexDIA: `Report.Derks2022.plexDIA.tsv` - More details: `?benchmarkingDIA.tsv` and `?Report.Derks2022.plexDIA.tsv` Load with ```{r lfdia, eval = TRUE, message = FALSE} library(QFeatures) lfdia <- read.delim(MsDataHub::benchmarkingDIA.tsv()) readQFeaturesFromDIANN(lfdia) ``` ```{r pledia, eval = TRUE, message = FALSE} plexdia <- read.delim(MsDataHub::Report.Derks2022.plexDIA.tsv()) readQFeaturesFromDIANN(plexdia, multiplexing = "mTRAQ") ``` ## DIA-NN single-cell proteomics reports - Type: tab-delimited DIA quantitative proteomics data tables produced by [DIA-NN](https://github.com/vdemichev/DiaNN). - Files: - Single-cell abel-free: `Ai2025_aCMs_report.tsv` - Single-cell label-free: `Ai2025_iCMs_report.tsv` - More details: `?Ai2025`. ## Proteomics contaminant databases - Type: fasta files, as documented in `camprotR`'s [cRAP databases](https://cambridgecentreforproteomics.github.io/camprotR/articles/crap.html) vignette. - Files: - `crap_gpm.fasta`: the common Repository of Adventitious Proteins (cRAP) from the Global Proteome Machine (GPM) organisation. - `crap_ccp.fasta`: Cambridge Centre for Proteomics' own cRAP fasta database. - `crap_maxquant.fasta.gz`: MaxQuant's contaminant database. - More details: `?cRAP`. ## FTICR-MS direct injection MS data Example files for direct injection fourier-transform ion cyclotron resonance (FTICR) mass spectrometry data. - Type: raw MS data in mzML file format. - Files: 5 replicates from sample *HAM004*, 5 replicates from sample *HAM005*, i.e., 10 mzML files. - More details: `?FTICR`. Example how to load one of the available files: ```{r} f <- MsDataHub::HAM004_641fE_14.11.07..Exp1.extracted.mzML() Spectra(f) ``` ## MRM data file Example file in mzML format for multiple reaction monitoring (MRM) data. The file does not contain mass spectra, but chromatographic data. The data can be imported and represented with the *Chromatograms* Bioconductor package. - Type: raw (chromatographic) MS data in mzML file format. - Files: - `MRM-standmix-5.mzML`: sample from mouse brain acquired by HILIC ESI-QqQ/MS in Dynamic multiple reaction monitoring mode (MRM). HPLC system was a 1290 Infinity (Agilent Technologies) coupled to ion-Funnel Triple quadrupole 6490 mass spectrometer (Agilent Technologies). This file was contributed by Xavi Domingo-Almenara from the The Scripps Research Institute, San Diego, CA. - More details: `?MRM`. Load with ```{r} f <- MsDataHub::MRM.standmix.5.mzML() ``` ## CE-MS data The CE-MS test files consist of two files, `"CEMS_10ppm.mzML"` and `"CEMS_25ppm.mzML"`. The data contains CE-MS runs of a standard mixture that contains e.g. Lysine (at 10 ppm and 25 ppm, respectively) and the neutral EOF marker Paracetamol (50 ppm). The data was acquired on a 7100 capillary electrophoresis system from Agilent Technologies, coupled to an Agilent 6560 IM-QToF-MS. CE Separation was performed using a 80 cm fused silica capillary with an internal diameter of 50 µm and external diameter of 365 µm. The Background Electrolyte was 10 % acetic acid and separation was performed at +30 kV and a constant pressure of 50 mbar. MS detection was performed in positive ionization mode. The raw data were then converted to the open-source *.mzML* format (Proteowizard). To reduce data size, the test data was subset to a retention time range from 400 to 900 seconds and an *m/z* range from 147.1 to 152.0. - Type: raw MS data in mzML file format. - Files: - `CEMS_10ppm.mzML`: sample with Lysine added in 10ppm. - `CEMS_25ppm.mzML`: sample with Lysine added in 25ppm. - More details: `?CEMS`. Load with ```{r} f <- MsDataHub::CEMS_25ppm.mzML() s <- Spectra(f) ``` ## TMT MS3 SPS data Example MS3 SPS TMT data. - `MS3TMT10_01022016_32917-33481.mzML.gz` is an mzML file containing 565 spectra from a MS3 PSP TMT 10-pex experiment. - `MS3TMT11.mzML` is an mzML file containing 994 scans from MS3 SPS TMT 11-plex experiment. - `fdms3tmt11.rda` contains a data.frame with identification data for `MS3TMT11.mzML`. # Adding data to `MsDataHub` 1. If you would like additional dataset to `MsDataHub`, start by opening an [issue](https://github.com/rformassspectrometry/MsDataHub/issues) in the package's GitHub repository and describe the new data. In particular, provide information about it's provenance, its use, its format(s) and acknowledge that the data may be shared freely with the community without any restrictions. You may provide an open licence specifying the terms it can be re-used, typically a CC-BY-SA license. 2. By contribution to the package, you acknowledge that you will comply to the R for Mass Spectrometry project [code of conduct](https://rformassspectrometry.github.io/RforMassSpectrometry/articles/RforMassSpectrometry.html#code-of-conduct). 3. A maintainer of the package will reply to your issue, confirming that the data can be added. 4. At this point, if you are familiar with the development of `ExperimentHub` packages and GitHub *pull requests*, you may directly send one that adds your data to the package. Make sure (1) add appropriate references in the manual page and (2) to add yourself as a contributor of the package in the DESCRIPTION file. 5. Alternatively, a maintainer will add the dataset to the package and may require your input to make sure the documentation file is complete. # Session information {-} ```{r sessioninfo, echo=FALSE} sessionInfo() ```