1 Introduction

Koina is a repository of machine learning models enabling the remote execution of models. Predictions are generated as a response to HTTP/S requests, the standard protocol used for nearly all web traffic. As such, HTTP/S requests can be easily generated in any programming language without requiring specialized hardware. This design enables users to easily access ML/DL models that would normally require specialized hardware from any device and in any programming language. It also means that the hardware is used more efficiently and it allows for easy horizontal scaling depending on the demand of the user base.

To minimize the barrier of entry and “democratize” access to ML models, we provide a public network of Koina instances at koina.wilhelmlab.org. The computational workload is automatically distributed to processing nodes hosted at different research institutions and spin-offs across Europe. Each processing node provides computational resources to the service network, always aiming at just-in-time results delivery.

In the spirit of open and collaborative science, we envision that this public Koina-Network can be scaled to meet the community’s needs by various research groups or institutions dedicating hardware. This can also vastly improve latency if servers are available geographically nearby. Alternatively, if data security is a concern, private instances within a local network can be easily deployed using the provided docker image.

Koina is a community driven project. It is fully open-source. We welcome all contributions and feedback! Feel free to reach out to us or open an issue on our GitHub repository.

At the moment Koina mostly focuses on the Proteomics domain but the design can be easily extended to any machine learning model. Active development to expand it into Metabolomics is underway. If you are interested in using Koina to interface with a machine learning model not currently available feel free to create a request.

Here we take a look at KoinaR the R package to simplify getting predictions from Koina.

2 Install

if (!require("BiocManager", quietly = TRUE)) {
  install.packages("BiocManager")
}

BiocManager::install("koinar")

3 Basic usage

Here we show the basic usage principles of KoinaR. The first step to interact with Koina is to pick a model and server you wish to use. Here we use the model Prosit_2019_intensity published by Gessulat et al (Gessulat et al. 2019) and the public Koina network available via koina.wilhelmlab.org. For a complete overview of models available on Koina have a look at the documentation available at https://koina.wilhelmlab.org/docs.

# Create a client tied to a specific server & model
# Here we use the model published by Gessulat et al [@prosit2019]
# And the public server available at koina.wilhelmlab.org
# All available models can be found at https://koina.wilhelmlab.org/docs
prosit2019 <- koinar::Koina(
  model_name = "Prosit_2019_intensity",
  server_url = "koina.wilhelmlab.org"
)
prosit2019
## Koina Model class:
##  Model name:  Prosit_2019_intensity 
##  Server URL:  koina.wilhelmlab.org

After you created the model you need to prepare your inputs. Here we prepare a simple data.frame with three different inputs peptide_sequences, collision_energies, and precursor_charges.

# Create example inputs
# Here we look at two different peptide sequences with charge 1 and 2 respectively
input <- data.frame(
  peptide_sequences = c("LGGNEQVTR", "GAGSSEPVTGLDAK"),
  collision_energies = c(25, 25),
  precursor_charges = c(1, 2)
)

After preparing the input you can start predicting by calling prosit2019$predict(input).

# Fetch the predictions by calling `$predict` of the model you want to use
# A progress bar shows you how much of the predictions are already done
# In this case this should complete instantly
prediction_results <- prosit2019$predict(input)

# Display the predictions
# The output varies depending on the chosen model
# For the intenstiy model we get a data.frame with 5 columns
# The three inputs we provided: peptide_sequences, collision_energies, precursor_charges
# and for each predicted fragment ion: annotation, mz, intensities
head(prediction_results)
##      peptide_sequences collision_energies precursor_charges annotation       mz
## 1            LGGNEQVTR                 25                 1       y1+1 175.1190
## 1.3          LGGNEQVTR                 25                 1       b1+1 114.0913
## 1.6          LGGNEQVTR                 25                 1       y2+1 276.1666
## 1.9          LGGNEQVTR                 25                 1       b2+1 171.1128
## 1.12         LGGNEQVTR                 25                 1       y3+1 375.2350
## 1.15         LGGNEQVTR                 25                 1       b3+1 228.1343
##      intensities
## 1    0.246388033
## 1.3  0.006869315
## 1.6  0.467457831
## 1.9  0.157304466
## 1.12 0.648119807
## 1.15 0.081853107

Alternatively if you prefer pass by value semantic you can use the predictWithKoinaModel function to predict with a Koina model.

prediction_results <- koinar::predictWithKoinaModel(prosit2019, input)

4 Example 1: Reproducing Fig.1d from (Gessulat et al. 2019)

One common use case for most of the models available through Koina is the prediction of peptide properties to improve peptide identification rates in Proteomics experiments. One of the properties that is most beneficial in this context is the peptide fragment intensity pattern.

In this example we have a look at the model published by Gessulat et al (Gessulat et al. 2019) and attempt to reproduce a figure 1d published in the manuscript. In this Figure the authors exemplify the prediction accuracy of their model by comparing the experimentally aquired mass spectra with the predictions of their model.

Screenshot of Fig.1d [@prosit2019] https://www.nature.com/articles/s41592-019-0426-7

Figure 1: Screenshot of Fig.1d (Gessulat et al
2019)
https://www.nature.com/articles/s41592-019-0426-7

We prepare the inputs for the model, all of them can be found in the header of the figure.

input <- data.frame(
  peptide_sequences = c("LKEATIQLDELNQK"),
  collision_energies = c(35),
  precursor_charges = c(3)
)

We reuse the model instance (prosit2019) created in the previous chapter. To fetch the predictions we call the predict method of the model instance.

prediction_results <- prosit2019$predict(input)

Here we create a simple mass spectrum to visually compare against Figure 1d of Gessulat et al (Gessulat et al. 2019). We can see that the predicted spectrum we just generated is identical to the predicted spectrum shown in the publication.

prediction_results <- prosit2019$predict(input)

# Plot the spectrum
plot(prediction_results$intensities ~ prediction_results$mz,
  type = "n",
  ylim = c(0, 1.1)
)
yIdx <- grepl("y", prediction_results$annotation)
points(prediction_results$mz[yIdx], prediction_results$intensities[yIdx],
  col = "red", type = "h", lwd = 2
)
points(prediction_results$mz[!yIdx], prediction_results$intensities[!yIdx],
  col = "blue", type = "h", lwd = 2
)

text(prediction_results$mz, prediction_results$intensities,
  labels = prediction_results$annotation,
  las = 2, cex = 1, pos = 3
)

Example 2: Compare spectral similarity between models

Fragment ion prediction models can have major difference in the predictions they generate. Impacting the peptide identification performance. We show this here by predicting the Biognosys iRT peptides, a commonly used set of synthetic spike in reference peptides, with the Prosit_intensity_2019 and the ms2pip_2021HCD models.

We follow the same steps as before.(1) Prepare the input.

input <- data.frame(
  peptide_sequences = c(
    "LGGNEQVTR", "YILAGVENSK", "GTFIIDPGGVIR", "GTFIIDPAAVIR",
    "GAGSSEPVTGLDAK", "TPVISGGPYEYR", "VEATFGVDESNAK",
    "TPVITGAPYEYR", "DGLDAASYYAPVR", "ADVTPADFSEWSK",
    "LFLQFGAQGSPFLK"
  ),
  collision_energies = 35,
  precursor_charges = 2
)
  1. Predict
pred_prosit <- prosit2019$predict((input))
pred_prosit$model <- "Prosit_2019_intensity"

ms2pip <- koinar::Koina(
  model_name = "ms2pip_HCD2021",
  server_url = "koina.wilhelmlab.org"
)

pred_ms2pip <- ms2pip$predict(input)
pred_ms2pip$model <- "ms2pip_HCD2021"

After generating the plots for all iRT peptides we can observe that the predicted mass spectra are quite different. Which model is better depends on the data set that is being analyzed.

lattice::xyplot(intensities ~ mz | model * peptide_sequences,
  group = grepl("y", annotation),
  data = rbind(
    pred_prosit[, names(pred_ms2pip)],
    pred_ms2pip
  ),
  type = "h"
)
iRT peptides fragment ions prediction using  AlphaPept and Prosit_intensity_2019

Figure 2: iRT peptides fragment ions prediction using AlphaPept and Prosit_intensity_2019

We can also use the OrgMassSpecR package to generate a mirror spectrum using the SpectrumSimilarity function. This not only provides a better visualization to compare spectra but also calculates a similarity score.

peptide_sequence <- "ADVTPADFSEWSK"

sim <- OrgMassSpecR::SpectrumSimilarity(pred_prosit[pred_prosit$peptide_sequences == peptide_sequence, c("mz", "intensities")],
  pred_ms2pip[pred_ms2pip$peptide_sequences == peptide_sequence, c("mz", "intensities")],
  top.lab = "Prosit",
  bottom.lab = "MS2PIP",
  b = 25
)
title(main = paste(peptide_sequence, "| Spectrum similarity", round(sim, 3)))
Spectral similarity ms2pip vs prosit plot created with OrgMassSpecR

Figure 3: Spectral similarity ms2pip vs prosit plot created with OrgMassSpecR

References

Gessulat, Siegfried, Tobias Schmidt, Daniel Paul Zolg, Patroklos Samaras, Karsten Schnatbaum, Johannes Zerweck, Tobias Knaute, et al. 2019. “Prosit: Proteome-Wide Prediction of Peptide Tandem Mass Spectra by Deep Learning.” Nature Methods 16 (6): 509–18. https://doi.org/10.1038/s41592-019-0426-7.

Appendix

Session information

sessionInfo()
## R version 4.6.0 RC (2026-04-17 r89917)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] curl_7.1.0       httptest_4.2.3   testthat_3.3.2   BiocStyle_2.40.0
## 
## loaded via a namespace (and not attached):
##  [1] httr_1.4.8          cli_3.6.6           knitr_1.51         
##  [4] magick_2.9.1        rlang_1.2.0         xfun_0.57          
##  [7] koinar_1.6.0        otel_0.2.0          jsonlite_2.0.0     
## [10] htmltools_0.5.9     tinytex_0.59        sass_0.4.10        
## [13] brio_1.1.5          rmarkdown_2.31      grid_4.6.0         
## [16] OrgMassSpecR_0.5-4  evaluate_1.0.5      jquerylib_0.1.4    
## [19] fastmap_1.2.0       yaml_2.3.12         lifecycle_1.0.5    
## [22] bookdown_0.46       BiocManager_1.30.27 compiler_4.6.0     
## [25] Rcpp_1.1.1-1.1      lattice_0.22-9      digest_0.6.39      
## [28] R6_2.6.1            magrittr_2.0.5      bslib_0.10.0       
## [31] tools_4.6.0         cachem_1.1.0