proteasy 1.6.0
Protease cleavages affect many vital physiological processes, and dysregulation of proteolytic activity is associated with a variety of pathological conditions. The field of degradomics is committed to provide insights about proteolytic events by incorporating techniques for identification and functional characterization of proteases and their substrates and inhibitors.
proteasy allows for batch identification of possible proteases for a set of substrates (protein IDs and peptide sequences), and may be an important tool in peptide-centric analyses of endogenously cleaved peptides.
This package utilizes data derived from the MEROPS database. The database is a manually curated knowledgebase with information about proteolytic enzymes, their inhibitors and substrates.
This document illustrates the functionality of proteasy through some use cases.
Similarly to existing tools such as TopFind, and Proteasix, proteasy exists for the purpose of retrieving data about proteases by mapping peptide termini positions to known sites where a protease cleaves. Unlike the cleaver package, which allows for in-silico cleavage of amino acid sequences by specifying an enzyme, the functions in proteasy relies only on experimentally derived data to find proteases annotated to cleavage sites.
The proteasy package is limited to entries annotated in MEROPS. Moreover, proteasy currently does not allow for retrieval of proteolytic details for multiple organisms at once, or inter-organism events. The package does however provide annotation data for all organisms available in MEROPS.
The function findProtease will automatically map a peptide’s start- and end- positions in its host-protein’s sequence. When a protein sequence has repeated instances of a peptide sequence, proteasy will only match to the first instance in the protein sequence.
The package should be installed using as described below:
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("proteasy")The package can then be loaded with:
library("proteasy")## Warning in fun(libname, pkgname): Package 'proteasy' is deprecated and will be removed from Bioconductor
##   version 3.20A fast way to find which possible proteases, if any, are annotated as cleaving actors for a substrate is by using the function searchSubstrate. Setting the parameter summarize = TRUE will return only a vector of reviewed proteases, and summarize = FALSE will return a table with details about each cleaving event. We will explore the two options for kininogen-1 (P01042).
# Returns vector of reviewed proteases that cleave P01042
searchSubstrate(protein = "P01042", summarize = TRUE)##  [1] "P07339" "P09668" "P07384" "P17655" "Q07075" "Q9UIQ6" "Q6P179" "Q6Q4G3"
##  [9] "Q9BYF1" "P52888" "Q99797" "P22894" "Q16819" "P08473" "P42892" "Q495T6"
## [17] "P15169" "P14384" "Q96IY4" "P14735" "Q5JRX3" "O43895" "Q9NQW7" "Q9Y337"
## [25] "Q9P0G3" "Q14520" "P08246" "P23946" "Q15661" "P20231" "P06870" "P20151"
## [33] "P03952" "P48147" "P42785" "Q9UHL4"# Returns data.table with details on cleaving events
searchSubstrate(protein = "P01042", summarize = FALSE) |>
    head()##    Protease (Uniprot) Protease status Protease organism Protease (MEROPS)
##                <char>          <char>            <char>            <char>
## 1:             P07339        reviewed      Homo sapiens           A01.009
## 2:             P07339        reviewed      Homo sapiens           A01.009
## 3:             P07339        reviewed      Homo sapiens           A01.009
## 4:             P09668        reviewed      Homo sapiens           C01.040
## 5:             P09668        reviewed      Homo sapiens           C01.040
## 6:             P07384        reviewed      Homo sapiens           C02.001
##    Cleaved residue   Substrate name Substrate (Uniprot) Residue number
##             <char>           <char>              <char>         <char>
## 1:               L      kininogen-1              P01042            289
## 2:               K      kininogen-1              P01042            316
## 3:               F      kininogen-1              P01042            321
## 4:               F       Bradykinin              P01042            385
## 5:               F Lysyl-bradykinin              P01042            385
## 6:               R      kininogen-1              P01042            389
##    Substrate organism Protease name     Cleavage type
##                <char>        <char>            <char>
## 1:       Homo sapiens   cathepsin D     physiological
## 2:       Homo sapiens   cathepsin D     physiological
## 3:       Homo sapiens   cathepsin D     physiological
## 4:       Homo sapiens   cathepsin H non-physiological
## 5:       Homo sapiens   cathepsin H non-physiological
## 6:       Homo sapiens     calpain-1     physiological# The function also accepts multiple proteins as input. Let's inspect
# the last rows in the returned data.table.
searchSubstrate(protein = c("P01042", "P02461"), summarize = FALSE) |>
    tail()##    Protease (Uniprot) Protease status Protease organism Protease (MEROPS)
##                <char>          <char>            <char>            <char>
## 1:             Q59EM4      unreviewed      Homo sapiens           S28.002
## 2:             B4DPD6      unreviewed      Homo sapiens           S28.002
## 3:             R4GNE8      unreviewed      Homo sapiens           S28.002
## 4:             R4GMR2      unreviewed      Homo sapiens           S28.002
## 5:             Q59EM4      unreviewed      Homo sapiens           S28.002
## 6:             B4DPD6      unreviewed      Homo sapiens           S28.002
##    Cleaved residue   Substrate name Substrate (Uniprot) Residue number
##             <char>           <char>              <char>         <char>
## 1:               P bradykinin (1-3)              P01042            382
## 2:               P bradykinin (1-3)              P01042            382
## 3:               P bradykinin (1-5)              P01042            382
## 4:               P bradykinin (1-5)              P01042            382
## 5:               P bradykinin (1-5)              P01042            382
## 6:               P bradykinin (1-5)              P01042            382
##    Substrate organism           Protease name     Cleavage type
##                <char>                  <char>            <char>
## 1:       Homo sapiens dipeptidyl-peptidase II non-physiological
## 2:       Homo sapiens dipeptidyl-peptidase II non-physiological
## 3:       Homo sapiens dipeptidyl-peptidase II non-physiological
## 4:       Homo sapiens dipeptidyl-peptidase II non-physiological
## 5:       Homo sapiens dipeptidyl-peptidase II non-physiological
## 6:       Homo sapiens dipeptidyl-peptidase II non-physiological# With summarize = FALSE we get both reviewed and unreviewed proteases.A corresponding function, searchProtease, exists to find which (if any) substrates a protease cleaves. The function is demonstrated with MMP-12 (P39900) as an example.
# Returns vector of substrates that MMP-12 cleave
searchProtease(protein = "P39900", summarize = TRUE)##  [1] "P00748"   "P15502"   "P02686"   "P15502-2" "P10646"   "P02751"  
##  [7] "P08519"   "Q06828"   "P51888"   "P10909"   "P02647"   "P01009"  
## [13] "P02671"   "P02778"   "P02458"   "P02461"   "P35556"   "Q07325"  
## [19] "P21810"   "P07585"   "P15502-1" "O14625"   "P01857"   "P35555"  
## [25] "Q03405"# Returns data.table with details on cleaving events that involve MMP-12
searchProtease(protein = "P39900", summarize = FALSE) |> head()## Key: <Protease (MEROPS)>
##    Cleaved residue Protease (MEROPS)         Substrate name Substrate (Uniprot)
##             <char>            <char>                 <char>              <char>
## 1:               A           M10.009 coagulation factor XII              P00748
## 2:               A           M10.009                elastin              P15502
## 3:               A           M10.009                elastin              P15502
## 4:               A           M10.009                elastin              P15502
## 5:               A           M10.009                elastin              P15502
## 6:               A           M10.009                elastin              P15502
##    Residue number Substrate organism              Protease name Cleavage type
##            <char>             <char>                     <char>        <char>
## 1:            379       Homo sapiens matrix metallopeptidase-12 physiological
## 2:            670       Homo sapiens matrix metallopeptidase-12 physiological
## 3:            102       Homo sapiens matrix metallopeptidase-12 physiological
## 4:            687       Homo sapiens matrix metallopeptidase-12 physiological
## 5:             90       Homo sapiens matrix metallopeptidase-12 physiological
## 6:            688       Homo sapiens matrix metallopeptidase-12 physiological
##    Protease (Uniprot) Protease status Protease organism
##                <char>          <char>            <char>
## 1:             P39900        reviewed      Homo sapiens
## 2:             P39900        reviewed      Homo sapiens
## 3:             P39900        reviewed      Homo sapiens
## 4:             P39900        reviewed      Homo sapiens
## 5:             P39900        reviewed      Homo sapiens
## 6:             P39900        reviewed      Homo sapiensThe function findProtease automatically maps the peptide sequences to the full-length protein sequence and obtains the start- and end-positions for the peptide. Then, the positions are searched against the MEROPs database and matches are returned.
# Create a vector of Uniprot IDs and corresponding peptide sequences
protein <- c("P02671", "P02671", "P68871",
             "P01011", "P68133", "P02461",
             "P0DJI8", "P0DJI8", "P0DJI8")
peptide <- c("FEEVSGNVSPGTR", "FVSETESR", "LLVVYPW",
             "ITLLSAL", "DSYVGDEAQS", "AGGFAPYYG",
             "FFSFLGEAFDGAR", "EANYIGSDKY", "GGVWAAEAISDAR")
# If we do not specify start position (start_pos) and end position
# (end_pos), the function will automatically assign these values by
# matching the provided peptide sequence against the full-length
# protein sequence.
res <- findProtease(protein = protein,
                    peptide = peptide,
                    organism = "Homo sapiens")
# The resulting S4 object can be inspected in three ways;
# (to save space we show only the first six rows)
# 1. Display sequence data for the provided input:
substrates(res) |> head()##              Substrate name Substrate (Uniprot)
##                      <char>              <char>
## 1:   fibrinogen alpha chain              P02671
## 2:   fibrinogen alpha chain              P02671
## 3:  hemoglobin subunit beta              P68871
## 4: alpha-1-antichymotrypsin              P01011
## 5:  Serum amyloid A protein              P0DJI8
## 6:  Serum amyloid A protein              P0DJI8
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Substrate sequence
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <char>
## 1: MFSMRIVCLVLSVVGTAWTADSGEGDFLAEGGGVRGPRVVERHQSACKDSDWPFCSDEDWNYKCPSGCRMKGLIDEVNQDFTNRINKLKNSLFEYQKNNKDSHSLTTNIMEILRGDFSSANNRDNTYNRVSEDLRSRIEVLKRKVIEKVQHIQLLQKNVRAQLVDMKRLEVDIDIKIRSCRGSCSRALAREVDLKDYEDQQKQLEQVIAKDLLPSRDRQHLPLIKMKPVPDLVPGNFKSQLQKVPPEWKALTDMPQMRMELERPGGNEITRGGSTSYGTGSETESPRNPSSAGSWNSGSSGPGSTGNRNPGSSGTGGTATWKPGSSGPGSTGSWNSGSSGTGSTGNQNPGSPRPGSTGTWNPGSSERGSAGHWTSESSVSGSTGQWHSESGSFRPDSPGSGNARPNNPDWGTFEEVSGNVSPGTRREYHTEKLVTSKGDKELRTGKEKVTSGSTTTTRRSCSKTVTKTVIGPDGHKEVTKEVVTSEDGSDCPEAMDLGTLSGIGTLDGFRHRHPDEAAFFDTASTGKTFPGFFSPMLGEFVSETESRGSESGIFTNTKESSSHHPGIAEFPSRGKSSSYSKQFTSSTSYNRGDSTFESKSYKMADEAGSEADHEGTHSTKRGHAKSRPVRDCDDVLQTHPSGTQSGIFNIKLPGSSKIFSVYCDQETSLGGWLLIQQRMDGSLNFNRTWQDYKRGFGSLNDEGEGEFWLGNDYLHLLTQRGSVLRVELEDWAGNEAYAEYHFRVGSEAEGYALQVSSYEGTAGDALIEGSVEEGAEYTSHNNMQFSTFDRDADQWEENCAEVYGGGWWYNNCQAANLNGIYYPGGSYDPRNNSPYEIENGVVWVSFRGADYSLRAVRMKIRPLVTQ
## 2: MFSMRIVCLVLSVVGTAWTADSGEGDFLAEGGGVRGPRVVERHQSACKDSDWPFCSDEDWNYKCPSGCRMKGLIDEVNQDFTNRINKLKNSLFEYQKNNKDSHSLTTNIMEILRGDFSSANNRDNTYNRVSEDLRSRIEVLKRKVIEKVQHIQLLQKNVRAQLVDMKRLEVDIDIKIRSCRGSCSRALAREVDLKDYEDQQKQLEQVIAKDLLPSRDRQHLPLIKMKPVPDLVPGNFKSQLQKVPPEWKALTDMPQMRMELERPGGNEITRGGSTSYGTGSETESPRNPSSAGSWNSGSSGPGSTGNRNPGSSGTGGTATWKPGSSGPGSTGSWNSGSSGTGSTGNQNPGSPRPGSTGTWNPGSSERGSAGHWTSESSVSGSTGQWHSESGSFRPDSPGSGNARPNNPDWGTFEEVSGNVSPGTRREYHTEKLVTSKGDKELRTGKEKVTSGSTTTTRRSCSKTVTKTVIGPDGHKEVTKEVVTSEDGSDCPEAMDLGTLSGIGTLDGFRHRHPDEAAFFDTASTGKTFPGFFSPMLGEFVSETESRGSESGIFTNTKESSSHHPGIAEFPSRGKSSSYSKQFTSSTSYNRGDSTFESKSYKMADEAGSEADHEGTHSTKRGHAKSRPVRDCDDVLQTHPSGTQSGIFNIKLPGSSKIFSVYCDQETSLGGWLLIQQRMDGSLNFNRTWQDYKRGFGSLNDEGEGEFWLGNDYLHLLTQRGSVLRVELEDWAGNEAYAEYHFRVGSEAEGYALQVSSYEGTAGDALIEGSVEEGAEYTSHNNMQFSTFDRDADQWEENCAEVYGGGWWYNNCQAANLNGIYYPGGSYDPRNNSPYEIENGVVWVSFRGADYSLRAVRMKIRPLVTQ
## 3:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH
## 4:                                                                                                                                                                                                                                                                                                                                                                                                                                                            MERMLPLLALGLLAAGFCPAVLCHPNSPLDEENLTQENQDRGTHVDLGLASANVDFAFSLYKQLVLKAPDKNVIFSPLSISTALAFLSLGAHNTTLTEILKGLKFNLTETSEAEIHQSFQHLLRTLNQSSDELQLSMGNAMFVKEQLSLLDRFTEDAKRLYGSEAFATDFQDSAAAKKLINDYVKNGTRGKITDLIKDLDSQTMMVLVNYIFFKAKWEMPFDPQDTHQSRFYLSKKKWVMVPMMSLHHLTIPYFRDEELSCTVVELKYTGNASALFILPDQDKMEEVEAMLLPETLKRWRDSLEFREIGELYLPKFSISRDYNLNDILLQLGIEEAFTSKADLSGITGARNLAVSQVVHKAVLDVFEEGTEASAATAVKITLLSALVETRTIVRFNRPFLMIIVPTDTQNIFFMSKVTNPKQA
## 5:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         MKLLTGLVFCSLVLGVSSRSFFSFLGEAFDGARDMWRAYSDMREANYIGSDKYFHARGNYDAAKRGPGGAWAAEVITDARENIQRFFGHGAEDSLADQAANEWGRSGKDPNHFRPAGLPEKY
## 6:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         MKLLTGLVFCSLVLGVSSRSFFSFLGEAFDGARDMWRAYSDMREANYIGSDKYFHARGNYDAAKRGPGGAWAAEVITDARENIQRFFGHGAEDSLADQAANEWGRSGKDPNHFRPAGLPEKY
##    Substrate length       Peptide Start position End position
##               <int>        <char>         <char>       <char>
## 1:              866 FEEVSGNVSPGTR            413          425
## 2:              866      FVSETESR            540          547
## 3:              147       LLVVYPW             32           38
## 4:              423       ITLLSAL            380          386
## 5:              122 FFSFLGEAFDGAR             21           33
## 6:              122    EANYIGSDKY             44           53# 2. Show which known proteases may have cleaved this protein:
proteases(res) |> head()##                        Protease name Protease (Uniprot) Protease status
##                               <char>             <char>          <char>
## 1:                       cathepsin D             P07339        reviewed
## 2:                       cathepsin B             P07858        reviewed
## 3:             legumain, animal-type             Q99538        reviewed
## 4:           procollagen C-peptidase             P13497        reviewed
## 5: vertebrate tolloid-like 1 protein             O43897        reviewed
## 6:                       cathepsin D             V9HWI3      unreviewed
##    Protease (MEROPS)                                           Protease URL
##               <char>                                                 <char>
## 1:           A01.009 https://www.ebi.ac.uk/merops/cgi-bin/pepsum?id=A01.009
## 2:           C01.060 https://www.ebi.ac.uk/merops/cgi-bin/pepsum?id=C01.060
## 3:           C13.004 https://www.ebi.ac.uk/merops/cgi-bin/pepsum?id=C13.004
## 4:           M12.005 https://www.ebi.ac.uk/merops/cgi-bin/pepsum?id=M12.005
## 5:           M12.016 https://www.ebi.ac.uk/merops/cgi-bin/pepsum?id=M12.016
## 6:           A01.009 https://www.ebi.ac.uk/merops/cgi-bin/pepsum?id=A01.009# 3. Display details of associated proteolytic events:
cleavages(res) |> head()##    Substrate (Uniprot)       Peptide Protease (Uniprot) Protease status
##                 <char>        <char>             <char>          <char>
## 1:              P02671 FEEVSGNVSPGTR             P07339        reviewed
## 2:              P02671      FVSETESR             P07339        reviewed
## 3:              P68871       LLVVYPW             P07339        reviewed
## 4:              P01011       ITLLSAL             P07339        reviewed
## 5:              P0DJI8 FFSFLGEAFDGAR             P07858        reviewed
## 6:              P0DJI8    EANYIGSDKY             P07858        reviewed
##    Cleaved residue Cleaved terminus Cleavage type
##             <char>           <char>        <char>
## 1:               F                N physiological
## 2:               F                N physiological
## 3:               L                N  pathological
## 4:               L                C physiological
## 5:               F                N physiological
## 6:               E                N physiological# We can find out what proportion of matching cleaving events by reviewed
# proteases occur at N- versus C-terminus
cl <- cleavages(res)[`Protease status` == "reviewed"]
cl$`Cleaved terminus` |> table() |> prop.table() |> round(digits = 2)## 
##   C   N 
## 0.3 0.7# And inspect the distribution of cleaved amino acids
cl$`Cleaved residue` |> table() |> barplot(cex.names = 2)# Find which protease is involved in the greatest number of cleaving events
cl[!duplicated(Peptide), .(count = .N), by = `Protease (Uniprot)`]##    Protease (Uniprot) count
##                <char> <int>
## 1:             P07339     4
## 2:             P07858     2
## 3:             Q99538     1
## 4:             P13497     1# If start- and end-positions are specified, the function will not attempt
# to automatically look up sequence data for the specified protein/peptide.
cl_by_pos <- findProtease(
    protein = "P02671",
    peptide = "FEEVSGNVSPGTR",
    start_pos = 413,
    end_pos = 425
)
# However, this means sequence details for substrates is not available.
substrates(cl_by_pos)##            Substrate name Substrate (Uniprot) Substrate sequence
##                    <char>              <char>             <lgcl>
## 1: fibrinogen alpha chain              P02671                 NA
##    Substrate length       Peptide Start position End position
##               <int>        <char>         <char>       <char>
## 1:               NA FEEVSGNVSPGTR            413          425We may want to read up on the details of an entry directly in MEROPS. The function browseProtease takes a UniProt or MEROPS ID and opens the MEROPS summary page which corresponds to that ID in a web browser.
browseProtease("P07339", keytype = "UniprotID") # (opens web browser)Here we visualize the cleaving events of two substrates as a protein-protein interaction network between substrates (red) and associated proteases (blue).
library(igraph)
library(data.table)
# Make a vector of substrates we want to investigate
proteins <- c('P01011','P02671')
# Look up known proteases which cleave these substrates, and associated data
# annotated to the cleaving events
res <- searchSubstrate(protein = proteins, summarize = FALSE)
# Filter to keep proteases with Uniprot status "reviewed"
res <- res[`Protease status` == "reviewed"]
# To create a network, we only need two columns of interactors
# i.e. the proteases with cleaving action on the substrates
res <- res[, c("Protease (Uniprot)", "Substrate (Uniprot)", "Cleavage type")]
# Construct the DAG
g <- igraph::graph_from_data_frame(res,
                                   directed = TRUE,
                                   vertices = unique(
                                       c(res$`Protease (Uniprot)`,
                                         res$`Substrate (Uniprot)`)))
# This will allow us to return to a layout we like
# (and where the legend fits nicely)
set.seed(104)
plot(g,
     vertex.label.family = "Helvetica",
     vertex.size = 14,
     vertex.color = ifelse(V(g)$name %in% res$`Protease (Uniprot)`,
                           "#377EB8", "#E41A1C"),
     vertex.label.cex = 1,
     vertex.label.color = "white",
     edge.arrow.size = .6,
     edge.color =  ifelse(E(g)$`Cleavage type` == "physiological",
                          "#01665E", "#8E0152"),
     layout = igraph::layout.davidson.harel)
legend(title = "Nodes", cex = 2, horiz = FALSE,
       title.adj = 0.0, inset = c(0.0, 0.2),
       "bottomleft", bty = "n",
       legend = c("Protease", "Substrate"),
       fill = c("#377EB8", "#E41A1C"), border = NA)
legend(title = "Edges", cex = 2, horiz = FALSE,
       title.adj = 0.0, inset = c(0.0, 0.0),
       "bottomleft", bty = "n",
       legend = c("Physiological", "Non-physiological"),
       fill = c("#01665E", "#8E0152"), border = NA)proteasy automatically finds protein sequences from protein IDs (thanks to ensembldb and Rcpi). We can use substrates() to access them. Here, we look study similarity matrices of some protein- and peptide-level sequences and plot them as heatmaps, which we annotate with cleavage data.
# Load additional libraries
library(Rcpi)
library(viridis)
suppressPackageStartupMessages(library(ComplexHeatmap))
# Prepare input: protein and associated peptide sequences
protein <- c('P01011','P01011','P01034','P01034',
             'P01042','P02671','P02671','P68871',
             'P68871','P01042')
peptide <- c('LVETRTIVRFNRPFLMIIVPTDTQNIFFMSKVTNPK','ITLLSAL',
             'KAFCSFQIY','AFCSFQIY','DIPTNSPELEETLTHTITKL','FEEVSGNVSPGTR',
             'FVSETESR','LLVVYPW','VDEVGGEALGR','KIYPTVNCQPLGMISLM')
# Find cleaving data associated with above substrates
res <- findProtease(protein = protein,
                    peptide = peptide,
                    organism = "Homo sapiens")
# Get substrate info
ss <- substrates(res)
# Show only unique sequences
ss_uniq <- ss[!duplicated(`Substrate sequence`)]
# Calculate protein (substrate) sequence similarity
psimmat = Rcpi::calcParProtSeqSim(ss_uniq$`Substrate sequence`,
                                  type = 'global',
                                  submat = 'BLOSUM62')
rownames(psimmat) <- colnames(psimmat) <- ss_uniq$`Substrate (Uniprot)`
# Plot as a heatmap
ComplexHeatmap::Heatmap(psimmat, col = viridis::mako(100))# We can do the same thing for peptide sequences,
# and annotate each row (cleaved peptide) with
# information about cleaved residue and termini
# Get cleavage details
cl <- cleavages(res)
# Calculate peptide sequence similarity
pep_psimmat = Rcpi::calcParProtSeqSim(cl$Peptide, type = 'global',
                                      submat = 'BLOSUM62')
# Right side annotation: cleaved residue
rsd <- cl$`Cleaved residue`
cols <- c("#8DD3C7", "#FFFFB3", "#BEBADA", "#FB8072")
names(cols) <- unique(rsd)
ha1 <- ComplexHeatmap::columnAnnotation(`cleaved residue` = rsd,
                         col = list(`cleaved residue` = cols))
# Right side annotation: Terminus
tn <- cl$`Cleaved terminus`
cols <- c("#B3E2CD", "#FDCDAC")
names(cols) <- unique(tn)
ha2 <- ComplexHeatmap::columnAnnotation(terminus = tn,
                     col = list(terminus = cols))
rownames(pep_psimmat) <- cl$`Substrate (Uniprot)`
# Plot as a heatmap
ComplexHeatmap::Heatmap(
    pep_psimmat,
    name = "sequence\nsimilarity",
    col = viridis::mako(100),
    show_column_names = FALSE,
    row_names_gp = grid::gpar(fontsize = 6.5),
    top_annotation = c(ha1, ha2)
)## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] ComplexHeatmap_2.20.0 viridis_0.6.5         viridisLite_0.4.2    
## [4] Rcpi_1.40.1           data.table_1.15.4     igraph_2.0.3         
## [7] proteasy_1.6.0        BiocStyle_2.32.1     
## 
## loaded via a namespace (and not attached):
##   [1] DBI_1.2.3                   bitops_1.0-8               
##   [3] gridExtra_2.3               httr2_1.0.2                
##   [5] rlang_1.1.4                 magrittr_2.0.3             
##   [7] clue_0.3-65                 GetoptLong_1.0.5           
##   [9] matrixStats_1.3.0           compiler_4.4.1             
##  [11] RSQLite_2.3.7               GenomicFeatures_1.56.0     
##  [13] png_0.1-8                   vctrs_0.6.5                
##  [15] stringr_1.5.1               ProtGenerics_1.36.0        
##  [17] shape_1.4.6.1               pkgconfig_2.0.3            
##  [19] crayon_1.5.3                fastmap_1.2.0              
##  [21] magick_2.8.4                XVector_0.44.0             
##  [23] utf8_1.2.4                  Rsamtools_2.20.0           
##  [25] rmarkdown_2.28              UCSC.utils_1.0.0           
##  [27] tinytex_0.52                bit_4.0.5                  
##  [29] xfun_0.47                   zlibbioc_1.50.0            
##  [31] cachem_1.1.0                GenomeInfoDb_1.40.1        
##  [33] jsonlite_1.8.8              blob_1.2.4                 
##  [35] highr_0.11                  DelayedArray_0.30.1        
##  [37] BiocParallel_1.38.0         cluster_2.1.6              
##  [39] parallel_4.4.1              R6_2.5.1                   
##  [41] RColorBrewer_1.1-3          bslib_0.8.0                
##  [43] stringi_1.8.4               rtracklayer_1.64.0         
##  [45] GenomicRanges_1.56.1        jquerylib_0.1.4            
##  [47] GOSemSim_2.30.2             Rcpp_1.0.13                
##  [49] bookdown_0.40               SummarizedExperiment_1.34.0
##  [51] iterators_1.0.14            knitr_1.48                 
##  [53] R.utils_2.12.3              IRanges_2.38.1             
##  [55] tidyselect_1.2.1            Matrix_1.7-0               
##  [57] abind_1.4-5                 yaml_2.3.10                
##  [59] doParallel_1.0.17           codetools_0.2-20           
##  [61] curl_5.2.1                  tibble_3.2.1               
##  [63] lattice_0.22-6              Biobase_2.64.0             
##  [65] KEGGREST_1.44.1             evaluate_0.24.0            
##  [67] circlize_0.4.16             Biostrings_2.72.1          
##  [69] pillar_1.9.0                BiocManager_1.30.24        
##  [71] MatrixGenerics_1.16.0       foreach_1.5.2              
##  [73] stats4_4.4.1                generics_0.1.3             
##  [75] RCurl_1.98-1.16             ensembldb_2.28.1           
##  [77] S4Vectors_0.42.1            ggplot2_3.5.1              
##  [79] munsell_0.5.1               scales_1.3.0               
##  [81] glue_1.7.0                  lazyeval_0.2.2             
##  [83] tools_4.4.1                 BiocIO_1.14.0              
##  [85] GenomicAlignments_1.40.0    fs_1.6.4                   
##  [87] XML_3.99-0.17               Cairo_1.6-2                
##  [89] AnnotationDbi_1.66.0        colorspace_2.1-1           
##  [91] GenomeInfoDbData_1.2.12     restfulr_0.0.15            
##  [93] cli_3.6.3                   rappdirs_0.3.3             
##  [95] fansi_1.0.6                 S4Arrays_1.4.1             
##  [97] dplyr_1.1.4                 AnnotationFilter_1.28.0    
##  [99] gtable_0.3.5                R.methodsS3_1.8.2          
## [101] yulab.utils_0.1.6           EnsDb.Hsapiens.v86_2.99.0  
## [103] sass_0.4.9                  digest_0.6.37              
## [105] BiocGenerics_0.50.0         SparseArray_1.4.8          
## [107] rjson_0.2.22                memoise_2.0.1              
## [109] htmltools_0.5.8.1           R.oo_1.26.0                
## [111] lifecycle_1.0.4             httr_1.4.7                 
## [113] GlobalOptions_0.1.2         GO.db_3.19.1               
## [115] bit64_4.0.5