Type: Package
Title: Prediction-Based Kinase-Substrate Enrichment Analysis
Version: 0.0.1
Description: A tool for inferring kinase activity changes from phosphoproteomics data. 'pKSEA' uses kinase-substrate prediction scores to weight observed changes in phosphopeptide abundance to calculate a phosphopeptide-level contribution score, then sums up these contribution scores by kinase to obtain a phosphoproteome-level kinase activity change score (KAC score). 'pKSEA' then assesses the significance of changes in predicted substrate abundances for each kinase using permutation testing. This results in a permutation score (pKSEA significance score) reflecting the likelihood of a similarly high or low KAC from random chance, which can then be interpreted in an analogous manner to an empirically calculated p-value. 'pKSEA' contains default databases of kinase-substrate predictions from 'NetworKIN' (NetworKINPred_db) http://networkin.info Horn, et. al (2014) <doi:10.1038/nmeth.2968> and of known kinase-substrate links from 'PhosphoSitePlus' (KSEAdb) https://www.phosphosite.org/ Hornbeck PV, et. al (2015) <doi:10.1093/nar/gku1267>.
Depends: R (≥ 3.3.0)
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.0.1
Maintainer: Peter Liao <pll21@case.edu>
NeedsCompilation: no
Packaged: 2017-12-21 19:56:06 UTC; pll21
Author: Peter Liao [aut, cre]
Repository: CRAN
Date/Publication: 2017-12-22 18:46:08 UTC

KSEAdb

Description

A data table containing all known kinase-substrate links known in PhosphoSitePlus.

Usage

KSEAdb

Format

An object of class data.frame with 240749 rows and 6 columns.

Source

https://www.phosphosite.org/staticDownloads.action

References

Hornbeck PV, Zhang B, Murray B, Kornhauser JM, Latham V, Skrzypek E PhosphoSitePlus, 2014: mutations, PTMs and recalibrations. Nucleic Acids Res. 2015 43:D512-20.


Filter matched data to remove positive IDs from KSEA

Description

Filter matched data to remove positive IDs from KSEA

Usage

KSEAfilter(matched_data, kseadb, reverse = F)

Arguments

matched_data

Results of get_matched_data function

kseadb

KSEA database containing substrate gene name "SUB_GENE" and phosphorylated residue "SUB_MOD_RSD" in standard form (ie. T302).

Value

Result of get_matched_data function with features existing in KSEA database removed.


NetworKINPred_db

Description

A data table containing all precalculated NetworKIN predictions performed on known ensembl sequences.

Usage

NetworKINPred_db

Format

An object of class data.frame with 450418 rows and 4 columns.

Source

http://networkin.info/download.shtml

References

Horn et al., KinomeXplorer: an integrated platform for kinome biology studies. Nature Methods 2014 Jun;11(6):603–4.


Running pKSEA::compare() on multiple files

Description

For running compare() on multiple CSV data files in the same directory and for writing results to a folder in the designated data directory. Can receive various arguments to be passed on to downstream functions. Writes to tempdir() unless outputpath variable is specified by user (argument passed on to results_write).

Usage

batchrun(summaryfiledir, commonfilestring = ".csv",
predictionDB, results_folder = NULL, ...)

Arguments

summaryfiledir

Directory containing summary statistic CSV files. Required data file columns: GN = gene name identifier that will be matched with prediction database, Peptide = unique peptide identifier (for example, sequence with modifications), Phosphosites = comma-separated phosphorylation sites (eg. "T102,S105"), pval= pairwise test p-value, fc= mean fold change, t= pairwise test t-statistic. pval and fc are used for results reporting only, all others are important for database searching, calculation, and permutation testing.

commonfilestring

Common string identifying all files to be included in analysis

predictionDB

Input database whose prediction scores will be used for calculations. Required columns: substrate_name= name of substrate corresponding to GN in summary_data, kinase_id = identifiers for kinase predictors, position= phosphorylated residue number, score = numeric score for strength of prediction.

results_folder

if desired, a single output folder. Else each run performed on each file will have a separate output folder identified by run initiation time.

...

parameters to be passed on to downstream functions, including(default): outputpath (tempdir()) n_permutations (1000), seed (123), kseadb (NULL), kin_ens_table (NULL). See run_on_matched, compare for details.

Examples

#point to data directory that contains summary .csv files
datapath <- system.file("extdata", package = "pKSEA")

#run batchrun function to analyze all files in that folder, with options
batchrun(datapath, predictionDB=NetworKINPred_db, kseadb = KSEAdb, n_permutations = 5)

Calculate score contributions by phosphorylation site

Description

Calculate score contributions by phosphorylation site

Usage

calc_contribution(matched_data)

Arguments

matched_data

Input

Value

matched_data with contribution scores calculated

Examples

#Read in example summary statistics dataset from csv
summarydata_ex <- read.csv(system.file("extdata", "example_data1.csv", package="pKSEA"))

#Get matched data using predictions from NetworKIN
matched_data_ex <- get_matched_data(summarydata_ex, NetworKINPred_db)

#Calculate contributions
calc_ex <- calc_contribution(matched_data_ex)

Running analysis runs on known substrates, predicted substrates, and both.

Description

Performs up to three run_on_matched() runs on summary-prediction matcheddata from get_matched_data(), returning permutation significance score results. If a KSEA database is provided for filtering and comparison, one full analysis will be performed on all phosphosites, one on data with all known kinase substrates removed according to the provided KSEA database, and one on known kinase substrates only.

Usage

compare(matched_data, predictionDB, kseadb, ...)

Arguments

matched_data

File path to summary statistic phosphoproteomics CSV data file with an entry for each phosphopeptide. Required data file columns: GN = gene name identifier that will be matched with prediction database, Peptide = unique peptide identifier (for example, sequence with modifications), Phosphosites = comma-separated phosphorylation sites (eg. "T102,S105"), pval= pairwise test p-value, fc= mean fold change, t= pairwise test t-statistic. pval and fc are used for results reporting only, all others are important for database searching, calculation, and permutation testing.

predictionDB

Input database whose prediction scores will be used for calculations. Required columns: substrate_name= name of substrate corresponding to GN in summary_data, kinase_id = identifiers for kinase predictors, position= phosphorylated residue number, score = numeric score for strength of prediction.

kseadb

Optional KSEA database for filtering purposes. Containing substrate gene name "SUB_GENE" and phosphorylated residue "SUB_MOD_RSD" in standard form (ie. T302).

...

optional parameters to be passed on to downstream functions, including (default): n_permutations (1000), seed (123), kin_ens_table (NULL). See run_on_matched for details.

Examples

#Read in example summary statistics dataset from csv
summarydata_ex <- read.csv(system.file("extdata", "example_data1.csv", package="pKSEA"))

#Get matched data using predictions from NetworKIN
matched_data_ex <- get_matched_data(summarydata_ex, NetworKINPred_db)

#Perform comparative analysis using provided KSEAdb as filter
## Not run: 
compare_results_ex <- compare(matched_data_ex, kseadb = KSEAdb, n_permutations = 10)

## End(Not run)

Filtering data to matched predictions

Description

This function reformats summary statistic phosphoproteomicdata to single observations for each phosphorylation site, duplicating other fields for multiple sites on the same peptide. Next, it attempts to find predictions for each phosphorylation site in the provided database. It returns observations (phosphorylation sites) for which a prediction is detected in the database, matching based on HUGO gene name and phosphorylated residue.

Usage

get_matched_data(datafull, predictionDB)

Arguments

datafull

Statistical summary data with an entry for each phosphopeptide. Required columns: GN = gene name identifier that will be matched with prediction database, Peptide = unique peptide identifier (for example, sequence with modifications), Phosphosites = comma-separated phosphorylation sites (eg. "T102,S105"), pval= pairwise test p-value, fc= mean fold change, t= pairwise test t-statistic. pval and fc are used for results reporting only, all others are important for database searching, calculation, and permutation testing.

predictionDB

Input database whose prediction scores will be used for calculations. Required columns: substrate_name= name of substrate corresponding to GN in datafull, kinase_id = identifiers for kinase predictors, position= phosphorylated residue number, score = numeric score for strength of prediction.

Examples

#Read in example summary statistics dataset from csv
summarydata_ex <- read.csv(system.file("extdata", "example_data1.csv", package="pKSEA"))

#Get matched data using predictions from NetworKIN
matched_data_ex <- get_matched_data(summarydata_ex, NetworKINPred_db)

Sum score contributions for each kinase across all phosphopeptides

Description

Sum score contributions for each kinase across all phosphopeptides

Usage

getscores(matched_data)

Arguments

matched_data

Input with calculated contributions

Value

A dataframe with each kinase as a row and raw kinase activity change score (KAC) calculated

Examples

#Read in example summary statistics dataset from csv
summarydata_ex <- read.csv(system.file("extdata", "example_data1.csv", package="pKSEA"))

#Get matched data using predictions from NetworKIN
matched_data_ex <- get_matched_data(summarydata_ex, NetworKINPred_db)

#Calculate contributions
calc_ex <- calc_contribution(matched_data_ex)

#Calculate kinase activity change scores
kac_ex <- getscores(calc_ex)


Extract summary table with pertinent columns related to included substrates

Description

Extract summary table with pertinent columns related to included substrates

Usage

getsubs(matched_data)

Arguments

matched_data

Input with calculated contributions


mk_runlabel()

Description

Utility function for generating new identifiers for each run, labeled by time run was initiated and custom suffix

Usage

mk_runlabel(parentdir = tempdir(), customsuffix)

Arguments

parentdir

parent directory

customsuffix

additional suffix to run identifier


Get percentile ranks across permutations

Description

Get percentile ranks across permutations

Usage

perc.permutation(results, permutations)

Arguments

results

Results of kinase scoring

permutations

Results of permutations


Obtain percentile rank comparing a single value to set

Description

Obtain percentile rank comparing a single value to set

Usage

perc.rank(set, value)

Arguments

set

Set of values to which given value will be compared

value

Value for which percentile score will be calculated


Perform permutation test

Description

Returns a table that has permuted the relationship between phosphopeptides and summary statistics (ie. fold change, t-score)

Usage

permtest(matched_data, perms = 1000, seed = 1)

Arguments

matched_data

Input with calculated contributions

perms

Number of permutations to run, default = 1000

Value

dataframe with kinases as rows, each column as KAC scores calculated from one permutation

Examples

#Read in example summary statistics dataset from csv
summarydata_ex <- read.csv(system.file("extdata", "example_data1.csv", package="pKSEA"))

#Get matched data using predictions from NetworKIN
matched_data_ex <- get_matched_data(summarydata_ex, NetworKINPred_db)

#Calculate contributions
calc_ex <- calc_contribution(matched_data_ex)

#Perform 5 permutations
perm_ex <- permtest(calc_ex, perms= 5, seed= 123)

Output writing of pKSEA compare() results

Description

Output only: uses results from compare(), outputs up to three files labeled full.csv and no_ksea.csv and ksea_only.csv appended to an output name (KSEA-filtered results only if KSEA database was provided to compare()).

Usage

results_write(full_ksea.results, outputpath = tempdir(), outputname,
  singlefolder = NULL)

Arguments

full_ksea.results

results from compare() including full and optional KSEA excluded and exclusive results

outputpath

parent directory for output, defaults to tempdir() unless defined by user

outputname

file name of output

singlefolder

if desired, name of output folder within output directory. Default is separate folders for each compare() run

Examples

#Read in example summary statistics dataset from csv
summarydata_ex <- read.csv(system.file("extdata", "example_data1.csv", package="pKSEA"))

#Get matched data using predictions from NetworKIN
matched_data_ex <- get_matched_data(summarydata_ex, NetworKINPred_db)

#Perform single run of pKSEA analysis
single_run_results_ex <- run_on_matched(matched_data_ex, n_permutations = 10)

#Export results to R session temporary directory
## Not run: 
results_write(single_run_results_ex, outputpath= tempdir(), outputname= "example")

## End(Not run)

Runs pKSEA analysis on a dataset result from get_matched_data.

Description

Calculates score contributions from summary statistics (tscore) and prediction scores, and sums contribution scores by kinase to calculate raw kinase activity change scores (KAC scores). Performs permutation test on summary statistic data to assess significance of kinase activity change scores, and reports significance as a percentile score (pKSEA significance score).

Usage

run_on_matched(matched_data, n_permutations = 1000, seed = 123,
  kin_ens_table = NULL)

Arguments

matched_data

data after filtering against predictions (results from get_matched_data())

n_permutations

number of mutations to perform (default 1000)

seed

seed used for permutation testing

kin_ens_table

optional table for inclusion of matched ensembl ids for kinases, with columns: ens = ensembl id, kinases = kinase_id as otherwise used

Examples

#Read in example summary statistics dataset from csv
summarydata_ex <- read.csv(system.file("extdata", "example_data1.csv", package="pKSEA"))

#Get matched data using predictions from NetworKIN
matched_data_ex <- get_matched_data(summarydata_ex, NetworKINPred_db)

#Perform single run of pKSEA analysis
single_run_results_ex <- run_on_matched(matched_data_ex, n_permutations = 10)