Type: | Package |
Title: | Machine Learning Foundations |
Version: | 1.2.1 |
Date: | 2018-06-21 |
Maintainer: | Kyle Peterson <petersonkdon@gmail.com> |
Description: | Offers a gentle introduction to machine learning concepts for practitioners with a statistical pedigree: decomposition of model error (bias-variance trade-off), nonlinear correlations, information theory and functional permutation/bootstrap simulations. Székely GJ, Rizzo ML, Bakirov NK. (2007). <doi:10.1214/009053607000000505>. Reshef DN, Reshef YA, Finucane HK, Grossman SR, McVean G, Turnbaugh PJ, Lander ES, Mitzenmacher M, Sabeti PC. (2011). <doi:10.1126/science.1205438>. |
Imports: | stats, utils |
URL: | http://mlf-project.us/ |
License: | GPL-2 |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 6.0.1 |
NeedsCompilation: | no |
Packaged: | 2018-06-22 18:50:01 UTC; Admin |
Author: | Kyle Peterson [aut, cre] |
Repository: | CRAN |
Date/Publication: | 2018-06-25 08:01:20 UTC |
Bootstrap Confidence Intervals via Resampling
Description
Provides nonparametric confidence intervals via percentile-based resampling for given mlf
function.
Usage
boot(x, y, func, reps, conf.int)
Arguments
x , y |
numeric vectors of data values |
func |
specify |
reps |
(optional) number of resamples. Defaults to 500 |
conf.int |
(optional) numeric value indicating level of confidence. Defaults to |
Examples
# Sample data
a <- rnorm(25, 80, 35)
b <- rnorm(25, 100, 50)
mlf::mic(a, b)
mlf::boot(a, b, mic)
Bias-Variance Trade-Off
Description
Provides estimated error decomposition from model predictions (mse, bias, variance).
Usage
bvto(truth, estimate)
Arguments
truth |
test data vector or baseline accuractruth to test against. |
estimate |
predicted vector |
Examples
# Sample data
test <- rnorm(25, 80, 35)
predicted <- rnorm(25, 80, 50)
mlf::bvto(test, predicted)
Distance Correlation
Description
Provides pairwise correlation via distance covariance normalized by distance standard deviation. Allows for non-linear dependencies.
Usage
distcorr(x, y)
Arguments
x , y |
numeric vectors of data values |
References
Székely GJ, Rizzo ML, Bakirov NK. Measuring and testing dependence by correlation of distances. Ann Stat. 2007. 35(6):2769-2794.
Examples
# Sample data
a <- rnorm(25, 80, 35)
b <- rnorm(25, 100, 50)
mlf::distcorr(a, b)
Entropy
Description
Estimates uncertainty in univariate probability distribution.
Usage
entropy(x, bins)
Arguments
x |
numeric or discrete data vector |
bins |
specify number of bins if numeric or integer data class. |
Examples
# Sample numeric vector
a <- rnorm(25, 80, 35)
mlf::entropy(a, bins = 2)
# Sample discrete vector
b <- as.factor(c(1,1,1,2))
mlf::entropy(b)
Bias
Description
Estimates squared bias by decomposing model prediction error.
Usage
get_bias(truth, estimate)
Arguments
truth |
test data vector or baseline accuracy to test against. |
estimate |
predicted vector |
Examples
# Sample data
test <- rnorm(25, 80, 35)
predicted <- rnorm(25, 80, 50)
mlf::get_bias(test, predicted)
Mean Squared Error
Description
Estimates mean squared error from model predictions.
Usage
get_mse(truth, estimate)
Arguments
truth |
test data vector or baseline accuracy to test against. |
estimate |
predicted vector |
Examples
# Sample data
test <- rnorm(25, 80, 35)
predicted <- rnorm(25, 80, 50)
mlf::get_mse(test, predicted)
Variance
Description
Estimates squared variance by decomposing model prediction error.
Usage
get_var(estimate)
Arguments
estimate |
predicted vector |
Examples
# Sample data
test <- rnorm(25, 80, 35)
predicted <- rnorm(25, 80, 50)
mlf::get_var(predicted)
Joint Entropy
Description
Estimated difference between two probability distributions.
Usage
jointentropy(x, y, bins)
Arguments
x , y |
numeric or discrete data vectors |
bins |
specify number of bins |
Examples
# Sample numeric vector
a <- rnorm(25, 80, 35)
b <- rnorm(25, 90, 35)
mlf::jointentropy(a, b, bins = 2)
# Sample discrete vector
a <- as.factor(c(1,1,2,2))
b <- as.factor(c(1,1,1,2))
mlf::jointentropy(a, b)
Kullback-Leibler Divergence
Description
Provides estimated difference between individual entropy and cross-entropy of two probability distributions.
Usage
kld(x, y, bins)
Arguments
x , y |
numeric or discrete data vectors |
bins |
specify number of bins |
Examples
# Sample numeric vector
a <- rnorm(25, 80, 35)
b <- rnorm(25, 90, 35)
mlf::kld(a, b, bins = 2)
# Sample discrete vector
a <- as.factor(c(1,1,2,2))
b <- as.factor(c(1,1,1,2))
mlf::kld(a, b)
Mutual Information
Description
Estimates Kullback-Leibler divergence of joint distribution and the product of two respective marginal distributions. Roughly speaking, the amount of information one variable provides about another.
Usage
mi(x, y)
Arguments
x , y |
numeric or discrete data vectors |
Examples
# Sample data
a <- rnorm(25, 80, 35)
b <- rnorm(25, 100, 50)
mlf::mi(a, b)
Maximal Information Criterion
Description
Information-theoretic approach for detecting non-linear pairwise dependencies. Employs heuristic discretization to achieve highest normalized mutual information.
Usage
mic(x, y)
Arguments
x , y |
numeric or discrete data vectors |
References
Reshef DN, Reshef YA, Finucane HK, Grossman SR, McVean G, Turnbaugh PJ, Lander ES, Mitzenmacher M, Sabeti PC. Detecting novel associations in large data sets. Science. 2011. 334(6062):1518-1524.
Examples
# Sample data
a <- rnorm(25, 80, 35)
b <- rnorm(25, 100, 50)
mlf::mic(a, b)
Permutation Test
Description
Provides nonparametric statistical significance via sample randomization.
Usage
perm(x, y, func, reps)
Arguments
x , y |
numeric vectors of data values |
func |
specify |
reps |
(optional) number of resamples. Defaults to 500. |
Examples
# Sample data
a <- rnorm(25, 80, 35)
b <- rnorm(25, 100, 50)
mlf::mic(a, b)
mlf::perm(a, b, mic)