Author: Martin Morgan Date: 22 July, 2019
1 + 2## [1] 3x = c(1, 2, 3)
1:3             # sequence of integers from 1 to 3## [1] 1 2 3x + c(4, 5, 6)  # vectorized## [1] 5 7 9x + 4           # recycling## [1] 5 6 7Vectors
numeric(), character(), logical(), integer(), complex(), …NA: ‘not available’factor(): values from restricted set of ‘levels’.Operations
==, <, <=, >, >=, …| (or), & (and), ! (not)[, e.g., x[c(2, 3)][<-, e.g., x[c(1, 3)] = x[c(1, 3)]is.na()Functions
x = rnorm(100)
y = x + rnorm(100)
plot(x, y)data.frame
df <- data.frame(Independent = x, Dependent = y)
head(df)##   Independent  Dependent
## 1   0.8658385  0.8357491
## 2  -1.2530897 -2.3004453
## 3   0.6287058  1.8726218
## 4  -0.4357103 -1.7256617
## 5  -0.9183898 -0.8309443
## 6  -0.1622652 -1.0660857df[1:5, 1:2]##   Independent  Dependent
## 1   0.8658385  0.8357491
## 2  -1.2530897 -2.3004453
## 3   0.6287058  1.8726218
## 4  -0.4357103 -1.7256617
## 5  -0.9183898 -0.8309443df[1:5, ]##   Independent  Dependent
## 1   0.8658385  0.8357491
## 2  -1.2530897 -2.3004453
## 3   0.6287058  1.8726218
## 4  -0.4357103 -1.7256617
## 5  -0.9183898 -0.8309443plot(Dependent ~ Independent, df)  # 'formula' interfacedf[, 1], df[, "Indep"], df[[1]],
df[["Indep"]], df$IndepExercise: plot only values with Dependent > 0, Independent > 0
Select rows
ridx <- (df$Dependent > 0) & (df$Independent > 0)Plot subset
plot(Dependent ~ Independent, df[ridx, ])Skin the cat another way
plot(
    Dependent ~ Independent, df,
    subset = (Dependent > 0) & (Independent > 0)
)fit <- lm(Dependent ~ Independent, df)  # linear model -- regression
anova(fit)                              # summary table## Analysis of Variance Table
## 
## Response: Dependent
##             Df Sum Sq Mean Sq F value    Pr(>F)    
## Independent  1 89.009  89.009  97.886 < 2.2e-16 ***
## Residuals   98 89.113   0.909                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1plot(Dependent ~ Independent, df)
abline(fit)lm(): plain-old functionfit: an object of class “lm”anova(): a generic with a specific method for class “lm”class(fit)## [1] "lm"methods(class="lm")##  [1] add1           alias          anova          case.names    
##  [5] coerce         confint        cooks.distance deviance      
##  [9] dfbeta         dfbetas        drop1          dummy.coef    
## [13] effects        extractAIC     family         formula       
## [17] hatvalues      influence      initialize     kappa         
## [21] labels         logLik         model.frame    model.matrix  
## [25] nobs           plot           predict        print         
## [29] proj           qr             residuals      rstandard     
## [33] rstudent       show           simulate       slotsFromS3   
## [37] summary        variable.names vcov          
## see '?methods' for accessing help and source code?"plot"          # plain-old-function or generic
?"plot.formula"  # method
?"plot.lm"       # method for object of class 'lm', plot(fit)library(ggplot2)
ggplot(df, aes(x = Independent, y = Dependent)) +
    geom_point() + geom_smooth(method = "lm")library(ggplot2), once per session)Started 2002 as a platform for understanding analysis of microarray data
1,750 packages. Domains of expertise:
Important themes
Resources
A distinctive feature of Bioconductor – use of objects for representing data
library(Biostrings)
dna <- DNAStringSet(c("AACTCC", "CTGCA"))
dna##   A DNAStringSet instance of length 2
##     width seq
## [1]     6 AACTCC
## [2]     5 CTGCAreverseComplement(dna)##   A DNAStringSet instance of length 2
##     width seq
## [1]     6 GGAGTT
## [2]     5 TGCAGWeb site, https://bioconductor.org
1750 ‘software’ packages, https://bioconductor.org/packages
Discovery and use, e.g., DESeq2
Also:
sessionInfo()## R version 3.6.0 (2019-04-26)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Linux Mint 19
## 
## Matrix products: default
## BLAS:   /home/msmith/Applications/R/R-3.6.0/lib/libRblas.so
## LAPACK: /home/msmith/Applications/R/R-3.6.0/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
## [1] Biostrings_2.52.0   XVector_0.24.0      IRanges_2.18.1     
## [4] S4Vectors_0.22.0    BiocGenerics_0.30.0 ggplot2_3.2.0      
## [7] BiocStyle_2.12.0   
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.1         pillar_1.4.2       compiler_3.6.0    
##  [4] BiocManager_1.30.4 zlibbioc_1.30.0    tools_3.6.0       
##  [7] digest_0.6.20      evaluate_0.13      tibble_2.1.3      
## [10] gtable_0.3.0       pkgconfig_2.0.2    rlang_0.4.0       
## [13] yaml_2.2.0         xfun_0.7           withr_2.1.2       
## [16] stringr_1.4.0      dplyr_0.8.1        knitr_1.23        
## [19] grid_3.6.0         tidyselect_0.2.5   glue_1.3.1        
## [22] R6_2.4.0           rmarkdown_1.12     bookdown_0.10     
## [25] purrr_0.3.2        magrittr_1.5       scales_1.0.0      
## [28] codetools_0.2-16   htmltools_0.3.6    assertthat_0.2.1  
## [31] colorspace_1.4-1   labeling_0.3       stringi_1.4.3     
## [34] lazyeval_0.2.2     munsell_0.5.0      crayon_1.3.4Research reported in this tutorial was supported by the National Human Genome Research Institute and the National Cancer Institute of the National Institutes of Health under award numbers U41HG004059 and U24CA180996.
This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement number 633974)