--- title: "Importing Genetic Data into Coala" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Importing Genetic Data into Coala} %\VignetteEngine{knitr::rmarkdown} %\usepackage[utf8]{inputenc} --- While `coala` primiary focuses on simulation of data, it also offers to calculate summary statsitcs from real data. This is particularly useful when comparing the statistics of real and simulated data. Rather than offering functions to import data directly, coala can convert the internal formats of other R packages into its own format. Currently, the `PopGenome` package is supported, but we plan to support `ape` and `pegas` in the future. Importing Data from PopGenome ----------------------------- `PopGenome` provides functions for reading various data formats, including `vcf` and `fasta`. Please refer to its documentation for detailed instructions. As an example, we will read sequence data from a short fasta file that is included in coala: ```{r import_data, results="hide", eval=FALSE} suppressPackageStartupMessages(library(PopGenome)) fasta <- system.file("example_fasta_files", package = "coala") data_pg <- readData(fasta, progress_bar_switch = FALSE) data_pg <- set.outgroup(data_pg, c("Individual_Out-1", "Individual_Out-2")) individuals <- list(paste0("Individual_1-", 1:5), paste0("Individual_2-", 1:5)) data_pg <- set.populations(data_pg, individuals) ``` Here the sequences originate from two population and an outgroup. The outgroup is required for most summary statistics. We can now convert `data_pg` using the `as.segsites` function: ```{r, eval=FALSE} library(coala) segsites <- as.segsites(data_pg) ``` Next, we calculate summary statistics using `calc_sumstats_from_data`: ```{r calc_sumstats_1, eval=FALSE} model <- coal_model(c(5, 5, 2), 1, 25) + feat_mutation(5) + feat_outgroup(3) + sumstat_sfs(population = 1) stats <- calc_sumstats_from_data(model, segsites) stats$sfs ``` Alternatively, it is also possible to pass the `data_pg` object directly to `calc_sumstats_from_data`: ```{r calc_sumstats_2, eval=FALSE} stats <- calc_sumstats_from_data(model, data_pg) stats$sfs ``` Please refer to the help pages for `as.segsites` and `calc_sumstats_from_data` for additional information.