--- title: "Tutorial of R package gpyramid" output: rmarkdown::html_vignette author: Shoji Taniguchi vignette: > %\VignetteIndexEntry{Tutorial} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## 1. Set up ```{r setup} library(gpyramid) library(ape) library(dplyr) ``` ## 2. Prepare data ### 2.1 Gene data ```{r r1} line_df <- data.frame(line = c("x1", "x2", "x3", "x4", "x5", "x6"), gene1 = c("H", "H", "A", "B", "B", "B"), gene2 = c("H", "H", "A", "B", "B", "A"), gene3 = c("H", "H", "B", "A", "A", "A"), gene4 = c("H", "B", "B", "B", "A", "B"), gene5 = c("H", "H", "B", "A", "B", "B"), gene6 = c("B", "A", "B", "A", "B", "B"), gene7 = c("B", "B", "B", "B", "B", "A")) line_df ``` ### 2.2 Position data ```{r} position_df <- data.frame(Gene = c("gene1", "gene2", "gene3", "gene4", "gene5", "gene6", "gene7"), Chr = c("1", "2", "3", "4", "5", "6", "7"), cM = c(20, 0, 40, 20, 10, 0, 0)) position_df ``` ### 2.3 Preprosessing #### Generate haplotype dataframe from row data ```{r r3} gene_dat <- util_haplo(line_df, target = "A", non_target = "B", hetero = "H", line_cul = "line") gene_df1 <- gene_dat[[1]] gene_df2 <- gene_dat[[2]] line_id <- gene_dat[[3]] colnames(gene_df1) <- line_id colnames(gene_df2) <- line_id gene_df1 gene_df2 ``` #### Generate recombination probability matrix from raw data ```{r} recom_mat <- util_recom_mat(position_df, "cM") recom_mat ``` ## 3. Find parent sets from candidate lines (cultivars) From candidate lines, `findPset` function returns the parent sets for gene pyramidding. In this example, there are 4 sets for gene pyramidding. ```{r} line_comb_lis <- findPset(gene_df1, gene_df2, line_id) line_comb_lis ``` ## 4. Calculate the number of necessary individuals and generations ### 4.1 Calculate cost of all the crossing schemes `calcCostAll` function calculates the number of necessary individuals and generations as the crossing cost for all the crossing schemes. Given parent sets for gene pyramidding, `calCostAll` function simulates all the crossing schemes and calculates the number of necessary individuals and generations as the cost of gene pyramidding. `calcCostAll` function returns the `gpyramid_all` object, which contains information of all the crossing schemes. Here, `getFromAll` function get one crossing scheme from `gpyramid_all` object. ```{r} cost_rslt <- calcCostAll(line_comb_lis, gene_df1, gene_df2, recom_mat, prob_total = 0.99) cost_rslt ``` ### 4.2 Plot cost of each crossing scheme The output of `calCostAll` function contains the cost of each crossing scheme. Here, we generate a plot of the number of necessary plants and generations. ```{r} cost_all <- cost_rslt$cost_all plot(x = cost_all$n_parent, y = cost_all$N_total, log = "y", xlab = "Number of generations", ylab = "Number of individuals") ``` ### 4.3 Select the most cost-effective crossing strategy ```{r} cost_all[cost_all$N_total < 200,] ``` ```{r} rslt_one <- getFromAll(cost_rslt, cross_id = 13) summary(rslt_one) plot(rslt_one$topolo) nodelabels() ``` ### 4.4 Another example ```{r} rslt_one <- getFromAll(cost_rslt, cross_id = 7) summary(rslt_one) plot(rslt_one$topolo) nodelabels() ```