--- title: "The Linear Phenotypic Selection Index Theory" Author: "Zankrut Goyani" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{The Linear Phenotypic Selection Index Theory} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Introduction In plant and animal breeding, quantitative traits (QTs) are expressions of genes distributed across the genome interacting with the environment. The phenotypic value of QTs ($y$) can be systematically partitioned into a genotypic component ($g$) and an environmental component ($e$): $$ y = g + e $$ The primary goal in breeding is to maximize an individual's **net genetic merit**. The net genetic merit ($H$) is a linear combination of the unobservable true breeding values ($\mathbf{g}$) weighted by their respective economic values ($\mathbf{w}$): $$ H = {\mathbf{w}}^{\prime}\mathbf{g} $$ Because the net genetic merit is unobservable in field trials, breeders construct a **Linear Phenotypic Selection Index (LPSI)** to predict it. The LPSI ($I$) is a linear combination of the observable and optimally weighted phenotypic trait values ($\mathbf{y}$) adjusted by index coefficients ($\mathbf{b}$): $$ I = {\mathbf{b}}^{\prime}\mathbf{y} $$ The objective of the LPSI is to predict the net genetic merit and maximize the multi-trait selection response. ## Optimizing the LPSI To identify the optimal parents for the next selection cycle, the correlation between the net genetic merit ($H$) and the LPSI ($I$) must be maximized. The vector $\mathbf{b}$ that simultaneously minimizes the mean squared difference between $I$ and $H$ and perfectly maximizes this correlation is mathematically derived as: $$ \mathbf{b} = {\mathbf{P}}^{-1}\mathbf{Gw} $$ where: * $\mathbf{P}$ is the phenotypic variance-covariance matrix. * $\mathbf{G}$ is the genotypic variance-covariance matrix. * $\mathbf{w}$ is the vector of economic weights defining relative trait importance. Once these optimal coefficients are derived, we can evaluate two fundamental parameters: 1. **The Maximized Selection Response ($R_I$)**: The expected mean improvement in the net genetic merit due to indirect selection on the index. $$ {R}_I = {k}_I\sqrt{{\mathbf{b}}^{\prime}\mathbf{Pb}} $$ 2. **The Expected Genetic Gain Per Trait ($\mathbf{E}$)**: The multi-trait selection response broken down per individual trait. $$ \mathbf{E} = {k}_I\frac{\mathbf{Gb}}{\sigma_I} $$ where $k_I$ is the standardized selection intensity and $\sigma_I$ is the standard deviation of the index score variance. ## Practical Implementation in R We can seamlessly translate this text theory into rigorous statistical practice using the `selection.index` package. We will utilize the built-in synthetic datasets: `maize_pheno` (containing multi-environment phenotypic records for 100 genotypes) and `maize_geno` (500 SNP markers). ### 1. Estimating Covariance Matrices First, we estimate the genotypic ($\mathbf{G}$) and phenotypic ($\mathbf{P}$) variance-covariance matrices from our raw phenotypic dataset. ```{r matrices} library(selection.index) # Load the synthetic phenotypic multi-environment dataset data("maize_pheno") # In maize_pheno: Traits are columns 4:6. # Genotypes are in column 1, and Block/Replication is in column 3. gmat <- gen_varcov(data = maize_pheno[, 4:6], genotypes = maize_pheno[, 1], replication = maize_pheno[, 3]) pmat <- phen_varcov(data = maize_pheno[, 4:6], genotypes = maize_pheno[, 1], replication = maize_pheno[, 3]) ``` ### 2. Defining Economic Weights Next, we establish the relative economic priority of each trait. Economic weights ($\mathbf{w}$) explicitly define our strategic breeding objectives. ```{r weights} # Define the economic weights for the 3 continuous traits # (e.g., Yield, PlantHeight, DaysToMaturity) weights <- c(10, -5, -5) ``` ### 3. Calculating the LPSI With the covariance matrices and economic weights specified, we integrate them into the primary `lpsi()` function, which evaluates the combinatorial multi-trait selection indices efficiently. ```{r lpsi} # Calculate the Optimal Combinatorial Linear Phenotypic Selection Index (LPSI) index_results <- lpsi( ncomb = 3, pmat = pmat, gmat = gmat, wmat = as.matrix(weights), wcol = 1 ) ``` ### 4. Evaluating Outcomes and Selecting Genotypes Finally, we evaluate the theoretical gains. The `lpsi()` function returns a structured data frame containing the theoretical selection response ($R_I$) and other parameter estimates for all requested trait combinations. ```{r gains} # View the top combinatorial indices, including their selection response (R_A) head(index_results) # Extract the phenotypic selection scores to strategically rank the parental candidates # using the top evaluated combinatorial index scores <- predict_selection_score( index_results, data = maize_pheno[, 4:6], genotypes = maize_pheno[, 1] ) # View the top performing candidates designated for the next breeding cycle head(scores) ``` ### 5. Extension: Linear Marker Selection Index The classical linear selection index theories seamlessly extend to marker-assisted genomic selection. If you have genome-wide marker profiles for your genotypes, you can incorporate them to estimate the Linear Marker Selection Index (LMSI). ```{r marker_data, eval=FALSE} # Load the associated synthetic genomic dataset (500 SNPs for the 100 genotypes) data("maize_geno") # Calculate the marker-assisted index combining our matrices and raw SNP profiles marker_index_results <- lmsi( pmat = pmat, gmat = gmat, marker_scores = maize_geno, wmat = weights ) summary(marker_index_results) ``` ### 6. The Base Index and Index Efficiency In scenarios where the phenotypic ($\mathbf{P}$) and genotypic ($\mathbf{G}$) matrices are poorly estimated (e.g., due to limited data), the true optimal coefficients ($\mathbf{b}$) can be systematically biased. The **Base Index** provides a robust, non-optimized alternative where coefficients are set strictly equal to the fixed economic weights ($I_B = \mathbf{w}'\mathbf{y}$). ```{r base_index} # Calculate the Base Index and automatically compare its efficiency to the LPSI base_results <- base_index( pmat = pmat, gmat = gmat, wmat = weights, compare_to_lpsi = TRUE ) # Observe the expected genetic gains and efficiency comparison base_results$summary ``` ### 7. Heritability of the LPSI The theory demonstrates that the correlation between the net genetic merit ($H$) and the expected index ($I$) differs from the traditional index heritability mathematically ($h^2_I \neq \rho^2_{HI}$). The `lpsi()` function intrinsically estimates both of these fundamental statistics: ```{r heritability} # Extract the top combinatorial index results top_index <- index_results[1, ] # h^2_I: Heritability of the optimal index top_index$hI2 # \rho_HI: Correlation between the LPSI and the true underlying Net Genetic Merit top_index$rHI ```