%\VignetteIndexEntry{polySegratio} %\VignetteDepends{polySegratio} %\VignetteKeywords{polyploids,segregation ratio} %\VignettePackage{polySegratio} \documentclass[a4paper]{article} %\VignetteIndexEntry{polySegratio overview} %\VignettePackage{polySegratio} \usepackage{fancyhdr} \pagestyle{fancy} %%%\usepackage{palatino} \usepackage[T1]{fontenc} \usepackage[sc]{mathpazo} \linespread{1.05} % Palatino needs more leading (space between lines) \usepackage[round]{natbib} \usepackage{graphicx} \usepackage{url} \renewcommand{\sectionmark}[1]{\markright{\thesection.\#1}} \fancyhead{} \lhead{polySegratio} %%%\rhead{\date{\today}} \rhead{\date{January 9, 2008}} \cfoot{R library for autopolyploid segregation analysis} \lfoot{} \rfoot{\thepage} \renewcommand{\headrulewidth}{0.4pt} \renewcommand{\footrulewidth}{0.4pt} \title{polySegratio: An R library for autopolyploid segregation analysis} \author{Peter Baker} \date{January 9, 2008} \begin{document} \maketitle \SweaveOpts{prefix.string=tmp/tmp} It is well known that the dosage level of markers in autopolyploids and allopolyploids can be characterised by their observed segregation ratios. On the other hand, contrary to methods employed in several studies, segregation ratios are not a good indicator of polyploid type \citep{qu02b}. The \texttt{polySegratio} package provides standard approaches to assess marker dosage in autopolyploids although the functions could equally well be applied to allopolyploids with specified expected segregation ratios. In addition, simulated sets of markers may be generated with specified dosages, ploidy and levels of oversidpersion. To use the library, you need to attach it with <<>>= library(polySegratio) @ <>= op <- options() options(width=70, digits=4) @ \section{Expected segregation ratios} \label{sec:expected-segratios} \cite{haldane30} outlined the derivation of the expected numbers and ratios of offspring for various parental configurations of autopolyploids. Expected gametic series for polyploids of various sizes were produced, along with expected ratios of gametic series for crosses and selfing and the equilibrium distribution under random mating. \citeauthor{haldane30} provides expected gametic series when one parent is nulliplex for polyploids up to order 16 (heccaidecaploid). \begin{table}[htbp] \begin{center} \caption {The gametic segregation in an autooctaploid of a heterozygous cross $(A^sa^{8-s}, s=1\ldots7)$ with a nulliplex $(a^8)$ assuming bivalent pairing and no double reduction. The ratio is of dominants to recessives and $\omega_k$ is the proportion of dominants.} \begin{tabular}{|l|ccccc|r@{:}lc|} \hline Heterozygous & \multicolumn{5}{c|}{Gametes} & \multicolumn{3}{c|}{Segregation Ratio}\\ Parent & $A^4$ & $A^3a$ & $A^2a^2$ & $Aa^3$ & $a^4$ & $A^sa^{8-s}$ & $a^8$ & $\omega_k$ \\ \hline $Aa^7$ & & & & 1 & 1 & 1&1 & 0.500 \\ $A^2a^6$ & & & 3 & 8 & 3 & 11&3 & 0.786 \\ $A^3a^5$ & & 1& 6& 6& 1 & 13&1 & 0.929 \\ $A^4a^4$ & 1 & 16 & 36 & 16 & 1 & 69&1 & 0.986 \\ $A^5a^3$ & 1 & 6 & 6 & 1 & & \multicolumn{2}{c}{ }&\\ $A^6a^2$ & 3 & 8 & 3 & & & \multicolumn{2}{c}{ }& \\ $A^7a$ & 1 & 1 & & & & \multicolumn{2}{c}{ }& \\ \hline \end{tabular} \label{tab:seg-ratio-1} \end{center} \end{table} For an autooctaploid with bivalent pairing and in the absence of double reduction \footnote{Double reduction: if separation for any locus is equational the two chromatids from one chromosome may be present together in one interphase nucleus but joined to separate centromeres allowing them to enter the same gamete. Sister chromatids in the same gamete, reducing the genetic content of a gamete twice, instead of once. Normally, two of the four chromosomes end up together in a gamete, reducing the genetic content in half. With double reduction gametes, the two chromosomes in the gamete are the same, at least at some loci; i.e., they are sister chromatids, and genetic content is reduced to 1/4 when compared to the parental plant. See \cite{mather36}} with $A$ being the dominant allele and $a$ the recessive, then the expected gametic series formed are outlined in Table~\ref{tab:seg-ratio-1}. Employing the notation that $A^s$ represents $s$ copies of allele $A$, then if a heterozygous parent $A^ra^{8-r}$ is crossed with a recessive nulliplex ($a^8$) octaploid then the results of crossing can be calculated by symbolic manipulation. For instance, if a parent with a single dose marker $Aa^7$ is crossed with a nulliplex parent $a^8$ then $Aa^7 \times a^8$ yields $(1.Aa^3 + 1.a^4) \times (a^4)$ or zygotes $(1.Aa^7 + 1.a^8)$ with ratios $1.Aa^7 : 1.a^8$. Although published previously in slightly different forms, the general formula of \cite{ripol99} is employed for $p(k)$ or the expected segregation proportion given dosage $k$ which is \begin{equation} \label{eq:ripol1} p(k| m, x) = 1 - {{m-k \choose mx} \over {m \choose mx}} , k=0 \ldots m/2 \end{equation} where $m$ is the ploidy level or number of homologous chromosomes and the monoploid number $x$ is the number of chromosomes in a basic set. Note that for diploids $m=2$, tetraploids $m=4$ , octaploids then $m=8$ and so on. To obtain such theoretical segregation proportions or probabilities using \texttt{expected.segRatio} is straightforward by specifying the ploidy level either numerically or by name. The function \texttt{expected.segRatio} employs Equations~\ref{eq:ripol1} and \ref{eq:homog} to compute expected segregation proportions. For instance <<>>= ## obtain expected segregation ratios ## default is one nulliplex parent so type.parents="heterogeneous" print(unlist(expected.segRatio(2))) print(unlist(expected.segRatio("Tetraploid"))) print(expected.segRatio("Octa")$ratio) @ In the case where, an AFLP band is present in both parents but not in all offspring, there must be less than four copies of the dominant allele in both parents. For instance, crossing the two genetically similar autooctoploid lines $Aa^7$ results in 1 nulliplex in 4 since $(1.Aa^3 + 1.a^4)^2$ is simply $(1.A^2a^6 + 2.Aa^7 + 1.a^8)$. For alternate autooctoploid parental configurations result in segregation proportions of around 0.9 or above and would apparently therefore be indistinguishable via segregation ratios alone. Similarly to Equation~\ref{eq:ripol1} we deduce that if both parents contain at least one copy of the dominant marker than a general equation for then for the dosage $j$ in the first parent and dosage $k$ in the second parent then the expected segregation proportion $p(j,k)$ is \begin{equation} \label{eq:homog} p(j, k | m, x) = 1 - { {m-k \choose mx} {m-j \choose mx} \over {m \choose mx}^2 }, j,k=0 \ldots m/2 \end{equation} where $m$ and $x$ are defined in Equation~\ref{eq:ripol1}, noting that neither parent is nulliplex. Such segregation ratios may be computed using \texttt{expected.segRatio} as follows: <<>>= ## obtain expected segregation ratios with type.parents="homozygous" print(unlist(expected.segRatio("tetra",type="homoz"))) print(expected.segRatio("Octa",type="homoz")$ratio) @ Note that Equations~\ref{eq:ripol1} and \ref{eq:homog} are defined for $m$ even but that a warning is issued and results still calculated if $m$ is odd. As an example <<>>= ## obtain expected segregation ratios with odd ploidy level a <- expected.segRatio(9) print(a$ratio) @ \section{Simulating a set of markers} \label{sec:simulate} Functions \texttt{sim.autoMarkers} and \texttt{sim.autoCross} may be used to simulate marker data for a collection of markers where either one of the parents is nulliplex or where both parents contain at least one dose of a marker. The data are only simulated to produce appropriate segregation ratios but other genetic parameters such as recombination, degree of preferential pairing or a genetic map are not considered. The proportions in each marker dosage need to be specified. \texttt{sim.autoMarkers} may be used to simulate dominant markers from an autopolyploid cross given the ploidy level, specified parental marker alleles, the expected segregation ratios and the proportions in each dosage marker class. The ploidy level may be chosen from tetraploid to heccaidecaploid and the segregation ratios may be specified explicitly or generated automatically. \texttt{sim.autoCross} is a wrapper to \texttt{sim.autoMarkers} which is used to generate markers for parents with markers that are 10, 01 or 11. The proportions of markers for each of these three parental types must be specified. Both functions return S3 class objects (class \texttt{simAutoCross} and class \texttt{simAutoMarkers}) which have associated print and plot methods. For instance, to generate and plot the segregation proportions for 200 markers for 100 progeny from a tetraploid cross where one of the parents is nulliplex and there are 70\% single dose markers and 30\% dose markers then use <<>>= mark.sim4 <- sim.autoMarkers(4, dose.proportion=c(0.7,0.3), n.markers=200, n.individuals = 200) print(mark.sim4) @ \begin{figure} [ht] \begin{center} <>= plot(mark.sim4) @ \caption{Segregation ratios from simulated marker data for 200 markers for a autotetraploid cross with 100 offspring} \label{fig:sim1} \end{center} \end{figure} Figure~\ref{fig:sim1} shows a histogram of segregation proportions for a tetraploid cross. Other plots, may be produced. For instance, the number of missing values is useful when looking at real data to determine if some markers are not well measured (See Figure~\ref{fig:sim2}). Often in molecular marker studies, a small percentage of markers may be missing or misclassified. The functions \texttt{addMissing} and \texttt{addMisclass} allow marker data to be modified accordingly. The rate may be specified either as a proportion of missing at random or a proportion of columns and rows with specified proportions of missings or misclassified. Not that if markers are randomly misclassified then the expected segregations ratios are still the same and so we may not expect to see much difference to perfectly classified markers. \texttt{addMissing} adds missing data at random to objects of class \texttt{autoMarker} or \texttt{autoCross}. \texttt{addMisclass} misclassifies marker data in objects of class \texttt{autoMarker} or \texttt{autoCross} at a specified rate. Parental marker data may also be misclassified. An example might be <<>>= miss.sim4 <- addMisclass(mark.sim4, misclass = 0.1) miss.sim4 <- addMissing(miss.sim4, na.proportion = 0.2) print(miss.sim4, col=c(1:6)) @ \begin{figure} [ht] \begin{center} <>= plot(miss.sim4, type="all") @ \caption{Histograms of the number of markers labelled 1, numbers of missing values per marker and segregation ratios} \label{fig:sim2} \end{center} \end{figure} \subsection{Overdispersion} \label{sec:overdispersion} Since markers are correlated and may be subject to different types of measurement errors, then the segregation ratios may follow an overdispersed Binomial distribution. Such markers may be simulated with \texttt{sim.autoMarkers} by setting the parameter \texttt{overdispersion} to \texttt{TRUE}. The amount of overdispersion or extra--binomial variation may be specified by setting the \texttt{shape1} parameter. Larger values imply less overdispersion. Typically, the \texttt{R} command would be like \texttt{sim.autoMarkers(4,c(0.8,0.2), overdisp=TRUE, shape1=20)} Overdispersed marker data are simulated from the Beta--Binomial distribution where the Binomial proportion $p$ is generated from a Beta distribution. Note that if $p$ is generated from a $\beta(a,b)$ distribution, then $E(p)=a/(a+b)$ and Var$(p)=ab/((a+b)^2(a+b+1))$. Thus constraining $E(p)$ to be the appropriate segregation proportion and setting the first shape parameter $a$ implies that $b = a(1-p)/p$. Tetraploid marker data generated for a range of \texttt{shape1} or $a$ values is shown in Figure~\ref{fig:overdisp1}. \begin{figure} [htb] \begin{center} <>= op <- par(mfrow = c(2, 2)) cmain <- 1.7 plot(sim.autoMarkers(4,c(0.8,0.2)), main="No overdispersion", cex.main=cmain) plot(sim.autoMarkers(4,c(0.8,0.2), overdisp=TRUE), main="Shape1 = 50", cex.main=cmain) plot(sim.autoMarkers(4,c(0.8,0.2), overdisp=TRUE, shape1=15), main="Shape1 = 15", cex.main=cmain) plot(sim.autoMarkers(4,c(0.8,0.2), overdisp=TRUE, shape1=5), main="Shape1 = 5", cex.main=cmain) par(op) @ \caption{Histograms of the number of dominant markers simulated for 500 overdispersed markers from 200 autotetraploids. Data were generated from the Beta--Binomial distribution with a range of shape parameters. Overdispersion increases as \texttt{shape1} decreases.} \label{fig:overdisp1} \end{center} \end{figure} \section{Standard approaches for assessing marker dosage} \label{sec:tests} The most widely used test for assessing marker dosage is the standard $\chi^2$ test. Following \cite{mather51}, this test is often employed to compare the observed segregation ratio against its expected value. More recently, \cite{ripol99} proposed that the observed segregation proportion be compared to the appropriate Binomial confidence interval given the sample size and the expected segregation proportion. Both tests may be carried out by means of the function \texttt{test.segRatio}. Note that if the tests reveal that a marker may be more than one dosage then it is not allocated a marker dosage. \subsection{$\chi^2$ tests} \label{sec:chisq} The default method of assessing marker dosage in \texttt{test.segRatio} is the $\chi^2$ test. The function requires that the segregation proportions are given in the form of object of S3 class \texttt{segRatio}. These are automatically produced for simulated data created with functions \texttt{sim.autoMarkers} and \texttt{sim.autoCross} and may be calculated from observed marker data either manually or by applying \texttt{segregationRatios} to a matrix of observed marker data. For instance, to calculate $\chi^2$ tests and allocate dosage for an autooctoploid then <<>>= ## simulated data a <- sim.autoMarkers(ploidy = 8, c(0.7,0.2,0.09,0.01), n.markers=200, n.individuals=100) print(a) @ Note that \texttt{a} is an object of S3 class \texttt{simAutoMarkers} and that the segregation ratios may be obtained as the list component \texttt{seg.ratios}. Since \texttt{a} is simulated we can also extract the true dosage obtain the number of correctly classified markers. <<>>= ## summarise chi-squared test vs true ac <- test.segRatio(a$seg.ratios, ploidy=8, method="chi.squared") print(ac) print(addmargins(table(a$true.doses$dosage, ac$dosage, exclude=NULL))) @ Note that for segregation ratios near to one the $\chi^2$ approximation may not hold and so \texttt{R} will produce a warning. \subsection{Binomial confidence intervals} \label{sec:binomial} The Binomial confidence interval approach of \cite{ripol99} is obtained by setting the \texttt{method} parameter to ``\texttt{binomial}''. The $\alpha$ level may be set in either method by setting the parameter \texttt{alpha}. For instance, <<>>= ## summarise binomial CI vs true ab <- test.segRatio(a$seg.ratios, ploidy=8, method="bin", alpha=0.01) print(ab) print(addmargins(table(a$true.doses$dosage, ab$dosage, exclude=NULL))) @ \section{Utility functions} \label{sec:util} Several utility functions are included for use with real or simulated data. When marker data are stored in spreadsheets repetitive parts of marker names may be left blank or columns containing parts of names may need to be combined. To aid the process of constructing unique marker labels, \texttt{autoFill} automatically fills out blanks of a vector with the preceding label and \texttt{makeLabel} generates labels from two columns where blanks in first column are replaced by preceding non-blank label. <<>>= ## imaginary data frame representing ceq marker names read in from ## spreadsheet x <- data.frame( col1 = c("agc","","","","gct5","","ccc","",""), col2 = c(1,3,4,5,1,2,2,4,6)) print(x) print(makeLabel(x)) print(cbind(x,lab=makeLabel(x, sep="."))) @ \texttt{divide.autoMarkers} will split up a set of markers depending on the parental alleles. This is useful when extracting markers to be used in constructing a marker map for one parent say or in obtaining those markers present in both parents but segregating in the offspring. <<>>= p2 <- sim.autoCross(4, dose.proportion=list(p01=c(0.7,0.3),p10=c(0.7,0.3), p11=c(0.6,0.2,0.2))) print(p2, row=c(1:5)) ss <- divide.autoMarkers(p2$markers) print(ss, row=c(1:5)) @ \bibliographystyle{apalike} %%\bibliography{polyploid,qtl,Mybooks2-stat,thesis-eqtl,genomic,thesis}%,my_mcmc} % just include bibfile here \bibliography{polySeg} \subsection{Acknowledgments} \label{sec:acknowledgments} Karen Aitken, given her experience in tetraploids and sugarcane marker maps, has provided many valuable insights into marker dosage in autopolyploids. David Lovell, Andrew George and Phil Jackson provided useful comments and discussions. \end{document}