\name{read.snps.pedfile} \alias{read.snps.pedfile} \title{Read genotype data from a LINKAGE "pedfile"} \description{ This function reads data arranged as a LINKAGE "pedfile" with some restrictions and returns a list of three objects: a data frame containing the initial 6 fields giving pedigree structure, sex and disease status, a vector or a data frame containing snp assignment and possibly other snp information, and an object of class \code{"snp.matrix"} or \code{"X.snp.matrix"} containing the genotype data } \usage{ read.snps.pedfile(file, snp.names=NULL, assign=NULL, missing=NULL, X=FALSE, sep=".", low.mem = FALSE) } \arguments{ \item{file}{The file name for the input pedfile} \item{snp.names}{A character vector giving the SNP names. If an accompanying map file or an info file is present, it will be read and the information used for the SNP names, and also the information merged with the result. If absent, the SNPs will be named numerically ("1", "2", ...)} \item{assign}{A list of named mappings for which letter maps to which Allele; planned for the future, not currently used } \item{missing}{Meant to be a single character giving the code recorded for alleles of missing genotypes ; not used in the current code} \item{X}{If \code{TRUE} the pedfile is assumed to describe loci on the X chromosome} \item{sep}{The character separating the family and member identifiers in the constructed row names; not used} \item{low.mem}{Switch over to input with a routine which requires less memory to run, but takes a little longer. This option also has the disadvantage that assignment of A/B genotype is somewhat non-deterministic and depends the listed order of samples.} } \details{ Input variables are assumed to take the usual codes, with the restriction that the family (or pedigree) identifiers will be held as strings, but identifiers for members within families must be coded as integers. Genotype should be coded as pairs of single character allele codes (which can be alphanumeric or numeric), from either 'A', 'C', 'G', 'T' or '1', '2', '3', '4', with 'N', '-' and '0' denoting a missing; everything else is considered invalid and would invalidate the whole snp; also more than 2 alleles also cause the snp to be marked invalid. Row names of the output objects are constructed by concatenation of the pedigree and member identifiers, "Family", "Individual" joined by ".", e.g. "Family.Adams.Individual.0". It has been called to the authors' attention that there are LINKAGE ped files out there of case-control-study origin which encode the entire collection as one big family. This is wrong because in a case-control study evey sample is supposed to be unrelated, and the PED file should be a few thousand families, each with 1 member, instead of 1 family with a few thousand members. To fix such faulty ped files, run this perl snipplet below, which joins the family field with the member field with an underscore as the new family field, and put "1" in the member field: \preformatted{ cat input.ped | perl -n -e \ '($a, $b, $c) = split /\s/,$_, 3; print $a, "_", $b, " 1 ", $c;' > output.ped } Also, note that the member field is limited to numeric. This is a documented limitation above. } \value{ \item{snps}{The output \code{"snp.matrix"} or \code{"X.snp.matrix"}} \item{subject.support}{A data frame containing the first six fields of the pedfile} } \author{Hin-Tak Leung} \seealso{\code{\link{snp.matrix-class}}, \code{\link{X.snp.matrix-class}}, \code{\link{read.snps.long}}, \code{\link{read.HapMap.data}}, \code{\link{read.pedfile.info}}, \code{\link{read.pedfile.map}}} \keyword{IO} \keyword{file}