\name{featurePSSM} \alias{featurePSSM} \title{Feature Coding} \description{ A set of functions for extract features from biological sequences, and coding features by numeric vector. } \usage{ featurePSSM(seq, start.pos, stop.pos, psiblast.path, database.path) } \arguments{ \item{seq}{a string vector for the protein, DNA, or RNA sequences.} \item{start.pos}{a integer vector denoting the start position of the fragment window.} \item{stop.pos}{a integer vector denoting the stop position of the fragment window.} \item{psiblast.path}{a string for the path of blastpgp program. blastpgp will be employed to do PSI-BLAST and get Position-Specific Scoring Matrix.} \item{database.path}{a string for the path of a formated reference database. Database can be formated by "formatdb" program.} } \details{ \code{\link{featurePSSM}} returns a matrix with 20*N+N columns. Each row represented features of one sequence coding by a 20*N+N dimension numeric vector generated by PSI-BLAST. It contains two kinds of fatures: normalized position-specific score of PSSM (Position-Specific Scoring Matrix), Shannon entropy for each position of WOP (weighted observed percentages). Program PSI-BLAST and formatted NCBI non-redundant protein database are needed. } \author{Hong Li} \examples{ if(interactive()){ file = file.path(.path.package("BioSeqClass"), "example", "acetylation_K.fasta") tmp = readFASTA(file) proteinSeq = sapply(tmp,function(x){x[["seq"]]}) names(proteinSeq) = sapply(tmp,function(x){x[["desc"]]}) ## Need "blastpgp" program and a formated database. Database can be formated by "formatdb" program. PSSM1 = featurePSSM(proteinSeq[1:2], start.pos=rep(1,2), stop.pos=rep(10,2), psiblast.path="blastpgp", database.path="./result1.fasta") } }