\documentclass[a4paper]{article}
%\VignetteIndexEntry{An R package for vegetation data access, formatting and taxonomic unification}
%\VignetteDepends{vegan, labdsv, interp}
%\VignetteKeyword{Vegetation}
%\VignetteEngine{knitr::knitr}
\usepackage[utf8]{inputenc}
% \SweaveSyntax{SweaveSyntaxLatex}
\usepackage{textcomp}
% \DeclareUnicodeCharacter{00D7}{\texttimes}
%\usepackage[T1]{fontenc}
% \usepackage[warning]{selinput}
% \SelectInputMappings{
%    adieresis={ä},
%    germandbls={ß},
%    Euro={€},
% %   multiply={}
%    }
\usepackage[sort&compress]{natbib}
\bibliographystyle{unsrtnat}
\usepackage[english]{babel}
\usepackage{hyperref}
\usepackage{graphicx}
\usepackage{xcolor}
\usepackage[left=25mm,right=20mm,top=35mm,bottom=35mm]{geometry}
\usepackage{tikz}
\usetikzlibrary{shapes}

\author{Florian Jansen}
\title{Vegetation data access and taxonomic harmonization \\
version \Sexpr{packageDescription("vegdata", field="Version")}}

\begin{document}

<<prep, echo=FALSE, results='hide'>>=
library(knitr)
options(stringsAsFactors=FALSE)
opts_chunk$set(concordance = TRUE, comment = "", warning = FALSE, message = TRUE, echo = TRUE, results = 'tex', size="footnotesize")
tmp <- tempdir(check = T)
suppressPackageStartupMessages(library(vegdata))
options(tv_home = tmp)
dir.create(file.path(tmp, 'Species'))
dir.create(file.path(tmp, 'Popup'))
dir.create(file.path(tmp, 'Data'))
file.copy(from = file.path(path.package("vegdata"), 'tvdata', 'Popup'), to = tmp, recursive = TRUE)
file.copy(from = file.path(path.package("vegdata"), 'tvdata', 'Species'), to = tmp, recursive = TRUE)
file.copy(from = file.path(path.package("vegdata"), 'tvdata', 'Data'), to = tmp, recursive = TRUE)
# dim(tax('all'))
@
%Sys.setlocale("LC_TIME", "de_DE.iso8859")

\maketitle

\begin{abstract}

\noindent
An example session to show functionality and usage of R library \texttt{vegdata}. \\
After installation of  \texttt{vegdata} you can invoke this PDF with

vignette('vegdata')


\end{abstract}

\section{Preliminary notes}
    Some \texttt{vegdata} functions expect an installation, or more precisely the main directory structure, of the vegetation database program Turboveg for Windows (see \url{'http://www.synbiosys.alterra.nl/turboveg/'} and \cite{Hennekens2001}. If the package can not find a Turboveg installation it will use the directory within the package installation path. If you want to use function \texttt{taxval} for taxonomic harmonization you will need to have GermanSL or an equally structured reference list. If you do not specify any, the most recent version of GermanSL will be used and if it can not be found within the specified path, it will be downloaded from \url{https://germansl.infinitenature.org/GermanSL/latest/GermanSL.zip}.

    Turboveg uses dBase database format for storage. The package tries to deal with the limitations of that format but it is essential, that you use "Database -> Reindex" in Turboveg every time you delete something in your Turboveg database. Otherwise it will not be deleted immediately in the dBase file, instead it is only marked for deletion, i.e. it is still there when you access this file with R and will not be recognized as deleted until you reindex your Turboveg database.

\section{Provided functionality}

 \subsection{Taxonomic harmonisation}
    One of the most important steps in using vegetation data (from different sources) for statistical analysis is to take care about the taxonomic content of the names existing in the database. That is, to make sure, that exactly one (correct and valid) name defines one biological entity. Most researchers remember to convert synonyms to valid names but in many cases the care about e.g. monotypic subspecies or ambiguous taxonomic levels is lacking \citep{Jansen2010}.
    The package offers the function \texttt{taxval} with different options for the adjustment of synonyms, monotypic taxa, taxonomic levels, members of aggregates and undetermined species.

\subsection{Database access}
    vegdata (will) provide direct access to different vegetation database formats:
\begin{description}
 \item[Turboveg] is a desktop program, written in VisualBasic. It provides basic functions to enter, import, maintain and export vegetation data. From the 2 000 000 vegetation plots registered in \url{https://www.GIVD.info} approximately 1.5 million are stored in Turboveg databases format.
\item[vegetweb] is the German national vegetation database. vegetweb is accessible at \url{https://www.vegetweb.de}. Data can be selected, open access data can be downloaded directly, other data after clearance from the owners.
\item VegX is an international exchange standard. An R package with a S4 implementation of the standard is in development
 \end{description}

\subsection{Cover standardization}
    Turboveg provides different abundance codes and all kinds of user defined cover codes can easily be added. For vegetation analysis a unique species performance platform is needed which will in most cases be the percentage cover of the observed plot area. Therefore, for every abundance code class the mean cover percentage is defined in Turboveg.
    Since different scales can occur in a database and the storage format of the code table in Turboveg is somewhat strange, the function \texttt{tv.coverperc} provides automatic conversion for convenience.

  \subsection{Layer aggregation}
    The most frequently used sample unit in vegetation science is a plot based vegetation relev\'{e} \citep{Dengleretal2011}. A Braun-Blanquet relev\'{e} is a sample of names and coverage (abundance) of species in a specified area (usually between 1 and 1000 $m^2$) at a specific time. It contains (at least is intended to contain) a \emph{complete} list of photo-autotrophic plants (or a defined subset) in that plot.
    This information can be stored in a three-column list of relev\'{e} ID, Taxon ID and performance measure (e.g. cover code).

    Often additional information about the kind of occurrence is wanted. In Turboveg one additional column for the most widespread attribute is included by default: growth height classes. E.g. in a forest it is of interest, if a woody species reaches full height (tree layer) or occurs only as a small individual (herb layer).
    Other attributes like micro location (hummock or depression, rock or dead wood), development stage (juvenile or not, flowering status etc.) or the month of survey in a multi-seasonal survey could also be of interest and can be added in Turboveg.
    For analysis you may want to differentiate species with different species-plot attributes (e.g. growing in different layers). Function \texttt{tv.veg} provides possibilities for species-plot attribute handling.

  \subsection{Vegetation matrix}
    Turboveg stores relev\'{e}s as a dataframe of occurrences (s. below) but almost all functions and programs for vegetation analyses use plot-species cross-tables with a 0 value for non-occurrence = observed absence.
    Function \texttt{tv.veg} inflates the Turboveg list to matrix format with plots in rows and species in columns. Column names can be either species numbers, species letter-codes (default) or full names (with underscores instead of blanks to match the R naming conventions).

\section{Preparations}

The best way to introduce the functionalities of the package is a session with example code.

We load the library as usual into our R environment.
<<load, results='hide'>>=
library(vegdata)
@


Several functions of this package use the directory structure of Turboveg. The first time such a function is called, the internal function \texttt{tv.home} tries to find your Turboveg installation path. Depending on whether you have Turboveg installed on your computer or not, it will give you a message (and an invisible return) about the Turboveg installation path or the path to the Turboveg directory structure of package vegdata.
<<eval=FALSE>>=
tv_home <- tv.home()
@


If you want to change this, declare manually by setting option "tv\_home":
<<eval=FALSE>>=
options(tv_home="path_to_your_Turboveg_root_directory")
options(tv_home="/home/jansen/aGitRepos/vegdata/vegdata/inst/tvdata/")
@

\section{Service functions}
<<dblisting>>=
tv.db()
@
will give you a list of available Turboveg database names (directories within the Turboveg \textit{Data directory}).

<<>>=
tv.refl()
@

GermanSL is the default Taxonomic reference list in package \texttt{vegdata}.
However, whenever you use a Turboveg database name in a function, the Reference list will be read from the database configuration file "tvwin.set" if possible.
If you want to change the default reference list:
<<eval=FALSE>>=
tv.refl('your_preferred_list')
@
will change option \textit{tv.refl} which will be used whenever db or refl is not given.

Package vegdata contains several service functions to query the taxonomic information contained in the reference list.

<<tax, eval =TRUE>>=
tax('Quercus robur')
@

The GermanSL is not included in vegdata to keep the R package small. Instead the reference list will be automatically downloaded into the tv\_home directory (see \texttt{tv.home()}) or a temporary folder, if it is not installed but needed. If you want to use a different list, specify \texttt{refl=<Name of your list>} according to the directory name in the Turboveg directory \emph{Species}.
Function \texttt{tax} can use the given species name (with option strict=FALSE also name parts), or 7 letter abbreviation or the TaxonUsageID (called SPECIES\_NR in Turboveg) to look for all (partially) matching species names within the reference list.

In GermanSL versions 1.1 to 1.3 additional information for every taxon is stored in an extra file (tax.dbf) which can be used with option \textit{detailed = TRUE}.
Since version 1.4 all information is included in the normal Turboveg file \textit{species.dbf}.  \\
\textit{tax} will give you all matching names by default. If you set option \textit{strict=TRUE}, only the species with exact match to the given character string will be returned. \\
\textit{syn} will give you all taxon names within the swarm of synonyms. The valid name is marked in column SYNONYM with FALSE.

%\inputencoding{utf8}

<<syn>>=
tax('Elytrigia repens')$TaxonName
syn('Elytrigia repens')
@

The reference list contains information about the taxonomic hierarchy which can be used with \textit{child} or \textit{parent}.
<<childs, eval=FALSE>>=
child(27, quiet=TRUE)$TaxonName
parent(32)
parent(32, rank = 'FAM')
@
If you want to learn more about the taxonomic reference list \emph{GermanSL} for Germany, please look at \cite{Jansen2008}.

\subsection{Name manipulation}

We often need to adapt spelling differences and changes of species names not belonging to species concepts. Package vegdata contains several service functions to help with that.

\textbf{taxname.abbr} is for standardisation of names, mostly for rank spellings.

\textbf{taxname.simplify} can be used to compare two name vectors of species names regarding "taxonomic fuzziness". Taxon names will differ through time and opportunity. But not all parts of the names are equally affected. Case endings change much more frequently than the beginning of names, ii and y are exchangeable, etc. Those name parts and diacritic marks, double consonants, "th", and other frequent differences in writing style will be eliminated.

\textbf{parse.taxa}: parse genus and epitheta from name strings.

\textbf{taxname.removeAuthors} Remove name authors from full scientific name strings.


\section{Load data from Turboveg}

Care about the taxonomic content of the datasets is crucial for every analysis. Some of these steps can be automated with an appropriate taxonomic reference. For background and details see \citep{Jansen2010}.

<<db>>=
db <- 'taxatest'
@
Defines the vegetation database name according to the name of the Turboveg database directory name

<<meta, eval=FALSE>>=
tv.metadata(db)
@

Metainformation, i.e. information about the kind of available information should always be given for every database. Since Turboveg does not ask and provide such information, write a simple text file called metainfo.txt and save it within the database folder.
Turboveg does not provide any metadata handling. \\
Database \texttt{taxatest} is an artificial dataset to show functionalities and necessary steps for taxonomic harmonization.

Let's have a look at the Turboveg data structure.
<<obs>>=
getOption('tv_home')
obs <- tv.obs(db)
# Adding species names
species <- tax('all', refl=tv.refl(db=db))
obs$TaxonName <-  species$TaxonName[match(obs$TaxonUsageID, species$TaxonUsageID)]
head(obs[,c('PlotObservationID','TaxonUsageID','COVER_CODE','LAYER','TaxonName')])
@

This condensed format shows only presences of species observations. Every species observation is stored in one row and the membership to a specific vegetation plot is given in column \textit{RELEVE\_NR}.


\begin{figure}
\begin{tikzpicture}
%\draw[step=0.5cm,color=gray](-2,-2) grid (10,2);
\draw (3.5,1.5) node [anchor = north east] {Field survey \ \ \ \ \ \ } -- (3.5,0.7);
\draw (7,1.5) node [anchor = north east] {Database \ \ \ \ \ \ \ \ } -- (7,0.7);
\draw (8.5,1.2) node {Analyses};

\pgftext[right]{\includegraphics[width=10mm]{veronica}};
\path (1.75,0) node (field) [draw, text width=1.5cm, text centered] {\small Field names};
\path (5.5,0) node (db) [draw, text width=2cm, text centered] {\small Referenced taxa};
\path (9,0) node (export) [draw, text width = 2cm, text centered] {\small Appropriate taxa};

\draw [->, very thick] (0,0) -- (field);
\draw [->, very thick] (field) -- (db);
\draw [->, very thick] (db) -- (export);

\draw (0.5,0) [below, opacity=0] -- (0.5,0);
\draw (0.5,-1) [->, below] -- (0.5,-0.2);
\draw (3.5,-1) [->, below] -- (3.5,-0.2);
\draw (7,-1) [->, below] -- (7,-0.2);
\end{tikzpicture}

\begin{enumerate}

\item Field interpretation
\begin{itemize}
	\item document your source(s) of taxonomic interpretation (Flora)
	\item specify determination certainty
	\item collect herbarium specimen
\end{itemize}

\item Database entry
\begin{itemize}
	\item document field records / original literature
	\item reference as conservative as possible to a taxonomic reference list with all relevant taxa (synonyms, field aggregates, horticultural plants, ...)
	\item document your interpretations
\end{itemize}

\item Preparation for analyses
\begin{itemize}
  \item standardize taxon name parts (rank abbreviations, genus sex, y versus i etc.)
  \item convert synonyms
  \item summarize monotypic taxa
  \item clean up nested taxa
  \item clean up taxonomic ranks
  \item \dots
\end{itemize}

\paragraph{Three steps of taxonomic interpretation}
\begin{itemize}
  \item need of appropriate tools (software, reference lists)
  \item standards
  \item threefold attention
\end{itemize}
\end{enumerate}

\caption{Steps of taxonomic interpretation}
\end{figure}


\subsection{Taxonomic harmonisation - function taxval}


Prerequisits for taxonomic harmonisation in vegdata are
\begin{enumerate}
 \item a taxonomic tree, i.e. a taxon list with parent-child relationships, given in a column called \textbf{IsChildTaxonOfID}
 \item ordered taxonomic levels, given as a data.frame, see \texttt{taxlevels} with rank abbreviation and numerical values giving the lowest number for the highest level. The rank abbreviation must correlate to of column \textbf{TaxonRank} in the taxonomic reference list. Type \textit{taxlevels} to see the implemented table for GermanSL.
\end{enumerate}

We are using the taxonomic reference list GermanSL \citep{Jansen2008}, which contains not only information about synonymy of species names, but also a taxonomic hierarchy. This enables several semi-automatic enhancements of the taxonomic information stored in your vegetation database. If your database is not referenced to GermanSL (and can not be converted), you have to care for a column IsChildTaxonOfID and an appropriate taxlevels data.frame to use function \texttt{taxval}. \textbf{Otherwise please use option tax=FALSE in \texttt{tv.veg}) and do the taxonomic harmonization by hand} (e.g. with function \texttt{comb.species}).

Dataset \textit{taxatest} is used to evaluate package code and to show the functionality especially for the taxonomic harmonization.
We have Achillea species with a subspecies sudetica, a synonym subspecies collina, two observations on species level (A. atrata and millefolium) and even one on genus level.

<<data>>=
library(vegdata)
obs <- tv.obs('taxatest')
sort(tax(unique(obs$TaxonUsageID))$TaxonName)
@

The database contains 20 different names in the beginning.

\paragraph{Synonyms} First the number of species names which are synonyms are given. They are transferred to accepted taxon names, respectively numbers (see option syn='adapt'). If you want to preserve synonyms, choose option syn = 'conflict' or 'preserve'.

\paragraph{Monotypic species within the area}
Monotypic taxa are valid taxa which are the only child of their next higher taxonomic rank within the survey area. By default they will be converted by \texttt{taxval} to the higher rank. For instance \emph{Poa trivialis} is in Germany only represented by \emph{Poa trivialis subspecies trivialis}. Both taxa are valid, but for most analysis only one name for these identical entities must be used. By default a list of monotypic taxa within the GermanSL (whole Germany) is considered (see \texttt{tv.mono('GermanSL 1.5')}). The default is to set all monotypic species to the higher rank (because many monotypic subspecies can occur in vegetation databases). \\
If necessary, the procedure has to be repeated through the taxonomic

\paragraph{Critical Pseudonyms}

Taxon misapplication is maybe the greatest danger in using survey data. Known misapplications of names (.auct) are embedded within GermanSL. Please pay attention, if these might also be relevant for your dataset. \\
Completely independent from the questions of correct taxonomic naming of a specific specimen, the boundary of a taxon interpretation can differ much, see \citet{Jansen2010}. This should be adequately solved during data entry. Nevertheless the "Warning: Critical species" and "Warning: Potential pseudonyms" give you a last chance to rethink the correctness of your taxon assignments.


\paragraph{Coarsening to a specific taxonomic level}

For most analyses it will be necessary to analyse only one taxonomic level, e.g. not to confuse species numbers per plot, community means etc. For this you normally have to raise the lower observations to a coarser level. If you want the species level, you can specify ag='adapt' and rank='SPE'. All hierarchical levels below the species level (including the above specified monotypic subspecies) are set to species level in this case.

<<adapt>>=
obs.tax <- taxval(obs, db='taxatest', ag='adapt', rank='SPE', check.critical = FALSE, taxlevels = taxlevels, mono = 'pre')
sort(tax(unique(obs.tax$TaxonUsageID), db=db)$TaxonName)
@


The subspecies have been raised to the species level. However, there is still a conflict between the Achillea millefolium observations and the aggregate. The same for Hieracium pilosella and H. subg. Pilosella.  To solve these issues we can us ag='conflict':

<<conflict>>=
obs.tax <- taxval(obs, db='taxatest', ag='conflict', check.critical = FALSE)
sort(tax(unique(obs.tax$TaxonUsageID), db=db)$TaxonName)
@

The yarrow taxa are all subsumed to the highest occurring taxon in this taxon tree, i.e. a genus observation. For the two Armeria species the subspecies level was kept because there is no conflicting higher taxon in the dataset.

The result might be disappointing, e.g. think of a single coarse determination only on the family level in a huge dataset. This will lead to raising all species of that family to the family level. This would impede many analyses and you might want to choose a maximum taxonomic level to remain in the dataset. This can be done with option "maxtaxlevel".
If you use interactive=TRUE, you will be able to see the number of occurrences for the different levels in file "taxvalDecisionTable.csv" which might help to decide which way to go.

Combining "maxtaxlevel" and the two options of "ag" might come closest to what seems scientifically reasonable in many cases.

<<maxtax, eval=FALSE>>=
obs.tax <- taxval(obs, db='taxatest', ag='conflict', maxtaxlevel = 'AGG', check.critical = FALSE, interactive = TRUE)
obs.tax <- taxval(obs.tax, db='taxatest', ag='adapt', rank='SPE', check.critical = FALSE)
sort(tax(unique(obs.tax$TaxonUsageID), db=db)$TaxonName)
@


Check \texttt{?taxval} and \texttt{args(taxval)} for more options.


\section{Vegetation data}

At the moment there exists no widely accepted formal class for vegetation data in R. But most functions in \texttt{vegan}, \texttt{ade4} or other packages expect vegetation data to be stored in a matrix with species in columns and plots in rows. Therefore, we need to inflate the Turboveg format (where zero occurrences are missing) to such a matrix.

\texttt{tv.veg} is a wrapper for the above mentioned functions and produces a vegetation matrix with releves as rows and species as columns.
Additionally care about species-plot attribute differentiation and combination, and the handling of species codes is provided.

\subsection{XML}


\subsection{Performance measures}

At least in Europe most vegetation plots have information about the performance of a species within the survey area, often given in some kind of alphanumeric code for cover percentage within the survey plot. Different code systems are combined by using the mean cover percentage per cover code class. Function \texttt{tv.coverperc} will do this job according to the definitions in \emph{Turboveg/Popup/tvscale.dbf} and the entries in the header data column \texttt{COVERSCALE}.

<<coverperc, echo=2:4, eval=TRUE>>=
options(width=120)
obs <- tv.obs(db)
# obs <- tv.coverperc(db, obs)
tail(obs)
options(width=110)
@

A few simple possibilities for percentage cover transformations are directly included in the \texttt{tv.veg} code, e.g. to use only presence-absence information you can choose option \texttt{cover.transform = 'pa'}.


\subsection{Pseudospecies}
How to account for different vegetation layers or other kinds of species differentiation?

The next step is the separation of pseudo-species. "Pseudo-species" are all kind of taxa split according to species-plot information beyond the performance measure which will be used within the matrix. At this point you have to decide which information should be preserved and which should be aggregated. For instance layer separation must be defined at this step. The default is to differentiate tree, shrub and herb layers but to combine finer layer specifications within them. \\
If you have more than one occurrence of the same species in a plot, e.g. because tree species growing as young stands and adult specimens were differentiated according to growth height classes, you have to create either pseudo-species which differentiate the occurrences in the resulting vegetation matrix or to combine species occurrences from different layers. For the latter you can use different calculations e.g. to sum up all cover percentages of different layers \texttt{lc='sum'} or the maximum value (\texttt{lc='max'}), mean value (\texttt{lc='mean'}). If you assume an independent occurrence of a species in different vertical layers, you can do the calculations with option lc = 'layer' (the default). This results in a probability sum: A species covering 50\% in tree layer 1 and 50\% in herb layer will get a combined cover of 75\% because both layers will overlap 50\% (1 - 0.5*0.5).\\

If you want to specify pseudo-species by other species-plot differentiation you can define a combination dataframe. Two example dataframes are included in the package (\texttt{lc.0} and \texttt{lc.1}). Option comb has to be given as a list with first element naming the column name holding the grouping variable and as second element the name of the combination dataframe. Try \\
<<pseudo1, eval=FALSE>>=
data(lc.0)
obs <- tv.obs(db)
tv.veg(db, pseudo = list(lc.0, c("LAYER")), lc = "layer")
@

and check the column names:

<<lc0, warning=FALSE>>=
tmp <- tv.veg(db, tax=FALSE, pseudo = list(lc.0, "LAYER"), lc = "layer", quiet=TRUE)
names(tmp)
@

Separated by dots and layer numbers you can see the preserved layers. For meaning of layer numbers see Turboveg help.\\
Check \texttt{(data(lc.1))} for the default layer combination.

Beside layers you can use any kind of species-plot attributes to distinguish between occurrences, for instance in a multi-temporal survey.

<<Season>>=
comb <- list(data.frame(SEASON=0:4, COMB=c(0,'Spring','Summer','Autumn','Winter')),'SEASON')
names(tv.veg(db, tax=FALSE, pseudo=comb, quiet=TRUE))
@

<<layer, results='hide', warning=FALSE>>=
data(lc.1)
veg <- tv.veg(db, lc = "sum", pseudo = list(lc.1, 'LAYER'), dec = 1, check.critical = FALSE)
veg[,1:10]
@


\subsection{Combine species manually}

Beside semi-automatic taxon harmonization with function \texttt{taxval} there are two possibilities to change Taxonomy manually. If you decide to interpret a certain species name in your database different than stored in the standard view of the taxonomic reference you can replace species numbers within the observational dataframe and run \texttt{taxval} later on.

<<>>=
obs.tax$TaxonUsageID[obs.tax$TaxonUsageID == 27] <- 31
@

will replace all occurrences of \emph{Achillea millefolium agg.} with \emph{Achillea millefolium} which might be adequate for your survey and will prevent a too coarse taxon grouping. For a longer list of replacements you can use a dataframe.

<<replace>>=
taxon.repl <- data.frame(old=c(27), new=c(31))
obs.tax$TaxonUsageID <- replace(obs.tax$TaxonUsageID,
                                match(taxon.repl$old, obs.tax$TaxonUsageID), taxon.repl$new)
@

The second possibility is to use function \texttt{comb.species} on vegetation matrices.

<<comb.spec, eval=FALSE>>=
comb.species(veg, sel=c('QUERROB','QUERROB.Tree'))
@

will use the first name ('QUERROB') for the replacement column with the cover sums of the selected columns.


\section{Site conditions}

Vegetation data should come with information about date, place, plot size as well as overarching properties of the plant community and site conditions, plot measurements etc. \texttt{tv.site} will load the Turboveg site (header) data and does some basic checks necessary because of the Turboveg dBase format.

<<site.echo, eval=TRUE>>=
site <- tv.site(db)
@

The function is quite straightforward. After loading the file \emph{tvhabita.dbf} from the specified database folder, warnings are given for plots without specified relev\'{e} area or date and fields are checked if they are empty (a lot of predefined header fields in Turboveg are often unused) or contain probably mistakable 0 values in numerical fields, due to dBase deficiencies (dBase can not handle NA = not available values reliably).
It is stated in the output, if you have to check and possibly correct 0 values.


\section{Additional functions}

Use help(package='vegdata') for a complete list of available functions and data sets in vegdata.

\subsection{Frequency tables}

\texttt{syntab} produces a relative or absolute frequency table of a classified vegetation table with the possibility to filter according to threshold values. To exemplify the function we use the second dataset implemented in the package. It is the demonstration dataset from \cite{Leyer2007}, a selection of grassland relev\'{e}s from the floodplains of the river Elbe.

<<elbaue, results='hide'>>=
elbaue <- tv.veg('elbaue', check.critical = FALSE)
elbaue.env <- tv.site('elbaue')
@

<<cluster>>=
clust <- vector('integer', nrow(elbaue.env))
clust[elbaue.env$MGL < -50 & elbaue.env$SDGL < 50] <- 1		# dry sites, low deviation
clust[elbaue.env$MGL < -50 & elbaue.env$SDGL >= 50] <- 2	# dry sites, high deviation
clust[elbaue.env$MGL >= -50 & elbaue.env$SDGL >= 50] <- 3	# wet sites, high deviation
clust[elbaue.env$MGL >= -50 & elbaue.env$SDGL < 50] <- 4	# wet sites, low deviation
#levels(clust) <- c('dry.ld','dry.hd', 'wet.hd','wet.ld')
@

We can e.g. look at the relative frequency of all species with more than 40\% at least in one column, according to the height of the groundwater table (low or high) and the amplitude of the groundwater table fluctuations (high or low deviations from the mean).
Additionally you can use the affiliation of species to abiotic clusters with the help of package \texttt{indicspecies}, which calculates species indicator values for one or several cluster \citep{DeCaceres2010} to order the syntaxonical table. Together with Ellenberg indicator values with will get a comprehensive view into our data.

<<syntab.mupa>>=

synt <- syntab(elbaue, clust, mupa = TRUE)
@


\section{Vegetation analyses}

The package \emph{vegdata} serves mostly as a helper for the analysis of vegetation data. Several powerful R packages like \emph{vegan} and others exist, to provide a very broad range of possibilities.

\subsection{Multivariate Ordinations}

With the functions shown above we are now ready to do some example analyses in the wide area of vegetation analyses.

We can do, for instance, a ``Nonmetric Multidimensional Scaling with Stable Solution from Random Starts Axis Scaling and Species Scores''
which is a wrapper for Kruskal's Non-metric Multidimensional Scaling \citep{Cox19942001} from Jari Oksanen \citep{Oksanen2008}.

<<nmds, quiet=TRUE, results='hide'>>=
## Data analyses
if (requireNamespace('vegan', quietly = TRUE) ) {
  library(vegan)
veg.nmds <- metaMDS(elbaue, distance = "bray", trymax = 5, autotransform =FALSE,
                    noshare = 1, expand = TRUE, trace = 2)
# eco <- tv.traits()
F <- cwm(veg = elbaue, trait.db = 'ecodbase.dbf', ivname = 'OEK_F', method = 'mean')
N <- cwm(veg = elbaue, trait.db = 'ecodbase.dbf', ivname = 'OEK_N', method = 'mean')
env <- envfit(veg.nmds, env = data.frame(F, N))
} else
  message("package vegan not available")
@

To show the result in comparison with environmental measurements in a nice graphic we do some plotting magic.


\begin{figure}
\begin{center}
<<nmdsplot, quiet=TRUE, results='hide', warning=FALSE, eval=TRUE>>=
if (requireNamespace('interp', quietly = TRUE) & requireNamespace('labdsv', quietly = TRUE) & requireNamespace('vegan', quietly = TRUE) ) {
  suppressPackageStartupMessages(library(labdsv))
  library(interp)
color = function(x)rev(topo.colors(x))
nmds.plot <- function(ordi, site, var1, var2, disp, plottitle =  'NMDS', env = NULL, ...) {
lplot <- nrow(ordi$points);  lspc <- nrow(ordi$species)
filled.contour(interp(ordi$points[, 1], ordi$points[, 2], site[, var1], duplicate = 'strip'),
           ylim = c(-1, 1.1), xlim = c(-1.4, 1.4),
           color.palette = color, xlab = var1, ylab = var2, main = plottitle,
           key.title = title(main = var1, cex.main = 0.8, line = 1, xpd = NA),
           plot.axes = { axis(1);  axis(2)
             points(ordi$points[, 1], ordi$points[, 2], xlab = "", ylab = "", cex= .5, col = 2, pch = '+')
             points(ordi$species[, 1], ordi$species[, 2], xlab = "", ylab = "", cex=.2, pch = 19)
             ordisurf(ordi, site[, var2], col = 'black', choices = c(1, 2), add = TRUE)
             orditorp(ordi, display = disp, pch = " ")
             legend("topright", paste("GAM of ", var2), col = 'black', lty = 1)
             if(!is.null(env)) plot(env, col='red')
           }
           ,...)
}

nmds.plot(veg.nmds, elbaue.env, disp='species', var1="MGL", var2="SDGL", env=env, plottitle = 'Elbaue floodplain dataset')
} else {
  message("packages interp and/or labdsv not available")
}
@
\caption{Non-metric multidimensional scaling of the elbaue vegetation data with an overlay of mean groundwater table (colors) and standard deviation of groundwater level fluctuations (GAM lines). Arrows show direction of increasing mean Ellenberg F (soil water) resp. N (nutrient availability).}
\end{center}
\end{figure}

The first axis of our NMDS plot show the influence of mean groundwater level on the patterns of the dataset. \emph{Glyceria maxima} is marking the wet side of the gradient, whereas \emph{Cnidium dubium} \emph{Agrostis capillaris} or \emph{Galium verum agg,} occur only at low mean groundwater level. The second axis can be assigned to the fluctuation of water levels measured as standard deviation of mean groundwater level. Species indicating high water fluctuation are \emph{Agrostis stolonifera} or \emph{Alopecurus geniculatus} whereas \emph{Carex vesicaria} occurs only in more balanced situations.

\newpage
\bibliography{lib}

\end{document}