Type: | Package |
Title: | Diversity Dynamics using Fossil Sampling Data |
Version: | 0.8.3 |
Maintainer: | Adam T. Kocsis <adam.t.kocsis@gmail.com> |
Description: | Functions to describe sampling and diversity dynamics of fossil occurrence datasets (e.g. from the Paleobiology Database). The package includes methods to calculate range- and occurrence-based metrics of taxonomic richness, extinction and origination rates, along with traditional sampling measures. A powerful subsampling tool is also included that implements frequently used sampling standardization methods in a multiple bin-framework. The plotting of time series and the occurrence data can be simplified by the functions incorporated in the package, as well as other calculations, such as environmental affinities and extinction selectivity testing. Details can be found in: Kocsis, A.T.; Reddin, C.J.; Alroy, J. and Kiessling, W. (2019) <doi:10.1101/423780>. |
License: | CC BY 4.0 |
Date: | 2024-11-21 |
BugReports: | https://github.com/divDyn/r-package/issues |
Encoding: | UTF-8 |
LazyData: | false |
Depends: | R (≥ 3.5.0) |
Imports: | Rcpp, stats, graphics, grDevices, methods |
NeedsCompilation: | yes |
RoxygenNote: | 7.3.0 |
LinkingTo: | Rcpp |
VignetteBuilder: | knitr |
Suggests: | knitr, rmarkdown, vegan, icosa |
Packaged: | 2024-11-21 18:05:42 UTC; root |
Author: | Adam T. Kocsis |
Repository: | CRAN |
Date/Publication: | 2024-11-21 18:30:09 UTC |
Environmental affinities of taxa
Description
This function will return the preferred environment of the taxa, given the distribution of occurrences.
Usage
affinity(
x,
tax,
bin,
env,
coll = NULL,
method = "binom",
alpha = 1,
reldat = NULL,
na.rm = FALSE,
bycoll = FALSE,
output = "levels"
)
Arguments
x |
|
tax |
|
bin |
|
env |
|
coll |
|
method |
|
alpha |
|
reldat |
|
na.rm |
|
bycoll |
|
output |
|
Details
Sampling patterns have an overprinting effect on the frequency of taxon occurrences in different environments. The environmental affinity (Foote, 2006; Kiessling and Aberhan, 2007; Kiessling and Kocsis, 2015) expresses whether the taxa are more likely to occur in an environment, given the sampling patterns of the dataset at hand. The function returns the likely preferred environment for each taxon as a vector. NA
outputs indicate that the environmental affinity is equivocal based on the selected method.
The following methods are implemented:
'majority'
: Environmental affinity will be assigned based on the number of occurrences of the taxon in the different environments, without taking sampling of the entire dataset into account. If the taxon has more occurrences in environment 1, the function will return environment 1 as the preferred habitat.
'binom'
: The proportion of occurrences of a taxon in environment 1 and environment 2 will be compared to a null model, which is based on the distribution of all occurrences from the stratigraphic range of the taxon (in x
or if provided, in reldat
). Then a binomial test is run on with the numbers of the most likely preference (against all else). The alpha
value indicates the significance of the binomial tests, setting alpha
to 1
will effectively switch the testing off: if the ratio of occurrences for the taxon is different from the ratio observed in the dataset, an affinity will be assigned. This is the default method. If an environment is not sampled at all in the dataset to which the taxon's occurrences are compared to, the binomial method returns NA
for the taxon's affinity.
References
Foote, M. (2006). Substrate affinity and diversity dynamics of Paleozoic marine animals. Paleobiology, 32(3), 345-366.
Kiessling, W., & Aberhan, M. (2007). Environmental determinants of marine benthic biodiversity dynamics through Triassic–Jurassic time. Paleobiology, 33(3), 414-434.
Kiessling, W., & Kocsis, Á. T. (2015). Biodiversity dynamics and environmental occupancy of fossil azooxanthellate and zooxanthellate scleractinian corals. Paleobiology, 41(3), 402-414.
Value
If output="levels"
, a named vector, values corresponding to affinities.
Examples
data(corals)
# omit values where no occurrence environment entry is present, or where unknown
fossils<-subset(corals, stg!=95)
fossilEnv<-subset(fossils, bath!="uk")
# calculate affinities
aff<-affinity(fossilEnv, env="bath", tax="genus", bin="stg", alpha=1, coll="collection_no")
Sampling statistics and diversity indices in every bin
Description
This function will return the basic sampling summaries of a dataset
Usage
binstat(
x,
tax = "genus",
bin = "stg",
coll = NULL,
ref = NULL,
noNAStart = FALSE,
duplicates = NULL,
xexp = NULL,
indices = FALSE
)
Arguments
x |
|
tax |
|
bin |
|
coll |
|
ref |
|
noNAStart |
(logical) Useful when the dataset does not start from bin no. 1, but positive integer bin numbers are provided. Then |
duplicates |
|
xexp |
( |
indices |
( |
Details
Secondary function of the package that calculates a number of sampling related variables and diversity estimators for each bin.
In contrast to the (divDyn
) function, the bins are treated independently in this function.
The function also returns the maximum subsampling quota for OxW subsampling
(subtrialOXW
) with a given xexp
value.
By setting total
to FALSE
(default), the following results are output:
occs
: The number of occurrences in each time bin.
colls
: The number of collections in each time bin.
xQuota
: The maximum quota for OxW subsampling (subtrialOXW
) with the given xexp
value.
The number of occurrences in each collection is tabulated, and is raised to the power of xexp
.
The xQuota
value is the sum of these values across all collections in a time slice.
refs
: The number of references in each time bin.
SIBs
: The number of Sampled-In-Bin taxa in each time bin.
occ1
: The number of taxa in each time bin, that occur in only 1 collection.
ref1
: The number of taxa in each time bin, that occur in only 1 reference.
occ2
: The number of taxa in each time bin, that occur in exactly 2 collections.
ref2
: The number of taxa in each time bin, that occur in exactly 2 references.
u
: Good's u, coverage estimator based on the number of single-collection taxa (occ1).
uPrime
: Good's u, coverage estimator based on the number of single-reference taxa (ref1).
chao1occ
: Chao1 extrapolation estimator, based on the the number of single-collection and two-collection taxa (occ1).
chao1ref
: Chao1 extrapolation estimator, based on the the number of single-reference and two-reference taxa (occ2).
Value
A data.frame with rows corresponding to bin entries.
Examples
data(corals)
# slice-specific sampling
basic <- binstat(corals, tax="genus", bin="stg")
# subsampling diagnostic
subStats <- subsample(corals, method="cr", tax="genus", FUN=binstat,
bin="stg", q=100,noNAStart=FALSE)
# maximum quota with xexp
more <- binstat(corals, tax="genus", bin="stg", coll="collection_no", xexp=1.4)
Mapping multiple entries to categories
Description
This basic function replaces groups of values in a vector with single values with the help of a key object.
Usage
categorize(x, key, incbound = "lower")
Arguments
x |
|
key |
|
incbound |
|
Details
Online datasets usually contain overly detailed information, as enterers intend to conserve as much data in the entry process, as possible. However, in analyses some values are treated to represent the same, less-detailed information, which is then used in further procedures. The map
function allows users to do this type of multiple replacement using a specific object called a 'key'
.
A key
is an informal class and is essentially a list
of vectors
. In the case of character
vectors as x
, each vector element in the list
corresponds to a set of entries in x
. These will be replaced by the name of the vector
in the list
, to indicate their assumed identity.
In the case of numeric
x
vectors, if the list
elements of the key
are numeric
vectors with 2 values, then this vector will be treated as an interval. The same value will be assigned to the entries that are in this interval (Example 2). If x
contains values that form the boundary of an interval, than either only the one of the two boundary values can be considered to be in the interval (see the incbound
argument to set which of the two).
The elements of key
are looped through in sequence. If values of x
occur in multiple elements of key
, than the last one will be used (Example 3).
Examples of this data type have been included (keys
) to help process Paleobiology Database occurrences.
Value
A vector with replacements.
Examples
# Example 1
# x, as character
set.seed(1000)
toReplace <- sample(letters[1:6], 15, replace=TRUE)
# a and b should mean 'first', c and d 'second' others: NA
key<-list(first=c("a", "b"), second=c("c", "d"), default=NA)
# do the replacement
categorize(toReplace, key)
# Example 2 - numeric entries and mixed types
# basic vector to be grouped
toReplace2<-1:16
# replacement rules: 5,6,7,8,9 should be "more", 11 should be "eleven" the rest: "other"
key2<-list(default="other", more=c(5,10),eleven=11)
categorize(toReplace2, key2)
# Example 3 - multiple occurrences of same values
# a and b should mean first, a and should mean 'second' others: NA
key3<-list(first=c("a", "b"), second=c("a", "d"), default=NA)
# do the replacement (all "a" entries will be replaced with "second")
categorize(toReplace, key3)
Cleanse Species Name Vector
Description
This function will take a vector of binomial names with various qualifiers of open nomenclatures, and removes them form the vector entries. Only the the genus and species names will remain.
Usage
cleansp(
x,
debug = FALSE,
collapse = "_",
subgenera = TRUE,
misspells = TRUE,
stems = TRUE
)
Arguments
x |
|
debug |
|
collapse |
|
subgenera |
|
misspells |
|
stems |
|
Details
This version will keep subgenera, and will not assign species to the base genus. The following qualifiers will be omitted: "n.", "sp.", "?", "gen.", "aff.", "cf.", "ex gr.", "subgen.", "spp" and informal species designated with letters. Entries with "informal" and "indet." in them will also be invalidated.
Value
A data.frame or character vector.
Author(s)
Adam T. Kocsis, Gwenn Antell. Adam T. Kocsis wrote the main body of the function, subroutines called by the misspells
and stems
are the modified work of Gwen Antell.
Examples
examp <- c("Genus cf. species", "Genus spp.", "Family indet.",
"Mygenus yourspecies", "Okgenus ? questionsp",
"Genus (cf. Subgenus) aff. species")
cleansp(examp)
Fossil occurrences of scleractinian (stony) corals from the Paleobiology Database
Description
Example dataset to illustrate the package's basic functionalities.
Usage
data(corals)
Format
A data.frame
with 29775 observations and 38 variables:
genus
Genus names of the occurrences. Cross referenced with a compiled table, the simplified version of this can be found in the supplementary material of Kiessling and Kocsis (2015).
collection_no
The number of the collection of the occurrence in the PaleoDB.
family
Family name of the occurrence.
abund_value
Abundance value.
abund_unit
Unit of abundance values.
reference_no
The reference number of the occurrence in the PaleoDB.
life_habit
The lifestyle of the occurring taxon.
diet
The diet of the occurring taxon.
country
Country of occurrence.
geoplate
Plate id of the occurrence.
lat
Present day latitude of the occurrence.
lng
Present day longitude of the occurrence.
paleolat
Reconstructed paleolatitude of the occurrence.
paleolng
Reconstructed paleolongitude of the occurrence.
period
Period of origin.
epoch
Epoch of origin.
subepoch
Subepoch of origin.
stage
Geologic stage of the embedding rocks.
early_interval
Early interval name registered in the PaleoDB dynamic time scale.
late_interval
Late interval name registered in the PaleoDB dynamic time scale.
max_ma
Maximum estimated age based on the PaleoDB dynamic time scale.
min_ma
Minimum estimated age based on the PaleoDB dynamic time scale.
stg
Bin number in the stage-level timescale
stages
.ten
Bin number in the PaleoDB 10 million year resolution timescale
tens
.env
Environment of the occurrence: reefal
(r)
, non-reefal(nr)
or unknown (uk
), based onkeys
.lith
Substrate of the occurrence: carbonate
(c)
, siliciclastic(s)
or unknown (uk
), based onkeys
.latgroup
Latitude of the occurrence: tropical
(t)
or non-tropical(nt)
.bath
Inferred depth of the occurrence: deep
(deep)
, shallow(shal)
or unknown (uk
), based onkeys
.gensp
The binomen of the occurrence.
ecology
Symbiotic status of the occurring coral: zooxanthellate
(z)
or azooxanthellate(az
, including apozooxanthellates).ecologyMostZ
Symbiotic status of the occurring coral, incorporating the uncertainty of inferred symbiotic status. This variable includes assignment with the maximum likely number of zooxanthellate genera.
ecologyMostAZ
Symbiotic status of the occurring coral, incorporating the uncertainty of inferred symbiotic status. This variable includes assignment with the maximum likely number of azooxanthellate genera.
growth
Growth type of the coral:
colonial
orsolitary
.integration
Integration of corallites from the scale of 0 to 4.
solitary
corals are marked with 0s.
Details
This particular dataset was used in a study by Kiessling and Kocsis (2015). All occurrences of Scleractinia were downloaded from the Paleobiology Database (PaleoDB, https://paleobiodb.org/) on 23 September 2014, originally comprising 32420 occurrences. They were than cross-checked with data from Corallosphere (used be accessible at http://corallosphere.org
). See the article text for details.
References
Kiessling, W., & Aberhan, M. (2007). Environmental determinants of marine benthic biodiversity dynamics through Triassic–Jurassic time. Paleobiology, 33(3), 414-434.
Source
Time series from metrics of diversity dynamics
Description
This function calculates various metrics from occurrence datasets in the form of time series.
Usage
divDyn(
x,
tax,
bin = NULL,
age = NULL,
revtime = FALSE,
breaks = NULL,
coll = NULL,
ref = NULL,
om = NULL,
noNAStart = FALSE,
data.frame = TRUE,
filterNA = FALSE
)
Arguments
x |
|
tax |
|
bin |
|
age |
|
revtime |
|
breaks |
|
coll |
|
ref |
|
om |
|
noNAStart |
(logical) Useful when the entries in the |
data.frame |
|
filterNA |
|
Details
The following variables are produced:
bin
: Bin number, or the numeric identifier of the bin.
tThrough
: Number of through-ranging taxa, taxa that have first occurrences before, and last occurrences after the focal bin.
tOri
: Number of originating taxa, taxa that have first occurrences in the focal bin, and last occurrences after it.
tExt
: Number of taxa getting extinct. These are taxa that have first occurrences before the focal bin, and last occurrences in it.
tSing
: Number of stratigraphic singleton (single-interval) taxa, taxa that only occur in the focal bin.
t2d
: Number of lower two timers (Alroy, 2008; 2014), taxa that are present in the i-1th and the ith bin (focal bin).
t2u
: Number of upper two timers (Alroy, 2008; 2014), taxa that are present in the ith (focal) and the i+1th bin. (Alroy, 2008; 2014)
tGFu
: Number of upper gap-fillers (Alroy, 2014), taxa that occurr in bin i+2 and i-1, but were not found in i+1. (Alroy, 2014)
tGFd
: Number of lower gap-fillers (Alroy, 2014), taxa that occurr in bin i-2 and i+1, but were not found in i-1. (Alroy, 2014)
t3
: Number of three timer taxa (Alroy, 2008; 2014), present in bin i-1, i, and i+1. (Alroy, 2008; 2014)
tPart
: Part timer taxa (Alroy, 2008; 2014), present in bin i-1,and i+1, but not in bin i.
extProp
: Proportional extinctions including single-interval taxa: (tExt + tSing) / (tThrough + tOri + tExt + tSing).
oriProp
: Proportional originations including single-interval taxa: (tOri + tSing) / (tThrough + tOri + tExt + tSing).
extPC
: Per capita extinction rates of Foote (1999). -log(tThrough/(tExt + tThrough)). Values are not normalized with bin lengths. Similar equations were used by Alroy (1996) but without taking the logarithm.
oriPC
: Per capita origination rates of Foote (1999). -log(tThrough/(tOri + tThrough)). Values are not normalized with bin lengths. Similar equations were used by Alroy (1996) but without taking the logarithm.
ext3t
: Three-timer extinction rates of Alroy (2008). log(t2d/t3).
ori3t
: Three-timer origination rates of Alroy (2008). log(t2u/t3).
extC3t
: Corrected three-timer extinction rates of Alroy (2008). ext3t[i] + log(samp3t[i+1]).
oriC3t
: Corrected three-timer origination rates of Alroy (2008). ori3t[i] + log(samp3t[i-1]).
divSIB
: Sampled-in-bin diversity (richness), the number of genera sampled in the focal bin.
divCSIB
: Corrected sampled-in-bin diversity (richness). divSIB/samp3t*totSamp3t, where totSamp3t is total three-timer sampling completeness of the dataset (Alroy, 2008).
divBC
: Boundary-crosser diversity (richness), the number of taxa with ranges crossing the boundaries of the interval. tExt + tOri + tThrough.
divRT
: Range-through diversity (richness), all taxa in the interval, based on the range-through assumption. (tSing + tOri + tExt + tThrough).
sampRange
: Range-based sampling probability, without observed range end-points (Foote), (divSIB - tExt - tOri- t-Sing)/tThrough
samp3t
: Three-timer sampling completeness of Alroy (2008). t3/(t3+tPart)
extGF
: Gap-filler extinction rates of Alroy(2014). log((t2d + tPart)/(t3+tPart+tGFu))
oriGF
: Gap-filler origination rates of Alroy(2014). log((t2u + tPart)/(t3+tPart+tGFd))
E2f3
: Second-for-third extinction propotions of Alroy (2015). As these metrics are based on an algorithmic approach, for the equations please refer to the Alroy (2015, p. 634, right column and Eq. 4)). See source code (https://github.com/divDyn/r-package) for the exact implementation, found in the Metrics
function in the diversityDynamics.R file.
O2f3
: Second-for-third origination propotions of Alroy (2015). Please see E2f3
.
ext2f3
: Second-for-third extinction rates (based on Alroy, 2015). Transformed to the usual rate form with log(1/(1-E2f3)).
ori2f3
: Second-for-third origination rates (based on Alroy, 2015). Transformed to the usual rate form with log(1/(1-O2f3)).
References:
Foote, M. (1999) Morphological Diversity In The Evolutionary Radiation Of Paleozoic and Post-Paleozoic Crinoids. Paleobiology 25, 1–115. doi:10.1017/S0094837300020236.
Alroy, J. (2008) Dynamics of origination and extinction in the marine fossil record. Proceedings of the National Academy of Science 105, 11536-11542. doi: 10.1073/pnas.0802597105
Alroy, J. (2014) Accurate and precise estimates of origination and extinction rates. Paleobiology 40, 374-397. doi: 10.1666/13036
Alroy, J. (2015) A more precise speciation and extinction rate estimator. Paleobiology 41, 633-639. doi: 10.1017/pab.2015.26
Value
A data.frame object, with every row corresponding to a time bin.
Examples
# import data
data(corals)
data(stages)
# calculate metrics of diversity dynamics
dd <- divDyn(corals, tax="genus", bin="stg")
# plotting
tsplot(stages, shading="series", boxes="sys", xlim=c(260,0),
ylab="range-through diversity (genera)", ylim=c(0,230))
lines(stages$mid, dd$divRT, lwd=2)
# with omission of single reference taxa
ddNoSing <- divDyn(corals, tax="genus", bin="stg", om="ref", ref="reference_no")
lines(stages$mid, ddNoSing$divRT, lwd=2, col="red")
# using the estimated ages (less robust) - 10 million years
# mean ages
corals$me_ma <- apply(corals[, c("max_ma", "min_ma")], 1, mean)
# ages reverse the direction of time! set ages to TRUE in this case
ddRadio10 <- divDyn(corals, tax="genus", age="me_ma",
breaks=seq(250,0,-10))
lines(ddRadio10$me_ma, ddRadio10$divRT, lwd=2, col="green")
# legend
legend("topleft", legend=c("all", "no single-ref. taxa", "all, estimated ages"),
col=c("black", "red", "green"), lwd=c(2,2,2), bg="white")
FAD - LAD matrix from occurrence data
Description
Function to generate range data from an occurrence dataset.
Usage
fadlad(
x,
tax,
bin = NULL,
age = NULL,
revtime = FALSE,
na.rm = TRUE,
diffbin = TRUE
)
Arguments
x |
|
tax |
|
bin |
|
age |
|
revtime |
|
na.rm |
|
diffbin |
|
Details
The function will output First and Last Appearance Dates of the taxa in the dataset. Keep in mind that incomplete sampling will influence these data and will make the ranges appear shrunken.
The following variables are produced:
row.names
attribute: The names of the taxa.
FAD
: First appearance dates in time bin nmbers or ages.
LAD
: Last appearance dates in time bin numbers or ages.
duration
: The durations of taxa in bin numbers or ages.
Value
A data.frame, with rows corresponding to tax
entries.
Examples
data(corals)
# binned data
flBinned <- fadlad(corals, tax="genus", bin="stg")
# using basic bin lengths
flDual <- fadlad(corals, tax="genus", age=c("max_ma", "min_ma"))
# single age esimate
data(stages)
corals$mid <- stages$mid[corals$stg]
flSingle <- fadlad(corals, tax="genus", age="mid")
Filling of missing values in a vector, based on the marginal values of the gaps
Description
The function will loop through a vector and will substitute NA
values with the value it last encountered or replaced.
Usage
fill(x, forward = TRUE, inc = 0)
Arguments
x |
|
forward |
|
inc |
|
Details
NA
s won't be substituted when they are the first values the loop encounters.
Value
A logical vector.
Examples
# forward, replace with previous
dummy<- c(TRUE, FALSE, NA, TRUE, FALSE, NA)
fill(dummy)
# forward, replace with previous+1
dummy2 <- c(1,NA, 3, 1, 2, NA, NA, 9, NA,3)
fill(dummy2, inc=1)
# backward, replace with previous in loop direction
fill(dummy2, inc=0, forward=FALSE)
Estimation of geographic ranges from occurrence data
Description
Geographic range as a function of a set of coordinates or sample/site/cell membeships.
Usage
georange(x, lng = NULL, lat = NULL, loc = NULL, method = "co")
Arguments
x |
|
lng |
( |
lat |
( |
loc |
( |
method |
( |
Details
Multiple estimators of geographic ranges are implemented based on coordinates or cell identifiers. The function outputs a vector of the results based on the calculation methods specified in methods
.
Value
A numeric vector with geographic ranges (multiple methods).
Examples
data(corals)
# select a taxon from a certain time slice
bitax <- corals[corals$stg==69 & corals$genus=="Microsolena",]
georange(bitax, lng="paleolng", lat="paleolat", method="co")
Scalar indices of diversity
Description
This function includes some indices that characterize a species-abundance/occurrence distribution.
Usage
indices(x, samp = NULL, method = NULL)
Arguments
x |
either a |
samp |
( |
method |
( |
Details
This set is not complete and does not intend to supercede additional R packages (e.g. vegan). However, some metrics are presented here as they are not
implemented elsewhere or because they are invoked more frequently. The following entries can be added to the method
argument of the function, which are
also named accordingly in the output table/vector.
"richness"
: The number of sampled species.
"shannon"
: The Shannon entropy.
dom
: The Berger-Parker dominance index, the proportion of occurrences in the time bin that belong to the most frequent taxon.
"hill2"
: The second order Hill number (Jost, 2006; q=2), which will be calculated by default. You can specify additional Hill numbers with adding "hillXX"
to the method
argument, such as "hill3"
for (q=3). The first Hill number is defined as the exponentiad version of Shannon entropy (Eq. 3 in Jost, 2006).
"squares"
: The 'squares' richness estimator of J. Alroy (2018).
"chao2"
: The Chao2 estimator for incidence-based data.
"SCOR"
: The Sum Common Species Occurrence rate of Hannisdal et al. (2012). This method will only be calculated if the occurrence entries (vector)
a collection vector is provided (see examples).
Value
A named numeric vector.
References
Alroy, J. 2018. Limits to species richness in terrestrial communities. Ecology Letters.
Hannisdal, B., Henderiks, J., & Liow, L. H. (2012). Long-term evolutionary and ecological responses of calcifying phytoplankton to changes in atmospheric CO2. Global Change Biology, 18(12), 3504–3516. https://doi.org/10.1111/gcb.12007
Jost, L. (2006). Entropy and diversity. Oikos, 113, 363–375. https://doi.org/10.1111/j.2006.0030-1299.14714.x
Examples
# the coral data
data(corals)
# Pleistocene subset
plei <- corals[corals$stg==94,]
# calculate everything
pleiIndex<-indices(plei$genus, plei$coll)
Keys to process stratigraphic, environmental and lithological information from the Paleobiology Database
Description
Lists of entries treated as indicators of similar characteristics
Usage
data(keys)
Format
A list
of 7 list
s:
tenInt
A
list
ofvector
s. Entries in theearly_interval
andlate_interval
variables of PaleoDB downloads indicate the collections' positions in the dynamic time scale. These entries were linked to 10 million year-resolution time scale stored intens
. These links were compiled using a download from the FossilWorks website (used to behttp://www.fossilworks.org/
), on 08 June, 2018. You can check the lookup tablestratkeys
here. This is version 0.9.2stgInt
A
list
ofvector
s. Entries in theearly_interval
andlate_interval
variables of PaleoDB downloads indicate the collections' positions in the dynamic time scale. These entries were linked to stage-resolution time scale stored instages
. SeebinInt
for version information.
These entries are reliable only in the Post-Ordovician!
reefs
A
list
ofvector
s. Entries in theenvironment
field of the PaleoDB download indicate information regarding the likely reefal origin of carbonatic rocks. See the vignette ('§PhaneroCurve') on the exact use of these data. v0.9.lith
A
list
ofvector
s. Entries in thelithology1
field of the PaleoDB download indicate information regarding the substrate of the embedding rocks. This key maps the entries tosiliciclastic
,"carbonate"
or"unknown"
substrates. v0.9.lat
A
list
ofvector
s. Entries in thepaleolat
field of the PaleoDB download indicate information regarding paleolatitude of the occurrences. This key maps the entries to"tropical"
or"non-tropical"
latitudes. v0.9.grain
A
list
ofvector
s. Entries in thelithology1
field of the PaleoDB download indicate information regarding the grain sizes of the depositional environment. This key maps the entries to"coarse"
,"fine"
or"unknown"
grain sizes. v0.9.depenv
A
list
ofvector
s. Entries in theenvironment
field of the PaleoDB download indicate information regarding the onshore-offshore nature of the depositional environment. This key maps the entries to"onshore"
,"offshore"
or"unknown"
environment. v0.9.3
Details
Entries in the stratigraphic, lithological and environment fields of current Paleobiology Database downloads are too numerous to form the basis of analyses without transformations.
This variable includes potential groupings of entries that represent similar characteristics. These objects can be used by the categorize
function to create new variables of stratigraphic, environmental and lithological information.
Source
Stratigraphic assignments are based on the download of collection data from Fossilworks (used to be http://www.fossilworks.org/
) and the dynamic time scale of the Paleobiology Database, written by J. Alroy. The assignment of numeric values were done by A. Kocsis. Environmental variables were grouped by W. Kiessling.
Match the dates of a time-dependent variable with a predefined vector
Description
The function takes a variable x
(e.g. a vector or a list object), and reorders it to best match the dates provided in a vector y
.
Usage
matchtime(x, y, ...)
## S4 method for signature 'numeric'
matchtime(x, y, index = FALSE, ...)
## S4 method for signature 'character'
matchtime(x, y, index = FALSE, ...)
## S4 method for signature 'list'
matchtime(x, y, index = FALSE, ...)
Arguments
x |
Object to be reordered to match |
y |
( |
... |
Additional arguments passed to class-specific methods. |
index |
( |
Value
An object of the class as x
or a numeric
vector.
Examples
# original vector
orig <- 1:10
# target values
targ <- c(5.1,4.2, 3.4, 2.7, 2.3)
# how do the two series match the best?
matchtime(orig, targ)
Origination/extinction response table for statistical modelling.
Description
This function takes an occurrence dataset and reformats it to a table that can be used as input for logistic models.
Usage
modeltab(
x,
tax,
bin,
taxvars = NULL,
rt = FALSE,
singletons = FALSE,
probs = NULL
)
Arguments
x |
|
tax |
|
bin |
|
taxvars |
|
rt |
|
singletons |
|
probs |
|
Details
Every entry in the output table corresponds to one cell in the bin
/tax
matrix. This function omits duplicates and concatenates two logical
vectors (response variables) to the occurrence dataset:
The ori
vector is TRUE
in the interval when the taxon first appeared, and FALSE
in all others. The ext
vector is TRUE
in the interval the taxon appeared for the last time, and FALSE
in the rest.
The true date of extinction and origination is unknown, therefore these events can only be expressed as probabilities. The argument probs
allows the replacement of a binary response with two probability values, which are based on the apparent sampling patterns. For extinctions, when probs
is set to "samp3t"
, the response parameter for extinctions in the last bin of appearance is set to the three-timer sampling compelteness of the following bin. Assuming that the taxon'as range offset is not larger than a whole bin, if the taxon did not go extinct in the bin in which it appeared the last time, it is assumed to be going extinct in the following bin, and the remainder (1 - sampling completeness) is assigned to that bin. The pattern is reversed for originations. For probs="sampRange"
, the range-based completeness measures are applied in a similar fashion. For Phanerozoic-scale analyses, a whole bin difference between apparent event and the actual event is reasonable. See more in Reddin et al. 2021. Note that the response probabilities are set to missing values (NA
s) when the probabilities cannot be calculated. The variable ext
is also set to NaN
for the early virtual extension of the range, and ori
is treated the same for the late-extension.
References:
Reddin, C. J., Kocsis, Á. T., Aberhan, M., & Kiessling, W. (2021). Victims of ancient hyperthermal events herald the fates of marine clades and traits under global warming. Global Change Biology, 27(4), 868–878. https://doi.org/10.1111/gcb.15434
Value
A data.frame with binary response variables.
Examples
# load necessary data
data(corals)
# simple table
modTab<-modeltab(corals, bin="stg", tax="genus", taxvars=c("ecology", "family"))
# probabilities for extinction modeling
modTab2 <- modeltab(corals, bin="stg", tax="genus", probs="samp3t")
# only extinction response (omit virtual origination extensions)
extTab <- modTab2[!is.nan(modTab2$ext), ]
# only extinction response (omit virtual extinction extensions)
oriTab <- modTab2[!is.nan(modTab2$ori), ]
Omission of taxa that have a poor occurrence record
Description
Function to quickly omit single-collection and single-reference taxa.
Usage
omit(
x,
om = "ref",
tax = "genus",
bin = "bin",
coll = NULL,
ref = NULL,
filterNA = FALSE
)
Arguments
x |
|
om |
|
tax |
|
bin |
|
coll |
|
ref |
|
filterNA |
|
Details
The function returns a logical
vector, with a value for each row. TRUE
values indicate rows to be omitted, FALSE
values indicate rows to be kept. The function is embedded in the divDyn
function, but can be called independently.
Value
A logical vector.
Examples
# omit single-reference taxa
data(corals)
data(stages)
toOmit <- omit(corals, bin="stg", tax="genus", om="ref", ref="reference_no")
x <- corals[!toOmit,]
# within divDyn
# plotting
tsplot(stages, shading="series", boxes="sys", xlim=c(260,0),
ylab="range-through diversity (genera)", ylim=c(0,230))
# multiple ref/slice required
ddNoSing <- divDyn(corals, tax="genus", bin="stg", om="binref", ref="reference_no")
lines(stages$mid, ddNoSing$divRT, lwd=2, col="red")
# with the recent included (NA reference value)
ddNoSingRec <- divDyn(corals, tax="genus", bin="stg",
om="binref", filterNA=TRUE,ref="reference_no")
lines(stages$mid, ddNoSingRec$divRT, lwd=2, col="blue")
# legend
legend("topleft", legend=c("no single-ref. taxa",
"no single-ref. taxa,\n with recent"),
col=c("red", "blue"), lwd=c(2,2))
Plot time series counts or proportions as polygons
Description
This function plots the changing shares of categories in association with an independent variable.
Usage
parts(
x,
b = NULL,
ord = "up",
prop = FALSE,
plot = TRUE,
col = NULL,
xlim = NULL,
border = NULL,
ylim = c(0, 1),
na.valid = FALSE,
labs = TRUE,
labs.args = NULL,
vertical = FALSE
)
Arguments
x |
|
b |
( |
ord |
|
prop |
|
plot |
|
col |
|
xlim |
|
border |
|
ylim |
|
na.valid |
|
labs |
|
labs.args |
|
vertical |
|
Details
This function is useful for displaying the changing proportions of a category as time progresses. Check out the examples for the most frequent implementations.
To be added: missing portions are omitted in this version, but should be represented as gaps in the polygons.
Value
The function has no return value.
Examples
# dummy examples
# independent variable
slc<-c(rep(1, 5), rep(2,7), rep(3,6))
# the categories as they change
v1<-c("a", "a", "b", "c", "c") # 1
v2<-c("a", "b", "b", "b", "c", "d", "d") # 2
v3<-c("a", "a", "a", "c", "c", "d") #3
va<-c(v1, v2,v3)
# basic function
plot(NULL, NULL, ylim=c(0,1), xlim=c(0.5, 3.5))
parts(slc, va, prop=TRUE)
# vertical plot
plot(NULL, NULL, xlim=c(0,1), ylim=c(0.5, 3.5))
parts(slc, va, col=c("red" ,"blue", "green", "orange"), xlim=c(0.5,3.5),
labs=TRUE, prop=TRUE, vertical=TRUE)
# intensive argumentation
plot(NULL, NULL, ylim=c(0,10), xlim=c(0.5, 3.5))
parts(slc, va, ord=c("b", "c", "d", "a"), col=c("red" ,"blue", "green", "orange"),
xlim=c(0.5,3.5), labs=TRUE, prop=FALSE,
labs.args=list(cex=1.3, col=c("black", "orange", "red", "blue")))
# just the values
parts(slc, va, prop=TRUE,plot=FALSE)
# real example
# the proportion of coral occurrences through time in terms of bathymetry
data(corals)
data(stages)
# time scale plot
tsplot(stages, shading="series", boxes="sys", xlim=c(250,0),
ylab="proportion of occurrences", ylim=c(0,1))
# plot of proportions
cols <- c("#55555588","#88888888", "#BBBBBB88")
types <- c("uk", "shal", "deep")
parts(x=stages$mid[corals$stg], b=corals$bath,
ord=types, col=cols, prop=TRUE,border=NA, labs=FALSE)
# legend
legend("left", inset=c(0.1,0), legend=c("unknown", "shallow", "deep"), fill=cols,
bg="white", cex=1.4)
Plotting ranges and occurrence distributions through time
Description
Visualization of occurrence data
Usage
ranges(
dat,
bin = NULL,
tax = NULL,
xlim = NULL,
ylim = c(0, 1),
total = "",
filt = "include",
occs = FALSE,
labs = FALSE,
decreasing = TRUE,
group = NULL,
gap = 0,
labels.args = NULL,
ranges.args = NULL,
occs.args = NULL,
total.args = NULL
)
Arguments
dat |
|
bin |
( |
tax |
( |
xlim |
( |
ylim |
( |
total |
( |
filt |
( |
occs |
( |
labs |
( |
decreasing |
( |
group |
( |
gap |
( |
labels.args |
( |
ranges.args |
( |
occs.args |
( |
total.args |
( |
Details
This function will draw a visual representation of the occurrence dataset. The interpolated ranges will be drawn, as well as the occurrence points.
Value
The function has no return value.
Examples
# import
data(stages)
data(corals)
# all ranges - using the age uncertainties of the occurrences
tsplot(stages, boxes="sys", xlim=c(250,0))
ranges(corals, bin=c("max_ma", "min_ma"), tax="genus", occs=FALSE)
# or use single estimates: assign age esimates to the occurrences
corals$est<-stages$mid[corals$stg]
# all ranges (including the recent!!)
tsplot(stages, boxes="sys", xlim=c(250,0))
ranges(corals, bin="est", tax="genus", occs=FALSE)
# closing on the Cretaceous, with occurrences
tsplot(stages, boxes="series", xlim=c(145,65), shading="short")
ranges(corals, bin="est", tax="genus", occs=TRUE, ranges.args=list(lwd=0.1))
# z and az separately
tsplot(stages, boxes="series", xlim=c(145,65), shading="short")
ranges(corals, bin="est", tax="genus", occs=FALSE, group="ecology",
ranges.args=list(lwd=0.1))
# same, show only taxa that originate within the interval
tsplot(stages, boxes="series", xlim=c(105,60), shading="short")
ranges(corals, bin="est", tax="genus", occs=TRUE, group="ecology", filt="orig" ,
labs=TRUE, labels.args=list(cex=0.5))
# same using the age uncertainties of the occurrence age estimates
tsplot(stages, boxes="series", xlim=c(105,60), shading="short")
ranges(corals, bin=c("max_ma", "min_ma"), tax="genus", occs=TRUE, group="ecology", filt="orig" ,
labs=TRUE, labels.args=list(cex=0.5))
# fully customized/ annotated
tsplot(stages, boxes="series", xlim=c(105,60), shading="short")
ranges(
corals, # dataset
bin="est", # bin column
tax="genus", # taxon column
occs=TRUE, # occurrence points will be plotted
group="growth", # separate ranges based on growth types
filt="orig" , # show only taxa that originate in the interval
ranges.args=list(
lwd=1, # set range width to 1
col=c("darkgreen", "darkred") # set color of the ranges (by groups)
),
total.args=list(
cex=2, # set the size of the group identifier lablels
col=c("darkgreen", "darkred") # set the color of the group identifier labels
),
occs.args=list(
col=c("darkgreen", "darkred"),
pch=3
),
labs=TRUE, # taxon labels will be plotted
labels.args=list(
cex=0.4, # the sizes of the taxon labels
col=c("darkgreen", "darkred") # set the color of the taxon labels by group
)
)
Test of rate split (selectivity)
Description
This function will determine whether there are meaningful differences between the taxonomic rates in the individual time bins of two subsets of an occurrence database.
Usage
ratesplit(
x,
sel,
tax = "genus",
bin = "stg",
rate = "pc",
method = "AIC",
AICc = TRUE,
na.rm = TRUE,
alpha = NULL,
output = "simple"
)
Arguments
x |
|
sel |
|
tax |
|
bin |
|
rate |
|
method |
|
AICc |
|
na.rm |
|
alpha |
|
output |
|
Details
Splitting an occurrence database to its subsets secreases the amount of information passed to the rate calculations and therefore the precision of the individual estimates. Therefore, our ability to tell apart two similar values decreases with the number of sampled taxa. In order to assess the subsets individually and compare them, it is advised to test whether the split into two subsets is meaningful, given the total data. Examples of this use can be found in Kiessling and Simpson (2011) and Kiessling and Kocsis (2015).
The meaningfulness of the split is dependent on the estimate accurracy and the magnitude of the difference. Two different methods are implemented: binom
and combine
.
References
Foote, M. (1999) Morphological Diversity In The Evolutionary Radiation Of Paleozoic and Post-Paleozoic Crinoids. Paleobiology 25, 1–115. doi:10.1017/S0094837300020236.
Kiessling, W., & Simpson, C. (2011). On the potential for ocean acidification to be a general cause of ancient reef crises. Global Change Biology, 17(1), 56-67.
Kiessling, W., & Kocsis, A. T. (2015). Biodiversity dynamics and environmental occupancy of fossil azooxanthellate and zooxanthellate scleractinian corals. Paleobiology, 41(3), 402-414.
Value
A list of two numeric vectors.
Examples
# example with the coral dataset of Kiessling and Kocsis (2015)
data(corals)
data(stages)
# split by ecology
z<-corals[corals$ecology=="z",]
az<-corals[corals$ecology=="az",]
# calculate diversity dynamics
ddZ<-divDyn(z, tax="genus", bin="stg")
ddAZ<-divDyn(az, tax="genus", bin="stg")
# origination rate plot
tsplot(stages, boxes="sys", shading="series", xlim=54:95,
ylab="raw per capita originations")
lines(stages$mid, ddZ$oriPC, lwd=2, lty=1, col="blue")
lines(stages$mid, ddAZ$oriPC, lwd=2, lty=2, col="red")
legend("topright", inset=c(0.1,0.1), legend=c("z", "az"),
lwd=2, lty=c(1,2), col=c("blue", "red"), bg="white")
# The ratesplit function
rs<-ratesplit(rbind(z, az), sel="ecology", tax="genus", bin="stg")
rs
# display selectivity with points
# select the higher rates
selIntervals<-cbind(ddZ$oriPC[rs$ori], ddAZ$oriPC[rs$ori])
groupSelector<-apply(selIntervals, 1, function(w) w[1]<w[2])
# draw the points
points(stages$mid[rs$ori[groupSelector]], ddAZ$oriPC[rs$ori[groupSelector]],
pch=16, col="red", cex=2)
points(stages$mid[rs$ori[!groupSelector]], ddZ$oriPC[rs$ori[!groupSelector]],
pch=16, col="blue", cex=2)
Replicate matching and merging
Description
This pseudo-generic function iterates a function on the subelements of a list of objects that have the same class and matching dimensions/names and reorganizes the result to match the structure of the replicates or a prototype template.
Usage
repmatch(x, FUN = NULL, proto = NULL, direct = c("dim", "name"), ...)
Arguments
x |
( |
FUN |
( |
proto |
( |
direct |
( |
... |
arguments passed to |
Details
The function is designed to unify/merge objects that result from the same function applied to different source data (e.g. the results of subsample()
). In its current form, the function supports vectors
(including one-dimensional tables
and arrays
), matrix
and data.frame
objects.
Value
If FUN
is a function
, the output is vector
for vector
-like replicates, matrix
when x
is a list
of matrix
objects, and data.frame
s for data.frame
replicates. In case FUN=NULL
: if x
is a list of vectors
, the function will return a matrix
; an array
is returned, if x
is a list
of matrix
class obejcts; if x
is a list of data.frame
objects, the function returns a data.frame
.
Examples
# basic example
vect <- rnorm(100)
# make 50 replicates
repl <- rep(list(vect), 50)
repmatch(repl, FUN=mean, direct="dim")
# named input
# two vectors
# a
a<- 1:10
names(a) <- letters[1:length(a)]
a[c(3,5,8)] <- NA
a <- a[!is.na(a)]
#b
b<- 10:1
names(b) <- letters[length(b):1]
b[c(1, 3,6, length(b))]<- NA
b <- b[!is.na(b)]
# list
x2 <- rep(c(list(a),list(b)), 3)
# simple match - falling through "dim" to "name" directive
repmatch(x2, FUN=NULL)
# prototyped
prot <- 1:10
names(prot) <-letters[1:10]
repmatch(x2, FUN=mean, proto=prot, na.rm=TRUE)
Determination and omission of consecutive duplicates in a vector.
Description
seqduplicated()
The function determines which elements of a vector are duplicates (similarly to duplicated
) in consecutive rows.
collapse()
Omits duplicates similarly to unique
, but only in consecutive rows, so the sequence of state changes remains, but without duplicates.
Usage
seqduplicated(x, na.rm = FALSE, na.breaks = TRUE)
collapse(x, na.rm = FALSE, na.breaks = TRUE)
Arguments
x |
( |
na.rm |
( |
na.breaks |
( |
Details
These functions are essentially about checking whether a value in a vector at index is the same as the value at the previous index. This seamingly primitive task had to be rewritten with Rcpp for speed and the appropriate handling of NA
values.
Value
A logical vector.
Examples
# example vector
examp <- c(4,3,3,3,2,2,1,NA,3,3,1,NA,NA,5, NA, 5)
# seqduplicated()
seqduplicated(examp)
# contrast with
duplicated(examp)
# with NA removal
seqduplicated(examp, na.rm=TRUE)
# the same with collapse()
collapse(examp)
# contrast with
unique(examp)
# with NA removal
collapse(examp, na.rm=TRUE)
# with NA removal, no breaking
collapse(examp, na.rm=TRUE, na.breaks=FALSE)
Quantile plot of time series
Description
This intermediate-level function will plot a time series with the quantiles shown with transparency values.
Usage
shades(
x,
y,
col = "black",
res = 10,
border = NA,
interpolate = FALSE,
method = "symmetric",
na.rm = FALSE
)
Arguments
x |
|
y |
|
col |
|
res |
|
border |
|
interpolate |
|
method |
|
na.rm |
|
Value
The function has no return value.
Examples
# some random values accross the Phanerozoic
data(stages)
tsplot(stages, boxes="sys", shading="series", ylim=c(-5,5), ylab=c("normal distributions"))
randVar <- t(sapply(1:95, FUN=function(x){rnorm(150, 0,1)}))
shades(stages$mid, randVar, col="blue", res=10,method="symmetric")
# a bottom-bounded distribution (log normal)
tsplot(stages, boxes="sys", shading="series", ylim=c(0,30), ylab="log-normal distributions")
randVar <- t(sapply(1:95, FUN=function(x){rlnorm(150, 0,1)}))
shades(stages$mid, randVar, col="blue", res=c(0,0.33, 0.66, 1),method="decrease")
List of singleton taxa
Description
The function returns lists of taxa that occurr with only one particular entry in a given variable.
Usage
singletons(
dat,
tax = "clgen",
var = NULL,
bin = NULL,
bybin = FALSE,
na.rm = TRUE
)
Arguments
dat |
( |
tax |
( |
var |
( |
bin |
( |
bybin |
( |
na.rm |
( |
Details
Singletons are defined in number of ways in the literature. True singletons are species that are represented by only one specimen, but one can talk about single-occurrence, single-interval, single-reference or single collection taxa as well. These can be returned with this function.
As the time bin has particular importance, it is possible to filter singleton taxa in the context of a single bin. These can be returned with the bybin
argument, that constrains and iterates the filtering to every bin.
If this argument is set to TRUE
and the variable in question is a references, than single-reference taxa will be taxa that occurred in only one reference within each bin - it does not necessarily mean that only one reference describes the taxon in the total database!
Value
A vector of character entries in tax
.
Examples
# load example dataset
data(corals)
# Example 1. single-occurrence taxa
singOcc <- singletons(corals, tax="genus", bin="stg")
# Example 2. output for every bin
singOccBin <- singletons(corals, tax="genus", bin="stg", bybin=TRUE)
# Example 3. single-interval taxa (all)
singInt <- singletons(corals, tax="genus", var="stg")
# Example 4. single interval taxa (for every bin)
singIntBin <- singletons(corals, tax="genus", var="stg", bin="stg", bybin=TRUE)
# Example 5. single reference taxa (total dataset)
singRef <- singletons(corals, tax="genus", var="reference_no")
# Example 6. single reference taxa (see description for differences )
singRefBin <- singletons(corals, tax="genus", var="reference_no", bin="stg", bybin=TRUE)
Discretization of continuous time dimension - slicing
Description
The function will slices time with a given set of boundaries and produce a time scale object if desired.
Usage
slice(x, breaks, offset = 0, ts = TRUE, revtime = TRUE)
Arguments
x |
( |
breaks |
( |
offset |
( |
ts |
( |
revtime |
( |
Details
Due to stratigraphic constraints, we can only process deep time data, when it is sliced to discrete bins. It is suggested that you do this separately for most of your analyses. This function is also used by the divDyn
function when age
entries are provided.
Value
Either of new entries and levels or time scale.
Examples
y<- runif(200, 0,100)
au <- slice(y, breaks=seq(0, 100, 10))
withOut <- slice(y, breaks=seq(0, 100, 10), ts=FALSE)
95 bin Phanerozoic time scale based on the stratigraphic stages of Gradstein et al. 2020.
Description
Stage-level (age-level) timescale used in some analyses.
Usage
data(stages)
Format
A data.frame
with 95 observations and 10 variables:
sys
Abbreviations of geologic systems.
system
Geologic periods.
series
Geologic series.
stage
Names of geologic stages.
short
Abbreviations of geologic stages.
bottom
Numeric ages of the bottoms boundaries (earliest ages) of the bins.
mid
Numeric age midpoints of the bins, the averages of
bottom
andtop
.top
Numeric ages of the tops (latest ages) of the bins.
dur
Numeric ages of the durations for the bins.
stg
Integer number identifiers of the bins.
systemCol
Hexadecimal color code of the systems.
seriesCol
Hexadecimal color code of the series.
col
Hexadecimal color code of the stages.
Details
This is an example time scale object that can be used in the Phanerozoic-scale analyses. Example occurrence datasets related to the package use the variable stg
when referring to this timescale. This version uses the longer Rhaetian option.
References
Gradstein, F. M., Ogg, J. G., & Schmitz, M. D. (2020). The geologic time scale 2020. Elsevier.
Source
Based on Gradstein et al. (2020).
95 bin Phanerozoic time scale based on the stratigraphic stages of Ogg et al. (2016) with updated dates in some intervals (2018).
Description
Stage-level (age-level) timescale used in some analyses.
Usage
data(stages2018)
Format
A data.frame
with 95 observations and 10 variables:
sys
Abbreviations of geologic systems.
system
Geologic periods.
series
Geologic series.
stage
Names of geologic stages.
short
Abbreviations of geologic stages.
bottom
Numeric ages of the bottoms boundaries (earliest ages) of the bins.
mid
Numeric age midpoints of the bins, the averages of
bottom
andtop
.top
Numeric ages of the tops (latest ages) of the bins.
dur
Numeric ages of the durations for the bins.
stg
Integer number identifiers of the bins.
systemCol
Hexadecimal color code of the systems.
seriesCol
Hexadecimal color code of the series.
col
Hexadecimal color code of the stages.
Details
This is an example time scale object that can be used in the Phanerozoic-scale analyses. Example occurrence datasets related to the package use the variable stg
when referring to this timescale.
This is the stages
object used until divDyn version 0.8.1.
References
Ogg, J. G., G. Ogg, and F. M. Gradstein. 2016. A concise geologic time scale: 2016. Elsevier.
Source
Based on Ogg et al. (2016), compiled by Wolfgang Kiessling.
The FossilWorks-based lookup table for the stratigraphic assignments of collections in the Paleobiology Database
Description
Table including the user-chosen interval data and the stratigraphic units of the dynamic timescale.
Usage
data(stratkeys)
Format
A data.frame
with 761 observations of 8 variables:
interval
The names of the registered intervals in the
early_interval
/max_interval
andlate_interval
/min_interval
columns.period
The period containing the interval.
epoch
The epoch containing the interval.
X10_my_bin
The 10 million year time scale interval containing the interval.
ten
Numeric identifier of the 10 million year interval in the
tens
object.stage
The stage containing the interval.
stg
Numeric identifier of the interval in the stage-level time scale provided as
stages
object.
Details
Since the separation of the FossilWorks (used to be http://www.fossilworks.org/
) portal from the Paleobiology Database (https://paleobiodb.org/) the access to the stratigraphic information in the database have been problematic. This table includes groupings of
early_interval
/max_interval
entries of the dynamic timescale that users can choose during collection entry. The table assigns these intervals to some corresponding stratigraphic units from different time scales.
These entries were distilled from those collections that only have a max_interval
value. As there is a mismatch between the data Paleobiology Database and FossilWorks this list is not comprehensive and a couple entries are probably missing. For this reason, this dataset is expected to be updated in the future.
This particular version (v0.9.2) is based on a download of all collections in FossilWorks between the Ediacaran and the Holocene. The download took place on 22 June, 2018. The entries were transformed to keys
to be used with the categorize
function. Some entries were corrected manually.
Source
Used to be http://www.fossilworks.org/
.
Utility functions for slicing gappy time series
Description
The function returns where the continuous streaks start and how long they are, which can be used for efficient and flexible subsetting.
Usage
streaklog(x)
whichmaxstreak(x, which = -1)
Arguments
x |
( |
which |
|
Details
The output list of streaklog
contains the following elements:
starts
: the indices where the streaks start.
streaks
: the lengths of the individual streaks (number of values).
runs
: the number of streaks.
The function whichmaxstreak() will return the indices of those values that are in the longest continuous streak.
Value
A list (streaklog) or a numeric vector (whichmaxstreak).
Examples
# generate a sequence of values
b<-40:1
# add some gaps
b[c(1:4, 15, 19, 23:27)] <- NA
# the functions
streaklog(b)
whichmaxstreak(b)
Subsampling wrapper function
Description
The function will take a function that has an occurrence dataset as an argument, and reruns it iteratively on the subsets of the dataset.
Usage
subsample(
x,
q,
tax = NULL,
bin = NULL,
FUN = divDyn,
coll = NULL,
iter = 50,
type = "cr",
keep = NULL,
rem = NULL,
duplicates = TRUE,
output = "arit",
useFailed = FALSE,
FUN.args = NULL,
na.rm = FALSE,
counter = TRUE,
...
)
Arguments
x |
( |
q |
( |
tax |
( |
bin |
( |
FUN |
( |
coll |
( |
iter |
( |
type |
( |
keep |
( |
rem |
( |
duplicates |
( |
output |
( |
useFailed |
( |
FUN.args |
( |
na.rm |
( |
counter |
( |
... |
arguments passed to |
Details
The subsample
function implements the iterative framework of the sampling standardization procedure.
The function 1. takes the dataset x
, 2. runs function FUN
on the dataset and creates a container for results of trials
3. runs one of the subsampling trial functions (e.g. subtrialCR
) to get a subsampled 'trial dataset'
4. runs FUN
on the trial dataset and
5. averages the results of the trials for a simple output of step 4. such as vector
s, matrices
and data.frames
. For averaging, the vectors
and matrices
have to have the same output dimensions in the subsampling, as in the original object. For data.frames
, the bin-specific information have to be in rows and the bin
numbers have to be given in a variable bin
in the output of FUN
.
For a detailed treatment on what the function does, please see the vignette ('Handout to the R package 'divDyn' v0.5.0 for diversity dynamics from fossil occurrence data'). Currently the Classical Rarefaction ("cr"
, Raup, 1975), the occurrence weighted by-list subsampling ("oxw"
, Alroy et al., 2001) and the Shareholder Quorum Subsampling methods are implemented ("sqs"
, Alroy, 2010).
References:
Alroy, J., Marshall, C. R., Bambach, R. K., Bezusko, K., Foote, M., Fürsich, F. T., … Webber, A. (2001). Effects of sampling standardization on estimates of Phanerozoic marine diversification. Proceedings of the National Academy of Science, 98(11), 6261-6266.
Alroy, J. (2010). The Shifting Balance of Diversity Among Major Marine Animal Groups. Science, 329, 1191-1194. https://doi.org/10.1126/science.1189910
Raup, D. M. (1975). Taxonomic Diversity Estimation Using Rarefaction. Paleobiology, 1, 333-342. https: //doi.org/10.2307/2400135
Value
Either a list of replicates or an object matching the class of FUN
.
Examples
data(corals)
data(stages)
# Example 1-calculate metrics of diversity dynamics
dd <- divDyn(corals, tax="genus", bin="stg")
rarefDD<-subsample(corals,iter=30, q=50,
tax="genus", bin="stg", output="dist", keep=95)
# plotting
tsplot(stages, shading="series", boxes="sys", xlim=c(260,0),
ylab="range-through diversity (genera)", ylim=c(0,230))
lines(stages$mid, dd$divRT, lwd=2)
shades(stages$mid, rarefDD$divRT, col="blue")
legend("topleft", legend=c("raw","rarefaction"),
col=c("black", "blue"), lwd=c(2,2), bg="white")
# Example 2-SIB diversity
# draft a simple function to calculate SIB diversity
sib<-function(x, bin, tax){
calc<-tapply(INDEX=x[,bin], X=x[,tax], function(y){
length(levels(factor(y)))
})
return(calc[as.character(stages$stg)])
}
sibDiv<-sib(corals, bin="stg", tax="genus")
# calculate it with subsampling
rarefSIB<-subsample(corals,iter=25, q=50,
tax="genus", bin="stg", output="arit", keep=95, FUN=sib)
rarefDD<-subsample(corals,iter=25, q=50,
tax="genus", bin="stg", output="arit", keep=95)
# plot
tsplot(stages, shading="series", boxes="sys", xlim=c(260,0),
ylab="SIB diversity (genera)", ylim=c(0,230))
lines(stages$mid, rarefDD$divSIB, lwd=2, col="black")
lines(stages$mid, rarefSIB, lwd=2, col="blue")
# Example 3 - different subsampling types with default function (divDyn)
# compare different subsampling types
# classical rarefaction
cr<-subsample(corals,iter=25, q=20,tax="genus", bin="stg", output="dist", keep=95)
# by-list subsampling (unweighted) - 3 collections
UW<-subsample(corals,iter=25, q=3,tax="genus", bin="stg", coll="collection_no",
output="dist", keep=95, type="oxw", xexp=0)
# occurrence weighted by list subsampling
OW<-subsample(corals,iter=25, q=20,tax="genus", bin="stg", coll="collection_no",
output="dist", keep=95, type="oxw", xexp=1)
SQS<-subsample(corals,iter=25, q=0.4,tax="genus", bin="stg", output="dist", keep=95, type="sqs")
# plot
tsplot(stages, shading="series", boxes="sys", xlim=c(260,0),
ylab="range-through diversity (genera)", ylim=c(0,100))
shades(stages$mid, cr$divRT, col="red")
shades(stages$mid, UW$divRT, col="blue")
shades(stages$mid, OW$divRT, col="green")
shades(stages$mid, SQS$divRT, col="cyan")
legend("topleft", bg="white", legend=c("CR (20)", "UW (3)", "OW (20)", "SQS (0.4)"),
col=c("red", "blue", "green", "cyan"), lty=c(1,1,1,1), lwd=c(2,2,2,2))
Subsampling trial functions
Description
These functions create one subsampling trial dataset with a desired subsampling method
Usage
subtrialCR(
x,
q,
bin = NULL,
unit = NULL,
keep = NULL,
useFailed = FALSE,
showFailed = FALSE
)
subtrialOXW(
x,
q,
bin = NULL,
coll = NULL,
xexp = 1,
keep = NULL,
useFailed = FALSE,
showFailed = FALSE
)
subtrialSQS(
x,
tax,
q,
bin = NULL,
coll = NULL,
ref = NULL,
singleton = "occ",
excludeDominant = FALSE,
largestColl = FALSE,
fcorr = "good",
byList = FALSE,
keep = NULL,
useFailed = FALSE,
showFailed = FALSE,
appr = "under"
)
Arguments
x |
( |
q |
( |
bin |
( |
unit |
( |
keep |
( |
useFailed |
( |
showFailed |
( |
coll |
( |
xexp |
( |
tax |
( |
ref |
( |
singleton |
|
excludeDominant |
|
largestColl |
|
fcorr |
|
byList |
( |
appr |
( |
Details
The essence of these functions are present within the subsampling wrapper function subsample
. Each function implements a certain subsampling type.
The return value of the funcfions by default is a logical
vector indicating which rows of the original dataset should be present in the subsample.
The inexact method for SQS is implemented here as it is computationally less demanding.
References:
Alroy, J., Marshall, C. R., Bambach, R. K., Bezusko, K., Foote, M., Fürsich, F. T., … Webber, A. (2001). Effects of sampling standardization on estimates of Phanerozoic marine diversification. Proceedings of the National Academy of Science, 98(11), 6261-6266.
Alroy, J. (2010). The Shifting Balance of Diversity Among Major Marine Animal Groups. Science, 329, 1191-1194. https://doi.org/10.1126/science.1189910
Raup, D. M. (1975). Taxonomic Diversity Estimation Using Rarefaction. Paleobiology, 1, 333-342. https: //doi.org/10.2307/2400135
Value
A logical vector.
Examples
#one classical rarefaction trial
data(corals)
# return 5 references for each stage
bRows<-subtrialCR(corals, bin="stg", unit="reference_no", q=5)
# control
unCor<-unique(corals[bRows,c("stg", "reference_no")])
table(unCor$stg)
Occurrence database summary
Description
The function calculates global statistics of the entire database
Usage
sumstat(
x,
tax = "genus",
bin = "stg",
coll = NULL,
ref = NULL,
duplicates = NULL
)
Arguments
x |
|
tax |
|
bin |
|
coll |
|
ref |
|
duplicates |
|
Details
The function returns the following values.
bins
: The total number of bins sampled.
occs
: The total number of sampled occurrences.
colls
: The total number of sampled collections.
refs
: The total number of sampled references.
taxa
: The total number of sampled taxa.
gappiness
: The proportion of sampling gaps in the ranges of the taxa (without the range-endpoints).
Value
A named numeric vector.
Examples
data(corals)
sumstat(corals, tax="genus", bin="stg", coll="collection_no", ref="reference_no")
Proportions of survivorship
Description
This function will calculate both forward and backward survivorship proportions from a given occurrence dataset or FAD-LAD matrix.
Usage
survivors(
x,
tax = "genus",
bin = "stg",
method = "forward",
noNAStart = FALSE,
fl = NULL
)
Arguments
x |
|
tax |
|
bin |
|
method |
|
noNAStart |
|
fl |
|
Details
Proportions of survivorship are great tools to visualize changes in the composition of a group over time (Raup, 1978). The curves show how a once coexisting set of taxa, called a cohort, loses its participants (forward survivorship) as time progress, or gains its elements as time is analyzed backwards. Each value corresponds to a cohort in a bin (a) and one other bin (b). The value expresses what proportion of the analyzed cohort (present together in bin a) is present in bin b.
References:
Raup, D. M. (1978). Cohort analysis of generic survivorship. Paleobiology, 4(1), 1-15.
Value
A numeric matrix of survivorship probabilities.
Examples
data(corals)
surv<-survivors(corals, tax="genus", bin="stg", method="forward")
# plot
data(stages)
tsplot(stages, shading="series", boxes="sys", xlim=c(260,0),
ylab="proportion of survivors present", ylim=c(0.01,1),plot.args=list(log="y"))
for(i in 1:ncol(surv)) lines(stages$mid, surv[,i])
Apply function to TAxon/BIN subset of occurrences and iterATE
Description
The function takes another function and reruns it on every taxon- and/or bin-specific subsets of an occurrence dataset.
Usage
tabinate(x, bin = NULL, tax = NULL, FUN = NULL, ...)
Arguments
x |
|
bin |
|
tax |
|
FUN |
( |
... |
arguments passed to |
Details
The main tabinate
function acts as a wrapper for any type of function that requires a subset of the occurrence dataset that represents either one bin
or one tax
entry or both.
For example, the iterator can be used to calculate geographic ranges from occurrence coordinates (georange
).
The output structure of FUN should be independent from the input subset, or the function will return an error.
Setting both bin
If bin=NULL
and tax=NULL
, will run FUN
on the entire dataset (no effect). Providing either bin
or tax
and keeping the other NULL
will iterate FUN
for every bin
or tax
entry (whichever is presented).
The function returns a vector of values if the return value of FUN
is a single value. In case it is a vector, the final output will be a matrix.
When both bin
and tax
is presented, the function output will be a matrix (one output value for a taxon/bin subset) or an array (3d, when FUN
returns a vector). Setting FUN
to NULL
will return the occurrence dataset as list
s.
Value
The return object depends on the output of FUN
, as well as the bin
and tax
input.
Examples
data(corals)
# the number of different coordinate pairs in every time slice
tabinate(corals, bin="stg", FUN=georange, lat="paleolat",
lng="paleolng", method="co")
# geographic range (site occupancy) of every taxon in every bin
tabinate(corals, bin="stg", tax="genus", FUN=georange,
lat="paleolat", lng="paleolng", method="co")
The 10 million year resolution timescale of the Paleobiology Database
Description
Roughly 10 million year timescale used in some analyses.
Usage
data(tens)
Format
A data.frame
with 49 observations and 9 variables:
- X10
The name of the bin: Period and number.
- Ocean
The primary state of the oceans from the point of carbonate precipitation.
ar
indicates aragonitic,cc
indicates calcitic conditions.- Climate
Primary climatic characteristic:
w
denotes warm,c
denotes cold.bottom
Numeric ages of the bottom boundaries (earliest ages) of the bins.
mid
Numeric ages midpoints of the bins, the averages of
bottom
andtop
.top
Numeric ages of the tops (latest ages) of the bins.
dur
Numeric ages of the durations of the bins.
ten
Integer number identifiers of the bins. §correct to num!
Details
This is an example time scale object that can be used in the Phanerozoic scale analyses. This time scale comprises 49 bins, roughly 10 million years of durations that result from the combination of certain standard stages.
Source
Executive committee meeting (2015) of old Paleobiology Database. Additional variables were added by Wolfgang Kiessling.
Function to plot a series a values with bars that have variable widths
Description
Function to use bars for time series.
Usage
tsbars(x, y, width = "max", yref = 0, gap = 0, vertical = TRUE, ...)
Arguments
x |
|
y |
|
width |
|
yref |
|
gap |
|
vertical |
|
... |
Arguments passed to |
Details
People often present time series with connected points, although the visual depiction implies a certain process that describes how the values change between the points.
Instead of using simple scatter plots, Barplots can be used to describe series where a single value is the most descriptive of a discreet time bin. The tsbars()
function
draws rectangles of different widths with the rect
function, to plot series in such a way.
Value
The function has no return value.
Examples
# an occurrence-based example
# needed data
data(stages)
data(corals)
# calculate diversites
dd <-divDyn(corals, tax="genus", bin="stg")
# plot range-through diversities
tsplot(stages, xlim=51:94, ylim=c(0,250), boxes="sys")
tsbars(x=stages$mid, y=dd$divRT, width=stages$dur, gap=1, col=stages$col)
Time series plotting using a custom time scale
Description
This function allows the user to quickly plot a time scale data table
Usage
tsplot(
tsdat,
ylim = c(0, 1),
xlim = NULL,
prop = 0.05,
gap = 0,
bottom = "bottom",
top = "top",
xlab = "Age (Ma)",
ylab = "",
boxes = NULL,
boxes.col = NULL,
shading = NULL,
shading.col = c("white", "gray80"),
plot.args = NULL,
boxes.args = NULL,
labels = TRUE,
labels.args = NULL,
lplab = TRUE,
rplab = TRUE
)
Arguments
tsdat |
|
ylim |
|
xlim |
|
prop |
|
gap |
|
bottom |
|
top |
|
xlab |
|
ylab |
|
boxes |
|
boxes.col |
|
shading |
|
shading.col |
|
plot.args |
|
boxes.args |
|
labels |
|
labels.args |
|
lplab |
|
rplab |
|
Details
As most analysis use an individually compiled time scale object, in order to ensure compatibility between the analyzed and plotted values, the time scale table used for the analysis could be plotted rather than a standardized table. Two example tables have been included in the package (stages
and tens
) that can serve as templates.
Value
The function has no return value.
Examples
data(stages)
tsplot(stages, boxes="sys", shading="series")
# same with colours
tsplot(stages, boxes="sys", shading="series", boxes.col="systemCol")
# only the Mesozoic, custom axes
tsplot(stages, boxes="system", shading="stage", xlim=52:81,
plot.args=list(axes=FALSE, main="Mesozoic"))
axis(1, at=seq(250, 75, -25), labels=seq(250, 75, -25))
axis(2)
# only the Triassic, use the supplied abbreviations
tsplot(stages, boxes="short", shading="stage", xlim=c(250,199),
ylab="variable", labels.args=list(cex=1.5, col="blue"),
boxes.args=list(col="gray95"))
# colourful plot with two levels of hierarchy
tsplot(stages, boxes=c("short", "system"), shading="series",
boxes.col=c("col", "systemCol"), xlim=c(52:69))