sars R Package

This vignette is heavily based on three papers that accompany the package. To cite this vignette, please cite the relevant corresponding paper(s):

Matthews, T.J., Triantis, K., Whittaker, R.J. and Guilhaumon, F. (2019) sars: an R package for fitting, evaluating and comparing species–area relationship models. Ecography, 42, 1446-1455.

Matthews, T. J., and F. Rigal. 2021. Thresholds and the species–area relationship: a set of functions for fitting, evaluating and plotting a range of commonly used piecewise models in R. Frontiers of Biogeography, 13, e49404.

Version 1.1.1 of the package has been archived on the Zenodo research data repository (DOI: 10.5281/zenodo.2573067).

BACKGROUND

The species–area relationship (SAR) describes the near universally observed pattern whereby the number of species increases with the area sampled, and it has been described as one of ecology’s few laws (Rosenzweig (1995)). The SAR is a fundamental component of numerous ecological and biogeographical theories, such as the equilibrium theory of island biogeography (MacArthur and Wilson (1967)). In addition, SAR models have been widely used in applied ecology and conservation biogeography: for example, to predict the number of extinctions due to habitat loss. Numerous types of SAR have been described, and one primary dichotomy employed is the split of SARs into island SARs (ISARs), whereby each data point is an individual island or isolated sample, and species accumulation curves (SACs) that represent cumulative counts of increased species number with sampling area (Gray, Ugland, and Lambshead (2004)). Whilst the remainder of the paper and the described R package are focused on ISARs, the models and the model fitting procedure can equally be applied to SACs (see T. J. Matthews, Triantis, et al. (2016)), although it should be noted that in SACs the data points are not independent of one another.

Over 20 SAR models have been described in the literature (Dengler (2009), Tjørve (2003), Tjørve (2009), Triantis, Guilhaumon, and Whittaker (2012)). However, despite this wide range of models, the majority of SAR studies are still based exclusively on the power model (Arrhenius (1921)), which if fitted in its non-linear (untransformed) form generally takes a convex form. Often, the log–log representation of the power model is used as it can be fitted using standard linear regression, and its parameters are more easily interpretable (Rosenzweig (1995)). However, whilst the power model has been found to provide a reasonable fit to a wide range of datasets (Dengler (2009), T. J. Matthews, Guilhaumon, et al. (2016)), it is not universally the best model, and a number of studies have reported other models to provide better fits to empirical data (e.g. Triantis, Guilhaumon, and Whittaker (2012), T. J. Matthews, Guilhaumon, et al. (2016)). The possibility of scale dependency of the form of the SAR has long been of interest, with, for example, a theoretical case being made for SARs at intermediate spatial scales being approximated by a power model, whilst at larger spatial scales the form of the SAR has been theorised to be sigmoidal. Additionally, it is only recently that the SAR for archipelagos as units of analysis and not just islands has started to be studied, and thus we know little about the form of archipelago SARs.

Due to the increased recognition of model uncertainty in SAR research, a number of recent studies have employed a multi-model inference approach (Burnham and Anderson (2002)) in the analysis of SARs, whereby either (1) multiple SAR models are compared using various criteria (e.g. AIC) and a best model is chosen (e.g. Dengler (2009)), or (2) multiple SAR models are fitted and a multi-model averaged curve is calculated using, for example, AIC weights (e.g. Guilhaumon et al. (2008)). We are not aware of any published software package that enables users to fit, and create multi-model averaged curves using more than eight SAR models. Considering currently available software, the BAT R package provides functions to fit three SAR models (linear, power and logarithmic); however, this package is focused on general biodiversity assessment and thus does not provide any additional SAR functionality. The mmSAR R package (Guilhaumon, Mouillot, and Gimenez (2010)) is focussed on SARs and while it allows users to fit eight SAR models using an information theoretic framework, it does not include several models that have been found to provide the best fits to several empirical datasets. To provide a set of tools to fill these gaps, we have developed the R package ‘sars’. The package provides functions to fit 20 SAR models using non-linear and linear regression, calculate multi-model averaged curves using various information criteria, and generate confidence intervals using bootstrapping. Novel features compared with mmSAR include (i) user-friendly functions for plotting (the user can now plot weighted multimodel SAR curves along with the individual SAR model curves) and (ii) determining the observed shape of the model fit (i.e. linear, convex up, convex down or sigmoidal) and (iii) presence or not of an asymptote, and (iv) functions to fit, plot and evaluate Coleman (1981)’s random placement model using a species-site abundance matrix, and (v) to fit the general dynamic model (GDM) of island biogeography (Whittaker, Triantis, and Ladle (2008)). In addition, the mmSAR package (which has been deprecated) no longer complies with recognised programming good practice , is not on CRAN (the main repository of R packages), and is not user friendly (e.g. it requires the user to load individual models prior to fitting). There was therefore a need to design a new package from scratch.

METHODS AND FEATURES

The ‘sars’ (species–area relationships) package has been programmed using standard S3 methods and is available on CRAN (version 1.2.1) and the development version on GitHub (txm676/sars), meaning researchers can easily add in their own models and functions and integrate these into the multi-model inference framework. Thus, the package represents a resource for future SAR work that can be built on and expanded by workers in the field.

Fitting individual SAR models and a set of SAR models The package provides functions to fit each of the 20 SAR models (see also Triantis, Guilhaumon, and Whittaker (2012)). The ‘sar_models’ function can be used to bring up a list of the 20 model names and a Table can be generated in the package using the ‘display_sars_models’ function. With the exception of the linear model (which is fitted using standard linear regression), all models are fitted using non-linear regression and the model parameters are estimated by minimizing the residual sum of squares with an unconstrained Nelder-Mead optimization algorithm and the ‘optim’ R function.

Choosing starting parameter values for non-linear regression optimisation algorithms is not always straight forward, depending on the data at hand. The starting values for the parameter estimates used as defaults in the package are carefully chosen to avoid numerical problems and to speed up the convergence process. Custom starting values can be provided for any of the 20 models using the ‘start’ argument in the model fit functions. In addition, we also use a grid search / brute force process (see Archontoulis & Miguez, 2014). ‘grid_start’ creates 100 starting values for each parameter within sensible bounds and then creates a large matrix storing all possible combinations of starting values across the n parameters in the model. For example, for a three parameter model this matrix contains one million rows. A random number of these rows are then selected and used as starting parameter values. There are three options for the grid_start argument to control this process. The default (grid_start = “partial”) randomly samples 500 different sets of starting parameter values for each model, adds these to the model’s default starting values and tests all of these. A more comprehensive set of starting parameter estimates can be used (grid_start = “exhaustive”) - this option allows the user to choose the number of starting parameter sets to be tested (using the grid_n argument) and includes a range of additional starting parameter estimates, e.g. very small values and particular values we have found to be useful for individual models. A large ‘grid_n’ will ensure a more extensive search of starting parameter value space, but will increase the time taken to fit the model. More generally, using grid_start = “exhaustive” in combination with a large grid_n can be very time consuming; however, we would recommend it as it makes it more likely that the optimal model fit will be found, particularly for the more complex models. This is particularly true if any of the model fits does not converge, returns a singular gradient at parameter estimates, or the plot of the model fit does not look optimum. The grid start procedure can also be turned off (grid_start = “none”), meaning just the default starting parameter estimates are used. It should be noted that for certain models (e.g. Weibull 3 par.) ‘grid_start’ has been found to occasionally push model fits into strange parameter space (e.g. step curves) and so for these models ‘grid_start’ just reverts back to our provided parameter starting values. However, for these models the ‘start’ argument can still be used. For certain models, there are also checks to ensure that the returned parameters are logical. For example, the optimisation procedure can sometimes return a negative z-value for the asymptotic model, meaning in the term z^A, the base is negative. Thus, if a fitting process returns a negative z-value we return that the model could not be fitted. Remember that, as grid_start has a random component, when grid_start != “none”, you can get slightly different results each time you fit a model or run sar_average().

Each individual model fit returns an object of class ‘sars’, which is a list of 22 elements containing relevant model fit information, such as the model parameter estimates, the fitted values, the residuals, model fit statistics (e.g. AIC, R2), the observed model shape (linear, convex or sigmoidal), whether or not the fit is asymptotic, and convergence information. The returned object can easily be plotted using the ‘plot.sars’ generic function; as this function is based on the base R plotting framework, the plot aesthetics can be edited using standard plotting arguments (see the ‘plot.sars’ documentation in the package). Summary and print generic functions are also provided for class ‘sars’; these functions follow the output of the standard ‘lm’ function in the ‘stats’ R package. Multiple SAR models can be fitted to the same dataset using the ‘sar_multi’ function and the resultant n model fit objects stored together as a ‘fit_collection’ object. This object is a list of class ‘sars’, where each of the n elements contains an individual SAR model fit. Using the ‘plot.sars’ generic function on a ‘fit_collection’ object generates a grid of the n individual model fit plots (Fig. 1).

#load an example dataset (Preston, 1962), fit the logarithmic SAR model using
#the grid_start method of selecting starting parameter values, return a model 
#fit summary and plot the model fit. 
data(galap) 
fit <- sar_loga(data = galap, grid_start = "partial") 
summary(fit)

## 
## Model:
## Logarithmic
## 
## Call: 
## S == c + z * log(A)
## 
## Did the model converge: TRUE
## 
## Residuals:
##         0%        25%        50%        75%       100% 
## -115.68753  -38.74990  -11.36383   25.67070  163.95994 
## 
## Parameters:
##      Estimate  Std. Error     t value    Pr(>|t|)        2.5%  97.5%
## c   0.2915341  34.5985767   0.0084262   0.9933958 -68.9056193 69.489
## z  30.2802630   8.3203931   3.6392827   0.0026812  13.6394768 46.921
## 
## R-squared: 0.49, Adjusted R-squared: 0.41
## AIC: 187.19, AICc: 189.19, BIC: 189.51
## Observed shape: convex up, Asymptote: FALSE
## 
## 
## Warning: The fitted values of the model contain negative values (i.e. negative species richness values)

plot(fit)

#Create a fit_collection object containing multiple SAR model fits, and 
#plot all fits. 
fitC <- sar_multi(data = galap, obj = c("power", "loga", "monod"))

## 
##  Now attempting to fit the 3 SAR models: 
## 
## --  multi_sars ---------------------------------------------- multi-model SAR --
## > power : v
## > loga  : v
## > monod : v

plot(fitC) #see Fig.1

Model fit validation

Model fits can be evaluated through tests of the normality and homoscedasticity of the residuals. In certain cases this is advised, given that the maximum likelihood parameter estimates and least-squares estimates are equivalent only under the assumption of normal errors with constant variance (see the Potential issues section, below). Any of three tests can be selected to test the normality of the residuals: 1) the Lilliefors extension of the Kolmogorov normality test (normaTest = “lillie”), 2) the Shapiro-Wilk test of normality (to be preferred when sample size is small; “shapiro”), and 3) the Kolmogorov-Smirnov test (“kolmo”). As a default, the option to omit a residuals normality test (“none”) is used - thus it is up to the user to select a test if appropriate. Three options are provided to check for the homogeneity of the residuals: 1) a correlation of the squared residuals with the model fitted values (“cor.fitted”), 2) a correlation of the squared residuals with the area values (“cor.area”), or 3) no homogeneity test (“none”; the default). If a test is selected and is significant at the 5% level a warning is provided in the model summary; alternatively, the full results of the three tests can be accessed in the model fit output. A third model validation check for negative predicted richness values (i.e. when at least one of the fitted values is negative) can be undertaken and a warning is provided in the model summary if negative values are predicted. It should be noted that in earlier versions of the package (pre Version 1.3.1) there was a bug in the test checking for homogeneity of the residual variance - the correlation used the raw residuals rather than the squared residuals. This has now been corrected. Finally, the convergence code from the optim algorithm can be checked manually. Occasionally the optim algorithm may not fully converge (e.g. due to degeneracy of the Nelder–Mead simplex) but still return a fitted model. As such, for each model fit we provide the optim model convergence code (anything other than 0 can indicate an issue) and a logical value (verge) specifying whether or not the code was 0.

#load an example dataset, fit the linear SAR model whilst running residual
#normality and homogeneity tests, and return the results of the residual
#normality test 
data(galap) 
fit <- sar_linear(data = galap, normaTest ="lillie", homoTest = "cor.fitted") 
summary(fit) #a warning is provided  indicating the normality test failed

## 
## Model:
## Linear model
## 
## Call: 
## S == c + m*A
## 
## Did the model converge: TRUE
## 
## Residuals:
##         0%        25%        50%        75%       100% 
## -107.94802  -52.11499  -24.16226   31.72842  217.95711 
## 
## Parameters:
##    Estimate Std. Error   t value  Pr(>|t|)     2.5 %   97.5 %
## c 70.314003  25.743130  2.731370  0.016228 15.100481 125.5275
## m  0.185382   0.072884  2.543504  0.023410  0.029060   0.3417
## 
## R-squared: 0.32, Adjusted R-squared: 0.21
## AIC: 191.77, AICc: 193.77, BIC: 194.08
## Observed shape: linear, Asymptote: FALSE
## 
## 
## Warning: The normality test selected indicated the model residuals are not normally distributed (i.e. P < 0.05)
## 
## Warning: The homogeneity test selected indicated a  signifiant correlation between the residuals and the fitted values (i.e. P < 0.05)

fit$normaTest

## $test
## [1] "lillie"
## 
## [[2]]
## 
##  Lilliefors (Kolmogorov-Smirnov) normality test
## 
## data:  res
## D = 0.2146, p-value = 0.04725

Observed model shape and identifying an asymptote

Whilst each of the 20 models has a general shape (Triantis, Guilhaumon, and Whittaker (2012)), the actual observed shape of the model fit can be different, for some models, depending on the parameter estimates. This is important as the shape of the curve has significant implications for conservation applications and the testing of macroecological theory (Rosenzweig (1995)). In ‘sars’, the observed shape of a model fit is determined using the sequential algorithm outlined in Triantis, Guilhaumon, and Whittaker (2012). The shape is calculated using the model fit within the observed range of area values. Briefly, the algorithm works by first determining whether the fit is a straight line. Then, if the fit is classified as not being linear, the observed shape is classified as either convex (upwards or downwards) or sigmoidal by analysis of the second derivative (with respect to area) of the model fit (the full algorithm is detailed in Triantis, Guilhaumon, and Whittaker (2012), p. 220). It should be noted that a small number of models (e.g. Extended Power 2) are listed in previous papers (e.g. Tjørve (2009)) as being sigmoidal without an upper asymptote (and we have followed these classifications). However, a recent study (Godeau et al. (2020)) has argued that sigmoidal models must have an upper asymptote by definition; thus, the classification of these models is currently uncertain.

There has also been considerable debate in the SAR literature as to whether or not the SAR is asymptotic. In the ‘sars’ package, to determine whether a fit is asymptotic, for the relevant models the fitted model parameters are analysed to check whether the estimated asymptote is within the range of the sample data (Triantis, Guilhaumon, and Whittaker (2012)). As of January 2025, this procedure has been updated slightly, such that an asymptote is recorded as being present if the estimated asymptote is within the range of the sample data + a small buffer equal to 10% of the max (i.e., the max richness value + 10% of its value) and min (i.e., the minimum richness value - 10% of its value) species richness values in the dataset. This is because we noticed that the estimated asymptotes can sometimes be a small amount outside the sample data range (e.g., estimated asymptote of 11.2 where the max richness value is 11). Alternatively, for a more liberal definition, users can just consider any model with an asymptote as providing an asymptotic fit.

Multimodel SAR curve

As well as fitting individual models, the package provides a function (‘sar_average’) to fit up to 20 models, compare the resultant fits using information criteria, and construct a multimodel-averaged SAR curve based on information criteria weight (see Guilhaumon, Mouillot, and Gimenez (2010)). The multimodel average curve is constructed as a linear combination of individual model fits by multiplying the predicted richness values of each of the successfully fitted models by the model’s information criterion weight, and then summing the resultant values across all models (Burnham and Anderson (2002)). Three information criteria are available in the package: AIC, AICc and BIC. Confidence intervals around the multimodel averaged curve can be calculated using a non-parametric bootstrap algorithm described in Guilhaumon, Mouillot, and Gimenez (2010). Briefly, each of the SAR models used in the ‘sar_average’ function is fitted to the data, and the fitted values and residuals stored. The residuals are then transformed using the approach in Davison and Hinkley (1997) (p.259). For each bootstrap sample, an individual model fit is selected with the probability of selection being equal to that model’s information criterion weight. The transformed residuals from this fit are then sampled with replacement and added to the model’s fitted richness values. The ‘sar_average’ function is then used to fit all candidate SAR models to this bootstrapped set of response values, and the multimodel averaged fitted values stored. Percentile confidence intervals are then calculated using all bootstrapped fitted values.

The ‘sar_average’ function can be used without specifying any models, in which case the package attempts to fit each of the 20 models in Table 1; alternatively, a vector of model names or a ‘fit_collection’ object (generated using the ‘sar_multi’ function) can be provided using the ‘obj’ argument. The three model validation tests listed above (normality and homogeneity of residuals, and negative predicted values) can be selected (by default none are selected, so it is up to the user to select any that are appropriate); if any model fails one or more of the tests during the fitting process it is removed from the resultant multimodel SAR curve. The output of the ‘sar_average’ function is a list of class ‘multi’ and class ‘sars’, with two elements. The first element (‘mmi’) contains the multi model inference (fitted values of the multimodel SAR curve), and the second element (‘details’) contains a range of information regarding the fitting process, including the successfully fitted models, the models removed due to failing any of the validation tests, and the information criterion values, delta values and weights for the successfully fitted models. The returned object can easily be plotted using the ‘plot.multi’ generic function, and multiple plot options are available (using the ‘type’ argument; see Fig. 2). The fits of all the successfully fitted models and the multimodel SAR curve (with or without confidence intervals) can be plotted together (‘type’ = “multi”), and a barplot of the information criterion weights of each model can also be produced (‘type’ = “bar”).

#load an example dataset (Niering, 1963), run the ‘sar_average’ function
#using a vector of model names and with no model validation tests, and
#produce the plots in Figure 2 of the paper 
data(niering) 

#run the ‘sar_average’ function using a vector of model names, and with simply
#using the default model starting parameter estimates (grid_start = "none")
fit <- sar_average(data= niering, obj =c("power","loga","koba","logistic","monod",
                                         "negexpo","chapman","weibull3","asymp"),
grid_start = "none", normaTest = "none", homoTest = "none", neg_check = FALSE, 
confInt = TRUE, ciN = 50) #a message is provided indicating that one model

## 
##  Now attempting to fit the 9 SAR models: 
## 
## --  multi_sars ---------------------------------------------- multi-model SAR --
## > power    : v
## > loga     : v
## > koba     : v
## > logistic : v
## > monod    : v
## > negexpo  : v
## > chapman  : v
## > weibull3 : v
## > asymp    : v
## 
## No model validation checks selected
## 
## 9 remaining models used to construct the multi SAR:
##  Power, Logarithmic, Kobayashi, Logistic(Standard), Monod, Negative exponential, Chapman Richards, Cumulative Weibull 3 par., Asymptotic regression 
## --------------------------------------------------------------------------------
## 
## Calculating sar_multi confidence intervals - this may take  some time:

## Warning in sar_conf_int(fit = res, n = ciN, crit = IC, obj_all = obj_all, : The following model(s) has been removed from the  confidence interval calculations as NAs/Infs are
##                 present in the transformed residuals: asymp

##   |                                                                              |                                                                      |   0%  |                                                                              |=                                                                     |   2%  |                                                                              |===                                                                   |   4%  |                                                                              |====                                                                  |   6%  |                                                                              |======                                                                |   8%  |                                                                              |=======                                                               |  10%  |                                                                              |========                                                              |  12%  |                                                                              |==========                                                            |  14%  |                                                                              |===========                                                           |  16%  |                                                                              |=============                                                         |  18%  |                                                                              |==============                                                        |  20%  |                                                                              |===============                                                       |  22%  |                                                                              |=================                                                     |  24%  |                                                                              |==================                                                    |  26%  |                                                                              |====================                                                  |  28%  |                                                                              |=====================                                                 |  30%  |                                                                              |======================                                                |  32%  |                                                                              |========================                                              |  34%  |                                                                              |=========================                                             |  36%  |                                                                              |===========================                                           |  38%  |                                                                              |============================                                          |  40%  |                                                                              |=============================                                         |  42%  |                                                                              |===============================                                       |  44%  |                                                                              |================================                                      |  46%  |                                                                              |==================================                                    |  48%  |                                                                              |===================================                                   |  50%  |                                                                              |====================================                                  |  52%  |                                                                              |======================================                                |  54%  |                                                                              |=======================================                               |  56%  |                                                                              |=========================================                             |  58%  |                                                                              |==========================================                            |  60%  |                                                                              |===========================================                           |  62%  |                                                                              |=============================================                         |  64%  |                                                                              |==============================================                        |  66%  |                                                                              |================================================                      |  68%  |                                                                              |=================================================                     |  70%  |                                                                              |==================================================                    |  72%  |                                                                              |====================================================                  |  74%  |                                                                              |=====================================================                 |  76%  |                                                                              |=======================================================               |  78%  |                                                                              |========================================================              |  80%  |                                                                              |=========================================================             |  82%  |                                                                              |===========================================================           |  84%  |                                                                              |============================================================          |  86%  |                                                                              |==============================================================        |  88%  |                                                                              |===============================================================       |  90%  |                                                                              |================================================================      |  92%  |                                                                              |==================================================================    |  94%  |                                                                              |===================================================================   |  96%  |                                                                              |===================================================================== |  98%  |                                                                              |======================================================================| 100%

#(asymp) could not be used in the confidence interval calculation

par(mfrow = c(3,1)) #plot all model fits and the multimodel SAR curve as a separate curve on top
plot(fit, ModTitle = "a) Multimodel SAR", mmSep = TRUE)

#plot the multimodel SAR curve (with confidence intervals; see explanation
#in the main text, above) on its own 
plot(fit, allCurves = FALSE, ModTitle =
      "c) Multimodel SAR with confidence intervals", confInt = TRUE)

#Barplot of the information criterion weights of each model 
plot(fit, type = "bar", ModTitle = "b) Model weights", cex.lab = 1.3)

One thing to note is how the sar_average() function deals with model non-convergence. There are different types of non-convergence and these are dealt with differently. If the optimisation algorithm fails to return any solution, the model fit is defined as NA and is then removed, and so does not appear in the model summary table or multi-model curve etc. However, the optimisation algorithm (e.g. Nelder-Mead) can also return non-NA model fits but where the solution is potentially non-optimal (e.g. degeneracy of the Nelder–Mead simplex) - these cases are identified by any optim convergence code that is not zero. We have decided not to remove these fits (i.e. they are kept in the model summary table and multimodel curve), but any instances can be checked using the returned ‘details$converged’ vector and then the model fitting re-run without these models, if preferred. Increasing the starting parameters grid search (see above) may also help avoid this issue.

Additional functions

In addition to the main functions used to fit and compare the 20 SAR models, the sars package provides additional functions for specific SAR-based analyses. First, a function is provided to fit the log–log version of the power model (a function that is often fitted in SAR studies; Rosenzweig 1995) and compare parameter values with those generated using the non-linear power model. The log–log version of the power model is not equivalent to its non-linear counterpart because of non-equivalence in the study of the variation in a variable and in its transformation, and bias of back-transformed results obtained on a logarithmic scale. Second, a function has been added that enables the fitting of Coleman (1981)’s random placement model to a species/sites abundance matrix. According to this model, the number of species occurring on an island depends on the relative area of the island and the regional relative species abundances. The fit of the random placement model can be determined through use of a diagnostic plot (which can be generated from the function output) of island area (log transformed) against species richness, alongside the model’s predicted values (see Wang et al. (2010)). Following Wang et al. (2010), the model is rejected if more than a third of the observed data points fall beyond one standard deviation from the expected curve. Finally, a function is provided to fit the general dynamic model of island biogeography (Whittaker, Triantis, and Ladle (2008)) using five different SAR models (linear, logarithmic, power(area), power(area and time), linear power).

#load an example dataset, fit the log-log power model, return a model fit
#summary and plot the model fit. When ‘compare’ == TRUE, the non-linear
#power model is also fitted and the resultant parameter values compared. 
#If any islands have zero species, a constant (‘con’) is added to all 
#species richness values. 
data(galap) 
fit <- lin_pow(dat = galap, compare = TRUE, con = 1) 
summary(fit)

## Model = Log-log power
## 
## Log-transformation function used: log()
## 
## Call:
## lm(formula = logT(S) ~ logT(A), data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.3591 -0.7584  0.1177  0.6009  1.0739 
## 
## Coefficients:
##      Estimate Std. Error t value Pr(>|t|)    
## LogC  3.01865    0.35442   8.517 6.56e-07 ***
## z     0.33854    0.08523   3.972  0.00139 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7626 on 14 degrees of freedom
## Multiple R-squared:  0.5298, Adjusted R-squared:  0.4962 
## F-statistic: 15.78 on 1 and 14 DF,  p-value: 0.001391
## 
## Power (non-linear) parameters:
##  c = 33.18 
##  z = 0.28

plot(fit)

#load an example dataset, fit the random placement model and plot the 
#model fit and standard deviation. The ‘data’ argument requires a species-
#site abundance matrix: rows are species and columns are sites. The area 
#argument requires a vector of site (island) area values. 
data(cole_sim) 
fit <- coleman(data = cole_sim[[1]], area = cole_sim[[2]]) 
plot(fit, ModTitle = "Hetfield")

#load an example dataset, fit the GDM using the logarithmic SAR model, and
#compare the GDM with three alternative (nested) models: area and time 
#(age of each island), area only, and intercept only. 
data(galap) 
galap$t <- rgamma(16, 5, scale = 2)#add a random time variable 
gdm(data = galap, model = "loga", mod_sel = TRUE)

## 
##  GDM fit using the logarithmic SAR model 
## 
## GDM model summary:
## 
## Nonlinear regression model
##   model: SR ~ Int + A * log(Area) + Ti * Time + Ti2 * Time^2
##    data: data
##      Int        A       Ti      Ti2 
## -177.278   29.534   35.957   -1.503 
##  residual sum-of-squares: 67312
## 
## Number of iterations to convergence: 1 
## Achieved convergence tolerance: 5.808e-08
## 
## All model summaries:
## 
##                 RSE    R2      AIC     AICc Delta.AIC Delta.AICc
## A          74.44350 0.486 187.1908 189.1908  0.000000   0.000000
## GDM        74.89556 0.554 188.9181 194.9181  1.727322   5.727322
## A + T      77.22652 0.486 189.1796 192.8159  1.988752   3.625116
## Intercept 100.32744 0.000 195.8435 196.7666  8.652648   7.575725

SAR extrapolation

Extrapolation (e.g. predicting the richness of areas too large to be sampled) is one of the primary uses of the SAR. Using an individual SAR model fit, extrapolation simply involves using the model function in combination with the parameter values estimated using the observed area / richness data to predict the richness on a larger area. An alternative extrapolation approach to simply using an individual model l is to use multi-model inference and model averaging, whereby a larger number of n models is fitted to a set of islands, the models ranked according to some criterion (e.g. Akaike’s information criterion, AIC) and the criterion values converted into model weights (i.e. the conditional probabilities for each of the n models). The n models are then each used to predict the richness of a larger area and these predictions are multiplied by the respective model weights and summed to provide a multi-model averaged prediction. Both of these options are available inside the package.

We would recommend using grid_start = “exhaustive” when fitting models for use with sar_pred(), as this is more likely to find the optimum fit for a given model. However, remember that, as grid_start has a random component, when grid_start != “none”, you can get slightly different results each time you fit a model or run sar_average, and then run sar_pred().

#fit the power model and predict richness on an island of area = 5000
data(galap)
p <- sar_power(data = galap)
sar_pred(p, area = 5000)

## 
## This is a sar_pred object:
## 
##   Model Area Prediction
## 1 Power 5000   370.1389

#fit three SAR models and predict richness on islands of area = 5000 & 10000
p2 <- sar_multi(galap, obj = c("power", "loga", "koba"))

## 
##  Now attempting to fit the 3 SAR models: 
## 
## --  multi_sars ---------------------------------------------- multi-model SAR --
## > power : v
## > loga  : v
## > koba  : v

sar_pred(p2, area = c(5000, 10000))

## 
## This is a sar_pred object:
## 
##         Model  Area Prediction
## 1       Power  5000   370.1389
## 2       Power 10000   450.4146
## 3 Logarithmic  5000   258.1944
## 4 Logarithmic 10000   279.1831
## 5   Kobayashi  5000   290.7020
## 6   Kobayashi 10000   318.4400

#calculate a multi-model curve and predict richness on islands of area = 5000 & 10000
#grid_start set to "none" for speed
p3 <- sar_average(data = galap, grid_start = "none")

## 
##  Now attempting to fit the 20 SAR models: 
## 
## --  multi_sars ---------------------------------------------- multi-model SAR --
## > power    : v
## > powerR   : v
## > epm1     : v
## > epm2     : v
## > p1       : v
## > p2       : v
## > loga     : v
## > koba     : v
## > monod    : v
## > negexpo  : v
## > chapman  : v
## > weibull3 : v
## > asymp    : v
## > ratio    : v
## > gompertz : v
## > weibull4 : v
## > betap    : v
## > logistic : v
## > heleg    : v
## > linear   : v
## 
## No model validation checks selected
## 
## 20 remaining models used to construct the multi SAR:
##  Power, PowerR, Extended Power model 1, Extended Power model 2, Persistence function 1, Persistence function 2, Logarithmic, Kobayashi, Monod, Negative exponential, Chapman Richards, Cumulative Weibull 3 par., Asymptotic regression, Rational function, Gompertz, Cumulative Weibull 4 par., Beta-P cumulative, Logistic(Standard), Heleg(Logistic), Linear model 
## --------------------------------------------------------------------------------

sar_pred(p3, area = c(5000, 10000))

## 
## This is a sar_pred object:
## 
##   Model  Area Prediction
## 1 Multi  5000   227.2078
## 2 Multi 10000   243.0279

SAR thresholds

An increasing number of studies have focused on identifying thresholds in the ISAR, including studies i) focused on identifying the small island effect, ii) determining whether the ISAR has an upper asymptote and iii) identifying thresholds in habitat ISARs, which are often systems of conservation concern. The most common approach in such studies is to use piecewise regression. As such, the package provide a set of functions for fitting six piecewise regression models to ISAR data, calculating confidence intervals around the breakpoint estimates (for certain models), comparing the models using various information criteria, and plotting the resultant model fits. The six piecewise models are a selection of 6 models out of the 14 listed by Gao et al. (2019) and comprise the continuous one-threshold and two-threshold, discontinuous one-threshold and two-threshold, and left-horizontal one-threshold and two-threshold models (see Gao et al. (2019) for further information).

The main function is ‘sar_threshold’, which fits the six piecewise models, in addition to a linear model and an intercept only model for comparison (using the ‘non_th_models’ argument). Summary and plot generic functions are available which return user-friendly output. As the coefficients in the fitted breakpoint regression models do not all represent the intercepts and slopes of the different segments (for these it is necessary to add different coefficients together), a separate function (‘get_coef’) can be used to calculate these. A separate function (‘threshold_ci’) generates the confidence intervals around the breakpoints of the one-threshold continuous and left-horizontal models.

There has been considerable debate regarding the number of parameters that are included in different piecewise models. Here (and thus in our calculation of AIC, AICc, BIC etc) we consider ContOne to have five parameters, ZslopeOne - 4, DiscOne - 6, ContTwo - 7, ZslopeTwo - 6, DiscTwo - 9. The standard linear model and the intercept model are considered to have 3 and 2 parameters, respectively. The raw ‘lm’ model fits are provided in the output, which are also of class ‘thresholdInt’; this class has a method for the generic function ‘logLik’ which adjusts the degrees of freedom associated with the log likelihood to match the number of parameters listed above.

Detailed information on fitting and plotting the threshold models, as well as the additional functions for calculating model slopes and intercepts, and generating confidence intervals around the threshold estimates, can be found in T. J. Matthews and Rigal (2021). See also the documentation for ‘sar_threshold’.

#load an example dataset, and fit the continuous two-threshold model 
#to the data (with area transformed using log to the base 10), using an 
#interval of 0.1 (for speed)
data(aegean2)
fit <- sar_threshold(data = aegean2, mod = c("ContTwo"), interval = 0.1, 
                     non_th_models = FALSE, logAxes = "area", con = 1,
                     logT = log10, nisl = NULL)

#generate model fitting summary table (generally more useful when fitting multiple models)
summary(fit)

## 
## Sar_threshold object summary:
## 
## 1 models fitted
## 
## Models fitted with area log-transformed
## 
## Models ranked using BIC
## 
##               LL Pars     AIC    AICc     BIC   R2  R2a    Th1   Th2 seg1 seg2
## ContTwo -1002.74    7 2019.48 2020.16 2041.55 0.95 0.95 -0.122 1.478   86   37
##         seg3
## ContTwo   50

#Plot the resultant model fit 
plot(fit, cex = 0.8, cex.main = 1, cex.lab = 0.8, pcol = "grey")

Multi-habitat and countryside species–area relationship models

Detailed information on fitting and plotting the multi-habitat and countryside species–area relationship models can be found in T. J. Matthews (2025). Further details on the models can be found in Furness et al. (2023), Pereira and Daily (2006) and Proenca and Pereira (2013).

The last twenty years has seen an increased focus on developing SAR models that explicitly account for habitat diversity (i.e., ‘multi-habitat’ models). These models can be coarsely be placed into two groups: those that (i) model species richness simply as some function of area and habitat diversity (here, termed ‘habitat-heterogeneity’ SAR models), and (ii) account for species affinities to different habitat types (often called ‘countryside’ SAR models).

Habitat-heterogeneity SAR models

Functionality has been provided to fit three habitat-heterogeneity SAR models: the (i) choros model, (ii) Kallimanis model, and (iii) jigsaw model.

The main function for fitting the habitat-heterogeneity models is ‘sar_habitat’, which takes as its first argument a dataset in the form of a data frame with three columns: the first with island/site areas, the second with island/site habitat diversity, and the third with the species richness of each island/site. The models can be fitted in three different forms (selected using the modType argument): a power form (modType = “power”), a logarithmic form (“logarithmic”) or a log–log power form (“power_log”). sar_habitat fits all three habitat-heterogeneity models (i.e., the choros, Kallimanis and jigsaw models), in addition to a standard SAR model, the identity of which depends on the ‘modType’ argument (i.e., an Arrhenius power model, a logarithmic model, or a log–log power model).

The non-linear regression models are fitted using non-linear regression and the ‘nlsLM’ function in the ‘minpack.lm’ R package. This function was used rather than the standard ‘nls’ function that is used elsewhere in the sars package, as we found that this resulted in better searches of the parameter space for the habitat models (and fewer convergence errors), particularly for the logarithmic models. It returns a standard ‘nls’ object and thus all the normal associated nls R functions can be used. The linear models are all fitted using standard linear regression.

The output of ‘sar_habitat’ is a list of class ‘habitat’ and class ‘sars’, containing the ‘nls’ and/or ‘lm’ model fit objects. Summary and plot generic method functions are provided, with the former generating a model summary table containing all parameter values, as well as model fit metrics (R2 and adjusted R2 [pseudo values are calculated for the non-linear models], and AIC, BIC and AICc), enabling the models to be easily compared. The plot function generates a bar plot of user-selected information criteria weights.

To illustrate the ‘sar_habitat’ function, we use a dataset describing the number of habitats and land snail species (richness range = 2 – 33) on 12 islands of varying size (area range = 0.002 km2 – 208 km2) in the Skyros Archipelago, Greece (Triantis et al., 2005). The dataset has been saved within the package as a data object.

#Load the dataset and fit the three habitat-heterogeneity models
#(power model form) alongside the Arrhenius power model, and view
#the raw model fit objects
data(habitat)
s <- sar_habitat(data = habitat, modType = "power")
s

## 
## The individual model fit objects from sar_habitat, use summary() for the model comparison results
## 
## $choros
## Nonlinear regression model
##   model: S ~ c1 * choros^z
##    data: data
##      c1       z 
## 10.3645  0.1555 
##  residual sum-of-squares: 35.77
## 
## Number of iterations to convergence: 6 
## Achieved convergence tolerance: 1.49e-08
## 
## $jigsaw
## Nonlinear regression model
##   model: S ~ (c1 * H^d) * ((A/H)^z)
##    data: data
##      c1       z       d 
## 14.0867  0.1771  0.2151 
##  residual sum-of-squares: 32.92
## 
## Number of iterations to convergence: 7 
## Achieved convergence tolerance: 1.49e-08
## 
## $Kallimanis
## Nonlinear regression model
##   model: S ~ c1 * A^(z + d * H)
##    data: data
##        c1         z         d 
## 1.539e+01 1.715e-01 4.741e-04 
##  residual sum-of-squares: 32.59
## 
## Number of iterations to convergence: 5 
## Achieved convergence tolerance: 1.49e-08
## 
## $power
## Nonlinear regression model
##   model: S ~ c1 * A^z
##    data: data
##      c1       z 
## 15.5505  0.1841 
##  residual sum-of-squares: 33.24
## 
## Number of iterations to convergence: 6 
## Achieved convergence tolerance: 1.49e-08
## 
## attr(,"failedMods")
## [1] "none"
## attr(,"type")
## [1] "habitat"
## attr(,"modType")
## [1] "power"

#Fit the three habitat-heterogeneity models in log–log power
#model form alongside the linear (log–log) power model, and
#compare the model fits using information criteria
s2 <- sar_habitat(data = habitat, modType = "power_log")
summary(s2)

## 
## sar_habitat model fit summary (modType = power_log). All models could be fitted. Models ranked by AICc:
## 
##        Model     z        d   d-z    R2 adjR2    AIC    BIC   AICc
## 1     choros 0.147       NA    NA 0.914 0.905 -0.772  0.682  2.228
## 2     jigsaw 0.086  0.50100 0.414 0.940 0.926 -3.100 -1.160  2.614
## 4      power 0.177       NA    NA 0.877 0.865  3.444  4.898  6.444
## 3 Kallimanis 0.179 -0.00019    NA 0.877 0.850  5.436  7.376 11.150

#Generate barplot of model AICc weights
plot(s2)

Countryside SAR models

Countryside SAR models are an extension of habitat-heterogeneity models, built on the idea that species respond differently to land-use change due to interspecific differences in the affinity to different land-use types. Countryside SAR models aim to incorporate the differential affinities of species to different habitat / land-use types (here, these two terms are used interchangeably), in addition to the existence of different habitats in a landscape (Pereira & Daily, 2006). To achieve this, species are typically grouped on the basis of habitat affinity (e.g., forest species, shrubland species).

The main function is ‘sar_countryside’, which takes as its first argument a dataset, where rows are sites, and columns contain the areas of different habitats (e.g., forest, farmland, grassland) and then the number of species in each species group (e.g., forest species, farmland species, grassland species). The function has been developed to automatically work with any number of habitats and species groups, but the user must ensure that the order of the area and species richness columns are correct (see the package documentation for further information). Typically, the species groups match the sampled habitats, occasionally with a ‘ubiquitous’ species group also included (i.e., true habitat generalists). However, the model is flexible and species can be classified into any number of groups. Note that species must be classified into groups prior to fitting the model, either using a priori knowledge (e.g., from habitat preference studies), and/or information in the sample data (see Proenca and Pereira (2013)).

As default, ‘sar_countryside’ fits the power form of the countryside SAR model, although functionality has been added to enable the logarithmic form to be fitted instead (controlled using the ‘modType’ argument). The model is fitted using the same non-linear regression approach as outlined for sar_habitat. As with any non-linear regression model, the convergence of the fitting algorithm is dependent on the use of sensible starting parameter estimates. Within the ‘sar_countryside’ function, we have incorporated a set of internal functions to select reasonable starting parameter estimates based on the results of previous studies (e.g., Proenca and Pereira (2013)). This is controlled through the ‘gridStart’ argument. This argument has three options which are, in order of decreasing speed but increasing sampling of parameter space, “none”, “partial” (the default) and “exhaustive”. Alternatively, the user can provide a set of starting parameters using the ‘startPar’ argument.

The ‘sar_countryside’ function returns a list (of class ‘habitat’ and ‘sars’; and with a ‘type’ attribute of ‘countryside’) with eight elements, including: (i) a list of the non-linear regression model fits for each of the i species groups; (ii) the normalized habitat affinity values for each of the models in (i); (iii) the c-parameter values for each of the models in (i); and (iv) the predicted total richness values (calculated by summing the predictions for each constituent countryside model) for each site in the dataset.

The ‘countryside_extrap’ function can be used with the output of sar_countryside to predict the species richness of landscapes with varying areas of the analysed habitats (provided by the user using the ‘area’ argument).

An associated generic plot method has been provided that works with the output of ‘sar_countryside’, allowing the user to generate four different plots to help model validation and interpretation. The first (generated by setting the ‘type’ argument to 1) plots the predicted total richness values (from both countryside and Arrhenius power [or logarithmic] SAR models) against the observed total richness values, with a regression line (intercept = 0, slope = 1) to aid interpretation. The second (‘type’ = 2) uses ‘countryside_extrap’ internally to generate separate fitted SAR curves for each of the modelled species groups, for each habitat individually, using a set of hypothetical sites in which the percentage of the site covered by a given habitat is always 100%. For some habitats, depending on the sampling design used, this means the associated model(s) may be extrapolating outside of the area range in the data. To see how this (i.e., Type 2 plots) works, we can imagine a dataset comprising a set of sites sampled in a two-habitat landscape (habitats A and B) with three species groups. Each site will thus have given amounts of habitat A and B that sum to the site area. The ‘sar_countryside’ function fits three component models, one for each species group. The plot method function (type = 2) then creates a vector of hypothetical sites with area values ranging from zero to the maximum observed site area value (i.e., the summed habitat areas for each site) across all sites. For all of these hypothetical sites, the proportion of habitat represented by habitat A is set to 100% (and by extension the proportion represented by habitat B is always 0%). This vector is then used alongside our fitted countryside model in ‘countryside_extrap’ to estimate the number of species in the three species groups in each hypothetical site, with these values used to plot the fitted curves for each species group on the same plot (i.e., the plot for habitat A). This process is then repeated for habitat B, and so on.

The third (‘type’ = 3) follows a similar approach as for Type 2 plots, but instead varies the proportion of a given habitat while fixing site area. The area of the largest site is used, and for a given (focal) habitat, the proportion of the site represented by the focal habitat is varied from zero up to one. As site area is fixed, as the proportion of the focal habitat increases, the proportions of the other habitats decrease. This decrease is fixed to occur at an equal rate. For example, if there are three habitats and the focal habitat is at 50% of the site, the other two habitats will have values of 25%. This process is then repeated using the next habitat as the focal habitat, and so on.

Finally, the fourth (‘type’ = 4) generates the “effective area” plots presented in Merckx, Dantas de Miranda, and Pereira (2019). The “effective area” for species group i in a site comprising j habitats is given by Ai = Sum over j of hij*Aj, where hij is the affinity of species group i to habitat j and is taken from the fitted ‘sar_countryside’ model.

To illustrate the ‘sar_countryside’ function and associated plots, we use a dataset of vascular plants (excluding adult trees) sampled in a multi-habitat landscape in Portugal (from Proenca and Pereira (2013)). A nested sampling scheme was used and the resultant SAR is similar to a Type IIIA curve. Sites were divided into three main land use types: agricultural land, shrub land, and oak forest. Species were grouped based on their affinity to those three land use types, with a fourth group containing ubiquitous species. The dataset has been saved within the package as a data object.

# Fit the countryside SAR model (power form) to the data, and use the function’s
# starting parameter value selection procedure (setting 'gridStart' here to
# "none" just for speed). Abbreviations: AG = agricultural land, SH = shrubland,
# F = oak forest, UB = ubiquitous species.data(countryside)
data(countryside)
s3 <- sar_countryside(data = countryside, modType = "power",
                      gridStart = "none", 
                      habNam = c("AG", "SH", "F"), 
                      spNam = c("AG_Sp", "SH_Sp", "F_Sp", "UB_Sp"))

# Use the fitted models to predict the richness of a site with given amounts of
# agricultural land, shrub land and forest. The ‘Indiv_mods’ element shows the
# predicted number of species (based on the given model) in a given group in the
# site, while the ‘Total’ element is the predicted total number of species
# across all groups.
countryside_extrap(s3, area = c(1000, 1000, 1000))

## $Indiv_mods
##     AG_Sp     SH_Sp      F_Sp     UB_Sp 
## 25.440637  9.022428 15.626832  3.457593 
## 
## $Total
## [1] 53.54749
## 
## $Failed_mods
## [1] FALSE

#Plot the fitted individual SAR curves for each species group (Type 2 plot),
#including a total species richness curve, modifying various aspects of the
#plot (e.g., legend position). Use the ‘which’ argument to only generate the
#second plot (i.e., the plot for shrubland)
par(mar=c(5.1, 4.1, 4.1, 7.5), xpd=TRUE)
plot(s3, type = 2, totSp = TRUE,
     lcol = c("black", "aquamarine4", "#CC661AB3",
              "darkblue", "darkgrey"), pLeg = TRUE,
     legPos ="topright", legInset = c(-0.27,0.3), 
     lwd = 1.5, ModTitle = c("Agricultural land", 
                             "Shrubland", "Forest"),
     which = 2)

# For a given habitat and a fixed site area, plot the number of species in each
# species group as a function of the proportion of the given habitat in the site
# (Type 3 plot).  Use the ‘which’ argument to only generate the first plot
# (i.e., the plot for agricultural land)
plot(s3, type = 3, totSp = TRUE, 
     lcol = c("black", "aquamarine4","#CC661AB3",
              "darkblue", "darkgrey"), pLeg = TRUE, 
     legPos ="topright", legInset = c(-0.27,0.3), 
     lwd = 1.5, ModTitle = c("Agricultural land", 
                             "Shrubland", "Forest"), 
     which = 1)

# Generate an effective area plot (Type 4 plot), customising point type and size
# etc 
par(mar=c(5.1, 4.1, 4.1, 2.1), xpd=FALSE)
plot(s3, type = 4, pch = 16, lwd = 2, cex  = 0.75, cex.axis = 0.75, 
     cex.lab = 0.9, ModTitle = c("Agricultural species"), which = 1)

On the calculation of information criteria

There are different ways to calculate the various information criteria (IC) used for model comparison (e.g. AIC, BIC). One difference relates to whether the residual sum of squares (rss) or the log-likelihood (ll) is used. Under standard assumptions (e.g. independence of data points, homoscedasticity and normality of the residuals), the two approaches produce identical parameter estimates for regression models. However, the formulas are different and thus can produce different IC values for the same model. For example, historically in the ‘sars’ package we have calculated IC values using formulas based on the rss (Burnham & Anderson, 2002; Guilhaumon et al. 2008). This meant that the IC values generated in ‘sars’ were not comparable with values generated in packages using different formulas. For example, in the nls (the main function for non-linear regression in R) and lm functions in the stats R package, a ll approach is used, meaning IC values from models fitted using nls could not be compared with IC values from sars models. To re-iterate, the parameter estimates are comparable, it is simply that the IC values will differ. In sars version 1.2.2 we changed our IC formulas to match those in nls and lm. Thus, if users have built their own models using nls (or lm) and wish to compare IC values with models fitted in ‘sars’, this is now straightforward. To re-create IC values from previous studies (i.e. those using a version of sars pre 1.2.2), it will be necessary to download sars Version 1.2.1 or earlier (either from CRAN or GitHub; version 1.1.1 was published as a release on GitHub). It is important to note that these are not the only two ways of calculating ICs for regression models, and other formulas exits. Thus, if building models using other functions and packages (i.e. other than nls or lm), users should make sure to check how these packages calculate IC values before comparing with models fitted in sars. In sars, as in nls and lm, we include an additional parameter for estimation of the variance. For example, the power model is considered to have three parameters when calculating IC values. Finally, if users are comparing models fitted in sars with their own models fitted using other packages, it is essential that IC values are all calculated using the same dependent variable (i.e. non-transformed richness values and not logged richness, as sars only uses the former currently).

Potential issues with using information criteria in conjunction with

non-linear regression models

Before starting this section, it is worth pointing out that we are not statisticians and the following discussion should be interpreted with this point in mind. We have however reviewed a lot of the literature (including a number of statistics forums) on the use of information criteria (e.g. AIC) in the context of comparing non-linear regression models. Based on this review, we have concluded that there are three potential issues with this approach.

First, many information criteria (including AIC) are theoretically applicable only when a model is fitted through the approach of maximum likelihood. In the case of linear and non-linear regression (e.g. the nls function in R, and the approach we use in ‘sars’), a least-squares estimator (minimising the residual sum of squares) is generally used rather than likelihood maximisation, for various reasons, such as analytical tractability. This is not necessarily a problem, as the least-squares parameter estimates and the maximum likelihood parameter estimates are equivalent in the case of normal and independent errors with a mean of zero and constant variance (Bates and Watts (1988)). This holds for both linear and non-linear regression models (Bates and Watts (1988); Forum1 (n.d.)). However, if these assumptions are not met this equivalence may break down and thus the use of AIC etc would not necessarily be appropriate. The use of the normality and homogeneity of variance checks within the package may be useful in this regard, but note that it is now up to the user to manually turn these checks on.

Second, most of the commonly used information criteria (e.g. AIC) include a term that penalises for the number of parameters. In the case of linear regression models, counting the number of parameters is straight forward. However, non-linear models are so called because they are non-linear in regard to their parameters (e.g. a quadratic model is a linear model but with a non-linear shaped curve). As such, it is not always as straight forward to count parameters in the same way, particularly in the case of very complex non-linear and machine learning type models (i.e. not those used in ‘sars’). “That is, a single parameter in the model may count as more or less than one parameter, in some sense” (Forum2 (n.d.)). However, in the absence of any other information, or indeed alternative model comparison approaches, all one can do is use the number of estimable parameters (sensu Burnham & Anderson) in a model, which is what we follow in the ‘sars’ package, and is what a large number of studies do (and advise) in the ecological and wider literature (e.g. Archontoulis and Miguez (2015); Banks and Joyner (2017); see also Forum3 (n.d.)’s response to Forum2 (n.d.)).

Finally, some have argued that you should only use AIC etc to compare models within the same ‘model complex’, and even only in the case of nested models (e.g. Ripley (no date)). From this it could be concluded that we should not compare the suite of 19 non-linear models with the linear model. However, Burnham and Anderson (2002) and Anderson and Burnham (2006) have argued that this is not the case and AIC for example can be used to compare non-nested models. In addition, again many studies use AIC to compare non-nested models from different ‘complexes’. We have worked on the assumption that, as long as the response variable is the same and models are comparable in terms of parameter number calculation (e.g. all include or not a parameter for the variance), it is appropriate to compare different types of model (at least within the limited selection used in the package).

CONCLUSIONS

The SAR has been a cornerstone of ecological and biogeographical science for almost a century and its form and fit are still of great significance in both theoretical and applied contexts. The development of the ‘sars’ R package should aid future SAR research by providing a comprehensive set of tools that enable in-depth exploration of SARs and SAR-related patterns. In addition, the package has been designed in such a way as to allow other SAR researchers to add (e.g. via GitHub) new functions and models in the future to ensure the package is of lasting value. For example, future additions to the package could include the suite of countryside biogeography SAR models that have recently been published, standard SAR functions not so far incorporated, or functions specifically intended for analysis of species accumulation curves or endemics–area relationships. Finally, whilst the focus of this paper has been on classic SARs, there is no reason that the functionality in the ‘sars’ package cannot be used to analyse other diversity–area relationships (e.g. functional or phylogenetic diversity–area relationships). Application of the full set of 20 models, in addition to the multimodel SAR framework, included in the ‘sars’ package to a wider range and type of data (e.g. trait and phylogenetic data) will likely be revealing and will help in improving our understanding of SARs, and diversity–area relationships more generally.

References

Anderson, D., and K. Burnham. 2006. “AIC Myths and Misunderstandings.” https://sites.warnercnr.colostate.edu/kenburnham/wp-content/uploads/sites/25/2016/08/AIC-Myths-and-Misunderstandings.pdf.

Archontoulis, S. V., and F. E. Miguez. 2015. “Nonlinear regression models and applications in agricultural research.” Agronomy Journal 107: 786–98. https://doi.org/10.2134/agronj2012.0506.

Arrhenius, O. 1921. “Species and Area.” The Journal of Ecology 9 (1): 95. https://doi.org/10.2307/2255763.

Banks, H. T., and M. L. Joyner. 2017. “AIC under the framework of least squares estimation.” Applied Mathematics Letters 74: 33–45. https://doi.org/10.1016/j.aml.2017.05.005.

Bates, D. M., and D. J. Watts. 1988. Nonlinear Regression Analysis and Its Applications. Book. New-York: John Wiley & Sons.

Burnham, K. P., and D. R Anderson. 2002. Model Selection and Multi-Model Inference: A Practical Information-Theoretic Approach. Book. 2nd ed. New-York: Springer.

Coleman, B. D. 1981. “On Random Placement and Species-Area Relations.” Journal Article. Mathematical Biosciences 54 (3–4): 191–215. https://doi.org/10.1016/0025-5564(81)90086-9.

Davison, A. C., and D. V. Hinkley. 1997. Bootstrap Methods and Their Application. Book. Cambridge: Cambridge University Press.

Dengler, J. 2009. “Which Function Describes the Species–Area Relationship Best? A Review and Empirical Evaluation.” Journal Article. Journal of Biogeography 36 (4): 728–44. https://doi.org/10.1111/j.1365-2699.2008.02038.x.

Forum1. n.d. https://stats.stackexchange.com/questions/492670/is-aic-appropriate-for-comparing-non-linear-models.

Forum2. n.d. https://stat.ethz.ch/pipermail/r-help/2010-August/250742.html.

Forum3. n.d. https://stat.ethz.ch/pipermail/r-help/2010-August/250839.html.

Furness, E. N., E. E. Saupe, R. J. Garwood, P. D Mannion, and M. D Sutton. 2023. “The jigsaw model: a biogeographic model that partitions habitat heterogeneity from area.” Frontiers of Biogeography 15: e58477. https://doi.org/10.21425/F5FBG58477.

Gao, D., Z. Cao, P. Xu, and G. Perry. 2019. “On piecewise models and species–area patterns.” Ecology and Evolution 9: 8351–61.

Godeau, U., C. Bouget, J. Piffady, T. Pozzi, and F. Gosselin. 2020. “Lack of definition of mathematical terms in ecology: The case of the sigmoid class of functions in macro-ecology.” Ecology and Evolution, 1–12. https://doi.org/10.1002/ece3.7016.

Gray, J. S., K. I. Ugland, and J. Lambshead. 2004. “On Species Accumulation and Species–Area Curves.” Journal Article. Global Ecology and Biogeography 13 (6): 567–68. https://doi.org/10.1111/j.1466-822X.2004.00138.x.

Guilhaumon, F., O. Gimenez, K. J. Gaston, and D. Mouillot. 2008. “Taxonomic and Regional Uncertainty in Species-Area Relationships and the Identification of Richness Hotspots.” Journal Article. Proceedings of the National Academy of Sciences USA 105 (40): 15458–63. https://doi.org/10.1073/pnas.0803610105.

Guilhaumon, F., D. Mouillot, and O. Gimenez. 2010. “mmSAR: An r-Package for Multimodel Species–Area Relationship Inference.” Journal Article. Ecography 33 (2): 420–24. https://doi.org/10.1111/j.1600-0587.2010.06304.x.

MacArthur, R. H, and E. O Wilson. 1967. The Theory of Island Biogeography. Book. Princeton: Princeton University Press.

Matthews, T. J. 2025. “An R package for fitting multi-habitat and countryside species–area relationship models.” In Prep. https://doi.org/10.21425/F5FBG49404.

Matthews, T. J., F. Guilhaumon, K. A. Triantis, M. K. Borregaard, and R. J. Whittaker. 2016. “On the Form of Species–Area Relationships in Habitat Islands and True Islands.” Journal Article. Global Ecology and Biogeography 25: 847–58. https://doi.org/10.1111/geb.12269.

Matthews, T. J., and F. Rigal. 2021. “Thresholds and the species–area relationship: a set of functions for fitting, evaluating and plotting a range of commonly used piecewise models in R.” Frontiers of Biogeography. https://doi.org/10.21425/F5FBG49404.

Matthews, T. J., K. A. Triantis, F. Rigal, M. K. Borregaard, F. Guilhaumon, and R. J. Whittaker. 2016. “Island Species–Area Relationships and Species Accumulation Curves Are Not Equivalent: An Analysis of Habitat Island Datasets.” Journal Article. Global Ecology and Biogeography 25 (5): 607–18. https://doi.org/10.1111/geb.12439.

Merckx, T., M. Dantas de Miranda, and H. M. Pereira. 2019. “Habitat amount, not patch size and isolation, drives species richness of macro-moth communities in countryside landscapes.” Journal of Biogeography 46: 956–67. https://doi.org/10.1111/jbi.13544.

Pereira, H. M., and G. C. Daily. 2006. “Modelling biodiversity dynamics in countryside landscapes.” Ecology 87: 1877–85. https://doi.org/10.1890/0012-9658(2006)87[1877:MBDICL]2.0.CO;2.

Proenca, V., and H. M. Pereira. 2013. “Species–area models to assess biodiversity change in multi-habitat landscapes: The importance of species habitat affinity.” Basic and Applied Ecology 14: 102–14. https://doi.org/10.1016/j.baae.2012.10.010.

Ripley, B. D. no date. “Selecting Amongst Large Classes of Models.” http://www.stats.ox.ac.uk/~ripley/Nelder80.pdf.

Rosenzweig, M. L. 1995. Species Diversity in Space and Time. Book. Cambridge: Cambridge University Press.

Tjørve, E. 2003. “Shapes and functions of species-area curves: a review of possible models.” Journal of Biogeography 30 (6): 827–35. http://doi.wiley.com/10.1046/j.1365-2699.2003.00877.x.

———. 2009. “Shapes and functions of species-area curves (II): a review of new models and parameterizations.” Journal of Biogeography 36 (8): 1435–45. https://doi.org/10.1111/j.1365-2699.2009.02101.x.

Triantis, K. A., F. Guilhaumon, and R. J. Whittaker. 2012. “The Island Species–Area Relationship: Biology and Statistics.” Journal Article. Journal of Biogeography 39 (2): 215–31. https://doi.org/10.1111/j.1365-2699.2011.02652.x.

Wang, Y., Y. Bao, M. Yu, G. Xu, and P. Ding. 2010. “Nestedness for Different Reasons: The Distributions of Birds, Lizards and Small Mammals on Islands of an Inundated Lake.” Journal Article. Diversity and Distributions 16 (5): 862–73. https://doi.org/10.1111/j.1472-4642.2010.00682.x.

Whittaker, R. J, K. A Triantis, and R. J Ladle. 2008. “A General Dynamic Theory of Oceanic Island Biogeography.” Journal Article. Journal of Biogeography 35 (6): 977–94.