Species In Space: August 2020

Friday, August 28, 2020

ENMTools version 1.0.1 is on CRAN! Clamping, variable importance, and progress bars!

Enhancements

Added variable importance tests via interface with the vip package
Added clamping for the predict functions, including plots of where clamping is happening
Added clamping for model construction functions, with a TRUE/FALSE switch defaulting to TRUE
Changed naming conventions for predict functions so that the suitability raster is in the $suitability slot, just as with modeling functions
Added progress bars for a lot of tests
Added “verbose” option for a lot of functions, defaulting to FALSE

Bug fixes

Fixed interactive.plot generic and moved the function to its own file to make it easier to extend
Temporarily suppressing some warnings coming out of leaflet that are being produced by the recent rgdal changes
Fixed background sampling code to resample when necessary
Changed enmtools.ranger demo code to actually use ranger instead of rf
Fixed code for calculating p values for some of the hypothesis tests, the old code was getting wrong answers when there were repeated value

Thursday, August 27, 2020

How do deal with recalibration errors

Due to changes to the CalibratR package, some of the recalibration methods don't work properly on some systems. This is due to the way CalibratR is addressing the parallel package, which Mac OS (and maybe others????) doesn't seem to like. There is a workaround, though; just copy the following code and run it before you run enmtools.calibrate, and all should be well!

if (Sys.getenv("RSTUDIO") == "1" && !nzchar(Sys.getenv("RSTUDIO_TERM")) && Sys.info()["sysname"] == "Darwin" && getRversion() >= "4.0.0") {

parallel:::setDefaultClusterOptions(setup_strategy = "sequential")

}

Tuesday, August 11, 2020

Hacking together the Bohl et al. test

A recent paper by Bohl et al. suggested a new method for testing statistical significance of ENM predictions. It's similar to a test by Raes and ter Steege and existing tests in ENMTools, but as I understand it the difference is as follows:

Raes and ter Steege choose random points from the study area to build a data set equivalent to the size of the empirical data set, and compare the performance of the models on training data to the performance on random training data. This is what you get in ENMTools if you set rts.reps > 0 and test.prop = 0.

ENMTools' implementation of the Raes and ter Steege test added the ability (via setting test.prop > 0) to split the randomly drawn spatial data into training and test subsets and compare your empirical model's ability to predict your empirical test data to the ability of random training data to predict random test data.

The Bohl et al. test compares the ability of your model to predict your empirical test data to the ability of randomly drawn training points to predict your empirical test data. As such the data for the replicate models would be the same as in ENMTools for test.prop > 0, but the data the models are evaluated on would be test data from the empirical data set instead of test data that was randomly drawn from the study area.

At this point I would not venture to say which of these approaches is better, as I don't feel that I fully understand it myself. They each reflect different null hypotheses, and so perhaps the answer to "which is better" is a question of which one reflects the null you're most interested in rejecting. I think there's a lot more work to be done in this area, and I'm not sure there's going to be a one-size-fits-all answer.

All of that aside, at some point we need to implement the Bohl et al. test in ENMTools. Until then, it's fairly easy to hack together as is. You can use the existing rts.reps argument to generate the reps, and then just evaluate those models on your empirical test data. Here's a quick and dirty example using some of the built-in data from ENMTools.

library(ENMTools)
library(dplyr)
library(ggplot2)

monticola.gam <- enmtools.gam(iberolacerta.clade$species$monticola,
euro.worldclim,
test.prop = 0.3,
rts.reps = 10)

test.pres <- monticola.gam$test.data
test.bg <- monticola.gam$analysis.df %>%
filter(presence == 0) %>%
select(Longitude, Latitude)

bohl.test <- function(thismodel){
dismo::evaluate(test.pres, test.bg, thismodel, euro.worldclim)
}

null.dist <-sapply(monticola.gam$rts.test$rts.models,
FUN = function(x) bohl.test(x$model)@auc)
null.dist <- c(monticola.gam$test.evaluation@auc, null.dist)
names(null.dist)[1] <- "empirical"

qplot(null.dist, geom = "histogram", fill = "density", alpha = 0.5) +
geom_vline(xintercept = null.dist["empirical"], linetype = "longdash") +
xlim(0,1) + guides(fill = FALSE, alpha = FALSE) + xlab("AUC") +
ggtitle(paste("Model performance in geographic space on test data")) +
theme(plot.title = element_text(hjust = 0.5))

Ta da!!!!