Friday, August 28, 2020

ENMTools version 1.0.1 is on CRAN! Clamping, variable importance, and progress bars!

 

Enhancements

  • Added variable importance tests via interface with the vip package
  • Added clamping for the predict functions, including plots of where clamping is happening
  • Added clamping for model construction functions, with a TRUE/FALSE switch defaulting to TRUE
  • Changed naming conventions for predict functions so that the suitability raster is in the $suitability slot, just as with modeling functions
  • Added progress bars for a lot of tests
  • Added “verbose” option for a lot of functions, defaulting to FALSE

Bug fixes

  • Fixed interactive.plot generic and moved the function to its own file to make it easier to extend
  • Temporarily suppressing some warnings coming out of leaflet that are being produced by the recent rgdal changes
  • Fixed background sampling code to resample when necessary
  • Changed enmtools.ranger demo code to actually use ranger instead of rf
  • Fixed code for calculating p values for some of the hypothesis tests, the old code was getting wrong answers when there were repeated value

Thursday, August 27, 2020

How do deal with recalibration errors


Due to changes to the CalibratR package, some of the recalibration methods don't work properly on some systems.  This is due to the way CalibratR is addressing the parallel package, which Mac OS (and maybe others????) doesn't seem to like.  There is a workaround, though; just copy the following code and run it before you run enmtools.calibrate, and all should be well!


if (Sys.getenv("RSTUDIO") == "1" && !nzchar(Sys.getenv("RSTUDIO_TERM")) && Sys.info()["sysname"] == "Darwin" && getRversion() >= "4.0.0") {

    parallel:::setDefaultClusterOptions(setup_strategy = "sequential")

} 

Tuesday, August 11, 2020

Hacking together the Bohl et al. test

 A recent paper by Bohl et al. suggested a new method for testing statistical significance of ENM predictions.  It's similar to a test by Raes and ter Steege and existing tests in ENMTools, but as I understand it the difference is as follows:

Raes and ter Steege choose random points from the study area to build a data set equivalent to the size of the empirical data set, and compare the performance of the models on training data to the performance on random training data.  This is what you get in ENMTools if you set rts.reps > 0 and test.prop = 0.

ENMTools' implementation of the Raes and ter Steege test added the ability (via setting test.prop > 0) to split the randomly drawn spatial data into training and test subsets and compare your empirical model's ability to predict your empirical test data to the ability of random training data to predict random test data.

The Bohl et al. test compares the ability of your model to predict your empirical test data to the ability of randomly drawn training points to predict your empirical test data.  As such the data for the replicate models would be the same as in ENMTools for test.prop > 0, but the data the models are evaluated on would be test data from the empirical data set instead of test data that was randomly drawn from the study area.

At this point I would not venture to say which of these approaches is better, as I don't feel that I fully understand it myself.  They each reflect different null hypotheses, and so perhaps the answer to "which is better" is a question of which one reflects the null you're most interested in rejecting.  I think there's a lot more work to be done in this area, and I'm not sure there's going to be a one-size-fits-all answer.

All of that aside, at some point we need to implement the Bohl et al. test in ENMTools.  Until then, it's fairly easy to hack together as is.  You can use the existing rts.reps argument to generate the reps, and then just evaluate those models on your empirical test data.  Here's a quick and dirty example using some of the built-in data from ENMTools.


library(ENMTools)

library(dplyr)

library(ggplot2)


monticola.gam <- enmtools.gam(iberolacerta.clade$species$monticola,

                              euro.worldclim,

                              test.prop = 0.3,

                              rts.reps = 10)


test.pres <- monticola.gam$test.data

test.bg <- monticola.gam$analysis.df %>%

  filter(presence == 0) %>%

  select(Longitude, Latitude)


bohl.test <- function(thismodel){

  dismo::evaluate(test.pres, test.bg, thismodel, euro.worldclim)

}


null.dist <-sapply(monticola.gam$rts.test$rts.models, 

                   FUN = function(x) bohl.test(x$model)@auc)

null.dist <- c(monticola.gam$test.evaluation@auc, null.dist)

names(null.dist)[1] <- "empirical"


qplot(null.dist, geom = "histogram", fill = "density", alpha = 0.5) +

  geom_vline(xintercept = null.dist["empirical"], linetype = "longdash") +

  xlim(0,1) + guides(fill = FALSE, alpha = FALSE) + xlab("AUC") +

  ggtitle(paste("Model performance in geographic space on test data")) +

  theme(plot.title = element_text(hjust = 0.5))


Ta da!!!!