Tuesday, August 11, 2020

Hacking together the Bohl et al. test

 A recent paper by Bohl et al. suggested a new method for testing statistical significance of ENM predictions.  It's similar to a test by Raes and ter Steege and existing tests in ENMTools, but as I understand it the difference is as follows:

Raes and ter Steege choose random points from the study area to build a data set equivalent to the size of the empirical data set, and compare the performance of the models on training data to the performance on random training data.  This is what you get in ENMTools if you set rts.reps > 0 and test.prop = 0.

ENMTools' implementation of the Raes and ter Steege test added the ability (via setting test.prop > 0) to split the randomly drawn spatial data into training and test subsets and compare your empirical model's ability to predict your empirical test data to the ability of random training data to predict random test data.

The Bohl et al. test compares the ability of your model to predict your empirical test data to the ability of randomly drawn training points to predict your empirical test data.  As such the data for the replicate models would be the same as in ENMTools for test.prop > 0, but the data the models are evaluated on would be test data from the empirical data set instead of test data that was randomly drawn from the study area.

At this point I would not venture to say which of these approaches is better, as I don't feel that I fully understand it myself.  They each reflect different null hypotheses, and so perhaps the answer to "which is better" is a question of which one reflects the null you're most interested in rejecting.  I think there's a lot more work to be done in this area, and I'm not sure there's going to be a one-size-fits-all answer.

All of that aside, at some point we need to implement the Bohl et al. test in ENMTools.  Until then, it's fairly easy to hack together as is.  You can use the existing rts.reps argument to generate the reps, and then just evaluate those models on your empirical test data.  Here's a quick and dirty example using some of the built-in data from ENMTools.


library(ENMTools)

library(dplyr)

library(ggplot2)


monticola.gam <- enmtools.gam(iberolacerta.clade$species$monticola,

                              euro.worldclim,

                              test.prop = 0.3,

                              rts.reps = 10)


test.pres <- monticola.gam$test.data

test.bg <- monticola.gam$analysis.df %>%

  filter(presence == 0) %>%

  select(Longitude, Latitude)


bohl.test <- function(thismodel){

  dismo::evaluate(test.pres, test.bg, thismodel, euro.worldclim)

}


null.dist <-sapply(monticola.gam$rts.test$rts.models, 

                   FUN = function(x) bohl.test(x$model)@auc)

null.dist <- c(monticola.gam$test.evaluation@auc, null.dist)

names(null.dist)[1] <- "empirical"


qplot(null.dist, geom = "histogram", fill = "density", alpha = 0.5) +

  geom_vline(xintercept = null.dist["empirical"], linetype = "longdash") +

  xlim(0,1) + guides(fill = FALSE, alpha = FALSE) + xlab("AUC") +

  ggtitle(paste("Model performance in geographic space on test data")) +

  theme(plot.title = element_text(hjust = 0.5))


Ta da!!!!


1 comment: