Friday, August 28, 2020

ENMTools version 1.0.1 is on CRAN! Clamping, variable importance, and progress bars!

 

Enhancements

  • Added variable importance tests via interface with the vip package
  • Added clamping for the predict functions, including plots of where clamping is happening
  • Added clamping for model construction functions, with a TRUE/FALSE switch defaulting to TRUE
  • Changed naming conventions for predict functions so that the suitability raster is in the $suitability slot, just as with modeling functions
  • Added progress bars for a lot of tests
  • Added “verbose” option for a lot of functions, defaulting to FALSE

Bug fixes

  • Fixed interactive.plot generic and moved the function to its own file to make it easier to extend
  • Temporarily suppressing some warnings coming out of leaflet that are being produced by the recent rgdal changes
  • Fixed background sampling code to resample when necessary
  • Changed enmtools.ranger demo code to actually use ranger instead of rf
  • Fixed code for calculating p values for some of the hypothesis tests, the old code was getting wrong answers when there were repeated value

Thursday, August 27, 2020

How do deal with recalibration errors


Due to changes to the CalibratR package, some of the recalibration methods don't work properly on some systems.  This is due to the way CalibratR is addressing the parallel package, which Mac OS (and maybe others????) doesn't seem to like.  There is a workaround, though; just copy the following code and run it before you run enmtools.calibrate, and all should be well!


if (Sys.getenv("RSTUDIO") == "1" && !nzchar(Sys.getenv("RSTUDIO_TERM")) && Sys.info()["sysname"] == "Darwin" && getRversion() >= "4.0.0") {

    parallel:::setDefaultClusterOptions(setup_strategy = "sequential")

} 

Tuesday, August 11, 2020

Hacking together the Bohl et al. test

 A recent paper by Bohl et al. suggested a new method for testing statistical significance of ENM predictions.  It's similar to a test by Raes and ter Steege and existing tests in ENMTools, but as I understand it the difference is as follows:

Raes and ter Steege choose random points from the study area to build a data set equivalent to the size of the empirical data set, and compare the performance of the models on training data to the performance on random training data.  This is what you get in ENMTools if you set rts.reps > 0 and test.prop = 0.

ENMTools' implementation of the Raes and ter Steege test added the ability (via setting test.prop > 0) to split the randomly drawn spatial data into training and test subsets and compare your empirical model's ability to predict your empirical test data to the ability of random training data to predict random test data.

The Bohl et al. test compares the ability of your model to predict your empirical test data to the ability of randomly drawn training points to predict your empirical test data.  As such the data for the replicate models would be the same as in ENMTools for test.prop > 0, but the data the models are evaluated on would be test data from the empirical data set instead of test data that was randomly drawn from the study area.

At this point I would not venture to say which of these approaches is better, as I don't feel that I fully understand it myself.  They each reflect different null hypotheses, and so perhaps the answer to "which is better" is a question of which one reflects the null you're most interested in rejecting.  I think there's a lot more work to be done in this area, and I'm not sure there's going to be a one-size-fits-all answer.

All of that aside, at some point we need to implement the Bohl et al. test in ENMTools.  Until then, it's fairly easy to hack together as is.  You can use the existing rts.reps argument to generate the reps, and then just evaluate those models on your empirical test data.  Here's a quick and dirty example using some of the built-in data from ENMTools.


library(ENMTools)

library(dplyr)

library(ggplot2)


monticola.gam <- enmtools.gam(iberolacerta.clade$species$monticola,

                              euro.worldclim,

                              test.prop = 0.3,

                              rts.reps = 10)


test.pres <- monticola.gam$test.data

test.bg <- monticola.gam$analysis.df %>%

  filter(presence == 0) %>%

  select(Longitude, Latitude)


bohl.test <- function(thismodel){

  dismo::evaluate(test.pres, test.bg, thismodel, euro.worldclim)

}


null.dist <-sapply(monticola.gam$rts.test$rts.models, 

                   FUN = function(x) bohl.test(x$model)@auc)

null.dist <- c(monticola.gam$test.evaluation@auc, null.dist)

names(null.dist)[1] <- "empirical"


qplot(null.dist, geom = "histogram", fill = "density", alpha = 0.5) +

  geom_vline(xintercept = null.dist["empirical"], linetype = "longdash") +

  xlim(0,1) + guides(fill = FALSE, alpha = FALSE) + xlab("AUC") +

  ggtitle(paste("Model performance in geographic space on test data")) +

  theme(plot.title = element_text(hjust = 0.5))


Ta da!!!!


Wednesday, July 29, 2020

install.packages("ENMTools") - ENMTools is now on CRAN!

Hey everybody! I'm happy to announce that we've finally put ENMTools onto CRAN!  It took a while, but everything seems to be working.  From now on you can install ENMTools just by typing:

install.packages("ENMTools")

After which, to get all of the dependencies, you should go ahead and do:

library(ENMTools)
install.extras()

After that, everything should work!  You might still need to put the maxent.jar file in the right place for dismo if you haven't done that already, but see the dismo maxent help file for that.

Tuesday, June 2, 2020

Introductory tutorials for the R version of ENMTools

Hey everybody!  I've started recording quick tutorials on the most important bits of ENMTools.  Here's one on how to install ENMTools and all of its dependencies:



And here's one on how to build ENMTools species objects and some quick models:





Sunday, May 24, 2020

Code snippet from Tyler Smith for fast plotting of ENMTools models

Over on the ENMTools GitHub page, Tyler Smith asked a question about plotting ENMTools models.  He pointed out that large models are very slow to plot, largely because of our use of ggplot2.  We like ggplot for this because it allows us to store plots in objects easily, and makes it possible for users to modify plots after the fact using all of the features of ggplot and the extensions people have written for it.  That said, it's probably frustrating to have a long draw time if you just want to take a quick peek at your model's predictions.  Tyler provided a code chunk that does a nice quick plot using base graphics.  The end result looks a lot like the standard ggplot plots we've been returning, but takes a fraction of the time to display.  Here's that code:

library(viridis) # to match ENMTools color palette

plotTWS <- ...="" function="" p="" x="">  plot(x$suitability, col = viridis(100, option = "B" ),
       xlab =  "Longitude", ylab = "Latitude",
       main = paste("Maxent model for", x$species.name),
       bty = 'l', box = FALSE)
  points(subset(x$analysis.df, presence == 1), pch = 21,
       bg = "white")
  points(x$test.data, pch = 21, bg = "green")
}

We're going to see if we can work out a quicker way to do our built-in plots using ggplot, but for now this is a nice workaround!

Friday, August 9, 2019

Calibration metrics added to ENMTools R package

Given our recent paper on how badly discrimination accuracy performs at selecting models that estimate ecological phenomena, the obvious next question is whether there are existing alternatives that do a better job.  The answer is an emphatic YES; calibration seems to be a much more useful method of selecting models for most purposes.

This is obviously not the first time that calibration has been recommended for species distribution models; Continuous Boyce Index (CBI) was developed for this purpose (Boyce et al. 2002, Hirzel et al. 2006), Phillips and Elith (2010) demonstrated the utility of POC plots for assessing and recalibrating models, and Jiménez-Valverde et al. (2013) argued convincingly that calibration was more useful for model selection than discrimination for many purposes.  However, discrimination accuracy still seems to be the primary method people use for evaluating species distribution models.

A forthcoming simulation study by myself, Marianna Simões, Russell Dinnage, Linda Beaumont, and John Baumgartner is demonstrating exactly how stark the difference between discrimination and calibration is when it comes to model selection; calibration metrics are largely performing fairly well, while discrimination accuracy is actively misleading for many aspects of model performance.  The differences we're seeing are sufficiently stark that I would go so far as to recommend against using discrimination accuracy for model selection for most practical purposes. 

We're writing this study up right now, and in the interests of moving things forward as quickly as possible we'll be submitting it to biorXiv ASAP - likely within the next week or two.  As part of that study, though, I've implemented a number of calibration metrics for ENMTools models, including expected calibration error (ECE), maximum calibration error (MCE), and CBI.  We did not implement Hosmer-Lemeshow largely because ECE is calculated in a very similar way as HL and can be used in statistical tests in much the same way, but scales more naturally (a perfect model has an ECE of 0).  In our simulation study we found that ECE was the best performing metric for model selection by far, so for now that's what I personally will be using.

Fairly soon we'll integrate the calibration metrics into all of the model construction functions (e.g., enmtools.glm, enmtools.maxent, etc.).  For now though, you can get calibration metrics and plots just by using the enmtools.calibrate function.



That will give you two different flavors of ECE and MCE as well as CBI and some diagnostic plots.  ECE and MCE calculations are done using the CalibratR package (Schwarz and Heider 2018), and CBI is calculated using functions from ecospat (DiCola et al. 2017).

Boyce, M.S., Vernier, P.R., Nielsen, S.E., and Schmiegelow, F.K.A. 2002. Evaluating resource selection functions. Ecological Modeling 157:281-300)
Hirzel, A.H., Le Lay, G., Helfer, V., Randon, C., and Guisan, A. 2006. Evaluating the ability of habitat suitability models to predict species presences. Ecological Modeling 199:142-152.