Friday, August 9, 2019

Calibration metrics added to ENMTools R package

Given our recent paper on how badly discrimination accuracy performs at selecting models that estimate ecological phenomena, the obvious next question is whether there are existing alternatives that do a better job.  The answer is an emphatic YES; calibration seems to be a much more useful method of selecting models for most purposes.

This is obviously not the first time that calibration has been recommended for species distribution models; Continuous Boyce Index (CBI) was developed for this purpose (Boyce et al. 2002, Hirzel et al. 2006), Phillips and Elith (2010) demonstrated the utility of POC plots for assessing and recalibrating models, and Jiménez-Valverde et al. (2013) argued convincingly that calibration was more useful for model selection than discrimination for many purposes.  However, discrimination accuracy still seems to be the primary method people use for evaluating species distribution models.

A forthcoming simulation study by myself, Marianna Simões, Russell Dinnage, Linda Beaumont, and John Baumgartner is demonstrating exactly how stark the difference between discrimination and calibration is when it comes to model selection; calibration metrics are largely performing fairly well, while discrimination accuracy is actively misleading for many aspects of model performance.  The differences we're seeing are sufficiently stark that I would go so far as to recommend against using discrimination accuracy for model selection for most practical purposes. 

We're writing this study up right now, and in the interests of moving things forward as quickly as possible we'll be submitting it to biorXiv ASAP - likely within the next week or two.  As part of that study, though, I've implemented a number of calibration metrics for ENMTools models, including expected calibration error (ECE), maximum calibration error (MCE), and CBI.  We did not implement Hosmer-Lemeshow largely because ECE is calculated in a very similar way as HL and can be used in statistical tests in much the same way, but scales more naturally (a perfect model has an ECE of 0).  In our simulation study we found that ECE was the best performing metric for model selection by far, so for now that's what I personally will be using.

Fairly soon we'll integrate the calibration metrics into all of the model construction functions (e.g., enmtools.glm, enmtools.maxent, etc.).  For now though, you can get calibration metrics and plots just by using the enmtools.calibrate function.



That will give you two different flavors of ECE and MCE as well as CBI and some diagnostic plots.  ECE and MCE calculations are done using the CalibratR package (Schwarz and Heider 2018), and CBI is calculated using functions from ecospat (DiCola et al. 2017).

Boyce, M.S., Vernier, P.R., Nielsen, S.E., and Schmiegelow, F.K.A. 2002. Evaluating resource selection functions. Ecological Modeling 157:281-300)
Hirzel, A.H., Le Lay, G., Helfer, V., Randon, C., and Guisan, A. 2006. Evaluating the ability of habitat suitability models to predict species presences. Ecological Modeling 199:142-152.