Species In Space: Calibration metrics added to ENMTools R package

Friday, August 9, 2019

Calibration metrics added to ENMTools R package

Given our recent paper on how badly discrimination accuracy performs at selecting models that estimate ecological phenomena, the obvious next question is whether there are existing alternatives that do a better job. The answer is an emphatic YES; calibration seems to be a much more useful method of selecting models for most purposes.

This is obviously not the first time that calibration has been recommended for species distribution models; Continuous Boyce Index (CBI) was developed for this purpose (Boyce et al. 2002, Hirzel et al. 2006), Phillips and Elith (2010) demonstrated the utility of POC plots for assessing and recalibrating models, and Jiménez-Valverde et al. (2013) argued convincingly that calibration was more useful for model selection than discrimination for many purposes. However, discrimination accuracy still seems to be the primary method people use for evaluating species distribution models.

A forthcoming simulation study by myself, Marianna Simões, Russell Dinnage, Linda Beaumont, and John Baumgartner is demonstrating exactly how stark the difference between discrimination and calibration is when it comes to model selection; calibration metrics are largely performing fairly well, while discrimination accuracy is actively misleading for many aspects of model performance. The differences we're seeing are sufficiently stark that I would go so far as to recommend against using discrimination accuracy for model selection for most practical purposes.

We're writing this study up right now, and in the interests of moving things forward as quickly as possible we'll be submitting it to biorXiv ASAP - likely within the next week or two. As part of that study, though, I've implemented a number of calibration metrics for ENMTools models, including expected calibration error (ECE), maximum calibration error (MCE), and CBI. We did not implement Hosmer-Lemeshow largely because ECE is calculated in a very similar way as HL and can be used in statistical tests in much the same way, but scales more naturally (a perfect model has an ECE of 0). In our simulation study we found that ECE was the best performing metric for model selection by far, so for now that's what I personally will be using.

Fairly soon we'll integrate the calibration metrics into all of the model construction functions (e.g., enmtools.glm, enmtools.maxent, etc.). For now though, you can get calibration metrics and plots just by using the enmtools.calibrate function.

That will give you two different flavors of ECE and MCE as well as CBI and some diagnostic plots. ECE and MCE calculations are done using the CalibratR package (Schwarz and Heider 2018), and CBI is calculated using functions from ecospat (DiCola et al. 2017).

Boyce, M.S., Vernier, P.R., Nielsen, S.E., and Schmiegelow, F.K.A. 2002. Evaluating resource selection functions. Ecological Modeling 157:281-300)

Di Cola, V., Broennimann, O., Petitpierre, B., Breiner, F. T., D’Amen, M., Randin, C., … Guisan, A. (2017). ecospat: an R package to support spatial analyses and modeling of species niches and distributions. Ecography, 40(6), 774–787.

Hirzel, A.H., Le Lay, G., Helfer, V., Randon, C., and Guisan, A. 2006. Evaluating the ability of habitat suitability models to predict species presences. Ecological Modeling 199:142-152.

Jiménez-Valverde, A., Acevedo, P., Márcia Barbosa, A., Lobo, J. M., & Real, R. (2013). Discrimination capacity in species distribution models depends on the representativeness of the environmental domain. Global Ecology and Biogeography. doi:10.1111/geb.12007

Phillips, S. J., & Elith, J. (2010). POC plots: calibrating species distribution models with presence-only data. Ecology, 91(8), 2476–2484.

Schwarz, J., & Heider, D. (2018). GUESS: Projecting Machine Learning Scores to Well-Calibrated Probability Estimates for Clinical Decision Making. Bioinformatics . doi:10.1093/bioinformatics/bty984

10 comments:

SeanZSeptember 22, 2019 at 6:33 PM
Hi Dan, this is all fantastic work and I cannot wait for the next paper to come by. I would like to ask, though, how does the selection criterias fit into all this? Are AICc or BIC still relevant in today's context of model selection (for example during the tuning of MaxEnt parameters)? Especially seeing how a couple of papers have recently raised a few issues with using AICc (Peterson et al., 2018; Velasco & González-Salazar, 2019).

TL:DR what are your thoughts on AICc?

Thank you!
ReplyDelete
Replies
Dan WarrenOctober 2, 2019 at 1:19 AM
Hi there! Sorry for the slow reply; I was on holiday.

My feelings about information criterion-based model selection for Maxent are a bit complicated, as we know that it's wrong at some level because of the disconnect between the number of parameters and the effective degrees of freedom, and yet simulation studies show that it can work better than Maxent's default behavior at selecting optimal model complexity. Obviously these concerns don't apply so much to GLM/GAM models though, so AICc in general I think is still pretty solid despite those concerns.

It's worth noting that many of the other studies in this area (including the Velasco & González-Salazar one, but I can't recall about the Peterson et al. one) are evaluating how well the selected models predict binary presence/absence, while Seifert and I were looking at how well selected models predicted the continuous suitability scores. The fact that we get different answers on the performance of AICc at model selection may be entirely due to how we're evaluating what a "good" model is based on our simulations; as my recent paper with Matzke and Iglesias reinforces, there may be little connection between a model that makes good distributions predictions and one that actually predicts the relative suitability of habitat well.
ReplyDelete
Replies
Israel Moreno-ContrerasDecember 16, 2020 at 11:53 PM
Hi Dan, very interesting post!
I have a couple questions.
The getMCE function of the CalibratR library asks for two arguments: actual (vector of observed class labels (0/1)) and predicted (vector of uncalibrated predictions).
How do I get the vector of observed class labels (0/1)?
How do I vector of uncalibrated predictions?

Best regards,
Israel
ReplyDelete
Replies
suzMarch 5, 2021 at 2:00 AM
This function is not yet available in standalone software of ENMTools i guess. Is it possible that enmtools in r package in future be able to use the outputs from maxent runs?
ReplyDelete
Replies
Dan WarrenMay 20, 2021 at 11:46 PM
This comment has been removed by the author.
ReplyDelete
Replies

Add comment

Friday, August 9, 2019

Calibration metrics added to ENMTools R package

10 comments:

Contributors