Sunday, August 11, 2024

Bug fix: plots for RTS-style tests

In the midst of doing some modeling work I discovered that the plots for the RTS-style tests were incorrect - the training AUC was used for the vertical dashed line instead of the metric that was actually used for the replicates.  The p-values reported by the test were still correct, as were the values in the reps table; it was only the plots that were affected.

This issue has now been fixed on GitHub, and the fix will be part of the next upload to CRAN.

Thursday, March 2, 2023

Big Changes in ENMTools 1.1.0!

ENMTools version 1.1.0 is now on the master branch on github.  We're working on getting ENMTools back on CRAN, but it has necessitated some big changes.  A lot of stuff has had to change under the hood to adjust to the fact that the raster package is being replaced by terra.  ENMTools depends very heavily on raster, so making this transition happen is quite a job.  A lot of other package authors are dealing with this right now as well, so the whole ecosystem is going to be a bit chaotic for the next little while.

We did create a branch on github that contains the last version of ENMTools that was designed for raster.  To install it just type:

devtools::install_github("danlwarren/ENMTools", ref = "raster")

Many of the changes we're making to deal with this are going to be invisible to users - just internal changes to the way data is handled.  In many cases it will speed up tests a bit, which is great.  However, a few of these changes will involve inevitable changes to the way that you set up your data for analysis in ENMTools.  Chief among these is that raster data will now need to be stored in terra's SpatRaster format and point data will be stored in terra's SpatVector format.  For the most part this is great - it allows us to allow these objects to have a CRS and projection attached to them, which is something we've been meaning to do for a while.  The one downside is that saving and loading these objects can be a bit of a hassle.

For that reason we're adding some new save and load functionality that should make this a little more easy to navigate.  The functions are "save.enmtools.species", "load.enmtools.species", "save.enmtools.clade", and "load.enmtools.clade".  

When you read in your own data from a .csv file, you'll want to convert it to a SpatVector object and specify a crs, like so:

my.data <- read.csv(path.to.file)

my.data <- terra::vect(my.data)

crs(my.data) <- "+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs" 

Where you replace "+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs" with the appropriate information for your data.  Loading in raster is much the same except you use terra::rast.

Unfortunately we've also had to disable a few functions.  The enmtools.hypervolume function is temporarily disabled, but we hope this won't be the case for long.  The authors of that package have adopted the terra framework on the version that's on github, but the version on CRAN isn't yet compatible.  We've also had to disable the GUESS recalibration model, because that package is spitting an error that we can't diagnose or work around.  The other recalibration methods still work, however.

Russell Dinnage and I have spent a lot of time hunting down bugs due to this changeover and planning even more big changes for ENMTools 2.0.  If you run into anything that doesn't work correctly, please do let us know - there's always the possibility that something has snuck by us.




Tuesday, November 15, 2022

ENMTools is temporarily off of CRAN

 ENMTools has been temporarily removed from CRAN due to an issue with another package.  The other package has now been fixed, but we have to re-upload ENMTools as a result.  We're going to take that opportunity to fix a couple of minor issues first.  For now, you can still install it from github using devtools.

Thursday, August 11, 2022

New preprint out: Incorporating sampling bias into permutation tests for niche and distribution models

I'm happy to announce that I have a new preprint out in collaboration with Jamie Kass, Ale Casadei-Ferreira, and Evan Economo.  I think this one is pretty important for anyone who's using ENMTools for hypothesis tests.

Using simulations, we demonstrate that hypothesis tests that use geographic randomization can be severely misled if the real occurrence data is subject to spatial sampling biases that are not included in the replicates used for constructing null distributions.  This reinforces a recent result by Osborne et al. showing that failing to include spatial autocorrelation in geographic randomization tests could have serious negative effects.  In both cases the result is an unacceptably high level of Type 1 error, leading users to conclude that test results were significant when they should not be.  In our case we showed that spatial sampling bias could be enough to generate statistically significant results on its own, even when the species being modeled was completely unaffected by the environmental predictors used for modeling.  This affects the ENMTools background test, the Raes and ter Steege-style (RTS) tests of model significance (and presumably the Bohl et al. 2019 test as well), and the bias assessment method demonstrated in our recent paper about Australian Pokemon (Warren et al. 2021).

To be completely frank this is one of those situations where we were in the middle of a study and were somewhat scooped - the excellent Osborne et al. study already demonstrates many of the same points we're making and was published right as we were getting our final simulation results back.  However, we think there are some important differences that make publishing this study still quite valuable.  

Most importantly, the Osborne et al. study includes all spatial autocorrelation in occurrence data into the pseudoreplicate data sets, while we focus on spatial sampling bias alone.  That means the Osborne et al. approach would be most useful in cases where it can be assumed that all spatial autocorrelation is a statistical artifact, while ours may be more useful in cases where you only want to correct for spatial autocorrelation due to spatial sampling bias while leaving the rest intact.  To the extent that spatial autocorrelation reflects individuals responding independently to a spatially autocorrelated environment, the spatial autocorrelation in the data set should arguably not be considered a statistical artifact as doing so would result in increased Type II error (erroneously failing to reject the null). Which one of these is a better idea will therefore vary by data set and requires a decent amount of thought (see Kühn and Dormann 2012 and Hawkins 2012 for a detailed discussion).  

However, our method requires a raster or set of points representing spatial sampling bias.  Any use of a sampling bias estimate comes with a large number of assumptions, and an inaccurate bias estimate will probably not fully correct for the inflated Type I error rate we demonstrate here (depending on exactly how wrong it is and exactly how it's wrong, if you get the distinction).  We discuss these issues in detail in the manuscript, and they are not trivial.

One thing is clear from both studies, however; using uniformly distributed points for pseudoreplicates makes a very strong assumption about spatial sampling bias and autocorrelation, and if those assumptions aren't met the result is an unacceptable rate of false positives.  I would highly recommend that people consider using one or the other of these methods (or both) for any geographic randomization tests in the future.

The good news from the ENMTools front is that this can already be done without modifying ENMTools.  You only need to provide the background points manually for the species objects, because the geographic randomization tests will sample from those automatically if they're available.  

I've had issues sampling in proportion to grid cell values using the functionality in the raster package before, so here's a quick function to get you started:


bias.sample <- function(bias.raster, npoints, replace = TRUE, biased = TRUE){
  
  # Convert the raster to a set of points
  bias.points <- data.frame(rasterToPoints(bias.raster))
  
  if(biased == TRUE){
    # Sampling is biased
    bias.points <- bias.points[sample(nrow(bias.points), 
                                      size = npoints, 
                                      prob = bias.points[,3],
                                      replace = replace),]
  } else {
    # Sampling is unbiased
    bias.points <- bias.points[sample(nrow(bias.points), 
                                      size = npoints, 
                                      replace = replace),]
  }
  
  return(bias.points[,1:2])
}

To perform an RTS-style test of model significance with a bias layer you'd just sample your background manually from that layer and stuff it into the background points in the species object.  Let's say you already have a species object and have a bias layer named "bias.layer".  To do an RTS test that incorporates spatial sampling bias, you'd just do this:

this.species$background.points <- bias.sample(bias.layer, npoints)
this.species <- check.species(this.species)
species.mx <- enmtools.maxent(this.species, env,
                             bg.source = "points",
                             rts.reps = 100)

Applying the correction to the background test is essentially the same, but for both species.  You'll want to crop your bias layer to the range of each species before sampling background for it, though, otherwise the background will be drawn from the entire area of the bias raster.  


Bohl, Corentin L., Jamie M. Kass, and Robert P. Anderson. 2019. “A New Null Model Approach to Quantify Performance and Significance for Ecological Niche Models of Species Distributions.” Journal of Biogeography 46 (6): 1101–11.

Hawkins, Bradford A. 2012. “Eight (and a Half) Deadly Sins of Spatial Analysis.” Journal of Biogeography 39 (1): 1–9.

Kühn, Ingolf, and Carsten F. Dormann. 2012. “Less than Eight (and a Half) Misconceptions of Spatial Analysis.” Journal of Biogeography 39 (5): 995–98.

Osborne, Owen G., Henry G. Fell, Hannah Atkins, Jan van Tol, Daniel Phillips, Leonel Herrera-Alsina, Poppy Mynard, et al. 2022. “Fauxcurrence: Simulating Multi-species Occurrences for Null Models in Species Distribution Modelling and Biogeography.” Ecography, April. https://doi.org/10.1111/ecog.05880.

Warren, Dan L., Alex Dornburg, Katerina Zapfe, and Teresa L. Iglesias. 2021. “The Effects of Climate Change on Australia’s Only Endemic Pokémon: Measuring Bias in Species Distribution Models.” Methods in Ecology and Evolution / British Ecological Society 12 (6): 985–95.

Warren, Dan L., Jamie M. Kass, Alexandre Cassadei-Ferreira, and Evan P. Economo.  2022 (preprint). Incorporating sampling bias into permutation tests for niche and distribution models.  biorXiv. https://doi.org/10.1101/2022.08.08.503252

Thursday, May 26, 2022

Youtube tutorial on implementing the bias test from Warren et al. 2021


I've just posted a Youtube tutorial on how to implement the bias test from last year's paper "The effects of climate change on Australia’s only endemic Pokémon: Measuring bias in species distribution models".  It's a really neat way to see what sort of methodological biases might exist in a given study design, and can even tell you where in geographic space to trust the predictions your models make!

https://www.youtube.com/watch?v=HtURN7HaSLg 


Code is here:


library(ENMTools)

present.files <- list.files("~/Dropbox/Ongoing Projects/Brazilian Ants and Bias/wc/",

                            pattern = ".gri", full.names = TRUE)

env.present <- stack(present.files)

future.files <- list.files("~/Dropbox/Ongoing Projects/Brazilian Ants and Bias/wc.future/",

                           pattern = ".gri", full.names = TRUE)

env.future <- stack(future.files)


adlerzi <- enmtools.species()

adlerzi$presence.points <- read.csv("~/Dropbox/Ongoing Projects/Brazilian Ants and Bias/Real Data Analyses/ant_spp/pt_data/Procryptocerus.adlerzi.csv")

adlerzi$species.name <- "Procryptocerus adlerzi"

adlerzi <- check.species(adlerzi)

adlerzi$range <- background.buffer(adlerzi$presence.points, 500000, mask = env.present[[1]])

adlerzi <- check.species(adlerzi)

interactive.plot(adlerzi)


raster.cor.plot(env.present)

# Based on that I'm going to go with bio1, bio12, and bio15


env.present <- env.present[[c("bio1", "bio12", "bio15")]]

env.future <- env.future[[c("bio1", "bio12", "bio15")]]

env.future

env.present


# Different resolutions so we need to resample

env.present <- resample(env.present, env.future)


adlerzi.gam <- enmtools.gam(adlerzi, env.present, test.prop = 0.2, rts.reps = 100)


adlerzi.future <- predict(adlerzi.gam, env.future)

plot(adlerzi.future$suitability.plot)

plot(adlerzi.gam)


predicted.change <- adlerzi.future$suitability - adlerzi.gam$suitability

plot(predicted.change)


bias.rasters <- lapply(adlerzi.gam$rts.test$rts.models,

                       function(x) predict(env.future, x$model, type = "response") - 

                         predict(env.present, x$model, type = "response"))


bias.stack <- stack(bias.rasters)

mean.pred <- calc(bias.stack, fun = mean)

plot(mean.pred)

sd.pred <- calc(bias.stack, fun = sd)

plot(sd.pred)


# Get upper and lower bounds of 95% CI

pred.95.upper <- calc(bias.stack, fun = function(x) quantile(x, probs = 0.975, na.rm= TRUE))

plot(pred.95.upper)

pred.95.lower <- calc(bias.stack, fun = function(x) quantile(x, probs = 0.025, na.rm= TRUE))

plot(pred.95.lower)


# Find which predictions fall within the CI of expected predictions under bias-only

plot(predicted.change > pred.95.lower & predicted.change < pred.95.upper)



Thursday, June 24, 2021

Problems with background test in ENMTools R package

At some point, I think fairly recently, something got messed up in the background tests of the ENMTools R package and it is building the models for the permutation tests using incorrect presence points.  I think I've found the issue and am working on a fix right now, but if you're using a recent version from CRAN or GitHub to do background tests I would not trust those results until a fix gets released.  I will do my damnedest to get that done this afternoon, Japan time.  Really sorry for any inconvenience this might have caused.


edit: This issue has now been fixed and the fixed version is live on CRAN.  If you've got any studies underway using the background test (similarity test), please reinstall ENMTools and run them again.

Sunday, May 16, 2021

Removing points from suitability plots

This is a question I got via email: how do you remove occurrence points from the suitability plots that are spit out by ENMTools modeling functions?

Luckily this is quite easy, as these are just ggplot objects.

library(ENMTools)

library(ggedit)


mont.mx <- enmtools.maxent(iberolacerta.clade$species$monticola,

                           euro.worldclim, test.prop = 0.3)


with.points <- plot(mont.mx)

with.points

without.points <- remove_geom(with.points, "point", 1:2)

without.points

For some reason ggedit's remove_geom function gets mad when you pass it two layer numbers (1:2), but it still works.  It doesn't work if you only pass it one number.  Go figure.