Monday, February 18, 2019

Low memory options for rangebreak tests are now available on develop branch

There are now options available for writing replicate models to disk for the rangebreak tests, instead of storing them in the output object.  This works in exactly the same way as it does for the identity and background tests, as outlined here:

http://enmtools.blogspot.com/2019/02/low-memory-usage-options-for.html

Again it's currently only on the develop branch, but we'll move it to main before too terribly long.

RWTY 1.0.2 now on CRAN

The newest version of RWTY, version 1.0.2 is now on CRAN.  This is a relatively minor release except for one significant bug fix: there was an issue causing one of the plotting functions to fail when it was called with a single MCMC chain (as opposed to multiple MCMC chains).  That's fixed now, and all is well.

The other major change (for the worse, IMO) is that we had to remove the "Plot Comparisons" vignette to get the package in under CRAN's size restrictions.  That's a really useful vignette, so it sucks to have to remove it.  There are just too many images to make it fit the size requirements, though, and since the entire point of that vignette is comparing those images it doesn't make sense to remove them.  You can still get the vignette installed with RWTY by installing from GitHub or just check it out here: http://danwarren.net/plot-comparisons.html

Wednesday, February 13, 2019

Low memory usage options for identity.test and background.test

This is something I've been meaning to do for a while, but just finally got around to because it was screwing someone's analysis up. 

Originally, the ENMTools R package was designed to store all replicate models in the output object for the identity and background tests.  While that's fine for low resolution or small extent studies, it got to be a real problem for people working with high-resolution data over larger geographic extents.

To deal with this, I've created options for background.test and identity.test that allow you to save the replicate models to .Rda files instead of storing them in the output object.  When called using this option, the replicate.models entries in the identity.test and background.test objects contain paths to the saved model files instead of containing the models themselves.

By default these functions just store models in the working directory, but you can specify a directory to save them to instead if you prefer. 

To run these tests using the low memory options, just pass the argument low.memory = TRUE.  If you want to pass it a directory to save to, just add rep.dir = "PATH", where PATH is your directory name.

Be warned that replicate models WILL be overwritten if they exist.  It's a good idea to make a separate directory for the reps from each analysis.

This new functionality is currently only implemented on the "develop" branch, but we'll move it over to the main branch soon.

Monday, January 21, 2019

Fun fact: you can run a whole bunch of models at once using ENMTools quite easily

The ENMTools R package contains a function called "species.from.file". This takes a .csv file and creates a list of species objects, one for each value in the "species" column in your .csv file. So you can do:

species.list <- species.from.file("myspecies.csv")

and you'll get back a list with ENMTools species in it. If you wanted to run a bunch of models using those species with the same settings, you could then do:

my.models <- lapply(species.list, function(x) enmtools.gam(x, climate.layers, test.prop = 0.3))

where "climate.layers" is your raster stack. That would create a list of ENMTools GAM objects, setting aside 30% of the data from each for testing.

Friday, October 5, 2018

Why add correlations for suitability scores?

Hey y'all!  After a conversation with some colleagues, I realized that I sort of added Spearman rank correlation as a measure of overlap to ENMTools without really explaining why.  Here is a quick and dirty sketch of my thinking on that.

Previous measures of overlap between suitability rasters that were implemented in ENMTools were based on measures of similarity between probability distributions.  That's fine as far as it goes, but my feeling is that it's more useful as a measure of species' potential spatial distributions than as a measure of similarity of the underlying model.   Here's a quick example:

# Perfect positive correlation
sp1 = seq(0.1, 1.0, 0.001)
sp2 = seq(0.1, 1.0, 0.001)


You can only see one line there because the two species are the same.  I wrote a function called olaps to make these plots and to calculate three metrics of similarity that are implemented in ENMTools.

olaps(sp1, sp2)

$D
[1] 1

$I
[1] 1

$cor
[1] 1

Okay, that's all well and good - perfectly positively correlated responses get a 1 from all metrics.  Now what if they're perfectly negatively correlated?

# Perfect negative correlation
sp1 = seq(0.1, 1.0, 0.001)
sp2 = seq(1.0, 0.1, 0.001)
olaps(sp1, sp2)
$D
[1] 0.590455

$I
[1] 0.8727405

$cor
[1] -1

What's going on here?  Spearman rank correlation tells us that they are indeed negatively correlated, but D and I both have somewhat high values!  The reason is that the values of the two functions are fairly similar across a fairly broad range of environments, even though the functions themselves are literally as different as they could possibly be.  Thinking about what this means in terms of species occurrence is quite informative; if the threshold for suitability for a species to occur is low (e.g., .0005 in this cartoon example), they might co-occur across a fairly broad range of environments; both species would find env values from 250 to 750 suitable and might therefore overlap across about 2/3 of their respective ranges.  That's despite them having completely opposite responses to that environmental gradient, strange though that may seem.

So do you want to measure the potential for your species to occupy the same environments, or do you want to measure the similarity in their estimated responses to those environments?  That's entirely down to what question you're trying to answer!

Okay, one more reason I kinda like correlations:

# Random
sp1 = abs(rnorm(1000))
sp2 = abs(rnorm(1000))


Here I've created a situation where we've got complete chaos; the relationship of both species to the environment is completely random.  Now let's measure overlaps:

olaps(sp1, sp2)

$D
[1] 0.5641885

$I
[1] 0.829914

$cor
[1] -0.04745993

Again we've got fairly high overlaps between species using D and I, but Spearman rank correlation is really close to zero.  That's exactly what we'd expect if there's no signal at all.  Of course the fact that species distributions and the environment are both spatially autocorrelated means that we'll likely have higher-than-zero (at least in absolute value) correlations even if there is no causal relationship between the environment and species distributions, but at least it's nice to know that we do have a clear expected value when chaos reigns.



Code for this is here:

https://gist.github.com/danlwarren/5509c6a45d4e534c3e0c0ecf1702bbdd

Tuesday, July 31, 2018

Predict functions for all model types

I've finally gotten around to adding predict() functions for all of the ENMTools model types.  You can now project your model onto a new time period or geographic extent, and it gives you back two things: a raster of the predicted suitability of habitats, and a threespace plot (see description here).  I've also added some more stuff under the hood that should catch some data formatting issues that were causing errors.

Wednesday, July 25, 2018

Minor fixes, new features

Hey all!  I've been bashing away at ENMTools for the past couple of days, just doing a bunch of bug fixes and adding some new features.  If you want to see everything you can go here, but I'll outline the highlights.

1. Added a "bg.source" argument to all modeling functions that allows you to specify whether you want to draw background from the background points or range raster stored in your enmtools.species object, or the geographic area outlined by your environment raster.  If you don't specify any bg.source argument it will prioritize them in the order it did previously: background points > range raster > environment layers.

2. Changed the raster.pca function to return the actual pca object along with the raster layers. 

3. Fixed a persistent error with ggplot2 plots from hypothesis tests excluding some values.  The fix for this isn't perfect yet (and I'm not entirely sure why), but in my experiments with it yesterday the issue is MUCH reduced.

4.  Added a plot to the output of enmtools.aoc that shows the averaged overlap values on each node in the phylogeny.  The old plots for the hypothesis tests are still there, but if you display the enmtools.aoc object those are now the second plot.  The first one is the tree, and it looks like this:



I've also added some plots I've been meaning to put in for a while.  These are called "three space plots", because they're meant to visualize the environment spaces representing the presence and background data from a model along with the environment space represented by a set of environment layers.

For these you just type threespace.plot(model, env), where your model is an enmtools.model object and your env is a set of raster layers.  That gives you something like this:



The goal here it to visualize how much of that environment space represents sets of conditions your model never saw when it was being built.  This is just a first pass and I do have more stuff planned in this direction, but I reckon this alone might be useful for some of you out there.