Wednesday, February 13, 2019

Low memory usage options for identity.test and background.test

This is something I've been meaning to do for a while, but just finally got around to because it was screwing someone's analysis up. 

Originally, the ENMTools R package was designed to store all replicate models in the output object for the identity and background tests.  While that's fine for low resolution or small extent studies, it got to be a real problem for people working with high-resolution data over larger geographic extents.

To deal with this, I've created options for background.test and identity.test that allow you to save the replicate models to .Rda files instead of storing them in the output object.  When called using this option, the replicate.models entries in the identity.test and background.test objects contain paths to the saved model files instead of containing the models themselves.

By default these functions just store models in the working directory, but you can specify a directory to save them to instead if you prefer. 

To run these tests using the low memory options, just pass the argument low.memory = TRUE.  If you want to pass it a directory to save to, just add rep.dir = "PATH", where PATH is your directory name.

Be warned that replicate models WILL be overwritten if they exist.  It's a good idea to make a separate directory for the reps from each analysis.

This new functionality is currently only implemented on the "develop" branch, but we'll move it over to the main branch soon.

Monday, January 21, 2019

Fun fact: you can run a whole bunch of models at once using ENMTools quite easily

The ENMTools R package contains a function called "species.from.file". This takes a .csv file and creates a list of species objects, one for each value in the "species" column in your .csv file. So you can do:

species.list <- species.from.file("myspecies.csv")

and you'll get back a list with ENMTools species in it. If you wanted to run a bunch of models using those species with the same settings, you could then do:

my.models <- lapply(species.list, function(x) enmtools.gam(x, climate.layers, test.prop = 0.3))

where "climate.layers" is your raster stack. That would create a list of ENMTools GAM objects, setting aside 30% of the data from each for testing.

Friday, October 5, 2018

Why add correlations for suitability scores?

Hey y'all!  After a conversation with some colleagues, I realized that I sort of added Spearman rank correlation as a measure of overlap to ENMTools without really explaining why.  Here is a quick and dirty sketch of my thinking on that.

Previous measures of overlap between suitability rasters that were implemented in ENMTools were based on measures of similarity between probability distributions.  That's fine as far as it goes, but my feeling is that it's more useful as a measure of species' potential spatial distributions than as a measure of similarity of the underlying model.   Here's a quick example:

# Perfect positive correlation
sp1 = seq(0.1, 1.0, 0.001)
sp2 = seq(0.1, 1.0, 0.001)

You can only see one line there because the two species are the same.  I wrote a function called olaps to make these plots and to calculate three metrics of similarity that are implemented in ENMTools.

olaps(sp1, sp2)

[1] 1

[1] 1

[1] 1

Okay, that's all well and good - perfectly positively correlated responses get a 1 from all metrics.  Now what if they're perfectly negatively correlated?

# Perfect negative correlation
sp1 = seq(0.1, 1.0, 0.001)
sp2 = seq(1.0, 0.1, 0.001)
olaps(sp1, sp2)
[1] 0.590455

[1] 0.8727405

[1] -1

What's going on here?  Spearman rank correlation tells us that they are indeed negatively correlated, but D and I both have somewhat high values!  The reason is that the values of the two functions are fairly similar across a fairly broad range of environments, even though the functions themselves are literally as different as they could possibly be.  Thinking about what this means in terms of species occurrence is quite informative; if the threshold for suitability for a species to occur is low (e.g., .0005 in this cartoon example), they might co-occur across a fairly broad range of environments; both species would find env values from 250 to 750 suitable and might therefore overlap across about 2/3 of their respective ranges.  That's despite them having completely opposite responses to that environmental gradient, strange though that may seem.

So do you want to measure the potential for your species to occupy the same environments, or do you want to measure the similarity in their estimated responses to those environments?  That's entirely down to what question you're trying to answer!

Okay, one more reason I kinda like correlations:

# Random
sp1 = abs(rnorm(1000))
sp2 = abs(rnorm(1000))

Here I've created a situation where we've got complete chaos; the relationship of both species to the environment is completely random.  Now let's measure overlaps:

olaps(sp1, sp2)

[1] 0.5641885

[1] 0.829914

[1] -0.04745993

Again we've got fairly high overlaps between species using D and I, but Spearman rank correlation is really close to zero.  That's exactly what we'd expect if there's no signal at all.  Of course the fact that species distributions and the environment are both spatially autocorrelated means that we'll likely have higher-than-zero (at least in absolute value) correlations even if there is no causal relationship between the environment and species distributions, but at least it's nice to know that we do have a clear expected value when chaos reigns.

Code for this is here:

Tuesday, July 31, 2018

Predict functions for all model types

I've finally gotten around to adding predict() functions for all of the ENMTools model types.  You can now project your model onto a new time period or geographic extent, and it gives you back two things: a raster of the predicted suitability of habitats, and a threespace plot (see description here).  I've also added some more stuff under the hood that should catch some data formatting issues that were causing errors.

Wednesday, July 25, 2018

Minor fixes, new features

Hey all!  I've been bashing away at ENMTools for the past couple of days, just doing a bunch of bug fixes and adding some new features.  If you want to see everything you can go here, but I'll outline the highlights.

1. Added a "bg.source" argument to all modeling functions that allows you to specify whether you want to draw background from the background points or range raster stored in your enmtools.species object, or the geographic area outlined by your environment raster.  If you don't specify any bg.source argument it will prioritize them in the order it did previously: background points > range raster > environment layers.

2. Changed the raster.pca function to return the actual pca object along with the raster layers. 

3. Fixed a persistent error with ggplot2 plots from hypothesis tests excluding some values.  The fix for this isn't perfect yet (and I'm not entirely sure why), but in my experiments with it yesterday the issue is MUCH reduced.

4.  Added a plot to the output of enmtools.aoc that shows the averaged overlap values on each node in the phylogeny.  The old plots for the hypothesis tests are still there, but if you display the enmtools.aoc object those are now the second plot.  The first one is the tree, and it looks like this:

I've also added some plots I've been meaning to put in for a while.  These are called "three space plots", because they're meant to visualize the environment spaces representing the presence and background data from a model along with the environment space represented by a set of environment layers.

For these you just type threespace.plot(model, env), where your model is an enmtools.model object and your env is a set of raster layers.  That gives you something like this:

The goal here it to visualize how much of that environment space represents sets of conditions your model never saw when it was being built.  This is just a first pass and I do have more stuff planned in this direction, but I reckon this alone might be useful for some of you out there.

Monday, June 4, 2018

More installation issues: "descendants" not exported from phyloclim, ecospat not found

Well the fun just keeps rolling in with the newest R update and ensuing package updates.  Two new issues have just been brought to my attention that prevent new installations of ENMTools: first, the newest version of phyloclim no longer exports the "descendants" function, which ENMTools uses.  There's a hot fix for that on the develop branch of ENMTools on Github, where I basically just copied the code over from phyloclim for the time being so it no longer has to be imported.

The second issue is a bit more of a hassle: ecospat has been removed from CRAN, and I'm not sure why.  Given that ENMTools requires ecospat to run, this is a bit of an issue.  For now, all you can really do is install ecospat manually from the Github mirror of the CRAN repository, and then install ENMTools after that.  Hopefully ecospat will get updated and re-uploaded to CRAN soon, but failing that we might need to find a more permanent workaround eventually.

Thursday, May 17, 2018

Issues with installing ENMTools under newest versions of R

During the course I co-taught with Matt Fitzpatrick in Glasgow a few weeks ago, it came out that a number of people were having trouble installing ENMTools.  A lot of this seems to stem from the newest version of R, and in particular how it interacts with your Java installation.

Some Mac users (including me) were able to fix this just by going to the console and typing "R CMD javareconf" and restarting R.

Windows users seemed to have much more significant issues, however.  There seemed to be several issues, but here's a solution that helped several of the students, courtesy of Hirzi Luqman:

* installing *source* package ‘ENMTools’ ...
** R
** data
*** moving datasets to lazyload DB
** tests
** byte-compile and prepare package for lazy loading
Error : .onLoad failed in loadNamespace() for 'rJava', details:
  call: dyn.load(file, DLLpath = DLLpath, ...)
  error: unable to load shared object '/Library/Frameworks/R.framework/Versions/3.5/Resources/library/rJava/libs/':
dlopen(/Library/Frameworks/R.framework/Versions/3.5/Resources/library/rJava/libs/, 6): Library not loaded: /Library/Java/JavaVirtualMachines/jdk-9.jdk/Contents/Home/lib/server/libjvm.dylib
  Referenced from: /Library/Frameworks/R.framework/Versions/3.5/Resources/library/rJava/libs/
  Reason: image not found
ERROR: lazy loading failed for package ‘ENMTools’
* removing ‘/Library/Frameworks/R.framework/Versions/3.5/Resources/library/ENMTools’
Installation failed: Command failed (1)

ENMtools/rJava was looking for (and couldn't find) the library "libjvm.dylib" in the directory "/Library/Java/JavaVirtualMachines/jdk-9.jdk/Contents/Home/lib/server/", because no such directory existed (this refers to a Java 9 installation); I don't know why it looked for the file there...

Solution: I just copied the same file from the Java 8 directory (or whatever Java version is installed on the computer; in my case the directory was "/Library/Java/JavaVirtualMachines/jdk1.8.0_45.jdk/Contents/Home/lib/server/") to the R library directory "/Library/Frameworks/R.framework/Versions/3.5/Resources/lib/", and it worked

Update: GitHub user Sandy had a different issue, detailed here.  The fix in that case was:

I removed all PATH related to "java", then Add jvm.dll to PATH (adding "%JAVA_HOME%\jre\bin\server;" to PATH).
After that, I set up the JAVA_HOME in R:
Sys.setenv(JAVA_HOME="C:\\Program Files\\Java\\jdk1.8.0_171\\").