Friday, October 5, 2018

Why add correlations for suitability scores?

Hey y'all!  After a conversation with some colleagues, I realized that I sort of added Spearman rank correlation as a measure of overlap to ENMTools without really explaining why.  Here is a quick and dirty sketch of my thinking on that.

Previous measures of overlap between suitability rasters that were implemented in ENMTools were based on measures of similarity between probability distributions.  That's fine as far as it goes, but my feeling is that it's more useful as a measure of species' potential spatial distributions than as a measure of similarity of the underlying model.   Here's a quick example:

# Perfect positive correlation
sp1 = seq(0.1, 1.0, 0.001)
sp2 = seq(0.1, 1.0, 0.001)


You can only see one line there because the two species are the same.  I wrote a function called olaps to make these plots and to calculate three metrics of similarity that are implemented in ENMTools.

olaps(sp1, sp2)

$D
[1] 1

$I
[1] 1

$cor
[1] 1

Okay, that's all well and good - perfectly positively correlated responses get a 1 from all metrics.  Now what if they're perfectly negatively correlated?

# Perfect negative correlation
sp1 = seq(0.1, 1.0, 0.001)
sp2 = seq(1.0, 0.1, 0.001)
olaps(sp1, sp2)
$D
[1] 0.590455

$I
[1] 0.8727405

$cor
[1] -1

What's going on here?  Spearman rank correlation tells us that they are indeed negatively correlated, but D and I both have somewhat high values!  The reason is that the values of the two functions are fairly similar across a fairly broad range of environments, even though the functions themselves are literally as different as they could possibly be.  Thinking about what this means in terms of species occurrence is quite informative; if the threshold for suitability for a species to occur is low (e.g., .0005 in this cartoon example), they might co-occur across a fairly broad range of environments; both species would find env values from 250 to 750 suitable and might therefore overlap across about 2/3 of their respective ranges.  That's despite them having completely opposite responses to that environmental gradient, strange though that may seem.

So do you want to measure the potential for your species to occupy the same environments, or do you want to measure the similarity in their estimated responses to those environments?  That's entirely down to what question you're trying to answer!

Okay, one more reason I kinda like correlations:

# Random
sp1 = abs(rnorm(1000))
sp2 = abs(rnorm(1000))


Here I've created a situation where we've got complete chaos; the relationship of both species to the environment is completely random.  Now let's measure overlaps:

olaps(sp1, sp2)

$D
[1] 0.5641885

$I
[1] 0.829914

$cor
[1] -0.04745993

Again we've got fairly high overlaps between species using D and I, but Spearman rank correlation is really close to zero.  That's exactly what we'd expect if there's no signal at all.  Of course the fact that species distributions and the environment are both spatially autocorrelated means that we'll likely have higher-than-zero (at least in absolute value) correlations even if there is no causal relationship between the environment and species distributions, but at least it's nice to know that we do have a clear expected value when chaos reigns.



Code for this is here:

https://gist.github.com/danlwarren/5509c6a45d4e534c3e0c0ecf1702bbdd

Tuesday, July 31, 2018

Predict functions for all model types

I've finally gotten around to adding predict() functions for all of the ENMTools model types.  You can now project your model onto a new time period or geographic extent, and it gives you back two things: a raster of the predicted suitability of habitats, and a threespace plot (see description here).  I've also added some more stuff under the hood that should catch some data formatting issues that were causing errors.

Wednesday, July 25, 2018

Minor fixes, new features

Hey all!  I've been bashing away at ENMTools for the past couple of days, just doing a bunch of bug fixes and adding some new features.  If you want to see everything you can go here, but I'll outline the highlights.

1. Added a "bg.source" argument to all modeling functions that allows you to specify whether you want to draw background from the background points or range raster stored in your enmtools.species object, or the geographic area outlined by your environment raster.  If you don't specify any bg.source argument it will prioritize them in the order it did previously: background points > range raster > environment layers.

2. Changed the raster.pca function to return the actual pca object along with the raster layers. 

3. Fixed a persistent error with ggplot2 plots from hypothesis tests excluding some values.  The fix for this isn't perfect yet (and I'm not entirely sure why), but in my experiments with it yesterday the issue is MUCH reduced.

4.  Added a plot to the output of enmtools.aoc that shows the averaged overlap values on each node in the phylogeny.  The old plots for the hypothesis tests are still there, but if you display the enmtools.aoc object those are now the second plot.  The first one is the tree, and it looks like this:



I've also added some plots I've been meaning to put in for a while.  These are called "three space plots", because they're meant to visualize the environment spaces representing the presence and background data from a model along with the environment space represented by a set of environment layers.

For these you just type threespace.plot(model, env), where your model is an enmtools.model object and your env is a set of raster layers.  That gives you something like this:



The goal here it to visualize how much of that environment space represents sets of conditions your model never saw when it was being built.  This is just a first pass and I do have more stuff planned in this direction, but I reckon this alone might be useful for some of you out there.

Monday, June 4, 2018

More installation issues: "descendants" not exported from phyloclim, ecospat not found

Well the fun just keeps rolling in with the newest R update and ensuing package updates.  Two new issues have just been brought to my attention that prevent new installations of ENMTools: first, the newest version of phyloclim no longer exports the "descendants" function, which ENMTools uses.  There's a hot fix for that on the develop branch of ENMTools on Github, where I basically just copied the code over from phyloclim for the time being so it no longer has to be imported.

The second issue is a bit more of a hassle: ecospat has been removed from CRAN, and I'm not sure why.  Given that ENMTools requires ecospat to run, this is a bit of an issue.  For now, all you can really do is install ecospat manually from the Github mirror of the CRAN repository, and then install ENMTools after that.  Hopefully ecospat will get updated and re-uploaded to CRAN soon, but failing that we might need to find a more permanent workaround eventually.

Thursday, May 17, 2018

Issues with installing ENMTools under newest versions of R

During the course I co-taught with Matt Fitzpatrick in Glasgow a few weeks ago, it came out that a number of people were having trouble installing ENMTools.  A lot of this seems to stem from the newest version of R, and in particular how it interacts with your Java installation.

Some Mac users (including me) were able to fix this just by going to the console and typing "R CMD javareconf" and restarting R.

Windows users seemed to have much more significant issues, however.  There seemed to be several issues, but here's a solution that helped several of the students, courtesy of Hirzi Luqman:

Error:
* installing *source* package ‘ENMTools’ ...
** R
** data
*** moving datasets to lazyload DB
** tests
** byte-compile and prepare package for lazy loading
Error : .onLoad failed in loadNamespace() for 'rJava', details:
  call: dyn.load(file, DLLpath = DLLpath, ...)
  error: unable to load shared object '/Library/Frameworks/R.framework/Versions/3.5/Resources/library/rJava/libs/rJava.so':
dlopen(/Library/Frameworks/R.framework/Versions/3.5/Resources/library/rJava/libs/rJava.so, 6): Library not loaded: /Library/Java/JavaVirtualMachines/jdk-9.jdk/Contents/Home/lib/server/libjvm.dylib
  Referenced from: /Library/Frameworks/R.framework/Versions/3.5/Resources/library/rJava/libs/rJava.so
  Reason: image not found
ERROR: lazy loading failed for package ‘ENMTools’
* removing ‘/Library/Frameworks/R.framework/Versions/3.5/Resources/library/ENMTools’
Installation failed: Command failed (1)


Problem:
ENMtools/rJava was looking for (and couldn't find) the library "libjvm.dylib" in the directory "/Library/Java/JavaVirtualMachines/jdk-9.jdk/Contents/Home/lib/server/", because no such directory existed (this refers to a Java 9 installation); I don't know why it looked for the file there...

Solution: I just copied the same file from the Java 8 directory (or whatever Java version is installed on the computer; in my case the directory was "/Library/Java/JavaVirtualMachines/jdk1.8.0_45.jdk/Contents/Home/lib/server/") to the R library directory "/Library/Frameworks/R.framework/Versions/3.5/Resources/lib/", and it worked


Update: GitHub user Sandy had a different issue, detailed here.  The fix in that case was:

I removed all PATH related to "java", then Add jvm.dll to PATH (adding "%JAVA_HOME%\jre\bin\server;" to PATH).
After that, I set up the JAVA_HOME in R:
Sys.setenv(JAVA_HOME="C:\\Program Files\\Java\\jdk1.8.0_171\\").

Thursday, April 5, 2018

Course Reminder: Quantitative geographic ecology using R. Glasgow, April 30 to May 4.

Hey everybody!  There are still open slots in my upcoming course with Matt Fitzpatrick entitled "Quantitative geographic ecology using R: modelling genomes, niches, and communities".  Sign up now, it's coming up fast!

The course will include introductory lectures, guided computer coding in R, and exercises for the participants, with an emphasis on visualization and reproducible workflows. All modelling and data manipulation will be performed with R. Attendees will learn to use niche modelling algorithms including Maxent, GLM, GAM, and others, and will learn both new and existing methods for conducting comparative studies using ENMs in the new ENMTools R package.  Generalized Dissimilarity Modelling (GDM) and Gradient Forest (GF) will be taught for modelling genomic and community-level data. The course is intended for intermediate R users with interest in quantitative geographical ecology.

https://www.prstatistics.com/course/quantitative-geographic-ecology-using-r-modelling-genomes-niches-and-communities-qger01/

Thursday, March 22, 2018

BITC presentation on the past, present, and future of ENMTools

In case you missed it, I gave a talk for the Biodiversity Informatics Training Curriculum the other day where I talked about the history of ENMTools, what's happening now, and where it's going in the near future.  It's a whirlwind tour, but for those of you who are curious about what's coming this is a good overview. 

Thanks to Town for having me on, and I apologize to y'all for the technical problems.  Everything worked right up until the very second we started recording.