Tuesday, November 3, 2015

Hey what's the deal with the ENMTools R package?

It has come to my attention that at least one person is actually using the ENMTools R package I sorta half-made a couple of years ago, for which I would like to express my deepest condolences.

Seriously, though, I did want to at least acknowledge its existence and the absolutely massive caveats that should come with any attempt to use it in its current state.  

The package exists because I needed a project in order to learn R; I've found that reading a book and doing examples is one thing, but to really assimilate a new language I need to have a project that makes me sit down and work on it every day.  When I started my postdoc at ANU a few years ago, I said to myself "I am going to do everything in R from this day forward, and in order to learn R I will rewrite as much of ENMTools as I need to to feel like I've mastered it".  

So that's what I did.  I wrote bits to generate reps for most of the major tests in ENMTools, including the background, identity, and rangebreak tests.  I also wrote code to measure breadth and overlap using the metrics in ENMTools, and a couple of other little utility functions.  That helped me get comfortable with the basics in R, and at that point I got busy enough with my actual postdoc work that I had to drop it.  

And that's pretty much where it stands today, a couple of years later.  It mostly works, but it ain't exactly pretty or well documented - it was my first R project, after all.  While some of its functionality has already been duplicated elsewhere (e.g., the identity and background tests in phylocom), some of it hasn't (e.g., the rangebreak tests).  Now that I've been writing R pretty much daily for the past three years, I see a million things I did sub-optimally, and a bunch of areas where I could have taken advantage of existing functionality to do things more quickly, more cleanly, and with a lot more cool bells and whistles.

So why do I bring this up?  First, as I mentioned, because apparently some people are actually using it.  I'm not sure whether that's due to masochism or desperation, but they are.  Second, and more importantly, because I'm going to try to bash it into a somewhat more useful form over the next however-long.  It's probably not going to duplicate all of the functionality of the original ENMTools, but the eventual goal is to include a lot of very cool stuff that the old version didn't have.  If you want to contribute or are brave enough to muck around with it in its current state, it's here:

Wednesday, October 28, 2015

Handy little snippet of R code for thinning occurrence data

I came up with this a few months back.  I was using the R package spThin, by Aiello-Lammens et al, but found that it didn't quite do what I wanted it to do.  The purpose of that package is to return the maximum number of records for a given thinning distance, which is obviously very valuable in situations where you (1) don't have a ton of data and (2) are concerned about spatial autocorrelation.

However, it wasn't quite what I needed for two reasons.  First, I didn't need to maximize the number of points given a specified thinning distance; I needed to grab a fixed number of points that did the best possible job of spanning the variation (spatial or otherwise) in my initial data set.  Second, the spThin algorithm, because it's trying to optimize sample size, can take a very long time to run for larger data sets.

Here's the algorithm I fudged together:

1. Pick a single random point from your input data set X and move it to your output set Y.

Then, while the number of points in Y is less than the number you want:
2. Calculate the distance between the points in X and the points in Y.
3. Pick the point from X that has the highest minimum distance to points in Y (i.e., is the furthest away from any of the points you've already decided to keep).

Lather, rinse, repeat.  Stop when you've got the number of points you want.

Just so you can get an idea of what's going on, here's some data on Leucadendron discolor from a project I'm working on with Haris Saslis-Lagoudakis:

That's a lot of points!  4,874, to be exact.  Now let's run it through thin.max and ask for 100 output points:

Lovely!  We've got a pretty good approximation of the spatial coverage of the initial data set, but with a lot fewer points.  What's extra-nice about this approach is that you can use it for pretty much any set of variables that you can use to calculate Euclidean distances.  That means you can thin data based on geographic distance, environmental distance, or a combination of both!  Just pass it the right column names, and it should work.  Of course since it's Euclidean distances you might run into some scaling issues if you try to use a combination of geographic and environmental variables, but you could probably fix that by rescaling axes before selecting points in some useful way.

Also, since it starts from a randomly chosen point, you will get somewhat different solutions each time you run it.  Could be useful for some sort of spatially-overdispersed jackknife if you want to do something like that for some reason.

There's no R package as such for this, but it will probably be folded into the R version of ENMTools when I get around to working on that again.  For now, you can get the code here:


Tuesday, September 29, 2015

Big Changes Coming!

Hey all,

Increasingly, I've been thinking that this blog would be a good place for some more general-purpose biogeography talk, and open to more people than just me and Rich.  I've talked to the authors of the old Species in Space blog that I contributed to, and we've decided we're going to move all of that content (and people) over here.

Some time in the next few weeks, you will see a bunch of posts from Species In Space show up here, and soon after that you will start seeing new content from some very cool people doing very cool biogeography things.  If any of you out there are looking for a place to blog some biogeography stuff, give me a shout!

In order to fit the new focus, we'll also gradually rebrand this to be the Species in Space blog, rather than strictly ENMTools.  That'll mean a new look, a new logo, and pointing the Species in Space URL over here.  No worries, though, I'll keep the old enmtools.com URL pointed here too, in case people have bookmarks to that.

I'm busy as can be at the moment, so this is all going to happen in fits and starts.  I think it will be pretty cool when we get it together, though, and hopefully you'll all enjoy it.


Tuesday, August 4, 2015

Tutorials on Maxent and ENMTools

A few months ago, I gave a workshop at James Cook University in Townsville, Australia.  We covered some basic concepts for niche modeling and ecological biogeography, and then ran through some sample exercises in Maxent, ENMTools, and ecospat.

First, here's a tutorial on basic ideas and procedures for niche modeling:

Then a very brief foray into the different assumptions that need to be made when using niche models for different purposes in ecological biogeography:

The main event was a two hour demo of ENMTools.  Unfortunately we had a serious issue with the recording: there was no audio at all!  I'll try to replicate the talk on my own and upload it one of these days.

For those of you who want to play along at home, the demo data is here:


I really want to thank JCU for having me out, and in particular Megan Higgie and Conrad Hoskin for their hospitality.

Tuesday, September 3, 2013

Fixed error in resampling in ENMTools 1.4.3

I've fixed a bug in 1.4.3 that kept the "resample from raster" command from printing results to the output file.  It was a very silly error; basically I had disabled printing for debugging and forgot to turn it back on!

Anyway, it's fixed now and should be working fine.  While I was at it, I fixed it so that the resample command now uses the output directory set in the ENMTools Options tab, instead of printing to the directory where the layers you're resampling from are located.  The new version is here:


Saturday, July 6, 2013

ENMTools 1.4.3

While trying to iron out the weirdness of Perl with Mac line endings in .csv files (unsuccessfully), I added some bits of code that seem to have caused the model selection functions in ENMTools to stop working on some input files.  Here's a fixed version.

Sunday, June 16, 2013

Version 1.4.2, adding sampling without replacement to "Resample From Raster" function

By request, I have added a radio button for sampling with or without replacement to the "resample from raster" function.  This function was initially intended for simulating data for methodological studies, but can also be used to sample random points for conducting significance tests for AUC values a la Raes and ter Steege 2007 (using the "constant" setting).  The initial setup was to always resample with replacement.  This isn't ideal for the Raes and ter Steege test, but was unlikely to have any real impact except on models built over very small geographic regions and/or those with very coarse resolution (i.e., study areas with a very small number of grid cells).

I'll post a detailed tutorial eventually, once I get a spare moment to breathe.  Long story short: if you have N data points and want to do X replicates, you load up a raster file that has data in grid cells for your study area and nodata values outside the study area.  This can even be the .asc file for your model itself.  Use the resample from raster tool, constant sampling function, to sample N data points for X replicates.  Then build a single model for each of those replicates using the same study area, model construction settings, and environmental predictors as in your model for your empirical data.  Collect all of the AUC train and test scores from those replicate models, and use those as the null distribution against which to compare your empirical values for AUC train and test.  Guidance on how to do that is here:

Species In Space

The new version is here:

ENMTools 1.4.2

Perl version only, see my previous kvetching about Active State if you want to know why.

Thanks to Marie-France Ostrowski for the suggestion and Renee Catullo for testing it.