Species In Space: 2009

Monday, October 5, 2009

How to run ENMTools tests on a cluster, or using a modeling method other than Maxent

ENMTools tests can take a long time to run, due to the number of Maxent runs necessary to construct a distribution of expected overlaps. For that reason, people occasionally ask whether there's a way to break up analyses and run them on a cluster, or just manually spread them across several computers. People also occasionally ask whether it's possible to use ENMTools with non-Maxent methods of ENM construction. There's no built-in way to do either of these at present, and given the number of different ways that people might want to do this I'm not sure there will be. However, it CAN be done! It requires a bit of extra work, but it's not too bad, particularly with a few simple new tools that I'm presenting here.

Let's say you want to do an identity test, and split it up so that replicates are submitted to a cluster. For starters, we'll just set up the test in ENMTools as usual but de-check the box marked "Run Maxent". This will cause ENMTools to generate the data sets necessary for the test, but the data will not be sent to Maxent to analyze.

Okay, now we have a file that has a whole bunch of replicates in it. What we would like to do is analyze each of those separately. I've got a really simple no-frills script here that simplifies this process. All you need to do is drop this file into the directory where your file of replicates is, then go to a command prompt in that directory. Type "splitcsv.pl INFILE", where INFILE is the name of your .csv file of replicates. In short order, you will have a new .csv file in that directory for each replicate in the input file. Note: you can also use this little tool to split a large file up by species. It will spit out a file for each unique species name in the input file. Another note: this script will overwrite files without asking, so it's best to run it in its own directory!

At this point it's up to you to figure out how to submit jobs to your cluster using the Maxent command line options. One thing to keep in mind: Maxent by default writes to a file called maxentResults.csv. If you've got a shared file system, or just a directory on a single multi-core computer that's being shared by two different simultaneous Maxent jobs, both of those jobs will by default try to write to the same results file. Java will happily allow multiple instances of Maxent to fight over the same filehandle, but what comes out in that results file will be incomplete and quite possibly not useful. I've just set my stuff up so that it writes to different directories, but it's possible that using the "perspeciesresults" option will fix this issue as well. On a side note: if you're doing jackknife or bootstrap on a cluster, you pretty much can't have multiple instances of Maxent writing to the same directory. The perspeciesresults option doesn't work for resampling, and having multiple instances writing to the same output file simultaneously will cause Maxent to fail when calculating summary grids. Just FYI.

All right, now you've gotten all of your runs finished and want to bring your data together to construct a distribution of overlaps. This is where a handy new half-developed tool in ENMTools becomes useful: scripting mode. You'll need the newest test version of ENMTools for this. I'll admit that it's a little weird to have a scripting interface for a program that is itself just an elaborate scripting interface for Maxent, but that doesn't mean it's not useful. At present the scripting interface is only hooked up for a few functions, and I'm not even close to writing a comprehensive manual entry for it. However, it works for our purposes.

What you need to do is build a .csv file that has a line for each comparison to be made. The script command is "measureOverlap". Capitalization is not important. A typical line looks like this:

measureOverlap,c:/sample data/species1_rep1.asc,c:/sample data/species2_rep1.asc,testing_measureOverlap1

The first entry is the command, the next two are the two files to compare, and the last entry is the name for the analysis. If you're doing a comparison between two species using 100 replicates, you need 100 lines in your script file, each with its own name for generating output files. In cases like this, the "concatenate" function in Excel is your best friend. Once you've got your script file, just go to Options->Run Script File in ENMTools and let it do its thing.

Now you've got two output files per replicate (I and D), and you'd really like to have all of those in one file. In Windows, the easy way to do this is to go to a command prompt and use the "copy" command with some clever wildcards. In the above example, let's say I've got 100 files named "testing_measureOverlap1_I_output.csv" to "testing_measureOverlap100_I_output.csv", and likewise for D. The appropriate command would be something like:

copy testing_measureOverlap*_I_output.csv collected_I_scores.csv

This would concatenate all of those output files for I into one csv file that can then be edited in Excel. There's going to be a bit of cleanup to do, since each of those files had its own header line and two copies of each score. That's all fairly easy, though, and should only take a couple of minutes with some clever sorting.

Everything I've said here goes for other modeling methods as well - you can generate data sets and analyze them in whatever software you like, and then use the scripting interface to build your distributions. When I get a chance I plan to make an interface that will make the comparison of multiple files easier (i.e., send everything to one formatted outfile). Seeing as I'm currently in a blind rush to finish my dissertation before my postdoc starts, though, I don't think that's going to happen too soon.

Friday, September 11, 2009

New manual!

We've finally bashed together a first take on the full manual for ENMTools. You can find it here, or in the zip file with the new version of ENMTools.

The new manual outlines for the first time the niche breadth, range breaking, spatial cross-validation, and jackknife/bootstrap functionality of ENMTools. Papers are in progress discussing all of these tools in more detail, but there should be enough information in the manual to get the general idea now.

Now I'm almost immediately going to start adding new features so that the manual is obsolete again. We'll try to keep it more up to date, though. Scout's honor.

Wednesday, August 5, 2009

New version of ENMTools up

There's a new version of the software up now that improves compatibility with Maxent. There's a new setting in the "ENMTools options for setting the Maxent version (3.2 and below versus 3.3 and higher), which is necessary due to some changes in the command line arguments for Maxent.

A few things are worth mentioning right off the bat:

-We're going to go Tkx-only from now on due to massive improvements in appearance, compatibility, and ease of updating the software.

-Due to all of the recent changes to the software, the manual has gone from "incomplete" to "almost useless". We'll work up a new version shortly. One thing to mention immediately is that we've turned off the Kullback-Leibler (KL) statistics for niche overlap. They weren't really useful for much, and were causing errors. Because of this, the niche overlap function now only spits out two files, not five as the manual suggests.

-The new zip file is much larger than the previous ones for two reasons: it contains a Mac executable and a sample data set. The Mac executable is still a little sketchy, and Rich has been having some problems that I (Dan) can't seem to duplicate. If anyone has any feedback on this it would be much appreciated. The sample data comes from the Knouft et al (2006) study of Cuban anoles, and consists of occurrences for Anolis allogus (east) and Anolis ahli as well as some environmental data.

-At the moment the newest version is the release version, so the testing and release links above point to the same place.

Friday, July 3, 2009

ENMTools Used in Bigfoot Study

Who would have guessed that the first published study using the test for niche identity implemented in ENMTools would be about bigfoot? The study in question is a guest editorial in the Journal of Biogeography that uses niche modeling of bigfoot localities (both observations and footprints) as part of a tongue-in-cheek critique of contemporary niche modeling practices. The main point of the study is that dubious observational data can be used to generate niche models that produce "visually convincing distributions." Using the test for niche identity, the authors show that the niche model projection for bigfoot is indistinguishable from the niche model projection for the black bear, bolstering their case that putative bigfoot observations may be cases of mistaken identity. The authors' general suggestion that point locality data extracted from public databases (including databases that are considered more reputable than the Bigfoot Field Researchers Organization) requires greater scrutiny is well-taken; nobody should be niche modeling from publicly available databases without expert validation of the localities being considered.

Friday, April 24, 2009

New Version of Maxent Available

Steven Phillips just posted a new version of Maxent (v. 3.3.0). I've already used this new version successfully with ENMTools, so hopefully there won't be any unforeseen compatability issues. One of the coolest features of the new version is the capability to do replicated runs, allowing "cross-validation, bootstrapping and repeated subsampling."

Tuesday, April 21, 2009

New executables of ENMTools for Windows and OSX! Hooray!

Thanks to Activestate's PerlApp (which is awesome), we now have executable versions of the testing build of ENMTools. You can download the new Mac OS X executable here, or the Windows executable here. For some reason the OS X executable is way bigger than the Windows version, and it only launches from the console on my Mac (double-clicking doesn't work). If anyone has insight into either of those issues, please let me know. I should also mention that neither of these builds have been tested extensively yet, as they're hot off the compiler. Please check your results and let me know if anything comes out weird.

Sunday, April 19, 2009

OSX users of the Tkx version - possible bug

Rich gave the Tkx version a spin yesterday, and had a bit of an odd bug come up that I've never seen before. It seems that the program wouldn't let him set the path to maxent.jar, no matter what he did. It works fine on my Mac, though. If anyone else runs into this problem, please let me know. If you do see this happen, you can fix it by opening the Perl script in a text editor and changing line 3329 from this:

$maxent_path = Tkx::tk___getOpenFile(-initialfile=>"maxent.jar");

to this:

$maxent_path = Tkx::tk___getOpenFile();

Friday, April 17, 2009

Tkx version ready for testing!

I'm excited to announce that there's already a test version of the new Tkx ENMTools. And here's the big news - IT WORKS ON OSX! Porting from Tk to Tkx was a bit of a hassle, but once that was done it turned out to be trivial to make it work on a Mac. It also has the side effect of making ENMTools look considerably more modern on a PC than it did before. Contrary to my earlier statements, though, I think we're going to keep the retro look of the web page. If you're as deeply in love with Sparklee logos as I am, you can actually save the above logo into the same folder as the new Tkx ENMTools and it will show up when you start up the software.

Anyway, HERE is the perl script for the new version. In order to keep your browser from trying to interpret that link as a web script, Windows users will need to right click and "save as" to download it. Mac users will have to do the Mac equivalent, whatever that is. I'll have a Windows executable version ready as soon as my new license for Perl Dev Kit gets here. For now, you'll need the very, very newest version of ActivePerl from activestate.com. Mac users will need to go to a console and type the path to the Activestate installation of ActivePerl, because ENMTools won't run using the default OSX perl installation. That'll look something like this:

/usr/local/ActivePerl-5.10/bin/perl ./ENMTools TkxTest 4-17-09.pl

...assuming you're in the directory where you've dropped the perl script.

Now in addition to looking considerably sexier and running on OSX, there are a couple of new features that will be of interest to many users. Here are a few:

-Change the amount of memory that is allotted to Maxent (make sure you put in -mx####m, where #### is the number of Mb to assign)
-Ability to turn on/off response curves, pictures, and ROC plots for pseudoreplicates
-New flavors of occurrence point jackknife (I'll write up an explanation of what these are soon)
-Generate data sets for random spatial cross-validation (ditto)
-A new rangebreak test that we haven't told anyone about, which is even cooler than the other ones that we haven't explained
-Measuring niche breadth on ENMs using Levins' measures of niche breadth

Now this stuff is all very, very new. We've done some testing, and things seem to be working correctly. Please email Dan (danwarren@ucdavis.edu) if you hit any snags. Oh, and there's a slight bit of weirdness on OSX in that it seems to want to draw the window slightly smaller than it needs to be, no matter how large I tell the program to make it. Just drag the bottom right corner out a bit and everything's fine. If anyone happens to know what to do about that little glitch, I'd appreciate the info.

Thursday, April 9, 2009

We're starting to work on the new Tkx version!

Hooray! In case you don't know why you should be excited about this, I'll give you a couple of reasons. First, it's going to make it so that it works with the default install of ActivePerl on Windows, which means no more downloading Tk to get it to run as a script. Second, it will take us much, much closer to having a Mac-friendly version, which I'm told a few people want. Finally, it looks a lot less like something you'd expect to see scratched on a wall at Lascaux. Just take a look:

Okay, so it's not Rembrandt, but it at least looks like something that was written in the last twenty years.

I'm hoping to have an alpha version of it up on the web site in a few days. The tabs are gone, replaced by a more normal-looking menu system. I'm also slowly but surely going to start adding more flexibility in how Maxent runs are conducted, with the ultimate goal being that the user can set any Maxent option from within ENMTools and apply that to all runs automatically. That's a bit of a ways away, though.

Friday, April 3, 2009

ENMTools is about to get a "testing" version

As it stands, ENMTools basically is a testing version. We're about to crank out a version that only has the bits in that we've worked with extensively, and that already have some associated published paper that we can point to and say "here, this is what this tool does". At the moment that's just the overlap, identity, and background tabs.

Tuesday, March 31, 2009

Like an infant, or possibly a drunken hobo...

...the ENMTools website is slowly lurching its way onto its feet. The site is intended primarily as a distribution point for new builds of the ENMTools software. This software was designed to make the tests used in Warren et al (2008) accessible to those who want to use them. At its current state, "accessible" might be a bit overly generous, but the software at very least makes these tests possible. In addition, there are other analyses and tools that are being added all the time. Most of these are being developed for papers in progress by Dan Warren, Rich Glor, Michael Turelli, or some combination of those three.

At present, the software only works on Windows systems. A Mac version will be available as soon as possible, but it may take a while. The whole thing basically needs to be ported from Tk to Tkx, which is a bit of an undertaking. When that happens, though, it will have the added bonus of making ENMTools look like it was at least programmed some time after 1995. Until then, we've decided to keep the web site's appearance consistent with that of the software.

Along with ENMTools updates, we'll probably post some other handy little tools or observations from time to time.