Tuesday, June 8, 2010

New test version of ENMTools with model selection

There's a new test version of ENMTools up here. In addition to fixing a few minor annoyances from previous versions, there's a new function that allows criterion-based model selection using AICc and BIC. The user interface for the function is almost non-existent - all it really does is ask for a script file. Here's a quick-and-dirty rundown of how to use it. In order to correctly calculate likelihoods, the data must be formatted appropriately. For that reason we suggest that users pay very close attention to the requirements below.

1. Build a set of models to compare. It is absolutely crucial that suitability scores be output in RAW format! You will need both the .asc file and the .lambdas file associated with each model.

2. Make sure that each set of occurrence points to be compared is in its own independent file. You do not want to load an occurrence file that has points for multiple species. You also need to eliminate duplicate occurrence points from the file, particularly if you have Maxent set to ignore duplicate occurrences.

3. Build a script. A script is simply a .csv file with the paths to the files you want to analyze. Each line of the script should consist of a .csv file, a .asc file, and a .lambdas file. Relative paths will not work, you need fully qualified path names. A typical line will look like this:


You need one line per analysis. Also note that ENMTools will output results into a file with a name based on your script file. If your script file is named myscript.csv, the output will be named myscript_model_selection.csv. At present it will overwrite that output file (if it already exists) without asking, so BE CAREFUL.

4. In ENMTools, choose the "Model Selection" tool under "ENM Measurements". A file dialog will pop up. At this point you should choose the script file that you just made. ENMTools will chug along for a while, and will tell you when it's finished. The process is fairly simple: ENMTools uses your raw suitability scores (after standardization) and occurrence points to calculate the likelihood of observing your data under that model. It then counts the number of parameters from your lambdas file, counting any parameter with nonzero weight. Finally, it uses these values to calculate AICc and BIC.

Preliminary studies (Warren and Seifert, in review) indicate that AICc outperforms BIC in selecting models on simulated data. I'll be talking about this study at Evolution this year, for those who are interested (shameless plug).

Keep in mind that this is a test build and may be buggy. Feedback is appreciated.