Tuesday, June 8, 2010

New test version of ENMTools with model selection

There's a new test version of ENMTools up here. In addition to fixing a few minor annoyances from previous versions, there's a new function that allows criterion-based model selection using AICc and BIC. The user interface for the function is almost non-existent - all it really does is ask for a script file. Here's a quick-and-dirty rundown of how to use it. In order to correctly calculate likelihoods, the data must be formatted appropriately. For that reason we suggest that users pay very close attention to the requirements below.

1. Build a set of models to compare. It is absolutely crucial that suitability scores be output in RAW format! You will need both the .asc file and the .lambdas file associated with each model.

2. Make sure that each set of occurrence points to be compared is in its own independent file. You do not want to load an occurrence file that has points for multiple species. You also need to eliminate duplicate occurrence points from the file, particularly if you have Maxent set to ignore duplicate occurrences.

3. Build a script. A script is simply a .csv file with the paths to the files you want to analyze. Each line of the script should consist of a .csv file, a .asc file, and a .lambdas file. Relative paths will not work, you need fully qualified path names. A typical line will look like this:

c:\mydata\points.csv,c:\mydata\species.asc,c:\mydata\species.lambdas

You need one line per analysis. Also note that ENMTools will output results into a file with a name based on your script file. If your script file is named myscript.csv, the output will be named myscript_model_selection.csv. At present it will overwrite that output file (if it already exists) without asking, so BE CAREFUL.

4. In ENMTools, choose the "Model Selection" tool under "ENM Measurements". A file dialog will pop up. At this point you should choose the script file that you just made. ENMTools will chug along for a while, and will tell you when it's finished. The process is fairly simple: ENMTools uses your raw suitability scores (after standardization) and occurrence points to calculate the likelihood of observing your data under that model. It then counts the number of parameters from your lambdas file, counting any parameter with nonzero weight. Finally, it uses these values to calculate AICc and BIC.

Preliminary studies (Warren and Seifert, in review) indicate that AICc outperforms BIC in selecting models on simulated data. I'll be talking about this study at Evolution this year, for those who are interested (shameless plug).

Keep in mind that this is a test build and may be buggy. Feedback is appreciated.

13 comments:

  1. Dear Dan:
    I am trying to use ENMTools to test the niche identity for some cryptic species (phylogroups), I installed the new version of ENMTolls (1.1) and the last version of Perl, I am using the last version of MaxEnt 3.3.1. Apparently they run well when I calculated I and D, however when I tried to run identity or background test I get the same error.

    Can't open C:/Users/Jonathan/Documents/Trabajo/Docto/Analyses/ENMTools/Rglaucostigma_rep0.asc!!




    Can't open C:/Users/Jonathan/Documents/Trabajo/Docto/Analyses/ENMTools/Rglaucostigma_rep0.asc!!


    while executing
    "::perl::CODE(0x23baecc)"
    invoked from within
    ".b5 invoke "
    invoked from within
    ".b5 instate {pressed !disabled} { .b5 state !pressed; .b5 invoke } "
    (command bound to event)


    Any idea of what I doing bad?

    ReplyDelete
  2. Could you email me your .csv file? Send it to dan.l.warren@gmail.com, thanks!

    ReplyDelete
  3. I had sent you the files, but also I had tried with the examples files, and I got the same error

    ReplyDelete
  4. Dear Dan:
    I just wondering if you find any problem in my .csv file, or if you have any idea of what is producing the error?

    ReplyDelete
  5. Oh goodness, I'm sorry! Somebody else emailed me a .csv file at the same time and I got the two confused - I thought that I had fixed your problem! I'll look at it straight away.

    ReplyDelete
  6. Oh wait, I just dug through my email and found that I did write back to you with a question - perhaps you didn't see it?

    ReplyDelete
  7. Dear Dan,

    I'm using ENMTools to compare different Maxent models. I've followed your instructions and it looks to work correctly. In fact, I can get the AICc and BIC for most of the models. However, the outputfile don't provide them for some of the models, giving just "X"s. I mean, the output says this:

    C:\models\species.csv,P:\models\species.asc,-151.356067919256,20,17,x,x,x

    Any idea of what it is going bad?

    Thanks in advance,

    Pedro

    ReplyDelete
  8. That occurs when your model has more parameters than you have occurrence points, which violates the assumptions of AIC.

    ReplyDelete
  9. Thanks Dan for your reply. Now, I understand. So, I should discard those models with more parameters than occurrence points?

    Cheers,

    Pedro

    ReplyDelete
  10. Hi again Dan,

    One new (but quick) doubt: when obtaining the suitability scores (in RAW format) with maxent in order to compare different models, the "add samples to backgound" option should be disabled, shouldn't it? I've read that it in your Ecol Appl paper but not in the "protocol" above.

    Cheers,

    Pedro

    ReplyDelete
  11. We disabled that option on the advice of Steven Phillips specifically because the simulated data we were using simulated data which has no spatial sampling bias. As that is not the case with real data, I don't think that it is generally necessary to disable this option.

    ReplyDelete