Tuesday, June 8, 2010

New test version of ENMTools with model selection

There's a new test version of ENMTools up here. In addition to fixing a few minor annoyances from previous versions, there's a new function that allows criterion-based model selection using AICc and BIC. The user interface for the function is almost non-existent - all it really does is ask for a script file. Here's a quick-and-dirty rundown of how to use it. In order to correctly calculate likelihoods, the data must be formatted appropriately. For that reason we suggest that users pay very close attention to the requirements below.

1. Build a set of models to compare. It is absolutely crucial that suitability scores be output in RAW format! You will need both the .asc file and the .lambdas file associated with each model.

2. Make sure that each set of occurrence points to be compared is in its own independent file. You do not want to load an occurrence file that has points for multiple species. You also need to eliminate duplicate occurrence points from the file, particularly if you have Maxent set to ignore duplicate occurrences.

3. Build a script. A script is simply a .csv file with the paths to the files you want to analyze. Each line of the script should consist of a .csv file, a .asc file, and a .lambdas file. Relative paths will not work, you need fully qualified path names. A typical line will look like this:

c:\mydata\points.csv,c:\mydata\species.asc,c:\mydata\species.lambdas

You need one line per analysis. Also note that ENMTools will output results into a file with a name based on your script file. If your script file is named myscript.csv, the output will be named myscript_model_selection.csv. At present it will overwrite that output file (if it already exists) without asking, so BE CAREFUL.

4. In ENMTools, choose the "Model Selection" tool under "ENM Measurements". A file dialog will pop up. At this point you should choose the script file that you just made. ENMTools will chug along for a while, and will tell you when it's finished. The process is fairly simple: ENMTools uses your raw suitability scores (after standardization) and occurrence points to calculate the likelihood of observing your data under that model. It then counts the number of parameters from your lambdas file, counting any parameter with nonzero weight. Finally, it uses these values to calculate AICc and BIC.

Preliminary studies (Warren and Seifert, in review) indicate that AICc outperforms BIC in selecting models on simulated data. I'll be talking about this study at Evolution this year, for those who are interested (shameless plug).

Keep in mind that this is a test build and may be buggy. Feedback is appreciated.

59 comments:

  1. Dear Dan:
    I am trying to use ENMTools to test the niche identity for some cryptic species (phylogroups), I installed the new version of ENMTolls (1.1) and the last version of Perl, I am using the last version of MaxEnt 3.3.1. Apparently they run well when I calculated I and D, however when I tried to run identity or background test I get the same error.

    Can't open C:/Users/Jonathan/Documents/Trabajo/Docto/Analyses/ENMTools/Rglaucostigma_rep0.asc!!




    Can't open C:/Users/Jonathan/Documents/Trabajo/Docto/Analyses/ENMTools/Rglaucostigma_rep0.asc!!


    while executing
    "::perl::CODE(0x23baecc)"
    invoked from within
    ".b5 invoke "
    invoked from within
    ".b5 instate {pressed !disabled} { .b5 state !pressed; .b5 invoke } "
    (command bound to event)


    Any idea of what I doing bad?

    ReplyDelete
    Replies
    1. I had similar error. In my case I had wrongly specified the folder with the climatic layers.

      Delete
  2. Could you email me your .csv file? Send it to dan.l.warren@gmail.com, thanks!

    ReplyDelete
  3. I had sent you the files, but also I had tried with the examples files, and I got the same error

    ReplyDelete
  4. Dear Dan:
    I just wondering if you find any problem in my .csv file, or if you have any idea of what is producing the error?

    ReplyDelete
  5. Oh goodness, I'm sorry! Somebody else emailed me a .csv file at the same time and I got the two confused - I thought that I had fixed your problem! I'll look at it straight away.

    ReplyDelete
  6. Oh wait, I just dug through my email and found that I did write back to you with a question - perhaps you didn't see it?

    ReplyDelete
  7. Dear Dan,

    I'm using ENMTools to compare different Maxent models. I've followed your instructions and it looks to work correctly. In fact, I can get the AICc and BIC for most of the models. However, the outputfile don't provide them for some of the models, giving just "X"s. I mean, the output says this:

    C:\models\species.csv,P:\models\species.asc,-151.356067919256,20,17,x,x,x

    Any idea of what it is going bad?

    Thanks in advance,

    Pedro

    ReplyDelete
  8. That occurs when your model has more parameters than you have occurrence points, which violates the assumptions of AIC.

    ReplyDelete
  9. Thanks Dan for your reply. Now, I understand. So, I should discard those models with more parameters than occurrence points?

    Cheers,

    Pedro

    ReplyDelete
  10. Hi again Dan,

    One new (but quick) doubt: when obtaining the suitability scores (in RAW format) with maxent in order to compare different models, the "add samples to backgound" option should be disabled, shouldn't it? I've read that it in your Ecol Appl paper but not in the "protocol" above.

    Cheers,

    Pedro

    ReplyDelete
  11. We disabled that option on the advice of Steven Phillips specifically because the simulated data we were using simulated data which has no spatial sampling bias. As that is not the case with real data, I don't think that it is generally necessary to disable this option.

    ReplyDelete
  12. Hey Dan - is there any reason why I would get X's instead of likelihood values when I *know* that my sample size is greater than the number of model parameters? I have 595 sample points (none are duplicated) and 87 model parameters. I'd love to use this tool but I can't figure out what's going on! Any help would be GREATLY appreciated! :)

    ~Elizabeth

    ReplyDelete
    Replies
    1. Hi Elizabeth,

      I hope I'm not too late. I recently faced a similar issue, and I solved it by adding an extra column to the front of my csv file with occurrence data. My extra column was filled with "junk" data (the header was "Status" and values were all "Africa"). It seems to be a bug; without this extra column, the mac version of ENMtools doesn't recognize some points in the csv file given, and so the sample size of points used to calculate AICc/AIC/BIC is smaller than that given in the input csv file. Hope this helps.

      -Bi Wei

      Delete
  13. p.s. I am running ENMTools_1.3 for OSX.

    ReplyDelete
  14. Dear Dan:
    I am trying to use ENMTools to test the niche identity for few plant species. I have installed the new version of ENMTolls (1.3) and the latest version of Perl. I am using the MaxEnt 3.3.1. I was able to calculate I and D, however when I tried to run identity or background test I get the same error:

    Can't open H:/ENMTools_1.3/Phyllanthus debilis_rep0.asc!!




    Can't open H:/ENMTools_1.3/Phyllanthus debilis_rep0.asc!!


    while executing
    "::perl::CODE(0x5642e4c)"
    invoked from within
    ".b5 invoke "
    invoked from within
    ".b5 instate !disabled { .b5 invoke } "
    invoked from within
    ".b5 instate pressed { .b5 state !pressed; .b5 instate !disabled { .b5 invoke } } "
    (command bound to event)

    Can you please help me solve this problem

    ReplyDelete
    Replies
    1. My guess is that it's due to the spaces in the species names. Try replacing those with underscores or dashes in your .csv file, and see if that fixes it.

      Delete
  15. Dear Dan,

    I am using the ENMTools model selection tool to compare Maxent habitat suitability models, and wondered whether it is ok to use an occurrence file that has duplicate occurrence points in it (as I did not remove duplicates in my Maxent models, for reasons I won't go into here!). I know the instructions say to remove them, but I'd like to know if the tool will run ok and produce sensible AIC/BIC results with them kept in? Your advice would be much appreciated.

    Thanks,
    Anna

    ReplyDelete
  16. If you constructed your models without removing duplicates, it's probably better to evaluate them without using duplicates. Cheers!

    ReplyDelete
  17. Hi Dan,

    Thanks for your quick reply. Just to clarify, are you saying that I should evaluate the models without using duplicates (ie remove them), even though I kept them in to build the models? Thanks!

    ReplyDelete
  18. Yes, but I'm assuming that you kept them in because you have some compelling reason to believe that the duplicate occurrences are not simply due to sampling bias.

    ReplyDelete
  19. Dear Dan, I am using ENM tools to compare models ran with Maxent but I have one doubt. Sorry if it is too obvious.
    The set of occurrence points to be compared is the set of points used to build the model (training data) or you use all the data (training + test data)??
    and you use only presence data or presence + absence?
    Thanks a lot!
    Elena

    ReplyDelete
  20. Dear Dan,
    I'm starting to use ENMtools to evaluate my maxent models and I have some doubts about the process. My question is: it is necessary to standardize all the asc files before running Model selection tool, or the tool is able to do it during the process? This is because I’m very green in this area and need to be certain of my final scores and how to interpret them correctly. All my final scores seem to be extremely high (~106820793909 - AICC), but perhaps it is normal. Sorry for this basic question and many thanks in advance.
    Leonel

    ReplyDelete
  21. ENMTools standardizes the models. However, the AICc score you present is a bit weird. Can you send me one of your models, or are they too big?

    ReplyDelete
  22. Hi again Darren! when running ENM with my models I get this error:
    Can't take log of -9.23358e-054 at ENMTools_3-17-2011.pl line 2116.

    Can't take log of -9.23358e-054 at ENMTools_3-17-2011.pl line 2116.

    while executing
    "::perl::CODE(0x13233a4)"
    (menu invoke)

    The output of my models is RAW.
    what can be wrong? thanks a lot!

    ReplyDelete
  23. The error simply means you can't take the log of a negative number. I can't really tell from this why your models might be producing negative probabilities, but obviously that's a problem regardless.

    ReplyDelete
  24. Hi Dan,

    First, let me say you are an absolute gem for troubleshooting in your blog comments for years. Much appreciated.

    Second, I cannot convince the Model Selection tool to find any of the files needed to compare models--the points, the prediction raster, or the lambdas. I get this message for the files in each of the six models (Auto features used as an example):

    Can't find c:\temp\camas_trimmed.csv c:\temp\auto.asc c:\temp\auto.lambdas!
    Can't find !
    Can't find !

    I started out with all my files in places with quite complicated paths and, as I got more and more errors, copied and renamed them until they sit in this little temp folder on my C drive. I have gotten the same error with ENM 1.4 and 1.3 independently. A ...model_selection.csv file is written, but it only contains the column headers--no data. I am happy to send you my script or whatever you need to diagnose the problem.

    ReplyDelete
  25. My guess is that there's something up with the .csv script file. Could you send it to me?

    ReplyDelete
  26. Hi Dan, I am using ENM tools to compare my maxent modelsand select the best model. I am getting this message in the black command screen "Found probability of -9999" I was wondering if that is indicating an error or it is just a normal message. I am using raw output.
    Thanks in advance.

    ReplyDelete
  27. That just means that there is one of your points that has a nodata value in the raster for some reason. Not anything to worry about unless you have a whole lot of them.

    ReplyDelete
    Replies
    1. Dear Darren
      Would you please give me a brief clarification on steps how to prepare my data to be used in ENMTools?. Which outputs of maxent are going to be inserted into ENMTools?. How can i prepare them?
      Regards
      Nega

      Delete
  28. Hello,

    I've done some searching and haven't found a specific answer to this question. I know I need a line in the model selection script for each csv/asc/lambda. So if I'm running multiple replicates (say 10) for one model, will my input model look like c:\data\points.csv\,c:\data\species_0.asc,c:\data\species_0.lambda
    c:\data\points.csv\,c:\data\species_1.asc,c:\data\species_1.lambda
    c:\data\points.csv\,c:\data\species_2.asc,c:\data\species_2.lambda
    etc for each replicate on this model?
    Will I then include similar lines for a separate model that also has 10 replicates?

    Thanks!

    ReplyDelete
  29. Replies
    1. Hi Dr. Warren,

      I would also like to compare Models A, B, and C each of which have five replicates. To clarify, I should put the three arguments (or line with the csv, asc, and lambda file) for each replicate in say Model A, in one row of the excel (csv) input file. Then all of the arguments for all replicates in Model B on the second row and all for Model C in the third row, correct? Rows being the actual numbered rows as in excel, not just lines of addresses entered on top of each other for ease of visualizing.

      Does it matter how the model replicates themselves are spaced? The person above seemed to keep each replicate on a separate "line" does this mean a distinct row in the excel (csv) file or was each replicate in the same row technically?

      Thanks so much and thanks for the tools...and the tunes!
      Ian

      Delete
    2. Each replicate would have to be on a separate line and treated as a separate model. The next question is of course how you use AIC values from all of those replicates to get some sort of overall AIC value for your model, and the answer is that there's not really an accepted way to do that. One of several reasons that AIC is a bit of an odd approach for Maxent models as typically constructed.

      Delete
    3. Okay, great thanks! That makes sense and confirms some suspicions I had about AIC. I suppose if all replicates from model A have lower AIC than all replicates model B I could conclude that A has better performance than B as measured by AIC. I need to research this more but since I'm here, would calculating in this manner be AICc or normal AIC?

      Delete
    4. Certainly it would suggest that modeling approach A was doing a better job than B.

      I'm not sure what you mean when you say "in this manner", but in general AICc is the way to go. It corrects for small sample size when your sample sizes are small, but if sample size is large it converges on regular AIC anyway.

      Delete
  30. Hello,

    I am trying to use the background test in ENMTools, but keep getting the same error (below). It appears that the program is creating a number of empty files in my output folder and then not being able to reopen them. My input files are .csv files with two columns with the headers "LAT" and "LONG" Any idea what the problem might be? Thanks in advance for any help that you can give!


    Can't open C:/Users/rtelemeco/Documents/Alligator Lizards/Niche Modelling/ENMTools_ElgariaOutput/34.23442.asc!!

    Can't open C:/Users/rtelemeco/Documents/Alligator Lizards/Niche Modelling/ENMTools_ElgariaOutput/34.23442.asc!!


    while executing
    "::perl::CODE(0x36b5578)"
    invoked from within
    ".b5 invoke "
    invoked from within
    ".b5 instate !disabled { .b5 invoke } "
    invoked from within
    ".b5 instate pressed { .b5 state !pressed; .b5 instate !disabled { .b5 invoke } } "
    (command bound to event)

    ReplyDelete
    Replies
    1. Could you email me your .csv file?

      Delete
    2. I sent the files, but not sure if they made it to you...

      Delete
    3. Nope, haven't seen them. Did you send them to dan.l.warren@gmail.com?

      Delete
  31. Hi,i know it is a simple question but i just started to use ENMTools, when i try to build a null model using resample from raster,what should i put in the number of points per replicate? I have one species(different wolfs) and 25 000 data (gps coordinates)?

    ReplyDelete
  32. What are you building the null model for?

    ReplyDelete
    Replies
    1. i am using wolf movement data(telemetry data),and plan to develop habitat suitability model( later use that to draw corridors in Gis) using maxent,after i run it, I would like to check my results in ENMTools by developing a null model

      Delete
    2. Do you mean you're looking for statistical significance of the model predictions, as in Raes and ter Steege 2007?

      If so you should put in the number of points that you have in your empirical data set.

      Delete
    3. Yes i am looking for statistic significance of my models predictions,so i just put 25 000?will ENMTools work with that much data?

      Delete
    4. Probably, but it depends on your computer. It may take a while!

      If it doesn't run in ENMTools, there are functions in dismo for doing this as well.

      Delete
  33. Hi Dan
    For some reason I don't know yet, I can't get a niche identity test using enmtools v1.3 on windows. I could do it last year, but now when I add two files for 25 replicates, I get the message "Niche identity test are finished" very quickly but I don't get any result. Can you please help me and let me know what I might have done wrong? Thanks!

    ReplyDelete
  34. Hard to diagnose just from this, but my guess is that Maxent isn't actually being run. Do you see the GUI pop up?

    ReplyDelete
  35. This comment has been removed by the author.

    ReplyDelete
  36. I have a question about the application of ENMTool model selection. Does it make sence to compare the models with the different number of variables, or is it just to compare the model with different parameters? Are there limitations for model comparisons with this tool at all?

    ReplyDelete
    Replies
    1. AIC is able to compare models with different numbers of variables and parameters. As for there being limitations: the application of AIC to Maxent models is still a bit of a hack despite being quite widely used. Maxent's use of lasso regularlization means that there's a disconnect between the number of parameters and the effective degrees of freedom, and we don't really understand fully how that affects model selection for species distribution modeling. The simulations that have been done largely support the utility of AIC, but we still know that it might be "wrong" to some unknown level - over-penalizing model complexity to some extent. I'm pretty comfortable with that based on my general preference for very simple models, but there are definitely some very smart people who have a different perspective on it.

      Delete
  37. This comment has been removed by the author.

    ReplyDelete