Comments on Species In Space: Handy little snippet of R code for thinning occurrence data

2024-08-06T02:31:54.846-07:00

This comment has been removed by the author.

Hi, Dan. Thank you for your reply sample.grid can ...

2015-11-18T19:09:37.906-08:00

Hi, Dan. Thank you for your reply
sample.grid can also extent to n dimension by splitting the space into cube or hyper-cube if necessary. The disadvantage are that the spatial auto-correlation within a grid is not considered, and one can not determine the number of the output samples.
But it is possible to use your method or similar in each grid instead of the random sampling in sample.grid. In this way, we can find a balance between the spatial autocorrelation and the computing time.
Anyway, this is also not my focus right now. Just keep in mind. Cheers!

Yeah that definitely seems like it's going to ...

2015-11-18T18:04:22.013-08:00

Yeah that definitely seems like it's going to be way faster when you've got a whole bunch of points - even though this method is miles faster than spThin, it still gets quite time-consuming when you get more than a few tens of thousands of points. One thing that this method can do that I don't think sample.grid can do (unless I misunderstand) is thin points based on distance in any number of dimensions, not just spatial. You could use this algorithm (and the earlier one you posted, I think) to thin points in environment space as well as (or instead of) geographic space.

Anyway, I don't have time to muck about with it much at the moment, I have a class to prepare. Cheers!

Hi, Dan I am not sure which is better. My guess is...

2015-11-18T17:13:39.844-08:00

Hi, Dan
I am not sure which is better. My guess is that the results may be similar. But I see the computation time of yours and their thinning algorithm are both O(N^3) algorithmic complexity, but theirs should be faster than yours.
https://en.wikipedia.org/wiki/Computational_complexity_theory
Could you tell me how much time does it take for your case? It looks like you also tried spThin and it is slower?
My application is addressing millions of data points. So I wrote a program much faster but not stricter:
Spatial points are overlayed with spatial grids with a specified cell size and then get a subset from each grid with a specified number at most. If one grid has less points than the specified number, all the points are taken. If one grid has more points than the specified number, only this number of points are taken by sample. This function can be used when there are too much point observations to be handled, especially for spatially clustered observations.

I put it into the GSIF package.
http://gsif.r-forge.r-project.org/sample.grid.html

Definitely seems applicable, but I can't immed...

2015-11-18T11:57:50.507-08:00

Definitely seems applicable, but I can't immediately see what advantages it might have over this method. Do you have an idea?

May be try this method in the following paper. htt...

2015-11-18T06:49:06.493-08:00

May be try this method in the following paper.
http://www.sciencedirect.com/science/article/pii/0377042796000350
Which one is better?

Funny enough, I had thought of doing essentially t...

2015-11-17T18:41:53.848-08:00

Funny enough, I had thought of doing essentially the same thing when I was originally writing this bit of code. You could get the same results from this code just by making the stop condition a minimum distance instead of a fixed number of points.

Hi Dan, I was just working on a similar problem. ...

2015-11-17T18:38:11.915-08:00

Hi Dan, I was just working on a similar problem. Though I wanted to randomly sample a set of point, enforcing a minimum distance between them. And like you, I didn't care about maximizing the number of points, I just wanted it to be random. I deviate somewhat from your method, maybe because we had different motivations for doing this. Basically I did:

1) randomly chose one point from the original, and move it to a new set (like you)
2) calculate the distance between this point, and all of the rest left in the original (like you)
3) delete from the original those points that fall within the minimum distance of the point moved to the new set
4) repeat until there is nothing left in the original set