County Polygons
Posted: March 31st, 2012, 10:10 am
So... I started looking into adding support for Counties as regions and have run into some data accuracy issues. I'm 99% sure I know how we should approach this but I wanted to gather some opinions.
First some background. Counties have shapes (duh). Sometimes those shapes are very irregular/complex. I downloaded the coordinates from the National Atlas of the US (most official source I know of) and got some pretty detailed polygons! One county in Arizona, for example, is defined by a perimeter of ~1300 coordinates. That's not bad! But guess what, it's not perfect...
(SIDE NOTE: I saw the coordinate set that some people use in GSAK for Utah counties (for example) and they only have ~15 points. That now seems horribly inaccurate in comparison.)
I've found instances where the state assigned by groundspeak is actually wrong (by a few feet). There is an Arizona cache which is actually in New Mexico. When I check against my AZ polygons, that cache doesn't get assigned a county.
Another problem is that even these highly complex polygons aren't entirely accurate. Some county perimeters are actually way more complex than that (think winding rivers). However, the coordinate set I have is the most accurate I know of. I'll add 2 pictures to illustrate this point. My assumption is that no matter how much we try to find a more accurrate coordinate polygon definition per state, we'll still eventually have some areas of inaccuracy.
So... that now comes down to the question "What do we do about it?"
There are a few options that I see but would love input.
1. Ignore it. Out of the small sample I compared against (13922 caches) only ~8 failed to get a county assignment (because the error pushed it outside of AZ). However, there may well have been other misassigned counties within the boundaries of the state. Being generous, let's estimate that at 100-200 caches out of 13922. That's an error rate of ~1%. That might be "good enough".
2. Find some other super accurate coordinate set. As far as I know, the best source is what I'm already using. It is the National Atlas afterall.
3. Have a process where we can "appeal" county designations. This should be ***INFREQUENT*** as it would place a burden on Corfman Clan and myself to police. We could potentially add a mechanism to grant priviledges to others to help w/ that burden but I definitely would NOT want the general public to be able to edit that (since it would affect the integrity of leaderboards).
WRT #3, it's easy for Corfman Clan and I to notice if something failed around the edge of the state. It's WAY less obvious if something was inaccurate within the state.
Thoughts?
First some background. Counties have shapes (duh). Sometimes those shapes are very irregular/complex. I downloaded the coordinates from the National Atlas of the US (most official source I know of) and got some pretty detailed polygons! One county in Arizona, for example, is defined by a perimeter of ~1300 coordinates. That's not bad! But guess what, it's not perfect...
(SIDE NOTE: I saw the coordinate set that some people use in GSAK for Utah counties (for example) and they only have ~15 points. That now seems horribly inaccurate in comparison.)
I've found instances where the state assigned by groundspeak is actually wrong (by a few feet). There is an Arizona cache which is actually in New Mexico. When I check against my AZ polygons, that cache doesn't get assigned a county.
Another problem is that even these highly complex polygons aren't entirely accurate. Some county perimeters are actually way more complex than that (think winding rivers). However, the coordinate set I have is the most accurate I know of. I'll add 2 pictures to illustrate this point. My assumption is that no matter how much we try to find a more accurrate coordinate polygon definition per state, we'll still eventually have some areas of inaccuracy.
So... that now comes down to the question "What do we do about it?"
There are a few options that I see but would love input.
1. Ignore it. Out of the small sample I compared against (13922 caches) only ~8 failed to get a county assignment (because the error pushed it outside of AZ). However, there may well have been other misassigned counties within the boundaries of the state. Being generous, let's estimate that at 100-200 caches out of 13922. That's an error rate of ~1%. That might be "good enough".
2. Find some other super accurate coordinate set. As far as I know, the best source is what I'm already using. It is the National Atlas afterall.
3. Have a process where we can "appeal" county designations. This should be ***INFREQUENT*** as it would place a burden on Corfman Clan and myself to police. We could potentially add a mechanism to grant priviledges to others to help w/ that burden but I definitely would NOT want the general public to be able to edit that (since it would affect the integrity of leaderboards).
WRT #3, it's easy for Corfman Clan and I to notice if something failed around the edge of the state. It's WAY less obvious if something was inaccurate within the state.
Thoughts?