How things are updating...
Posted: June 24th, 2012, 1:12 pm
Previously:
I also adjusted the algorithm that decides when the stats are updated. Previously it would only run when all caches are up to date. In theory, I think that's a good idea but in practice there are some problems. Namely, if the accounts that access the Geocaching.com Live API to update caches have reached their daily quota there will be caches that are out of date, but cannot be updated. Also, there are API call limitations in place with the API and the site may need to throttle back accessing the API. So again, there may be close to an hour where there are caches that are out of date but cannout be updated. So now, if it is time to update the stats, then if all the caches are up to date or if caches cannot be updated, then the stats are updated.
As of right now:
The site has 95,953 caches. Of those, 18,366 have been updated. We also have 5,332,322 found logs and 144,256 geocachers.
The other night, I was having some beers with jcanyoneer & azcampbell and we got to talking about this. Jcanyoneer mentioned something about updating the caches with less logs first - the caches with lots of logs really don't matter as much anyway (they're less lonely). I mulled it over and thought why not, it's easy enough to change. So now the site will pick caches sorted first by how long it's been since the last update (never is at the top of the list), and next by how many logs we have (this will favor recently added caches). This should make the statistics converge to something meaningful sooner than later.Corfman Clan wrote:Things aren't going to be changing as quickly as one might think. I set the database in such a way that every cache is ready for a full update. That is, all the logs will be retrieved for every cache. Doing this will just happen to pick older caches first. (They aren't being sorted so I'm not sure why, it just does. Maybe because the primary key is the cache ID). We already have most of the logs for the old caches, so those won't be changing much. Also, Groundspeak limits the API call for fetching logs to 200 calls per hour with up to 30 logs per call. With 4 accounts, I'm getting 800 calls an hour. Some of those old caches have a lot of logs too. So after a day, a bit over 4000 caches are fully updated. New caches will be added with pocket queries but then, those caches won't be updated until all the older caches have been. So, it's going to take awhile. The system also has a lot to do so I have my doubts on whether the stats will be updated too, though I can manually start that as I want.rocketsciguy wrote:That's awesome!Thanks Russell (and Redfist) for all your hard work to date, and thanks greenskeeper and bikephotog for being so quick to sponsor! It's exciting to know that it's just around the corner. Since you're building data from the Live API now, I think it would be pretty interesting to see the site flesh itself out as the database builds on its skeleton base. Each day you'd have radical changes in Point value and Leaderboards. (hint hint) It would be really interesting to see it.
I also adjusted the algorithm that decides when the stats are updated. Previously it would only run when all caches are up to date. In theory, I think that's a good idea but in practice there are some problems. Namely, if the accounts that access the Geocaching.com Live API to update caches have reached their daily quota there will be caches that are out of date, but cannot be updated. Also, there are API call limitations in place with the API and the site may need to throttle back accessing the API. So again, there may be close to an hour where there are caches that are out of date but cannout be updated. So now, if it is time to update the stats, then if all the caches are up to date or if caches cannot be updated, then the stats are updated.
As of right now:
The site has 95,953 caches. Of those, 18,366 have been updated. We also have 5,332,322 found logs and 144,256 geocachers.