Page 1 of 1

Expansion & Those Pesky Pocket Queries

Posted: April 20th, 2014, 11:16 pm
by Corfman Clan
LonelyCache uses pocket queries to discover new caches that have been placed within its territory. We have a set of pocket queries for each of the six states that currently make up the LonelyCache territory. It's easy enough to make a set of pocket queries that cover a whole state but it's a manual process that needs regular attention due to new caches being placed and others being archived. Sometimes I'm behind in this or make mistakes and problems occur (for example, see here).

As mentioned, creating a set pf pocket queries for a state is easy enough. Defining a set for covering only a part of a state is more difficult. This is the main reason why the DGP regions in California, Texas, and Mexico are not a part of LonelyCache.

Last summer I asked, "What's next?" One big response was expansion - to add the CA DGP Regions, part of Idaho, and Wyoming and Montana. We have since added all of Wyoming because it was easy to do. We could have added Montana too, but I would prefer to include the traditional DGP territory first. That is, add the CA DGP Regions: Death Valley, Mohave, and Salton See and the Texas DGP Region Trans-Pecos.

So that gets back to those pesky pocket queries; they gotta go the way of the dodo. :P

After taking more than a couple months hiatus from site development I started tackling this pocket query dependence last Friday. The Geocaching Live API will be utilized to stepwise search a predefined grid for geocaches. Newly found geocaches will be added to LonelyCache if applicable and other geocaches will be updated. The first phase will be to use this mechanism to add the Trans-Pecos region of Texas. If all goes well, then the three California regions will be added. Finally, this mechanism will be utilized for all of the LonelyCache territory and the current pocket queries will be retired.

Re: Expansion & Those Pesky Pocket Queries

Posted: May 6th, 2014, 12:08 pm
by rocketsciguy
Cool! Good luck with the transition! I think I said the same with regarding PQs, that you should provide a way for users to donate their premium membership perks to help keep the database maintained. It would be hard to donate pocket queries, but I can imagine the mechanisms are already in place with the Live API to facilitate this, and that it would "just" be a matter of putting the appropriate functionality in place on the front- and backends.

From a UI perspective, I could see after a registered LonelyCache user logs in (I would recommend only Premium Members on GC.com since basic members have such limited API use), he/she is presented with a prompt (a pop-up dialog box or on their Cacher stats home page):
Please help keep our database maintained.
Click here to donate Geocaching.com Live API updates. You have XXXX remaining for today.
50, 100, 200, 500, 1000, 2000, 5000
Donating costs you nothing, but may limit your ability to use other Live API programs and tools.
Users could opt out of receiving pestering message like this with a checkbox on their account Profile page.

Each cache stat page could also have a pair of links visible only to registered, logged-in LC members, perhaps at the bottom of the page next to the Last Updated dates:
Update this cache's recent finds NOW | Completely refresh this cache's history NOW
You could have people "adopt" their favorite lonely caches on a first come first served basis to make sure they're refreshed on a regular basis. Encourage people with a monthly Top Sponsors board, have a pledge drive to commit to a minimum number of donations per month. People do this enough, and you could eliminate the need for the LonelyCache01 - 04 users.

May need to check Live API Terms and Conditions about whether something like this is permissible or not. You may have to do something like make the data retrieved through the API available to the user, or make it a positive action that must be taken each time rather than a passive "sign me up for 2000 a day".

Just an idea...

Re: Expansion & Those Pesky Pocket Queries

Posted: May 12th, 2014, 12:29 am
by Corfman Clan
Tonight I was able to test this for the first time. After making a few corrections, I was able to let it run and collect all the caches in the Trans-Pecos region of Texas. I'm really happy with the results. The collection seemed to work very smoothly.

First I created a grid that covers the continental US. Each "square" of the grid covers approximately 10x10 miles. This varies plus or minus a couple miles east/west since the earth isn't flat. In total, there are 56,027 pieces to the grid. Next, I activated the 367 pieces of the grid that intersect the Trans-Pecos region. Only the activated pieces are searched for caches.

After all the activated pieces were searched, 1077 caches were found. Of those, 815 are in Trans-Pecos, 195 are in New Mexico, and the remaining 67 are either in Mexico or parts of Texas outside the Trans-Pecos region.

So far, the results are very encouraging, but there is still some development left and there are plenty of tests that need to be performed before this is moved to production.

Re: Expansion & Those Pesky Pocket Queries

Posted: May 13th, 2014, 12:46 am
by Corfman Clan
I had another successful evening of testing the updates. Tonight, I activated the 356 grid pieces that cover the three California DGP regions of Death Valley, Mojave, and Salton Sea. I also re-searched the grid pieces covering the Trans-Pecos region using the "Lite" search provided by the API instead of the regular search used yesterday and for the California regions.

After all 723 activated pieces were searched, including those covering the Trans-Pecos I ended up with:
  • 5,887 caches in the three California regions (There were 3,233 the last time DGP was updated).
  • 815 caches in the Texas region (There were 508 the last time DGP was updated).
  • 685 Arizona caches updated.
  • 611 Nevada caches updated.
  • 195 New Mexico caches updated.
  • 10,726 total caches retrieved.
  • 8,193 of the total are in the expanded LonelyCache territory
  • 2,533 of the total are outside of the expanded LonelyCache territory.
I'm pretty happy with how things are working now. I have some more testing to perform but I've covered most all the changes now. I also need to review the code changes and differences along with the database schema changes. Once that's done, then I'll be able to deploy.

Re: Expansion & Those Pesky Pocket Queries

Posted: May 13th, 2014, 6:53 am
by rocketsciguy
Encouraging news! Hope the additional testing and implementation continue as successfully!

Re: Expansion & Those Pesky Pocket Queries

Posted: May 14th, 2014, 9:15 pm
by Corfman Clan
This has now been implemented. See the release notes for more information.

Re: Expansion & Those Pesky Pocket Queries

Posted: May 16th, 2014, 11:43 am
by Corfman Clan
Continuing forward with this. Today I have activated the grid sections covering New Mexico and have turned off the New Mexico pocket queries.

Re: Expansion & Those Pesky Pocket Queries

Posted: May 21st, 2014, 12:16 am
by Corfman Clan
All grid sections LonelyCache wide have been activated and all pocket queries have been turned off. What a nice feeling :D

Re: Expansion & Those Pesky Pocket Queries

Posted: May 21st, 2014, 5:36 am
by rocketsciguy
Very cool! Glad to hear the transition has worked so smoothly. What does this mean now in terms of update frequency, API usage vs. current capacity? Do you anticipate needing to buy more API calls (dummy Premium accounts, or the contribution scheme I had above)? Will you be growing the service area again anytime soon?

Re: Expansion & Those Pesky Pocket Queries

Posted: May 21st, 2014, 10:44 am
by Corfman Clan
rocketsciguy wrote:Very cool! Glad to hear the transition has worked so smoothly. What does this mean now in terms of update frequency, API usage vs. current capacity? Do you anticipate needing to buy more API calls (dummy Premium accounts, or the contribution scheme I had above)? Will you be growing the service area again anytime soon?
Each section of the search grid should be searched once a week. When a section is searched, all the caches in that section are updated and the time the search completed is saved. This "last search" time is used to add the grid section to the grid search queue. When I mention activating portions of the grid, what I actually do is set the last search time from way in the future (12/31/9999) to the past. To help spread the work out, when I activated the grid, I set the last search time from one to seven days ago.

In the short term, there may be a delay with some new caches being added to LonelyCache due to the transition from pocket queries to traversing the search grid. This is because the pocket queries are now turned off and all the active sections of the grid have yet to be searched. Now a new cache will be added to LonelyCache when the section of the grid it is in is searched. This means new caches may be added to LonelyCache at any time whereas before all new caches for a state were added in one fell swoop, once a week.

As far as API usage vs. current capacity, we will have dropped a small amount. This is due to a couple of factors. First LonelyCache will find caches that are within a grid section but not within LonelyCache. This occurs at the boundary sections, for example the western sections in California, the northern sections of Wyoming, and the eastern sections of New Mexico. Now LonelyCache may also search for a cache before it is due to be updated. For example, this will occur when a full update is performed for a cache, followed by the cache being searched for in the grid section. This shouldn't be a problem, though. Currently we have four accounts and each can search for 16,000 caches per day for a total of 448,0000 cache searches per week. LonelyCache currently has 132,377 caches so we're well within capacity. Now that LonelyCache is no longer dependent on pocket queries we could probably move to three accounts instead of four but I'd have to verify some things first.

I'm not sure when we may expand the LonelyCache territory more. There are requests for the DGP region of Mexico and for the southern portion of Idaho. My big concern now is the amount of resources we are using with the web hosting company. The database is increasing in size and expansion adds to that size. As I mentioned, LonelyCache currently has 132,377 caches. It also has 13,482,191 found logs and 217,572 geocachers. That's a big increase from when we went live in August 2012; off the top of my head it's something like 30% growth in caches and cachers and about 60% growth in found logs. I'll need to continue the discussion with the web hosting company on growth and server resources before any more expansion takes place.

Good questions, thanks for asking.

Dead Animals is what we'll become... Great song by The Young Evils.

Re: Expansion & Those Pesky Pocket Queries

Posted: May 22nd, 2014, 10:10 am
by rocketsciguy
Great reply; really appreciate the effort you've put into this site, and your willingness to be open about what's going on behind the scenes and under the hood, to mix metaphors. Some follow ups:
Corfman Clan wrote:In the short term, there may be a delay with some new caches being added to LonelyCache due to the transition from pocket queries to traversing the search grid. This is because the pocket queries are now turned off and all the active sections of the grid have yet to be searched. Now a new cache will be added to LonelyCache when the section of the grid it is in is searched. This means new caches may be added to LonelyCache at any time whereas before all new caches for a state were added in one fell swoop, once a week.
Would there be any value in continuing to use Pocket Queries but for the purpose only to harvesting data on new caches? As a test, I just setup a PQ for myself to return:
  • Up to 1000 caches
  • Traditional, Earthcache, Wherigo, Multi, Letterbox, Mystery/Unknown (Virtual and Webcam are grandfathered - no new caches; others types are events or unique and not cataloged by LC)
  • Any container
  • Within AZ, CO, NV, NM, UT, WY
  • Placed during "the last week"
It returned 236 new caches in those six states. I added California and Texas, and it increased to 630 (the great majority of which are not in the LC service area). Adding Idaho and Montana (both candidates for expansion) returned 709. Don't know my Mexican states well enough, but I expect a low rate of new cache publication -- the whole nation only had 5 new caches in the last week, 2 of which are events.

This is just a sample at a point in time, but it would be reasonable to break this into different PQs per state, or multiple small/sparse states in one PQ and large/dense states in their own, and expect them not to exceed 1000 new caches per week. Our volunteer cache reviewers have finite capacity, after all. The next ET Highway mass publication will of course break that assumption, but that's a 3-sigma event -- otherwise this would work ~99.5% of the time. ;) (There might also be trouble if the "placed on" date is significantly different from the publication date. There might be brand new caches that don't show on the PQ, but the API scan will get it within a week.)

So you could setup those PQs to run daily, and LonelyCache will always have the newest caches up-to-date. I think there is some value in doing that. If a random cache has 20 finds in its first week, its future as a low CP cache is foreordained. But if it goes a week with one or zero finds, it might be destined to be a high point lonely cache, and I might be more interested in going out of my way to get FTF or STF on that one. I would be very interested in that kind of information, but it requires sufficient and consistent data. Just an idea. :D
Corfman Clan wrote:I'm not sure when we may expand the LonelyCache territory more. There are requests for the DGP region of Mexico and for the southern portion of Idaho. My big concern now is the amount of resources we are using with the web hosting company. The database is increasing in size and expansion adds to that size. As I mentioned, LonelyCache currently has 132,377 caches. It also has 13,482,191 found logs and 217,572 geocachers. That's a big increase from when we went live in August 2012; off the top of my head it's something like 30% growth in caches and cachers and about 60% growth in found logs. I'll need to continue the discussion with the web hosting company on growth and server resources before any more expansion takes place.
Hosting is a real expense, and it is more likely to grow rather than shrink with time. Please let us know how we can defray your expenses (merchandise is great; I'd like to get some, but I'd send a donation too if I knew you had a fundraising goal), even if it's "Hey guys and gals, we need some more clicks on banner ads, or maybe use our Amazon or [insert sponsoring caching supply store here].com referral link and buy something to earn LonelyCache some extra cash [pun intended]." Whatever is within your limits as agreed with Groundspeak to keep the site non-commercial and permit use of the API. You might not be able to or want to make money off this, but it shouldn't be a money pit either. You already contribute so much time, let some of us help how we can too.

Thanks for all you do! Keep up the great work!

Re: Expansion & Those Pesky Pocket Queries

Posted: May 23rd, 2014, 5:55 pm
by Corfman Clan
rocketsciguy wrote:Would there be any value in continuing to use Pocket Queries but for the purpose only to harvesting data on new caches? As a test, I just setup a PQ for myself to return:
  • Up to 1000 caches
  • Traditional, Earthcache, Wherigo, Multi, Letterbox, Mystery/Unknown (Virtual and Webcam are grandfathered - no new caches; others types are events or unique and not cataloged by LC)
  • Any container
  • Within AZ, CO, NV, NM, UT, WY
  • Placed during "the last week"
It returned 236 new caches in those six states. I added California and Texas, and it increased to 630 (the great majority of which are not in the LC service area). Adding Idaho and Montana (both candidates for expansion) returned 709. Don't know my Mexican states well enough, but I expect a low rate of new cache publication -- the whole nation only had 5 new caches in the last week, 2 of which are events.
I hadn't thought about doing that. I think it's a good idea and I set up three pocket queries as you describe. One for AZ & CO, one for NV & NM, and one for UT & WY. Each are set to run each day of the week. Code changes would need to be made to do this for CA and TX as any cache returned in a PQ will be added to LonelyCache; there is no check whether the cache is actually in LonelyCache or not since it assumed by the PQ itself.

Thanks for the suggestion.