rocketsciguy wrote:So, because I noticed, I have to ask: Why the one letter source tables/queries names? "r", "g", "s", "p"? Or is this some syntax with which I am unfamiliar?
I'm no SQL export so I usually run to my brother Robert for help. The problem there is he's absolutely amazing with Oracle SQL and we're using SQL Server. A lot transfers but there are differences. I also run to Redfist too since he's much better with SQL than I. So take what follows with a grain of salt.
For Select statements, there's the select part, that is, what you want to select, and the predicate (the from part, that is, what you are selecting from). If the predicate is another select, then it has to be named. So,
SELECT a FROM (SELECT c FROM d) b the "b" is giving a name to the select within the parenthesis. This is the case with the "r" and "s". I picked "r" for selecting the rank, and "s" for selecting the sum.
I'm not positive but I believe the "g" is only needed for shorthand. That is, it is a lot easier to just type g.Name than dbo.Geocacher.Name. Wow, I just noticed I used "g" to alias two different tables too, though they are scoped differently so it is not an issue.
rocketsciguy wrote:Are the webpages static HTML generated en masse with each update (hope not), or generated on demand from a set of template pages which use, e.g., PHP fields passed through the URL to query the database?
The website is using Microsoft's .NET so the pages are essentially templates with server side code that runs when they are requested.
For the LeaderBoard page, the above query is generated and then run and the results are utilized in creating the HTML for the page. Let's take a look at the above query. It is using RegionID=1. RegionID=1 is LonelyCache wide. For some other region (such as AZ, CO, etc), the RegionID would be different. The query also has Job=1 or Job=2. Job specifies what type of statistics are stored in the GeocacherPoints table. 1 is for find points and 2 is for hide points. We'll have different job value for the backcountry boards. You can also see use of the
OverallPoints column in the table. We have other columns for the different cache types too. So for the Multi-cache board, we'll replace the
OverallPoints column with the
MultiPoints column. At the bottom of the query, you'll also see
WHERE Row BETWEEN 1 AND 100. By adjusting the 1 and 100, we can, for example, allow paging through the leaderboard.
So, why did I just write all the above? Well, when the leaderboard page is requested, there will be some parameters passed in the URL that specify the region and board and the page of the board to display. The server side code will evaluate all that and modify the query as needed then execute the query. When the results come back, the server side code will then generate the HTML for the page and that is then sent by the web server to your webbrowser and displayed.
rocketsciguy wrote:
Looks like you have the point totals pre-computed and stored in tables, probably with the heavy crunching performed offline then pushed to the server, is that right?
Yes, things are set up to update the statistics once a day once things are up to date. We decided to not spend the time updating the statistics if data gathering is taking place. On the test site, that is set to 4:00 am Eastern time. Usually they're done a few minutes after 4, other times they don't run until a few hours later. Updating the statistics seems to take a few minutes of pretty intense number crunching on the database server. That time will only grow as the number of caches, cachers, and logs increase and as we add more statistics. The updates are done online at the server though. That is, all statistic updates are actually SQL update or insert queries.
rocketsciguy wrote:
It would be cool if the Live API data were fed directly into your live database, and all the supporting queries update with new data as it comes in. But I imagine that would be difficult to keep from breaking and might be "unfair" because of the round-robin way the Pocket Queries bring in data, and would likely bog down the server enormously. Just displaying this Top 10 rankings would require that every cache's points be recomputed, LCP-wide, and every cacher's total hide and find scores be summed -- looks like you're taking the sane route of doing those computations off-line.
Like I mentioned above, the statistics updates are being done online, not offline. They are only being run once a day though and only then if no data gathering is taking place. We may update a cache's points as we gather data, but we will not be updating the statistics used to generate the leaderboards. Our feeling is that updating the statistics in realtime would be less meaningful than once a day and we took that into account when we came up with the current design. One problem with this is that while the statistics are being updated, anyone browsing the leaderboards will probably see a lot of garbage and have poor response. I'm thinking we may put up some banner or disable disable the leaderboard pages while this is going on.