Page 1 of 1
Bing bot aggressively spidering LCP?
Posted: November 19th, 2012, 10:35 pm
by Corfman Clan
This morning I received an email from the admin of the company hosting LonelyCache and thought it was rather interesting
We run an automated process to attempt to detect and block data harvesting from some of our websites. The process watches for "aggressive" web browsing activity from individual IP addresses. In general any IP that requests more than 30 pages per minute averaged over a 5 minute period triggers an alert. In addition to monitoring or own website this system monitors all sites we host.
For some time I've noticed that the Bing bot has been aggressively spidering your Lonely Cache site. Bing bot is often visiting your site from multiple IP addresses simultaneously requesting upwards of 100-150 pages per minute combined. What I thought was odd is that the Bing bot is aggressively spidering your site almost daily. I haven't seen this type of aggressive spidering on any other sites we host. Do you have any idea why Bing bot is so interested in the Lonely Cache Project?
What's your ideas on why Bing bot is so interested in LonelyCache?
Re: Bing bot aggressively spidering LCP?
Posted: November 20th, 2012, 7:19 am
by skeeper
What is Bing bot?
Re: Bing bot aggressively spidering LCP?
Posted: November 20th, 2012, 8:29 am
by Corfman Clan
the greenskeeper wrote:What is Bing bot?
Oops, I sometimes forget not everyone knows what these techie terms are.
The internet search facilites, such as
Google,
Yahoo,
Bing, etc., have
web bots (or spiders) that basically traverse (crawl) all the world wide web gathering information that allows them to return search results quickly and (hopefully) that are worthwhile. So "Bing Bot" is Bing's web bot.
Re: Bing bot aggressively spidering LCP?
Posted: November 20th, 2012, 4:38 pm
by chris geertsen
i am not a computer expert so i would not know alot of these terms. nor do i know how they work

but why is this a bad thing for the site?
Re: Bing bot aggressively spidering LCP?
Posted: November 20th, 2012, 4:44 pm
by chris geertsen
so these bot's basically help out bing google etc for helping there browser page have more links to sites. maybe because this site is so new there trying to gather as much imformation as they can so when someone searches in there browser it will show up. i have been to at least one browser where i searched this name and it did not come up at all. there is my wisdom. doubt it's that good.

Re: Bing bot aggressively spidering LCP?
Posted: November 20th, 2012, 6:34 pm
by Corfman Clan
Of course I want the search engines to know about LonelyCache and include it as results in searches. That is a good thing. The question really is why the Bing bot is hitting LonelyCache as much as it is (way more than any other site the company is hosting).
My response to the email was
I don’t know why the Bing bot would be spidering LonelyCache that much. Perhaps it’s because all the pages in LonelyCache tend to have a lot of hyperlinks to other LonelyCache pages. With the dynamic nature of the site and the number of geocaches & geocachers in the LonelyCache territory, there is essentially millions of pages to navigate through.
That may be the reason why, I don't know. For example, for the points leaderboards, a typical page has over 400 hyperlinks and there are over 425,000 of them just for the LonelyCache Wide region.
Anyway, I was hoping for more whimsical reasons why the Bing bot might be so interested in LonelyCache, such as because it's so awesome how could it not be

Re: Bing bot aggressively spidering LCP?
Posted: November 20th, 2012, 6:36 pm
by Corfman Clan
Corfman Clan wrote:Anyway, I was hoping for more whimsical reasons why the Bing bot might be so interested in LonelyCache, such as because it's so awesome how could it not be

Or maybe, because LonelyCache is filled with so much
Baad Daata it constantly needs re-scanning...
Re: Bing bot aggressively spidering LCP?
Posted: November 21st, 2012, 1:42 pm
by Team Tierra Buena
Corfman Clan wrote:Corfman Clan wrote:Anyway, I was hoping for more whimsical reasons why the Bing bot might be so interested in LonelyCache, such as because it's so awesome how could it not be

Or maybe, because LonelyCache is filled with so much
Baad Daata it constantly needs re-scanning...
Or maybe the bots have a website where they get points for visiting lonely websites!
Happy Thanksgiving, everyone!
Re: Bing bot aggressively spidering LCP?
Posted: November 21st, 2012, 6:50 pm
by rocketsciguy
Corfman Clan wrote:Corfman Clan wrote:Anyway, I was hoping for more whimsical reasons why the Bing bot might be so interested in LonelyCache, such as because it's so awesome how could it not be

Or maybe, because LonelyCache is filled with so much
Baad Daata it constantly needs re-scanning...
That's funny!
I think your response to the hosting company is probably right... tons of hyperlinks on every dynamically-generated page, and every page is updated every day. Even if the content of a particular page doesn't change, the time stamp at the bottom of the page changes every update cycle, so if the Bing-Bot is doing a text-comparison of the HTML, it will find differences. Those changes probably tell the Bot to dig deeper. Blame Microsoft for having an overly aggressive, poorly designed web-crawler algorithm.
But please keep all those hyperlinks! They make the site very useful!
I think I remember from somewhere that there's a way to prevent or inhibit spiders from crawling your domain. A "policy" stored as a specially formatted 'spider.txt' file in the root directory or something like that. I bet your hosting company would be happier if the spiders only did their thing once every week or month, or not at all.
Re: Bing bot aggressively spidering LCP?
Posted: November 28th, 2012, 9:35 pm
by Ranger Alpha
Does LonelyCache have a
robots.txt file?
Re: Bing bot aggressively spidering LCP?
Posted: November 29th, 2012, 9:28 am
by Corfman Clan
Ranger Alpha wrote:Does LonelyCache have a
robots.txt file?
No, it doesn't and at this time I see no compelling reason to add one.
- A web bot may honor a robots.txt file or completely ignore it, so its utility is limited.
- We do want the search engines to know about LonelyCache, so we don't want to direct those web bots to stay away.
- The web hosting company isn't concerned about any adverse effects (performance or otherwise) from the Bing Bot spider. The admin was mostly just curious on what might be going on with it.
With that said, this did highlight a deficiency with our configuration that we have since changed that should make things better in the future. Currently we have the two domains: lonelycache.com and lonelycacheproject.com. We changed things so
http://www.lonelycache.com is our primary web site and
http://www.lonelycacheproject.com will have a permanent redirect to
http://www.lonelycache.com. Before this was done, they appeared to the search engines as two different web sites, now they will appear as just one. This should help you not get duplicate search results.