For All Titles, see the Master Archive Index.
Click a Calendar Day or choose a Section of Interest.

July 8, 2005

Engine Listing Delays - Playing in Google's Sandbox

Playing in Googlebot's Sandbox with Slurp, Teoma,
& MSNbot
By Mike Valentine

There has been endless webmaster speculation and
worry about the so-called "Google Sandbox" - the
indexing time delay for new domain names - rumored
to last for at least 45 days from the date of
first "discovery" by Googlebot. This recognized
listing delay came to be called the "Google
Sandbox effect."

Ruminations on the algorithmic elements of this
sandbox time delay have ranged widely since the
indexing delay was first noticed in spring of
2004. Some believe it to be an issue of one single
element of good search engine optimization such as
linking campaigns. Link building has been the
focus of most discussion, but others have focused
on the possibility of size of a new site or
internal linking structure or just specific time
delays as most relevant algorithmic elements.

Rather than contribute to this speculation and
further muddy the Sandbox, we'll be looking at a
case study of a site on a new domain name,
established May 11, 2005 and the specific site
structure, submissions activity, external and
internal linking. We'll see how this plays out in
search engine spider activity vs. indexing dates
at the top four search engines.

Ready? We'll give dates and crawler action in
daily lists and see how this all plays out on this
single new site over time.

* May 11, 2005 Basic text on large site posted on
newly purchased domain name and going live by days
end. Search friendly structure implemented with
text linking making full discovery of all content
possible by robots. Home page updated with 10 new
text content pages added daily. Submitted site at
Google's "Add URL" submission page.

* May 12 - 14 - No visits by Slurp, MSNbot, Teoma
or Google. (Slurp is Yahoo's spider and Teoma is
from Ask Jeeves) Posted link on WebSite101 to new
domain at Publish101.com

* May 15 - Googlebot arrives and eagerly crawls
245 pages on new domain after looking for, but not
finding the robots.txt file. Oooops! Gotta add
that robots.txt file!

* May 16 - Googlebot returns for 5 more pages and
stops. Slurp greedily gobbles 1480 pages and 1892
bad links! Those bad links were caused by our
email masking meant to keep out bad bots. How
ironic slurp likes these.

* May 17 - Slurp finds 1409 more masking links &
only 209 new content pages. MSNbot visits for the
first time and asks for robots.txt 75 times during
the day, but leaves when it finds that file
missing! Finally get around to add robots.txt by
days end & stop slurp crawling email masking links
and let MSNbot know it's safe to come in!

* May 23 - Teoma spider shows up for the first
time and crawls 93 pages. Site gets slammed by
BecomeBot, a spider that hits a page every 5 to 7
seconds and strains our resources with 2409 rapid
fire requests for pages. Added BecomeBot to
robots.txt exclusion list to keep 'em out.

* May 24 - MSNbot has stopped showing up for a
week since finding the robots.txt file missing.
Slurp is showing up every few hours looking at
robots.txt and leaving again without crawling
anything now that it is excluded from the email
masking links. BecomeBot appears to be honoring
the robots.txt exclusion but asks for that file
109 times during the day. Teoma crawls 139 more
pages.

* May 25 - We realize that we need to re-allocate
server resources and database design and this
requires changes to URL's, which means all
previously crawled pages are now bad links!
Implement subdomains and wonder what now? Slurp
shows up and finds thousands of new email masking
links as the robots.txt was not moved to new
directory structures. Spiders are getting errors
pages upon new visits. Scampering to put out fires
after wide-ranging changes to site, we miss this
for a week. Spider action is spotty for 10 days
until we fix robots.txt

* June 4 - Teoma returns and crawls 590 pages! No
others.

* June 5 - Teoma returns and crawls 1902 pages! No
others.

* June 6 - Teoma returns and crawls 290 pages. No
others.

* June 7 - Teoma returns and crawls 471 pages. No
others.

* June 8-14 Odd spider behavior, looking at
robots.txt only.

* June 15 - Slurp gets thirsty, gulps 1396 pages!
No others.

* June 16 - Slurp still thirsty, gulps 1379 pages!
No others.

So we'll take a break here at the 5 weeks point
and take note of the very different behavior of
the top crawlers. Googlebot visits once and looks
at a substantial number of pages but doesn't
return for over a month. Slurp finds bad links and
seems addicted to them as it stops crawling good
pages until it is told to lay off the bad liquor,
er that is links by getting robots.txt to slap
slurp to its senses. MSNbot visits looking for
that robots.txt and won't crawl any pages until
told what NOT to do by the robots.txt file. Teoma
just crawls like crazy, takes breaks, then comes
back for more.

This behavior may imitate the differing
personalities of the software engineers who
designed them. Teoma is tenacious and hard
working. MSNbot is timid and needs instruction and
some reassurance it is doing the right thing,
picks up pages slowly and carefully. Slurp has
addictive personality and performs erratically on
a random schedule. Googlebot takes a good long
look and leaves. Who knows whether it will be back
and when.

Now let's look at indexing by each engine. As of
this writing on July 7, each engine also shows
differing indexing behavior as well. Google shows
no pages indexed although it crawled 250 pages
nearly two months ago. Yahoo has three pages
indexed in a clear aging routine that doesn't list
any of the nearly 8,000 pages it has crawled to
date (not all itemized above.) MSN has 187 pages
indexed while crawling fewer pages than any of the
others. Ask Jeeves has crawled more pages to date
than any search engine, yet has not indexed a
single page.

Each of the engines will show the number of pages
indexed if you use the query operator
"site:publish101.com" without the quotes. MSN 187
pages, Ask none, Yahoo 3 pages, Google none.

The daily activity not listed in the three weeks
since June 16 above has not varied dramatically,
with Teoma crawling a bit more than other engines,
Slurp erratically up and down and MSN slowly
gathering 30 to 50 pages daily. Google is absent.

Linking campaign has been minimal with posts to
discussion lists, a couple of articles and some
blog activity. Looking back over this time it is
apparent that a listing delay is actually quite
sensible from the view of the search engines. Our
site restructuring and bobbled robots.txt
implementation seems to have abruptly stalled
crawling but the indexing behavior of each engine
displays distinctly differing policy by each major
player.

The sandbox is apparently not just Google's
playground, but it is certainly tiresome after
nearly two months. I think I'd like to leave for
home, have some lunch and take a nap now.

Back to class before we leave for the day kiddies.
What did we learn today? Watch early crawler
activity and be certain to implement robots.txt
early and adjust often for bad bots. Oh yes, and
the sandbox belongs to all search engines.

Mike Banks Valentine is a search engine
optimization specialist who operates http://WebSite101.com
and will continue reports of case study
chronicling search indexing of http://Publish101.com

----------------------------------------------------
Copyright © 2005 Mike Valentine

Bob

March 27, 2005

Add This PHP Code To Email Search Engine Crawler Alerts

Do you want to be automatically informed by email when Google's spider visits your web site?
A search engine spider is an automated software program that locates and collects data from web pages for inclusion in a search engine's database. The name of Google's spider is "Googlebot".

If you have a web site that allows you to use PHP code then your web pages can inform you when Google's spider has indexed them.

Ask your web space provider if you can use PHP code on your site. If you can, add the following piece of code at the very beginning of your web page HTML code, before the doctype declaration and before the tag:


$email = "you@example.com";

if(eregi("googlebot",$_SERVER['HTTP_USER_AGENT']))

{

mail($email, "The Googlebot came to call",

"Google has visited: ".$_SERVER['REQUEST_URI']);

}

?>

(Source: V. M. Marshalls weblog)
This little piece of PHP code recognizes Googlebot if it visits the web page, and it informs you by email when Googlebot has been there.

Of course, you have to replace "you@example.com" with your own email address. The web page must end with .php and you must be allowed to use PHP on your web site.

Note that a visit of the Googlebot doesn't mean that Google will index your web site. Google will decide later if the visited page is suitable for its database.

You can also use this code with other search engine spiders. To be informed by email when Yahoo's spider visits your web pages, replace "googlebot" in the example with "yahoo! Slurp":


$email = "you@example.com";

if(eregi("yahoo! Slurp",$_SERVER['HTTP_USER_AGENT']))

{

mail($email, "The Yahoo bot came to call",

"Yahoo has visited: ".$_SERVER['REQUEST_URI']);

}

?>


You can also do this with other search engine spiders. You can get a list of search engine spider names here.

To invite Google, Yahoo and other major search engines to visit your web site, you have to submit your web site to these search engines. You can either do this manually or you can use IBP to submit your web site to all important search engines.

IBP not only helps you to submit your web site to search engines, it also helps you to optimize your pages so that search engines rank your web pages higher.

"Copyright by Axandra.com. Web site promotion software."

Bob

April 19, 2004

Articles To Live By For Site Design and Promotion

If you take the time to go through this group of articles and absorb the details, you'll have a very thorough understanding of web site design and promotion.

iSiteBuild Articles

Bob

April 15, 2004

Can You Control Your Google "Snippets"? - Guest Article

Jill Whalen offers one of her readers some answers about how to "manage" your pages to control your descriptions on Google search results pages or whether you should even try to...

A Reader Asks Jill:
Now that Google has stopped using [DMOZ descriptions] it is even more important which fragments of your copy Google chooses to display. The Webmaster fully controls the title tag and the URL, but those lines in between are a key factor when a potential customer decides whether to visit your site or the one below it.

Searching for "Search Engine Optimization," I noticed that Google chose to display the description from your index page (highrankings.com), whereas other sites have just fragments of their copy displayed.

I assume lots of your readers would like to know more about how to influence Google's description, so maybe this would be a topic for an upcoming newsletter. Questions I would like answered include:

- How or under what circumstances will Google choose to display the description instead of just fragments of the copy?
- Is Google more likely to choose the description from pages/sites with a high PR (credibility)?
- If Google chooses to display fragments of the copy would it be wise to alter those parts of the copy (slightly) to make them look more appealing in the SERPs? I realize that you shouldn't alter the keywords, but the words around them might not be as appealing as one could wish. Google mostly does a good job at selecting relevant parts of your copy, but still!

Yours sincerely,

Henrik Ranch

++Jill's Response++

Henrik, those are all great questions, and I'm sure others will also be interested in the answers.

First, it's important to note that Google will display a different description in its results pages, depending on the keyword phrase that was searched for. Because of this, it's impossible to completely control what will show up as you have no way of knowing every single
keyword phrase that someone might use to find your site. The number of different phrases people find your site with is often quite surprising, which makes it a good idea to check your server logs regularly.

So let's look at what circumstances cause Google to display your Meta description as opposed to a plain old "snippet."

It's actually surprisingly simple, yet elegant:

If the keyword phrase searched upon by the user is included in your Meta description tag, hen this is what will show up in Google, instead of the usual snippet.

This is why when you find my site on the first page of Google for the phrase search engine optimization, my Meta description shows up -- that phrase is used within my description tag. As long as the *exact phrase* being searched upon is contained in your Meta description tag, it will show up in the search results.

It's as simple as that, and doesn't have anything to do with the PageRank or the authority of the page in question. The same thing should happen for every site under the same circumstances. This is why Meta description tags are still important to use on all of your pages. They don't actually boost your rankings, but they allow you to have better control of the description that shows up for your optimized keyword phrases.

Does this mean you should create really long Meta descriptions that have every conceivable keyword phrase in them? Of course not! If you have a long Meta description, Google will then just show a snippet of your Meta description that contains the searched-upon phrase.
Theoretically, you could create a 10- or 20-sentence Meta description tag that used every conceivable keyword phrase you might be ranked highly with, but you wouldn't get any real benefit from that.

So what happens when the searched-upon phrase *isn't* used within your Meta description tag?

You'll notice that if you do a search for search engine rankings in Google, you'll still find my site in the first page of results, but this time my Meta description tag *doesn't* show up. Instead, you'll see the following snippet: "When performed by a qualified, experienced
search engine optimization consultant, optimizing your site for high search engine rankings really does work!"

It's not my Meta description, but it's a darn nice marketing statement nonetheless. So, how'd I do it? Well, I waved my magic wand and said "ala peanut butter sandwiches" and poof, there it was! Don't worry, for those of you who don't have magical powers, you can achieve the same thing through other means. In reality it showed up as it did simply because that sentence was the first (and in this case the only) instance of the phrase "search engine rankings" within the copy of my page. Pretty neat, huh?

I didn't purposely optimize my snippets as I'm really not smart enough to think of such things, but there's no reason why you can't or shouldn't optimize yours. There are definitely things you can do to get the most benefit out of your Google listings, once you realize how it works.

The best (and most obvious) thing to do is to make sure your two or three optimized keyword phrases are being used in your Meta description tag. However, if there's a phrase you happen to be getting found for that you can't or don't want to use in your Meta description tag, then just make sure that the *first instance* of that phrase is part of a nice call-to-action marketing statement within your copy, and voila -- that's pretty much all it takes!

On a related note, if you happen to check to see if your page is indexed by Google, and search for it by company name or by the site's URL, you'll often find that this brings up the copyright notice at the bottom of the page, or some other strange, ugly snippet in the search engine results page (SERP).

I've had frantic emails from people concerned about this, but there's really no reason to be worried. Your site's description only shows up that way for that particular search, not for the real searches people do to find your site. Since the copyright notice at the bottom of your page is often the only place that mentions your company name or URL, it becomes the Google snippet by default!

If you're not a big brand that people are searching for by company name, then you don't need to worry about it. However, if people do indeed search for you by name, then simply control the snippet by making sure you put the company name in your Meta description or in a nice marketing sentence closer to the top of the page.

So you can see, although those descriptions at Google never seemed to have any rhyme or reason to them, they actually do! You'll find that the more you learn about search engine marketing and optimization, the less mystifying it becomes. Most of the time, the things that seem really strange on the surface are actually very logical.

Jill Whalen

Visit the High Rankings Advisor website at http://www.highrankings.com/advisor.htm
and don't forget to check out the High Rankings Search Engine Optimization Forum at:
http://www.highrankings.com/forum

Bob

Google
Copyright ©HelpYouToSell.com  All Rights Reserved.
Linking Allowed, Publishing Allowed With Byline Of:
Bob Turpen
Helping You To Sell Online