This SEO test will be a part of a series I want to write about on the effects of the sitemap protocol and if it is really working. Hopefully by the end we will all better understand sitemaps and core principals about XML sitemaps. Stay tuned for more articles on the different sitemap related posts and the tools used.
I am going to run an experiment testing sitemap.xml and hanging pages. This will not be a flawless test but it will help me prove my assumptions.
I have been telling clients that if you add a url to an XML sitemap that it will be discovered. Why do I say this? Often times due to different varying factors search engines have a hard time finding pages inside of dynamic content pages. This test is not going to be about accessibility or how content is delivered. The SEO Experiement is only about sitemap files.
The test is going to be limited to hard coding the url into an xml sitemap and see if it can be found. My assumption will not be a solution to linking or standard url structure.
We already know that Google, Yahoo, and MSN will index urls similar to http://www.domain.com/?=123. We also know that pages with session id’s are usually not indexed because of the session variables. What I am trying to prove is that a url listed in an XML sitemap will be found by Google, Yahoo, and MSN without back links and only submitted through the different search engine webmaster tools .
How I am going to do this test is by creating a unique page that is not linked by any other pages and add it to its very own xml sitemap to be discovered. Once I created the pages I will add the sitemap to the webmaster tools section of each search engine. I am going to create another page and add it to a urllist.txt (an old protocol used by Yahoo. In the xml Sitemap I am going to put in 1 page only that is free floating and see how long it takes to get indexed by each search engine. In the urllist.txt file I am going to add another page and see if it is indexed.
Assumptions/problems/and forecast – Hypothesis
The file will be picked up in Yahoo and cached much fast than Google with xml sitemap file. The urllist.txt file will probably not be found on MSN which will be one problem. The other problem with MSN is they don’t yet have a way to add multiple sitemap files through their portal, even though they read robots.txt. The files from the urllist.txt will be picked up first by Yahoo and Google will follow.
2 html pages
1 sitemap.xml page
1 urllist.txt page
To be continued
Because I don’t want to spoil the experiment I won’t give the specifics locations of the files or exactly how I am going to do it. The experiment could take 1 day or 2 weeks. For now we will see. Stay tuned for the results and summary. More experiments are coming.