When trying out different CMS for Webmaster Woman last summer, none of which is being used, I had to nuke one (Drupal) because it was throwing out Session IDs. It seems that now that’s creating a problem with Yahoo because although they’re usually right on it with picking up new sites and pages and including them in the index, while the new site is getting hit by the crawler, all there is in the index is two “pages” - one of which doesn’t even exist.

I believe it may possibly be a duplicate content issue, since during the time of the test install there was a folder called /search and now Yahoo is showing that with the SessionID (which it hits with the crawler constantly) - with a duplicate of what’s on the current temporary homepage of the site.

I’m not sure if it will make a difference with or withouth the trailing slash for /search/ or /search but I’m putting up another temporary page in a directory I’m creating as

http://www.webmasterwoman.com/search/

That’s returning a 404 but this is also being picked up with a SessionID for the homepage, and returning a 200:

http://www.webmasterwoman.com/?PHPSESSID=7f5131f335eb7c2787a98392a04131d3

I’ll see if it will be picked up with whatever new is put on it for the time being and get rid of that Session ID page being hit daily - which doesn’t even exist, Slurp is just still looking for it.

The site was totally crawled in the past day, par for the course with Yahoo, they’re not asleep on the job. Not that it matters much right now because the site isn’t near ready for a launch, but it’s a challenge to deal with this issue, which I’ve never had to before.

Comments

Comments are closed.