I think this site gets hit by more robots than humans. The indexing going on is insane. Does the googlebot ever sleep? Or play with its friends? My god that thing needs a day off.
- jim 9-28-2000 4:06 pm

What do you mean?
- jimlouis 9-28-2000 9:02 pm


Search engines (and other concerns cataloging the net) write programs that "travel" around the web recording certain bits of information. One way they can travel is by following links found on the pages they are serching. Pretty simple program, really. Start here on page a, make note of the URL, title, any information in the hidden meta tags, maybe remember the first 100 words or so, or remember all the words marked as headlines in HTML, plus note the URL's of any links on the page. Report back to home base. Then start over again, using one of the links you found on page a. Even better, have the program (called a "spider" or a "robot") make a copy of itself (called "forking") everytime it finds a URL, and have the copy immediately follow the link. Soon you will have crawled (oh yeah, they are also called "crawlers") all over the net. Or at least all over the parts of the net that are somehow connected (via hyperlinks) to each other. Google (the search engine) is constantly crawling the net. That's how they know which pages have information on whatever it is you gave it to search for (i.e., it doesn't go out and search the net when you use google; it's already searched the net, and it's replying from its own store of knowledge, its own model of the semantics of the web.) They'd quickly be useless without current information, so they are always gathering. And not just google is doing this. I can see in the referer logs (a file the server generates and adds one line to each time someone hits the site) where individual hits come from, as well as what browser, and what computer platform was being used. Since these robots are not traditional browsers they usually show up with names like metacrawler, or websweep, or (for google) googlebot, or (most disturbingly - I saw this one, really) email_siphon. They usually supply a URL to a page with an explanation about what they are doing. Constantly looking through your referer logs is a sickness that infects some bloggers. Assuming someone has followed a link to your page, you can see the URL of the link, and this is how people figure out who is linking to them (as well as what robots are searching the site.) It is traditional (but not mandatory) for robots to be built in such a way that the first file they look for on a new site is the robots.txt file. If they find it they read that file, in which the webmaster can list any pages or directories they do not want indexed. Obviously the robots take this advice on an honor type system. And that's Robots (in this context at least.) By the way, one of the first programs designed to do this was poorly designed; it ended up taking down each site it visited, becoming the first large scale web virus in internet history. I'll try to find the link for that - good story.
- jim 9-29-2000 2:10 am


Cool.
- jimlouis 9-29-2000 5:18 am


Kendall (from monkeyfist) has noticed the storm of googlebot activity too.
- jim 10-04-2000 3:00 pm


Now Dave Winer has commented too. Wow, I was actually out in front of this one.
- jim 10-06-2000 5:47 pm





add a comment to this page:

Your post will be captioned "posted by anonymous,"
or you may enter a guest username below:


Line breaks work. HTML tags will be stripped.