IBM, and Altavista have combined on a web mapping project. One of the more cartoon like results is featured as today's picture. Here's a short explanation from the IBM site:Our analysis reveals an interesting picture (Figure 9) of the web's macroscopic structure. Most (over 90%) of the approximately 203 million nodes in our
crawl form a single connected component if hyperlinks are treated as undirected edges. This connected web breaks naturally into four pieces. The first
piece is a central core, all of whose pages can reach one another along directed hyperlinks -- this "giant strongly connected component" (SCC) is at the
heart of the web. The second and third pieces are called IN and OUT. IN consists of pages that can reach the SCC, but cannot be reached from it -
possibly new sites that people have not yet discovered and linked to. OUT consists of pages that are accessible from the SCC, but do not link back to it,
such as corporate websites that contain only internal links. Finally, the TENDRILS contain pages that cannot reach the SCC, and cannot be reached from
the SCC. Perhaps the most surprising fact is that the size of the SCC is relatively small -- it comprises about 56M pages. Each of the other three sets
contain about 44M pages -- thus, all four sets have roughly the same size.
|
- jim 5-16-2000 3:19 pm