Now the word count total for each word is linked to the search engine which will return the posts in which the word is used if your curious about checking (I used 'love' 14 times? Where?) This actually uncovered lots of weaknesses in the search engine. I've patched most of them, but I see now what a big problem it is. What's a word anyway? Is it just any string of characters delimited by spaces? What about punctuation (is 'here' and 'here,' the same word?) or what about html (is 'here' and '<i>here</i> the same word?) It seems like they should be the same, but if so, you need to teach the system a lot more than just that a word is any string of characters delimited by spaces.
Keep in mind that the word_count script is only counting word occurances on the top level, while the search engine finds occurances on the top level, plus any in discussion pages below. So the search engine might return more hits than word_count. Also, it might well be flakey in other ways.
|
- jim 11-03-2000 3:27 pm
Keep in mind that the word_count script is only counting word occurances on the top level, while the search engine finds occurances on the top level, plus any in discussion pages below. So the search engine might return more hits than word_count. Also, it might well be flakey in other ways.
- jim 11-03-2000 6:33 pm [add a comment]