Jim Bassett's Weblog comments

I'm always trying to clarify just what it is I'm thinking about with these posts. So here goes another try. The problem is about building smarter information storage and retrieval systems. This calls for automated agents (software programs) that can do, at least, preliminary sorting of data for us. Search engines are examples of what we have built so far. Not very sophisticated. The main problem with search engines is that they have to rely on some variation of matching words (or strings of words, or letter patterns within words.) Not meanings, but words. 'Cat' and 'Kitten' are only the same to a search engine if it contains a dictionary that explicitly links 'cat' and 'kitten' as synonyms. Humans don't think in such a literal manner (I think.) We can infer meaning beyond mere identity. Software programs are not so good at metaphore, or god help us, poetry. With so much information available, we need our software filters (whether they be search engines or something else) to do more than find matching occurances of words. For them to be really helpful, we need software to understand meaning. That's the problem. There are two directions towards progress, and both, I think, are being pursued. I guess we'll meet somewhere in the middle. The first direction involves including some machine readable codes inside information documents that explain the meaning of the document. This is called metadata, and XML is the format being pushed to encode this metadata into documents. (Although how you form the syntax of the metadata is not dictated by XML, so it doesn't address the hard problem, it just provides a non-proprietary form to encode that solution into. We still must all agree to use the same metadata syntax, or we'll be back where we started.) The other direction to take involves changes to our languages themselves. These changes revolve around making less ambiguous symbol systems. Or, in other words, making human languages more machine like. (It's not that machines are becoming smarter and will eventually be intelligent in a human way; it's that we are becoming more like machines as they become more like us, and we will meet somewhere in the middle.) One example of this sort of language shift is how technical people tend to talk in strings of initials. HTML, XML, PPP, POP3, IMAP, NMAP, RAM, GPL, SQL, ect... This isn't just a time saving device. It is more machine readable. Searching for 'toast' may not turn up that great reference to 'lightly browned bread', but searching for 'XML' will always turn up articles on 'XML' because that is what we call it. Always. From the start. Probably this sort of shift in the language is troublesome to some people. Perhaps our humanity (or at least our culture) is located exactly in that ambiguous space between symbol and meaning. This gap allows us to be flexible, and maybe even beautiful. But it makes it really hard to communicate well with our machines.

Another example of this change in language, and what got me started on this today, is the idea of constructed languages. Everybody has heard of esperanto, the most popular constructed language. But how many know about lojban? It bills itself as "the logical language." Personally, I feel my time is better spent learning Java if I'm going to pick up a new language, but the description of lojban is really interesting. It's a sort of extreme case that highlights some of the more subtle shifts in language that are definitely happening alread. And man, would it be easier to write smart information systems if we all spoke more logically. Of course, we might not like that world very much. I guess we'll see. AFAIK; FWIW.

back to Jim Bassett's Weblog

[home] [subscribe] [login]