I was thinking about this yesterday: using a cameraphone as a scanner:NEC and the Nara Institute of Science and Technology have devloped technology which uses movie recordings to produce high quality images, on par with those of a scanner. This technology will be aimed at cellular phones and video cameras.
The technique involves recording a part of the subject to a movie, while moving the camera; the "Mosaicing Technology" analyzes the moving image and estimates the three-dimensional position of the subject, and under the supervision of the "Ultra Resolution Technology," the joining points of the image are deleted, thereby optimizing it so that even low resolution cameras can produce scanner like output. In other words, even cellular phones and video cameras can produce high quality images. My guess is this is a little way off, but that will be seriously useful. Even if it could just do business cards (with some OCR) it would rule.
This should be pretty easy if the "scan" is done well. It may be slow, given the puny processors on cell phones, but the mosaicing and OCR are pretty straight forward. The 3d aspects of this -- compensation for tilting, compensation for change in distance from the lens -- that stuff is hard. 3d transformations of image data have been done for at least 20 years, but it's still hard.
Seriously, could this be the answer to the keyboard problem with mobile devices? (I mean, the problem that you can't fit a full size keyboard on a tiny mobile device.) Just write what you want on a napkin (or whatever) and then snap a picture of it. Bing. That leverages some pretty well tested technology (hand writing) in a very unobtrusive way (no special pens required.)
That could work. The user would need some trial and error training on how to scan. The device/user would need some trial and error training to do OCR on handwriting.
But I've got an interesting anecdote about handwriting. I just had to fill in a credit application to buy a shit load of Xilinx chips. The form was a PDF that I printed, filled in by hand, and faxed back. Anyway, filling out the names and addresses of 5 companies gave me writers cramp! Hmm, I guess I have largely switched to mouse and keyboard.
I had a similar experience while at jury duty. I had to fill out a form. I don't think I had written that much in over a year, and my handwriting was extremely bad. I could barely recognize it as my own.
I'm sure there's a simple answer but I don't get it. You take a movie file with the still camera (or phone) and as it records you move the device across the surface you are scanning? Wouldn't you need some sort of track to insure that the camera doesn't wobble? And wouldn't the speed that the camera moves be hyper critical? I guess a motorized system could get around these problems, one much like those employed by scanners. So what's the difference?
I guess with enough processing power you can get around those problems. Probably it's key to have a lot of overlap between different frames. They already have hand held scanners, although not in phones, so it's definitely possible.
A normal scanner (as in a copier, fax or scanner) looks at only a very narrow slice of the image at any one moment, and must carefully scan the image at a set speed, geometry, etc. We've all seen what happens when paper doesn't flow smoothly through a fax machine. All sorts of distortion is possible.
Scanning with a video or still camera is a different sort of thing. Each "snap shot" or "frame" has a bunch of stuff in it rather than being a narrow strip of the image. But each frame will typically have less than full coverage of the item being scanned. An image concatenation (or mosaicing) routine can construct a panorama from a set of frames by looking at the overlap.
To operate without human aid, the mosaicing routing needs to have some machine vision capabilities. It needs to recognize features in the image, and line up those overlapping features without being fooled by things like repeated letters, words, textures. This process is complicated by the inevitable changes in geometry between frames during a hand-held scan. Image warping is a well-known, but somewhat compute intensive, approach to dealing with that.
It's all doable, but the processing power in a mobile unit would be stressed by the software required to do fool-proof mosaicing of a hand-held scan.
OK, right, a mosaic as opposed to a narrow slice. I get it now. I guess the processing power needed will come eventually. And I didn't know they make hand held scanners which I think is pretty cool but the camera/phone scan would be awesome.
This doesn't do the "move the camera around to scan a large area" thing, but this about to be released camera phone can take still pictures with it's 1.2 megapixel camera and then run OCR on them. So, supposedly, you can snap a picture of, say, a business card and then import the info right into your address book.
If the software is slick that would be really powerful. Could this be a reasonable replacement for a keyboard? If you want to write an email you could just whip out a pen, write your message on a napkin, and then take a photo of it. Not bad. But I doubt the experience would be seamless. At least yet.
|
- jim 2-24-2004 6:21 pm
This should be pretty easy if the "scan" is done well. It may be slow, given the puny processors on cell phones, but the mosaicing and OCR are pretty straight forward. The 3d aspects of this -- compensation for tilting, compensation for change in distance from the lens -- that stuff is hard. 3d transformations of image data have been done for at least 20 years, but it's still hard.
- mark 2-25-2004 2:25 am
Seriously, could this be the answer to the keyboard problem with mobile devices? (I mean, the problem that you can't fit a full size keyboard on a tiny mobile device.) Just write what you want on a napkin (or whatever) and then snap a picture of it. Bing. That leverages some pretty well tested technology (hand writing) in a very unobtrusive way (no special pens required.)
- jim 2-25-2004 2:32 am
That could work. The user would need some trial and error training on how to scan. The device/user would need some trial and error training to do OCR on handwriting.
But I've got an interesting anecdote about handwriting. I just had to fill in a credit application to buy a shit load of Xilinx chips. The form was a PDF that I printed, filled in by hand, and faxed back. Anyway, filling out the names and addresses of 5 companies gave me writers cramp! Hmm, I guess I have largely switched to mouse and keyboard.
- mark 2-25-2004 3:12 am
I had a similar experience while at jury duty. I had to fill out a form. I don't think I had written that much in over a year, and my handwriting was extremely bad. I could barely recognize it as my own.
- jim 2-25-2004 3:14 am
I'm sure there's a simple answer but I don't get it. You take a movie file with the still camera (or phone) and as it records you move the device across the surface you are scanning? Wouldn't you need some sort of track to insure that the camera doesn't wobble? And wouldn't the speed that the camera moves be hyper critical? I guess a motorized system could get around these problems, one much like those employed by scanners. So what's the difference?
- steve 2-27-2004 10:02 pm
I guess with enough processing power you can get around those problems. Probably it's key to have a lot of overlap between different frames. They already have hand held scanners, although not in phones, so it's definitely possible.
- jim 2-27-2004 10:07 pm
A normal scanner (as in a copier, fax or scanner) looks at only a very narrow slice of the image at any one moment, and must carefully scan the image at a set speed, geometry, etc. We've all seen what happens when paper doesn't flow smoothly through a fax machine. All sorts of distortion is possible.
Scanning with a video or still camera is a different sort of thing. Each "snap shot" or "frame" has a bunch of stuff in it rather than being a narrow strip of the image. But each frame will typically have less than full coverage of the item being scanned. An image concatenation (or mosaicing) routine can construct a panorama from a set of frames by looking at the overlap.
To operate without human aid, the mosaicing routing needs to have some machine vision capabilities. It needs to recognize features in the image, and line up those overlapping features without being fooled by things like repeated letters, words, textures. This process is complicated by the inevitable changes in geometry between frames during a hand-held scan. Image warping is a well-known, but somewhat compute intensive, approach to dealing with that.
It's all doable, but the processing power in a mobile unit would be stressed by the software required to do fool-proof mosaicing of a hand-held scan.
- mark 2-27-2004 10:58 pm
OK, right, a mosaic as opposed to a narrow slice. I get it now. I guess the processing power needed will come eventually. And I didn't know they make hand held scanners which I think is pretty cool but the camera/phone scan would be awesome.
- steve 2-28-2004 12:55 am
This doesn't do the "move the camera around to scan a large area" thing, but this about to be released camera phone can take still pictures with it's 1.2 megapixel camera and then run OCR on them. So, supposedly, you can snap a picture of, say, a business card and then import the info right into your address book.
If the software is slick that would be really powerful. Could this be a reasonable replacement for a keyboard? If you want to write an email you could just whip out a pen, write your message on a napkin, and then take a photo of it. Not bad. But I doubt the experience would be seamless. At least yet.
- jim 3-11-2004 5:38 pm