Interesting. Looks like redacted information can be (often) recovered by counting pixels:
The technology employed is, at first sight, nothing revolutionary. The two researchers measured the inclination of the text, deformed at the time of its digital reproduction - the inclination was an angle of 0.52°.

They then used a character recognition software to determine the width of the Arial-font text which provides the number of letters per unit of length. Simple recourse to an English dictionary then helped establish a list of possible words.

- jim 5-11-2004 7:59 pm

Yikes. Another advantage of courier.
- mark 5-12-2004 7:04 am


sounds like Cryptonomicon to me...stealing data from the screen. Here's my level of ignorance: was that fact, or fiction? (Oh, I just followed the link, duh...ignore me ignore me).
- sally mckay 5-12-2004 7:25 am


Back in the olden days, there were monospace typewritters. The Courier is a good approximation of a typical typeface. With the IBM Selectric, proportional spacing made it into the hands of the average consumer. But letters occupied one of three sizes "i", "n" or "m". The latter is sometimes called an em space.

But with desktop publishing, truly proportional spacing is commonplace. Any character can be of any width, and the space per letter varies widely in, for example Arial (Helvetica, Swiss, etc.) So I could see how this technique might work for short redactions.

If the text is justified, then the white space between words also varies, and it's harder to determine exactly how much space is consumed by the redacted word(s). The sample in the article is ragged on the right margin, meaning spaces are constant width.
- mark 5-12-2004 8:34 am





add a comment to this page:

Your post will be captioned "posted by anonymous,"
or you may enter a guest username below:


Line breaks work. HTML tags will be stripped.