Anonymizing the Written Word

Stylometry is a generally computer-driven way to de-anonymize a text just from metrics like word choice and sentence length. (And quite a few more.)

I’ve mentioned before that communications have certain parallels with holograms, in reflecting more information about the originator than may be intended. Stylometry is one example of this.

As it happens, stylometry is a huge problem for anonymity — even if you perfectly mask your IP address and other metadata, the message content may be enough to identify you. Indeed, one of the SnowdenLeaked slides hinted Nut Squirrelers’ Anonymous has deployed it on a mass scale to spot given authors anywhere they publish on the Internet… sadly the relevant slide was mostly redacted, and this is conjecture.

Here’s one possible solution.

A couple of clever CS students wrote a set of libraries and accompanying program that lets anyone run a text they just wrote through an “anonymizing” program. The program compares the text with a large body of text written by people other than the real author, and then suggests changes which would make it more anonymous.

The program is very much still in development, and (horror!) uses Java. It still looks like a great idea.

By the way, there’s also a low-tech solution. Immerse yourself in the work of an author with a strong, defined style (Douglas Adams…) and even copy out by hand a few pages of their work. The next few pieces you write, if you try a little, will differ greatly from your usual style!

Just check it against existing stylometry tools to be sure it’s different enough.

“An open source project to combat “stylometry”, the study of attributing authorship to documents based only on the linguistic style they exhibit, is proving that it is possible to change writing style so as to evade detection.

Artificial Intelligence techniques are routinely used to detect plagiarism and recently were employed to reveal that Harry Potter author J K Rowling is indeed the author of The Cuckoo’s Calling published under the byline of Robert Galbraith. Now software is tackling the opposite problem – anonymizing writing style to protect the identity of the originator.[…]

The JStylo-Anonymouth (JSAN) framework is work in progress at PSAL under the supervision of assistant professor of computer science, Dr. Rachel Greenstadt. It consists of two parts:

JStylo – authorship attribution framework, used as the underlying feature extraction employing a set of linguistic features
Anonymouth – authorship evasion (anonymization) framework, which suggests changes that need to be made

In the small scale user study (10 participants) reported in the award-winning paper, 80% were able to anonymize their documents to a limited extent. Modifying pre-written documents was found to be difficult and the anonymization did not hold up to more extensive feature sets. However, the students point out:

It is important to note that Anonymouth is only the firrst step toward a tool to achieve stylometric anonymity with respect to state-of-the-art authorship attribution techniques. The topic needs further exploration in order to accomplish signifcant anonymity.”

