Skip to main content

'Leaky Documents': A Dangerous Internet Trap

Did you know that when you exchange common Microsoft Word files over the Internet that you may be revealing hidden information from your computer?

Ask Alistair Campbell, British Prime Minister Tony Blair's former top communications aide. He released a file from the Downing Street press office to the House of Commons Foreign Affairs Select Committee investigating the genesis of a plagiarized document involving government justifications for the Iraq war.

The Microsoft Word document, without Campbell's knowledge, revealed the names of the four civil servants who worked on it. Campbell recently resigned his powerful post, and the British government now releases official documents only in Adobe's secure PDF format.

In a surprising report from the BBC, we ordinary computer users recently learned that the most common word processing documents generated on the most common word processing application -- Microsoft Word -- are not necessarily what we think they are. What you see on the screen or the printed page is not exactly what you get.

There is a function in all post-1997 versions of Microsoft Word, Excel and PowerPoint in which fragments of data from other files you deleted or were working on at the same time can be hidden in any document you save. With the right viewing software, this hidden data (Microsoft calls it "metadata") can easily be read by anyone who obtains the document file.

Computer researcher Simon Byers, the BBC reported, conducted a random survey of 100,000 Microsoft Word documents available from Web sites on the Internet. He found that every single one of them had hidden information.

In his research, Byers learned that about half the documents had up to 50 hidden words and a third up to 500 hidden words. Ten percent of the documents had more than 500 words concealed within them.

The hidden text often revealed the names of the document's authors, their relationship to each other, earlier versions of the text, and information about the internal network through which the document traveled. Occasionally, documents revealed very personal information such as social security numbers.


Byers said the problem of leaky Microsoft Word documents is "pervasive" and wrote that anyone worried about losing personal information should consider using a different word processing program. Alternatively, he recommends using utility programs that can scrub information from Word documents.

In informal conversations with colleagues and friends, I was surprised to hear my most computer-literate acquaintances say that they were fully aware of how Microsoft buries metadata in documents, but had never considered it a security risk. Non-technical users of Microsoft Word, however, were outraged and surprised, considering the phenomenon a huge privacy issue that needed wide public exposure.

The issue certainly caught the Washington, D.C. police department by surprise. During the widely publicized hunt for the Washington sniper the department allowed "The Washington Post" to publish a letter sent to the police that unintentionally included names and telephone numbers buried in the document file.

"The time when most information tends to leak is when you are using a document that has a number of revisions or a number of people working on it," said Nick Spenceley, founder of the computer forensics firm, Inforenz, in a BBC interview.

One way to protect against the hidden data issue is to send electronic documents only as Portable Document Format (PDF) files, advised Spenceley. "I'm not sure many people check Word documents before they go out or are published," he said.

There are a growing number of privacy horror stories with Word. Spenceley said he knows of a case in which someone found previous versions of an employment contract buried in the Word copy he was sent. Reading the hidden extras gave the person applying for the job a big advantage during negotiations.


It's perhaps no surprise that Microsoft doesn't see leaky documents as a security problem, but as a feature that its customers are responsible for using correctly. "Microsoft is aware of the functionality of metadata being stored within Word 97 documents and would advise users to follow the instructions laid out in the Microsoft Knowledge Base (see URL below)," a Microsoft PR spokesperson said. "However, Microsoft does not wish to comment on how customers use the functionality within our software."

One certain way to ensure document security and save considerable money is to avoid using Microsoft Word's default format to save documents. For most common documents, one can use Word or virtually any other word processor to save documents in the universal rich text format (RTF). Any modern word processor on any type of computer can read formatted RTF files. If sophisticated formatting, graphics or tables are an issue requiring the use of Word, PDF makes a secure display format while accurately replicating the document's style.

Mariner Software, the author of my preferred Mariner Write word processing application, quickly answered an e-mail query when I asked the company's assurance that no hidden data is embedded in any of my documents after saving them to the RTF format.

"I assure you there is no 'invisible' data retained or embedded," answered Mariner's Logan Ryan. "We do not subscribe to the 'questionable' marketing or technological tactics as Microsoft does. Your files are not tampered with."

Microsoft has published a document in its online Knowledge Base (Article # 223790) that describes various ways metadata is embedded in a Word document. Unfortunately, there's no single, one-button method to "clean" residual data from a Word document.

Click to access Microsoft's article.