The Panama Papers may have been the biggest cross-border investigative journalism project in history, but it’s only the beginning.
Investigations like this are made possible by intelligence analysis and document discovery software that until recent years was only available to intelligence and law enforcement agencies. But as computer power has increased, so has the size of the potential market. These tools are now priced such that even journalists can afford them, and they’re so well-designed that even journalists can use them.
For the Panama Papers, the International Consortium of Investigative Journalists used software from the Australian-born company Nuix. While the company certainly has some strong competitors — including Palantir, i2, and New Zealand’s Wynyard Group — Nuix’s tools are used by the US Secret Service and Department of Homeland Security, INTERPOL and, here in Australia, the Department of Defence, ICAC, and various other agencies.
Such tools ingest all of the documents associated with an investigation — whether they’re emails, reports, spreadsheets, faxes, or phone and data logs such as those kept under mandatory data retention laws — allowing investigators to search them in pretty much any way they like. The results are then linked to documents the system has decided are related.
“We [highlight] people’s names, countries, telephone numbers, email addresses, company names, credit card details, lots of very high-value pieces of information, depending on what you’re looking for,” said Nuix chief executive officer Ed Sheehy.
“[For example with] credit card numbers of interest, we will show you automatically that it was in an email, and there was four documents attached to it, and the same sender has sent 15 emails in the past, these [potentially different] credit card numbers are inside those 15 emails, and these were the images that were inside there.”
Sheehy was speaking at a gathering of some of the nation’s leading investigative journalists, plus me, at Nuix’s Sydney headquarters on Monday. Those who’d used the software on the Panama Papers sang its praises.
The AFR’s Neil Chenoweth described the process as “absolutely exciting”.
Four Corners journalist Marian Wilkinson was “incredibly impressed”, and told Crikey it was a “fantastic system, no doubt about that”.
“You could try and plough away in the server … to try and do the research on the documents, which was bloody hard I have to say, but also you could share information leads, and searching tips with the journalists from around the world working on the project, and that was immensely helpful,” Wilkinson said.
Problems included “lots and lots and lots” of false positives when the system returned every document referring to a particular name, not just the person of interest. That was inevitable, though, and it didn’t blunt Wilkinson’s opinion.
“For me, personally, again not coming from a background of big-data journalism, I was so immensely impressed … In my journalistic life, it was a life-changing experience. Because it showed me the potential of big-data searches, and it showed me something I’ve always believed in in journalism: the human element is critical to match with the data searching, [such as] the work of your colleagues who could share information.”
The 11.5 million documents of the Panama Papers actually provide a tiny dataset when compared to legal discovery cases. Nuix’s biggest effort was “just shy of four billion documents”, said Sheehy, including 3.1 billion emails, 440 million Word documents, 330 million Excel spreadsheets, and “a raft of other stuff” in the archives of a Wall Street bank that was hit with 15 different court cases at once.
Intelligence analysis software can already deal with huge datasets. The next breakthrough will be wider availability and sophistication of existing techniques such as latent semantic indexing, which allows the system to figure out whether a mention of “football” is about sport or a codeword for a bomb or a drug delivery, sentiment analysis, and voice and video searching and transcription.
There’s a flipside, though. Yes, investigative tools are getting more powerful, but organisations will tend to keep less historical data, making things harder for investigators, according to Chris Pogue, a member of the US Secret Service Electronic Crimes Task Force, and Nuix’s senior vice president for cyber threat analysis.
“A big part of security is not keeping things you don’t need any more, apart from compliance regulations and things like that, in almost every case it is in your best interests to only keep data that is relevant to your current operating business,” he said.
A very interesting article, thank you. It certainly explains how extremely large datasets are now being managed in a coherent fashion.
My initial reaction to the release of these documents was to ask the time honoured question, cui bono? As there were oblique references to Putin via an old friend the media immediately picked this up and ran with it baying like hounds to the moon. However, there were even more serious references to the Icelandic PM, now deposed because of these revelations, and David Cameron through a trust his father set up as well as any number of Australians and Europeans but no references to any US entities.
Now given that the US practically ‘owns’ Panama and that it is known in secret squirrel circles to be the source of many clandestine US companies it would be particulary dumb for a US citizen or corporation to have any loot or secrets stashed there given their own governmental agencies are so prominent in that state. Even so, I would have expected at least 1, maybe more to be caught out. However there has been, to my knowledge, not one US entity ‘outed’ by these papers.
Therefore, the question of cui bono seems to rest with the US. They are so squeaky clean as to be suspected of living on another planet and if you believe that then it is time to give the tooth fairy the vote and flock to the bottom of the garden to drink rum laced camomile tea with the pixies. I am thus of the opinion that these revelations are part of the non stop geopolitical games being played and in this case, while ostensibly being aimed at Russia via rather spurious associations they could really be aimed at the US’ allies as a warning not to step outside the tent.
So thank you for this article, it has answered the question of how 11.5 million documents (about 9000 metres of documents and that is assuming 1 document = 1 A4 page) could be analysed so quickly though I expect they have just scratched the surface of this.
Amazing really. This is one of the biggest stories of 2016 so far and the author manages to focus on a point that is only of interest to IT enthusiasts. There is a great deal more to these revelations (and that not revealed) than whizz bang search engines.
Very interesting article.
But the way around snooping journalists is, as pointed out, deleting information & not keeping comprehensive archives.
Unlike the old days when old paperwork & letters could be discovered in boxes in attics.
Grumpy – I think that it was Teddy Roosevelt who said “We stole Panama fair and square”.
You are spot on about the amazing cleanliness of US entities.
This suggests that the interesting emphasis placed on current bete noirs is not entirely accidental – false flags, red herrings an’all dat.
JO’N – Stilg is a tekky not someone who lives in the real world of politics so I welcomed his explanation of the oddly efficient way that the 11M docs have been given the skrut.
Pity this tek wasn’t available when WIKI hit the head lines.
How naive is this article!!!
One of the opening statement reads:
“Investigations like this are made possible by intelligence analysis and document discovery software that until recent years was only available to intelligence and law enforcement agencies”…which should have read:
“Investigations like this are made possible by the CIA leading witless global MSM a merry dance”.
With US States (Florida & Wisconsin)providing exactly the same blind trust arangments, and with the clear aim of this alleged leak (it was a hack, with selective release)being to blacken Putin & Assad et al, this a a low point in Aussie MSM…Four Corners staff should hang their heads in shame at being played so convincingly.