About 90% of an iceberg lies beneath its surface. So it is with political commentary. It’s easy enough to open a national broadsheet newspaper after an election and scan the pages filled with phrases like “hostage to entrenched interests” and “politically fabled Labor heartland” and assume that all political commentary is equally and uniformly silly.
But that would be lazy. Each columnist is unique, and by careful examination of the underlying data, using only the most up-to-date and high-tech statistical techniques, we can actually explore the heterogeneity of hacks. Whose commentary is based on nothing more than vague impression and anecdote, and whose is instead based on dodgy mathematics? Today, I propose to you that we undertake such analysis. And who better to begin with than News Corp columnist John Black? We saw a little while ago how he uses a very questionable statistical “technique”, one that systematically overstates the explanatory power of the resulting model and the magnitude of the effects of variables, and understates the uncertainty in estimates of the effects of variables, among other problems, to try to determine who voted for Labor in Queensland’s recent election, just by looking at census data and the aggregate votes in Queensland seats.
(Most social scientists would be astounded if he actually could do that, since it would mean he has overcome the ecological fallacy, but he’s a smart guy, let’s give him the benefit of the doubt: he probably has solved the most intractable methodological challenge in statistical inference, probably when he had nothing better to do during Senate estimates.)
We also saw that it was possible for me, with exactly the same technique, to get models of the Queensland electorate with exactly the same level of explanatory power from entirely randomly generated numbers. But John Black thought his model of the Queensland electorate was nonetheless a pretty good guide to the New South Wales election. On March 1, about a month before the state went to the polls, he wrote:
“When we took the Queensland demographics underpinning the January 31 state result and applied them to identical demographics in current NSW seats, Labor won 46 out of 93 seats … We felt pretty comfortable carrying out this projection of the Queensland result as we’ve been doing it for 40 years and the model was statistically powerful, explaining 84 per cent of the variation in votes across the 89 seats in Queensland … With an error of estimate of 4 per cent, we don’t expect the projection to be perfect but it should be a better guide than the usual incorrect assumption that uniform swings will apply. Bearing in mind that Labor lost every second vote in some of its safest seats in the previous election, this assumption is even sillier than usual.” (Emphasis mine).
The results of the election are (more or less) in, so we can assess these claims: did Black’s “projection” really make “a better guide than the usual incorrect assumption that uniform swings will apply“?
But for a guy who has come across a revolutionary technique for forecasting elections — hey, he’s been using it for 40 years! — John Black sure is modest after the election. Just look at his weekend column on the aftermath of the NSW election. He sure knows a lot about people who don’t vote for the Greens, as in this prolier-than-thou paragraph:
“The state seats where the Greens failed to win many votes were dominated by mainstream Australian suburban families with children, who drive themselves to work daily or ride as a car passenger.
“The parents tend to have certificate qualifications in engineering for dad and hospitality for mum, with dad employed as a machine operator in manufacturing or a transport driver, and mum finding it very difficult to get a hospitality job which pays enough to earn any realistic income and has flexible hours for her to look after three kids in the local government school system.”
Maybe we should have a look at how Black’s model did, especially compared the the “uniform swing” model that he reckons it’s superior to.
So let’s put the two following models to the test: John Black’s Queensland demographic-stepwise-bullshit model (calculating, in a sense, the impact of demographic variables in the Queensland election on the ALP vote and using the coefficients from that model to project NSW results based on NSW demographics, the results of which Black generously provided on his website) and a uniform-swing model based on the final polls in which the pollsters were predicting a 10-percentage point swing to the Labor Party (that is, a model whose projection is just the 2011 results, adjusted for boundary changes and with 10 percentage points added to it). Comparing their projections to the actual result, what do we find?
Perhaps not unsurprisingly, John Black’s model — which had no input from polling at all, even at an overall state level — overpredicted the swing to Labor. It projected a 17% swing to Labor on the two-party preferred measure in the “traditional contest” seats (where there was a TPP fight between Labor and the Coalition); in fact, it was about 10%, almost exactly what the final polls were predicting.
This means that John Black’s model dramatically overpredicted how many of these “traditional contests” the ALP would win– his model predicted 44 wins for the ALP in these contests, while the uniform-swing model predicted 35 victories. In reality, there were only 32.
On average, the uniform-swing model missed the result by about 5 percentage points (either over- or under-shooting). John Black’s model, on the other hand, missed by nearly 12 points.
Slice it as you will, John, the weird Queensland-demographics-in-NSW model you’ve created actually performed much worse than the uniform-swing model. Maybe you could write a column in The Australian about that. Let me get you started:
“The pundits who got the NSW election wrong tend to be former Queensland Labor Senators with fishing hobbies and boutique ‘analytics’ firms who haven’t opened a statistics textbook since the 1960s, patronised by Boomer managers with no clue about data analysis. Often, they have obtained a vanity column in a vanity newspaper in which they cannot help but promote, at every opportunity and regardless of nominal relevance, their business interests. Their names are most often short monosyllables. And most characteristically, they tend to be enormously and, sadly, unjustifiably confident in their risibly flimsy analysis.”
*This article was originally published at Tom Westland’s blog, Apocalypse of Thomas
It’s a step in the right direction for Crikey to address the issue of flawed statistical analysis. Hopefully this might mean it too will begin to show the same concern re its own failings such as treating correlation and cause as identical when a correlation can be presented as supporting one of its sacred cows? Who knows, we might even see an acknowledgement of such hoary hoaxes as the spurious assertion that the famous Qld Death in Custody Enquiry found that incarcerated indigenous prisoners were more prone to dying in custody than their non-indigenous counterparts?
I know. I’m being over-optimistic to expect Crikey to confess about that long-standing misrepresentation of the relevant data.
He’s not the only one mixing statistics and frigonometry – it seems Murdoch employs a few in each state to boost their pro-conservative assertions.
I enjoyed this article, very much, with the the last “helpful” paragraph is a doozy! And, “risable”; now there’s a great underutilized word! It could be applied to so much more to commentary and almost anything that comes from the mouths of our politicians, with added emphasis on the current federal government.
So much anger and contempt comes through the tone of this article, and I’m not sure why this is. It can’t be just because John Black writes for the Australian?
SO the guy uses regression analysis to calculate 2PP labor votes on a seat by seat basis…so what? Won’t be the first or the last to use this method for forecasting…which is still heavily utilised in other areas like finance and social policy.
There are undeniably some flaws to his methodology…I think there will be evidence of multicollinearity in his analysis and a simpler model with less independent variables (but still heavily correlated to the dependent variable) would probably be more robust (and less biased). But all up, hardly a hanging offence.
I’m more of a fan of the polling method myself, but I do believe demographics can provide some insight into how people vote and have some predictive power.
Scott, you’re talking to fervent True Believers with the Crikey Collective. They can’t help their failure to reign in their biases; but it’s worthwhile your attempting to talk reason to them, because at least some may begin to analyse issues rather than simply falling back on knee-jerk attacks against their favoured usual suspects.