# Queensland state projections from Federal Senate voting data

 Projections Predictor Analysis

With the resurgence of One Nation, there's some need to model how they'd perform running all over Queensland in a state election.

This being a resurgence, we don't have much by way of useful data from the last state election. Even the more recent Federal election isn't too much help, because they only ran in 12 of 30 federal divisions. It's certainly of some use, because they ran mostly in their stronger areas and are likely to do so again at the state election... but when we're trying to estimate seat polling from state-wide voteshare, we need a ground truth covering the entire state.

Enter Senate data, which is most definitely state-wide.

The Australian Electoral Commission (AEC) has published a spreadsheet ('Formal Preferences' for Queensland, 2016) which lists every formal Senate ballot's preference sequence and the polling place it was lodged at. ('Postal' etc counts as a polling place, broken up by federal division of the voter.)

The AEC has also published a spreadsheet ('Votes by SA1') which, for each polling place, lists the number of House of Representatives votes from each Statistical Area level 1 (SA1, usually contains about 200-400 voters).

The Electoral Commission of Queensland (ECQ) has published a spreadsheet ('Current and Projected SA1 Enrolment') detailing, for each SA1 in Queensland, the state electoral district in which it now resides as of the final determination for the state redistribution made in May 2017. Some SA1s are (or were?) split between districts; each part of an SA1 has its own line in the spreadsheet.

Plan of attack: take polling place data, project it down onto the fine grain of the SA1s (or parts thereof), then aggregate into state electoral districts.

There will, of course, be a number of processing steps.

The first issue is that there are many more parties on the Senate ballot than there will be on the State ballot papers. We solve this problem by only considering the Senate preferences for a subset of the parties: {Greens, Labor, Liberal National, One Nation, and None of those four}. People might interleave preferences for the four parties; we will consider only the earliest preference for each. There are 65 (potentially partial) orderings, detailed below:

``````1 no-preferences:
(None)

4 one-preferences:
(Grn), (Lab), (Lnp), (Phn)

12 two-preferences:
(Grn, Lab), (Grn, Lnp), (Grn, Phn), (Lab, Grn), (Lab, Lnp), (Lab, Phn), (Lnp, Grn), (Lnp, Lab), (Lnp, Phn), (Phn, Grn), (Phn, Lab), (Phn, Lnp)

24 three-preferences:
(Grn, Lab, Lnp), (Grn, Lab, Phn), (Grn, Lnp, Lab), (Grn, Lnp, Phn), (Grn, Phn, Lab), (Grn, Phn, Lnp), (Lab, Grn, Lnp), (Lab, Grn, Phn), (Lab, Lnp, Grn), (Lab, Lnp, Phn), (Lab, Phn, Grn), (Lab, Phn, Lnp), (Lnp, Grn, Lab), (Lnp, Grn, Phn), (Lnp, Lab, Grn), (Lnp, Lab, Phn), (Lnp, Phn, Grn), (Lnp, Phn, Lab), (Phn, Grn, Lab), (Phn, Grn, Lnp), (Phn, Lab, Grn), (Phn, Lab, Lnp), (Phn, Lnp, Grn), (Phn, Lnp, Lab)

24 four-preferences:
(Grn, Lab, Lnp, Phn), (Grn, Lab, Phn, Lnp), (Grn, Lnp, Lab, Phn), (Grn, Lnp, Phn, Lab), (Grn, Phn, Lab, Lnp), (Grn, Phn, Lnp, Lab), (Lab, Grn, Lnp, Phn), (Lab, Grn, Phn, Lnp), (Lab, Lnp, Grn, Phn), (Lab, Lnp, Phn, Grn), (Lab, Phn, Grn, Lnp), (Lab, Phn, Lnp, Grn), (Lnp, Grn, Lab, Phn), (Lnp, Grn, Phn, Lab), (Lnp, Lab, Grn, Phn), (Lnp, Lab, Phn, Grn), (Lnp, Phn, Grn, Lab), (Lnp, Phn, Lab, Grn), (Phn, Grn, Lab, Lnp), (Phn, Grn, Lnp, Lab), (Phn, Lab, Grn, Lnp), (Phn, Lab, Lnp, Grn), (Phn, Lnp, Grn, Lab), (Phn, Lnp, Lab, Grn)
``````

The difference between the latter two sets is that some people chose to exhaust their vote rather than bothering to rank their least-preferred party of the four.

A Python script handily classifies every ballot into one of those 65 orderings and then summarises by polling place. I wouldn't want to do this analysis for additional parties — for five, there would be 326 possible partial orderings! At that point, and especially given issue (5), it would be better to just use primaries and set statewide preference flows.

At this stage we could also arguably include informal Senate ballots (which either have no identifiable first preference marking at all, or are otherwise disqualified) as votes for `None`. We can't, however, include non-voters as `None`, however, because we don't know where they'd vote.

The second issue to deal with is that Senate turnout is slightly higher than House turnout (and more worryingly, the House turnout numbers from the spreadsheet don't actually match the published total). We resolve that issue by allocating per-booth proportions of each vote-order (so if a polling booth had equal quantities of each vote-order, and precisely one person from a a certain SA1 voted there, that SA1 would be credited with 1/65th of a vote in each category).

The third issue is that the ECQ has different electors-per-SA1 numbers than the AEC do. This has two factors: the AEC is using turnout at the July election, the ECQ is using enrolment about six months later — so the ECQ numbers should be higher. We solve this issue by scaling the federal votes on an SA1-by-SA1 basis.

The fourth issue lies in dealing with SA1s being split between state districts. This is actually quite simple to deal with: the ECQ publishes how many electors will be in each part of the SA1, so just split the votes accordingly.

Actually, issues 3 and 4 can be combined in the one step: simply scale the federal votes by `ECQ SA1 [part] population / AEC SA1 total`.