It’s now a well anchored habit in France : before each election, media (re)discover the rise of extrem right. Afterward, maps flood the internet to provide the best analysis such as:
Using some socio-economical features, we too can build a predictive model for the elections outcomes.
A look at the original artwork French voters produced during the first round of the Presidential election :
And focusing on the Front National :
Elementary, my dear Pearson.
Using data retrieved from the french national statistics office (INSEE), we focus on the two regions where the Front National got the best results :
- The north (Hauts de France)
- The south (Provence-Alpes-Côte d’Azur)
We can draw the highest correlated factors for each regions :
In Provence, Le Pen scores high where :
- average education level is low,
- the number of private nurses is low,
- service sector is sparse,
- and, weirdly, altitude is low.
This last element deserves a short digression : André Siegfried, son of a Minister, was a french sociologist and geograph at the beginning of the 20th century. After having lost a campaign, Siegfried investigated the possible relationship between geology and political orientation and came to the following conclusion : “granite votes right, limestone votes left”.
In the North, Le Pen scores high where :
- average education level is low,
- the median income are low,
- unemployment is high.
Out of sheer curiosity, how about the relationship between Front National and the percentage of immigrants ?
It seems that the more immigrants in town, the less prone are voters to choose the Front National.
And how about population density ?
As pointed out by Hervé Le Bras, big cities offer more opportunities, thus lowering the Front National score. But interestingly enough, this is not the case in Provence.
Education seems to be the most proeminent factor for Le Pen, how does it compare with Macron ?
And incomes ?
As Hervé Le Bras already mentioned, poors have a low turnout. In the north, Le Pen scores well where unemployment is high.
Finally, how about elevation ?
Indeed, comparing with an other region with high elevation gradient (Rhône Alpes), Le Pen does score poorly at high altitude.
An other interesting observation is for the area of Nice, where unemployment is low and Le Pen high. This confirm the thesis of a two-headed Front National, one of the North, more social, and an other of the South, closer to Poujade. And following latest development in the french political landscape (as of may 2017), it might augure a schism within the Front National.
Creating a model
Well, now we have a dataset with 35000 cities and for each, 170 predictors (such as the ratio of camping place per habitant, the proportion of student or the local GDP).
Instead of going through the painful process of features selection (as seen before, lots of multicollinearity here) and regularization, we prefer to hop in the Land Cruiser of Machine Learning : XGB Tree.
For the detailed implementation of the algorithm, see here
Confusion Matrix and Statistics
Overall Statistics Accuracy : 0.7401 95% CI : (0.7316, 0.7484) No Information Rate : 0.5313 P-Value [Acc > NIR] : < 2.2e-16 Kappa : 0.5864 Mcnemar's Test P-Value : 6.207e-11 Statistics by Class: Class: FILLON Class: LE.PEN Class: MACRON Class: MÉLENCHON Sensitivity 0.59605 0.8698 0.6212 0.53140 Specificity 0.93704 0.7925 0.9104 0.95703 Pos Pred Value 0.64918 0.8261 0.6397 0.58214 Neg Pred Value 0.92229 0.8430 0.9037 0.94772 Prevalence 0.16350 0.5313 0.2039 0.10125 Detection Rate 0.09746 0.4621 0.1267 0.05381 Detection Prevalence 0.15012 0.5594 0.1980 0.09243 Balanced Accuracy 0.76655 0.8311 0.7658 0.74421
74% accuracy on the testing set is not so bad, given the model does not take into account the local specificities and history (althought we use the outcome of the previous presidential election). But what is striking is the high sensitivity of the Le Pen compared with her challenger, that might be related to the unbalanced aspect of the dataset : She finished first in half of the towns on the first round.
Regarding the features weight, the 2012 results are overwhelmingly the best predictors. Then come :
- the ratio of self-employed in the active population
- population’s density,
- ratio of university degrees,
- average income per household.
So, on the scale of France, the elevation does indeed play a significant role in the vote. I’ll quote Hervé Le Bras : for communities away from main communication axis and thus less prone to mobility, social interactions are stronger and rumours less likely to spread.
- Nantes, un bastion socialiste partagé entre les votes Macron et Mélenchon - FR
- Exode urbain et inégalités : les cartes du vote FN
- Statistics per town - french institute for statistics and economics studies- INSEE
- Presidential elections results - first round
- The Economist : daily chart