Search This Blog

Wednesday, 4 August 2010

Explaining clustering to non- mathematicians and non-geographers 1


This is really my project for the summer - learning and understanding statistical clustering, classification, and  ranking of multiple variables based on location.

I am going to attempt to explain why I am doing this and how in a non-mathematical way. If you want it explained in numbers and formulae I recommend my hero in these matters Dr Kardi Teknomo who gently guides you through the subject in his brilliant tutorials which can be found here.

I have drawn your attenion to HMIC reports and an Audit Commission Report in recent posts. These compare different aspects of Home Office police forces in England and Wales' performance and expenditure. As these police forces have unique geographic juridiction these comparisons are geographically based. I am using data from the HMIC police force value for money profiles that can be found here. And specifically from the spreadsheet that can be found here. My interest is specifically to do with the Metropolitan Police Service (MPS) with responsiblity for London. I had a quick look through the report. The first page has the following;

My impression of the data presented was that the MPS is nothing like the forces it is grouped with. So I thought I would try group the most similar forces together using data from the spreadsheets and standard clustering techniques and at the sametime learn how to apply the most useful of them to my data from Camden and the rest of London.

If you have been following my blog you will know that I do not like comparisions that are standardised using "per head of resident population". It is particularly unfair on London and other places whose actual population is greatly swelled by those working, shopping, being entertained and holidaying there. A compromise is made for the City of London by calculating their figures on a population of 316,500 where in fact the resident population is about 8,000, but for nowhere else. So I disagree with this statement shown on the first page of charts;

 
I have therefore steered away from using data standardised in this way. What I have used are data that show how expediture is allocated as a percentage of another figure. These proportion figures overcome the problem that the MPS is so much bigger than any other force. The data used are "non-staff costs as % of staff cost"- columns AX to BE  and "Supplies and Services as % of staff costs" - columns BQ to CA on the first sheet of the spreadsheet. And "% of the total Police Officer and PCSO workforce by rank" - columns AD to AN on the second sheet. This amounted 28 variables to be used in the clustering process.

I reason that how a police force chooses to spent the money allocated to it provides an indication of similarity and differences of how the force is managed, structured and its priorities. I am interested in which forces, on this basis, are most similar to the MPS and generally the forces that are most different to the others.

I will explain how this is done with reference to the maps above and others in Part 2.

No comments:

Post a Comment