In this post I am going to outline how I have clustered the Council Wards in London based on the "violence against person" incidents in 2009. I am clustering the demand profile that these incidents present to the police. The idea is to find similar Wards in London based on the variables I use.
The choice of variables is critical. I have hopefully shown that Wards can be grouped by number of relevant incidents in the year and whether these occur between midnight and 4am on Saturday and Sunday mornings or whether they do not occur at these times. I am therefore only using two variables at this stage.
This means they can be plotted in a two dimensional graph based on the incidents that occur at those two separate times. This is shown above.
I then load the simple three column spreadsheet into SPSS17 and perform two different types of clustering calculations. I have outlined in some detail how these calculation work in six posts starting here. I used two different methods - K means and Ward's Hierarchical (do not be confused Ward is the person who devised the method and nothing to do with Council Wards, its just a burden I have to bear when writing about my clustering analysis).
So this is what the map of London looks like when produced by ARCMap and the accompanying graphs in MSExcel.
Mathematically this quite simple (I hope I have got this correct having made that bold statement). First I calculate the proportion or percentage of the x axis value is to the y axis value in each Ward. That is (x/y)*100. These percentages are then all added together and divided by the number of Wards to give an average percentage, which is in this case is 10.70% (to two decimal places). So to make the x axis variable have the same scale as the y axis; 100%/10.70% gives a figure of 9.34 (to two decimal places). This 9.34 is then used to multiply the x variable incidents for each Ward.
The resulting three column table is loaded into SPSS17 etc. and the following maps and graphs are produced.
The clustering now takes both variables equally into account but the two methods split the clusters differently especially the brown and orange. More in subsequent posts.
No comments:
Post a Comment