An alternative approach and one which is simpler mathematically is the bottom up or agglomeritive approach. This starts with 43 separate clusters and ends up with one through the merging of clusters.
The important thing to grasp in this post is that the criteria for clustering is based on a measurement of distances between clusters and this distance can be measured in various different ways. The example I am illustrating is calculated using SPSS version 17 software. I have selected the euclidean distance measure (squared - recommended by SPSS - gives longer distances more weight), which is basically the shortest distance between two points and measuring to the centre (or centroid) of the cluster. The process uses the agglomeritive approach.
The process starts with 43 clusters each with the membership of one force. The centroid of each cluster is therefore the location of that force in the 28 dimensional space that the computer calculates by plotting the values of the 28 variables relating to each force. The computer is then asked to find the closest two centroids (euclidean distance wise) which happens to be Bedfordshire and South Yorkshire (these are the most similar forces as far as expenditure patterns are concerned). It then merges the two clusters together (now there are 42 clusters) and calculates the centroid of that new cluster. It then looks for the closest two clusters again. This time Derbyshire and Kent clusters are merged and the centroid calculated of that new cluster (now 41clusters). The closest two clusters are again found. This time the Bedfordshire, South Yorkshire cluster is merged with Durham (40 clusters).
1 Avon & Somerset, 2 Bedfordshire, 3 Cambridgeshire, 4 Cheshire, 5 City of London, 6 Cleveland, 7 Cumbria, 8 Derbyshire, 9 Devon & Cornwall, 10 Dorset, 11 Durham, 12 Dyfed-Powys, 13 Essex, 14 Gloucestershire, 15 Greater Manchester, 16 Gwent, 17 Hampshire, 18 Hertfordshire, 19 Humberside, 20 Kent, 21 Lancashire, 22 Leicestershire, 23 Lincolnshire, 24 Merseyside, 25 Metropolitan Police, 26 Norfolk, 27 North Wales, 28 North Yorkshire, 29 Northamptonshire, 30 Northumbria, 31 Nottinghamshire, 32 South Wales, 33 South Yorkshire, 34 Staffordshire, 35 Suffolk, 36 Surrey, 37 Sussex, 38 Thames Valley, 39 Warwickshire, 40 West Mercia, 41 West Midlands, 42 West Yorkshire, 43 Wiltshire
This Agglomeration Table is produced by SPSS. It takes little bit of understanding. I listed the forces in alphabetical order so the numbers relating to the clusters relate to the forces as shown (but remember by cluster 2 by stage 3 has two forces in it 2 & 33, you are given help in this in the last 3 columns). The stages refer to the stage in the process, so stage 1 is when 42 clusters are formed from 43. The Coefficient column gives an indication of how good the fit of the clustering is. For instance it appears that 33 & 32 clusters (stages 10 & 11) are a better fit than 34 clusters (stage 9).
I am interested in the Metropolitan Police - 25 so it the other end of the table that is of interest.
Cluster 3 (Cambridgeshire) is merged with cluster 37 (Sussex) at stage 33 (11 clusters). At stage 34 the Metropolitan Police (25) is merged with that 3 cluster. This means that the Metropolitan Police is one of the last forces to be put in a cluster with other forces but it is by no means the most dissimilar force as far as expenditure is concerned. It probably comes about 7th in the list behind Cumbria, City of London, Avon and Somerset, Warwickshire, North Wales and Norfolk. Interestingly 6 clusters is a better fit than 7 clusters.
The maps of clusters 2 to 7 are displayed in the previous post. Even though the actual process is agglomeritive it is easier for non-mathematicians to visualise it as if it is divisive.
No comments:
Post a Comment