Search This Blog

Thursday 5 August 2010

Explaining clustering to non- mathematicians and non-geographers 2

What I am trying to achieve with clustering is to group police forces together based on the similarity and differences in the data I have selected from the HMIC value for money spreadsheet as I have discussed in the previous post.

Importantly each datum relates to a police force (via the row in the spreadsheet) and to specified expenditure (via the column); and each datum has a numerical value.

Now this is what you must grasp - that numerical value (in this case all percentages) becomes a co-ordinate in a similar way to map co-ordinate. On a flat two dimensional map a location can be described by a two coordinates, X,Y; latitude, longitude; etc. By selecting any two of my 28 variables the locations forces can be plotted on a flat "map". Those forces whose variable values are close numerically are close on the map, those that are more different numerically are further away on the map.  If three variables are selected the map becomes three dimensional. If I include all 28 variables then the map becomes 28 dimensional. This is of course impossible for a human to visualise but easy for a modern computer to chart, but of course not display.

The fact that you must accept is that each force is very precisely located in this 28 dimensioned map. Each force has one location and this point will always be the same if the same variables are used with the same values. It does not matter if more forces are added or taken away, this will not vary the location of the other forces. This mapping of the forces in imaginary space is the first part of the clustering process. You will not become confused later in the process if you tell yourself that the locations are fixed and do not move for the rest of the process.


So the locations of the forces are now fixed for this calculation. Now the next part of the process is where the variations in the end result is introduced. How do you decide how to cluster the above set of dots into 2, 3, 4, 5, 6, 7, etc groups? That is for next time.

No comments:

Post a Comment