Search This Blog

Wednesday, 22 September 2010

What is the difference between clustering and classification?

This is my understanding of the answer to the question I posed. I have been doing a lot of clustering but in this post I presented my first classification. This is not my final version, I have more potential variables up my sleeve. The classification is a description of the variables based on their membership of a cluster. I now know why creators of geodemographic almost exclusively use the K means means of clustering; it gives an output of the where the final centres of each cluster is for each variable. This gives a pretty good idea of the nature of, in our case, the Wards in each cluster. This is the table for my classification;

The variables were adjusted around the mean, with 1 representing the mean value. This ensured that each variable was given equal weight in the clustering process. The description I gave to each cluster was based on the information in this table. For instance cluster 1 is high in all three variables so areas where violence is common, thus my description, cluster 4 seemed to me to be pubs and club venues from the very high value of the second variable.

I am not saying my descriptions are perfect, I am explaining the process. Of course the members of the clusters are varying distances from the cluster center (this information is also given) and therefore will not necessarily exactly match the mean characteristics of the cluster. It should also be borne in mind that it is the dominant relevant features of the Ward that are being described. It is highly unlikely that all parts of the Ward will comply with that description. We then get into a discussion about the scale of analysis which I have discussed before on this blog.

So what is the answer to the question? Classification is the next stage in the process after clusters have been created; classification involves the description of clusters based on the characteristics of the cluster members.

1 comment:

  1. Pattern Classification
    The process of separating data into categories, or classes, characterized by a distinct set of features
    It pertains to known number of groups and the objective is to assign new data points to one of these groups
    The process of partitioning objects into groups called clusters whose members are similar in some way , so that the data points in a group are similar to each another, while those in distinct groups aren’t similar to those in the other groups