Search This Blog

Friday, 15 January 2010

Crime data like caviar please

One of the things that annoys me is the fact that crime data are normally presented divided by the resident population. I understand that this is meant to aid comparison of administrative units, such as boroughs, with different population sizes. But it only rarely makes sense because it seems to assume that the only people who are victims of crime or commit crime are those that reside within the boundaries. In London with its huge commuter, business, student, tourist, sporting and entertainment seeking transient population this is clearly not the case.

I tend to like my crime data how I like my caviar, raw and with good provenance. The new data store referred to in my previous post has raw as well as mucked about data. And the provenance is not always clear.

I present the three maps above more to make a point than to illustrate facts because I do not know how the numbers of alcohol attributed violent crimes are calculated. I assume they are based on police recorded crime but because the figures are not integers I assume some sort of sampling has taken place.

The point I wish to make is using the raw data is valid in my view so the top map gets a tick. Alcohol attributed violent crime must be hugely influenced by non-resident populations so in my view the second map is meaningless. The third map is a result of me putting together two datasets from the London Data Store. I am using the number of bar employees as a proxy indicator of the number of alcohol outlets there are in each borough because I think it is reasonable to assume that the alcohol attributed violence is connected with people who have frequented such places. Therefore I think my map is better for comparison purpose.

Conclusion: only muck around with crime data if it is based on a well thought out hypothesis and state what that is.

PS I don't eat caviar... but the point about crime data remains.

No comments:

Post a Comment