I have decided the best way to explain this is by reference to the methodology used in the British Crime Survey (BCS). The BCS is a survey of the general adult population of England and Wales that selects a sample of about 44,000 people a year to represent the total population about 44 million. The survey includes asking questions about whether the selected representative of the selected household has been a victim of crimes. Now because the interviewer can not assume that the person being interviewed has specialist knowledge about criminal offences within the England and Wales legal system there is a standard questionnaire that asks questions in everyday English.
A little bit of an aside the BCS in a masterpiece of covering just about every eventuality as one would expect from a piece of work that has been going almost 30 years and undergone numerous rigorous reviews of its methodology.
One of the first issues it covers is whether the person has moved in the relevant time, that is the last 12 months. This affects household victimisation, thus the slightly odd set of questions that are asked early on about crime types. These are listed in the table shown below.
The BCS dataset lists the numbers as options for variable `Crimtype' with the BCS data dictionary giving the names shown in the second column. I have examined the 25 sets of questions that produce answers regarding whether the household or individual has been subject to various events and if so how often. I have then used my specialist crime knowledge to explain the options. The way the questions are phrased allows burglaries to be identified. It is not possible though to differentiate between a theft person and a robbery as violence questions are at this stage dealt with separately to theft offences.
The table above should be read from left to right. The gaps signify where there is no comparable data. Even where there is comparable data there are caveats which I will cover in subsequent posts. What I want to point out is that the initial question (which seems to be to identify potential victims) with a little bit of recoding produces good results, ie consistent with published results. I do concede however that the data set I am dealing with only includes those cases that eventually were deemed to be offences.