Crime Pattern Detection is one of the many applications of data mining; it deals with the study, analysis and understanding of crimes through discovery of pattern or other significative insight in data. As you might think, detecting a pattern of criminal activity, or even predicting them is something that might seem taken from a movie. However, it is a real application of data mining, and there are many research groups working on that.
In this report I will introduce the concept of time series analysis using the R base package, to inspect the assassination history of Puerto Rico from 2007 to 2014.
This report’s dataset features the total amount of murders perpetrated in Puerto Rico during the period 2007 - 2014. This data is divided by months and years.
Let’s take a look at it.
pr_crime_table <- read.csv("~/Desktop/pr_crime_table.csv", row.names = 1)
print(pr_crime_table)
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 2007 66 55 67 47 62 52 41 60 69 65 78 68
## 2008 57 74 76 57 84 52 71 64 70 68 72 70
## 2009 56 80 75 63 91 79 63 75 84 84 85 59
## 2010 76 78 67 64 99 82 97 94 67 76 93 90
## 2011 112 94 99 70 92 107 100 95 97 97 87 86
## 2012 88 83 73 63 70 97 88 89 70 70 83 104
## 2013 76 75 64 69 75 75 72 96 65 88 66 62
## 2014 55 60 48 54 61 71 51 62 45 52 57 65
The first row of the dataset (the header) are the months, and the first column (row names) represent the year. For example, on January 2007 there were 66 registered assassinations.
The data was obtained from the Instituto de Estadisticas de Puerto Rico (Statistics Institute of Puerto Rico).
Now that we have seen how the data is, let’s commence the analysis by building the time series model. To build the model we need a vector of the values to be analyzed (the frequency of crimes), and the time interval, seen as frequency
on the code, which is 12 since we are using monthly data, and lastly we need to state the starting period (January 2007).
# Transform the table into a vector; ts should just take a vector, it performs
# the split into times by itself
ts.model <- ts(c(t(pr_crime_table)), frequency = 12, start = c(2007, 1))
Once the model is built, the next step is to decompose it into several components. These components are: seasonal, trend and irregular. In the next section, I will explain these concepts in detail.
To decompose the data into the previously mentioned elements, I applied a technique named moving averages - a calculation that shows the average value of a subset over a period of time.
decomposition.ma <- decompose(ts.model)
names(decomposition.ma)
## [1] "x" "seasonal" "trend" "random" "figure" "type"
The previous output shows the components of this decomposed
object. As it was previously mentioned, our objective was to decompose the time series into a seasonal, trend and irregular (named random
in the object) component.
The seasonal trend is the common trend across all seasons (months).
plot(decomposition.ma$figure, type='b', xaxt='n', xlab='', ylab='Difference of murders'
, main='Difference of murders from the global average')
months.names <- months(ISOdate(2007,1:12,1))
axis(1, at=1:12, labels=months.names, las=2)
The past plot is the seasonal trend of the data. What you see in the figure is the generalized trend of the crime activity during a year, based on seven years of crime. The x-axis represent the months of the year, and the y-axis shows the difference between the global average number of crimes, and the average number of crimes of that month.
The plot shows that during the period February - April, the amount of crimes decreases until it reaches a global minima at -13.5. Then, on the following month, May, the average number of murders increase by 19. Thus, we have the less active month, and the most active back to back. From April until the end of the year, the crime activity does not vary that much. The following table shows the exact values of the plot.
print(decomposition.ma$figure)
## [1] -2.3536706 1.0034722 -4.8655754 -13.5024802 5.5570437
## [6] 4.4141865 -0.5143849 5.3784722 -1.8239087 1.9618056
## [11] 4.2118056 0.5332341
The trend decomposition is the trend of the murders across the complete dataset.
plot(decomposition.ma$trend, xlab = 'Year', ylab = 'Number of assassinations',
main = 'Trend of assassinations in Puerto Rico during 2007 - 2014')
We can see on this plot that the assassinations increases until it reaches a maximum point during mid-2011, meaning that 2011 was the year with the most assassinations. After that year the number decline.
The following table presents the total murders per year. If we compare those numbers with the plot, we can notice that it follows the pattern presented on the previous image. From 2007 until 2011, the frequency of murders increment up to 2011 (this is the peak on the plot). Afterwards, from 2011 until 2014 the recurrence of homicides slows down.
apply(pr_crime_table, 1, function(x) sum(x))
## 2007 2008 2009 2010 2011 2012 2013 2014
## 730 815 894 983 1136 978 883 681
For the last section of the report, I will present a summary of the decomposition using a visualization that has the previous two figures, plus a plot of the entire original dataset, and the random or remainder decomposition. Each value of the remainder component is the difference between the original value from the seasonal plus trend fit.
plot(decomposition.ma)
The first plot is the original values. If we compare it to the second one, the trend, we can notice that it follows the same structure, a peak at mid-2011 and smaller values at the extremes. The third plot, the seasonal decomposition, is the same seasonal trend plot presented before, but repeated for all the years. Finally, the random plot is the remainder decomposition.
In this report, I presented the basics of a time series analysis using the base R package, and an application of it using crime data from Puerto Rico. The results found indicates that there is a trend of crimes that takes place during a year; during the first three months the homicide activity declines, followed by an increase in April, where it hits its maximum value, and subsequently it decreases and continues an almost similar behaviour until the end of the year. Moreover, the general trend of the data is that from 2008 up until 2011, the number of crimes increased at a slow pace, and after mid-2011 it decreased.