The game Guild Wars 2, release on 2012, introduces a player versus player (PvP) game mode called Conquest. In Conquest, there are two teams made of five players, and the main objective is to seize capture points to increase your team score (the more points you have, the faster the score increases); other ways to increase the score if by killing players of the opposing team, non-player characters (NPC) or by getting special bonuses.
This work is a data analysis done on a dataset made of 119 Conquest matches. In the analysis we will take a look at several factors regarding a Conquest match, such as the map where the match was played, the final score of both teams, the result of the match and the composition of the teams.
As mentioned before, the dataset consists of 119 matches done during April and May 2015 and in the following table we will show the first five observations of the dataset.
## map red.score blue.score winner red_1 red_2 red_3 red_4 red_5 blue_1
## 1 2 317 500 1 6 6 6 0 0 7
## 2 1 133 500 1 3 4 8 8 0 5
## 3 3 270 501 1 4 8 4 7 8 7
## 4 1 500 13 0 5 8 3 8 6 4
## 5 2 452 500 1 7 1 4 7 5 4
## 6 3 234 501 1 7 6 1 6 8 6
## blue_2 blue_3 blue_4 blue_5
## 1 3 6 6 8
## 2 6 8 6 6
## 3 6 6 8 6
## 4 6 8 3 0
## 5 6 1 6 3
## 6 7 8 3 2
The first column is the map where the match was played. The next two are the final scores for the red and blue team, followed by the winner (0 means the red team won, and 1 means blue won). The last ten columns are about the structure of the team in terms of profession or classes. The first five colums represent the composition of the red team and the other five, the blue team.
For those of you interested, this is the codebook of what the number means.
In the first section of this report, we took a look at the maps and the frequency each one of them had of being selected. The next table shows the frequency and percentage of usage of each map.
## Map Frequency Percentage
## Battle of Kyhlo 1 15 12.61
## Forest of Niflhel 2 53 44.54
## Legacy of the Foefire 3 33 27.73
## Temple of the Silent Storm 4 18 15.13
As seen by both the table and plot, the most selected map was Forest of Niflhel, with a percentage of 44.54% and the least selected, was Battle of Kyhlo with 12.61%.
This part of the analysis is related to the scores the result of the matches. First, we will give an overview of the final scores and the outcome of the matches, followed by an in depth view at the scores of the victor teams and the defeated teams.
We will start this section by displaying a table with the frequency of wins for both teams.
## Team Frequency
## 1 Red team 49
## 2 Blue team 70
The red team won 49 of the matches or 41.18%, while the blue team won 70 or 58.82% of the matches. The percentages of victories per map follows this same pattern. In the map Battle of Kyhlo, the red team won 46.67% of the matches and the blue team won 53.33%, making this the only map where the blue team obtained more victories than the red team. For the next map, Forest of Niflhel, the respective percentages were 39.62% and 60.38%, for Legacy of the Foefire 33.33% and for Temple of the Silent Storm, 55.56% and 44.44%. This is shown in the next table.
## Map Red team victories Blue team victories
## 1 Battle of Kyhlo 7 8
## 2 Forest of Niflhel 21 32
## 3 Legacy of the Foefire 11 22
## 4 Temple of the Silent Storm 10 8
## Red team win % Blue team win %
## 1 46.67 53.33
## 2 39.62 60.38
## 3 33.33 66.67
## 4 55.56 44.44
Next, we will discuss the gap between the final scores of the matches.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.0 95.0 148.0 176.1 234.5 528.0
The previous table is a summary of the differences between final scores of a match. It shows that there was a match where one of the teams lost by 9 points while on the other hand, there was another match where the defeated team, lost by 528 points. The following five histograms shows the distribution of the final score differences. The first histogram is for the difference across all the matches and the following four, are per map.
Now we will present four visualizations. The first one presents the frequency of victories of both teams across all the maps, the following two present the scores of both teams and the last one is a scatterplot the final score of each match of both teams.
The previous scatterplot shows the relation between the final scores of both teams. The points on the upper left corner and bottom right represent the matches where one of the teams won by a great difference.
A correlation test was done on these values and the result was
-0.5939152, which means that there is some kind of (negative) correlation on the data. To further confirm this, we calculated the p-value (found after this), which by being so small implies that there is a strong evidence against the null hypothesis (the null hypothesis is that the correlation is 0).
This table presents correlation, p-value and other statistics.
##
## Pearson's product-moment correlation
##
## data: df$red.score and df$blue.score
## t = -7.985, df = 117, p-value = 1.084e-12
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.6991682 -0.4634651
## sample estimates:
## cor
## -0.5939152
The next topic to discuss is the scores of the matches. Here, we will take a look at how these scores are distributed and show some statistics about them, accompanied by several plots.
In this section we will take a look at the final scores of the winners of all the 119 matches recorded in this dataset. Before continuing, we would like to mention that a PvP match in Guild Wars 2 ends when one of the teams reaches a score greater or equal to 500.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 500.0 500.0 501.0 504.3 501.0 624.0
As expected, the final scores does not vary that much (most of them are close to 500) with some them, way over 500 as is the case of the max value of 624; the standard deviation of the scores is 16.6342219. The reason regarding the score of 624 is because one of the maps (Legacy of the Foefire), has an objective that gives 150 points to a team.
Unlike the winner’s score, the final score of the loser team could be anything from 0 (if you are really bad) up to 499 (in most cases).
The following list shows the scores of the loser’s team.
## [1] 317 133 270 13 452 234 454 278 452 242 282 402 185 150 444 311 348
## [18] 420 345 310 351 316 424 102 271 188 343 372 329 383 283 10 199 373
## [35] 435 300 361 420 95 313 356 389 396 476 361 475 460 373 403 345 286
## [52] 174 156 484 230 491 412 227 327 431 246 405 481 370 185 370 281 133
## [69] 228 354 293 366 467 406 269 316 406 462 354 473 449 234 464 196 304
## [86] 454 388 211 396 434 263 389 340 211 410 280 390 169 354 376 262 460
## [103] 142 360 378 364 353 83 423 358 389 404 398 299 286 286 201 437 433
The scores are more varied that the winner scores.
The next table shows a summary of this scores, followed by the standard deviation.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.0 269.5 354.0 328.2 405.5 491.0
## [1] 106.904
The summary table shows that during one game, the team who lost, just obtained 10 points (ouch!), making this the minimum value across the whole dataset, while the average was 328.2 points and the max was 491 (so close to winning). We also calculated the standard deviation, which is , a value greater than the standard deviation of the winner’s score, which was . The following two figures shows these results via a histogram (that includes a density line) and a boxplot.
The histogram shows that the score of the defeated team during most matches was in the range of 350 to 400 and that there were a few games where the score of t he defeated team was less than 100.
## [1] 13 10
The previous figure, a boxplot, shows the distribution of the scores. The box goes from the first quartile (Q1), which is 269.5 to the third quartile (Q3), 405.5. The line in the middle shows the median, which is 354 and the top and lower whiskers shows the maximum value (excluding the outliers), which is 491 and the minimum value (also excluding the outliers), which for this dataset is 83. The two dots at the bottom of the plot are the outliers. The values of these two outliers are 10 and 13 points.
The next table shows a summary of the boxplot.
## [1] 83.0 269.5 354.0 405.5 491.0
The values represent the minimum, first quartile, median, third quartile and maximum.
The last part of this analysis is about the composition of the teams. Here, we studied the structure of the teams while answering several questions on the way. This section will follow a similar structure to the previous ones. We will start with a summary and then a detailed analysis about certain topics. Besides this, this section introduces several machine learning algorithms that will be used to try to predict the outcome of the matches based on the composition of the team and the map where the match was played.
As mentioned before, our dataset is made of 119 matches. For this reason, there are 238 observations of team composition.
This section will start with a summary of the structure of the teams. We will present the composition of the teams in terms of professions or classes. In addition to this, we will show how many teams were incomplete.
The first question we will answer is: What is the frequency and percentage of each professions across the dataset?
## Frequency Percentage
## Elementalist 113 9.50
## Engineer 70 5.88
## Guardian 155 13.03
## Mesmer 124 10.42
## Necromancer 111 9.33
## Ranger 299 25.13
## Thief 160 13.45
## Warrior 141 11.85
We can see that the most common class across the matches was the Ranger, with a frequency of 299 (25.13%), however, the reason why this number much larger than the other frequencies is the fact that all the matches we did was using a Ranger, so every single match had at least one Ranger. On the other end, there is the Engineer, with a frequency of 70 (5.88%).
If the percentages are summed, the result is 98.59, instead of 100%. This bring us to other next question: What is the amount of missing players?
## Frequency Percentage
## Missing players 17 1.43
17 players were missing. We do not know if these players were missing since the beginning of the game (for some reason sometimes teams can contain less than 5 players), or if they disconnected.
Note: If the percentage of missing players and the sum of the percentages of each class are summed, the result is 100.2% instead of 100%. This is because of the rounding that was done on each percentage to avoid having large numbers on the table. The real values are 98.57143% and 1.428571%.
In this question we looked for those team compositions that appears more than once.
## class.1 class.2 class.3 class.4 class.5 Frequency
## 1 Elementalist Guardian Ranger Ranger Warrior 8
## 2 Elementalist Guardian Ranger Thief Warrior 6
## 3 Elementalist Mesmer Ranger Ranger Thief 6
## 4 Guardian Mesmer Ranger Thief Warrior 5
## 5 Elementalist Guardian Necromancer Ranger Thief 3
## 6 Elementalist Guardian Ranger Ranger Thief 3
## 7 Elementalist Ranger Ranger Thief Warrior 3
## 8 Elementalist Ranger Thief Thief Warrior 3
## 9 Engineer Engineer Guardian Necromancer Ranger 3
## 10 Guardian Mesmer Ranger Thief Thief 3
## 11 Guardian Ranger Ranger Thief Warrior 3
## 12 Ranger Ranger Thief Thief Warrior 3
## 13 Elementalist Engineer Guardian Necromancer Warrior 2
## 14 Elementalist Guardian Guardian Necromancer Ranger 2
## 15 Elementalist Guardian Guardian Ranger Thief 2
## 16 Elementalist Guardian Guardian Ranger Warrior 2
## 17 Elementalist Guardian Mesmer Ranger Ranger 2
## 18 Elementalist Guardian Mesmer Ranger Thief 2
## 19 Elementalist Guardian Necromancer Thief Thief 2
## 20 Elementalist Mesmer Mesmer Necromancer Ranger 2
## 21 Elementalist Mesmer Necromancer Ranger Warrior 2
## 22 Elementalist Mesmer Ranger Ranger Warrior 2
## 23 Engineer Guardian Mesmer Ranger Warrior 2
## 24 Engineer Guardian Ranger Ranger Thief 2
## 25 Engineer Guardian Ranger Thief Warrior 2
## 26 Engineer Ranger Ranger Thief Warrior 2
## 27 Guardian Guardian Mesmer Ranger Thief 2
## 28 Guardian Guardian Mesmer Ranger Warrior 2
## 29 Guardian Guardian Ranger Ranger Thief 2
## 30 Guardian Mesmer Necromancer Ranger Ranger 2
## 31 Guardian Mesmer Necromancer Ranger Warrior 2
## 32 Guardian Mesmer Ranger Ranger Thief 2
## 33 Mesmer Mesmer Necromancer Ranger Thief 2
## 34 Mesmer Mesmer Ranger Ranger Thief 2
## 35 Mesmer Necromancer Ranger Ranger Ranger 2
## 36 Mesmer Necromancer Ranger Thief Warrior 2
## 37 Mesmer Ranger Ranger Thief Warrior 2
## 38 Necromancer Necromancer Ranger Ranger Warrior 2
## 39 Necromancer Ranger Ranger Thief Warrior 2
As the table shows, there were 39 team configurations that were repeated more. than once.
In the dataset there are 25 teams that has at least 3 characters playing the same class. Of those 25 teams, 8 won the matches.
These are the composition of those teams.
## class.1 class.2 class.3 class.4 class.5 result
## 40 Ranger Thief Ranger Thief Thief 0
## 104 Necromancer Necromancer Ranger Necromancer Ranger 0
## 121 Necromancer Ranger Warrior Ranger Ranger 0
## 122 Thief Ranger Ranger Warrior Ranger 0
## 129 Guardian Guardian Guardian Necromancer Thief 0
## 165 Guardian Mesmer Thief Guardian Guardian 0
## 196 Guardian Ranger Ranger Guardian Ranger 0
## 219 Ranger Necromancer Ranger Ranger Mesmer 0
The last section of this analysis introduces two classification models, naive Bayes classifier and random forest. These two models will learn from our data with the purpose of classifying the outcome of a match in either ‘victory’ or ‘defeat’, based on the composition of the team.
The basic idea behind these classifiers could be divided in two parts. During the first part, often called the training part, the model is feeded with data that already have the outcome of the match. In other words, we are training the model. The table below shows an example of an observation that will be used for training the model. This observation have the five members of the team (3 rangers, presented with the number 6 and two quitters, presented as 0), and the result, which is ‘lost’.
## class.1 class.2 class.3 class.4 class.5 result
## 1 6 6 6 0 0 lost
During the second part, after the training is completed, we will send differerent team compositions to the model and it will output the outcome of the match based on what it already learned. In other words, we will test the model using a completely new dataset.
The dataset was split 70% for training and 30% for testing.
The following table is a direct output that has information about the model and the results.
##
## Call:
## randomForest(formula = result ~ ., data = train.data, ntree = 70, proximity = TRUE)
## Type of random forest: classification
## Number of trees: 70
## No. of variables tried at each split: 2
##
## OOB estimate of error rate: 51.85%
## Confusion matrix:
## lost won class.error
## lost 43 42 0.4941176
## won 42 35 0.5454545
The random forest model reports an error rate of 51.85% during the training phase, in other words 51.85% of the classifications performed during training, were incorrect. More details about this can be found at the confusion matrix at the bottom of the table.
After the training phase, we tested the model using the test dataset. The classification error obtained in this phase is 52.63%. This is the confusion matrix for the test dataset:
## lost won class.error
## lost 20 26 0.5652174
## won 14 16 0.4666667
We will use the same training and testing dataset with a different model, a naive Bayes classifier. This is the confusion matrix and classification error for this model.
## lost won class.error
## lost 46 49 0.5157895
## won 39 28 0.5820896
The naive Bayes model’s classification error found during training is 54.32%.
## lost won class.error
## lost 16 26 0.6190476
## won 18 16 0.5294118
The classification error reported for the testing set on the naive Bayes model is 57.89%.
By looking at the classification errors from both models, we concluded that the random forest model performed better than the naive Bayes with a training classification error of 51.85% and a testing classification error of 52.63% in comparison to 54.32% and 57.89%.
However, our answer to the question “is it possible to predict the outcome of a match based on the structure of the team?”, would be no. A classification error of around 50% means that the chance a team has of winning is of 50%, which is logical in this type of game since there are 2 teams in one match.
To conclude, we would like to repeat that we had a very small data sample of just 119 matches (238 teams), so with more data, the classification error could improve (or not).