As a follow up of the report I did previously, titled PvP in Guild Wars 2: A data analysis, I decided to run the same machine learning algorithms used in said report, in another framework, Apache Spark, to see how they differ from each other. The original purpose why this was done on the previous report, was to verify if it is possible to predict the outcome of a match using the composition of the team.
The tests were done on Apache Spark 1.4.0, using the Python API.
The original analysis can be found here: PvP in Guild Wars 2: A data analysis
The dataset used is made of 238 team compositions, taken from 119 matches done in Conquest (Guild Wars 2). The file is made of 6 columns; the first five columns represent the five members of the team and the sixth one is the outcome of the match.
On the previous analysis, the classification error of the random forest model on the test set, was 52.63%. On Spark, the classification error of the model was 47.30%. The naive Bayes classifier also showed a smaller error in comparison with the previous test. Previously, the classification error was 57.89% and the model done in Spark, reported an error of 54.05%.
To summarize, both models showed a smaller classification error on Spark. However, to obtain a more fair comparison, the proper thing would be to perform several tests both in R and Spark, get the averages and compare them.