CSGO team analysis with Python and Pandas part 01

Thibaut Cordenier
5 min readFeb 25, 2020

With IEM Katowice’s final showdown coming in this week end, I thought this could be a nice opportunity to examine one of the key components of the esports scene : Counter Strike Global Offensive.

You can find all the different parts in the github repository link and the Jupyter Notebook for this part here.

Part 01 : fetching datas

Importing librairies

import os, os.path
from datetime import date
import pandas as pd
from pandas.io.json import json_normalize
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib.dates as mdates

from tqdm import tqdm, tqdm_notebook
import requests
import json
import re

Preparing the request to the API

https://liquipedia.net/counterstrike/Main_Page hosts lots of informations about Counter Strike : teams, tournaments, players, hosts.

We’ll gather them with the help of its API. You can see the help page here and terms of use here. :

Since we are interested in teams performance, social media presence and the names of their sponsors we will fetch all categories that are related to this subjects. Which leads to this arguments that we will store in a variable called “attributes”.

attributes  = ["?Has name",
"?Is active","?Was created",
"?Was disbanded", "?Has earnings",
"?Has location","?Has region",
"?Has sponsor",
"?Has earnings in 2012",
"?Has earnings in 2013",
"?Has earnings in 2014",
"?Has earnings in 2015",
"?Has earnings in 2016",
"?Has earnings in 2017",
"?Has earnings in 2018",
"?Has earnings in 2019",
"?Has site",
"?Has twitter",
"?Has twitch stream",
"?Has instagram",
"?Has facebook",
"?Has youtube channel",
"?Has faceit profile",
"?Has vk",
"?Has esea id",
"?Has steam profile"]

Since these arguments require a “|” to separate them, we will add it with list comprehension and then regroup all the list elements in one single item ready to be passed in our next request.

attributes = ["|" + attribute for attribute in attributes]
attributes = ''.join(attributes)

Which gives us this :

'|?Has name|?Is active|?Was created|?Was disbanded|?Has earnings|?Has location|?Has region|?Has sponsor|?Has earnings in 2012|?Has earnings in 2013|?Has earnings in 2014|?Has earnings in 2015|?Has earnings in 2016|?Has earnings in 2017|?Has earnings in 2018|?Has earnings in 2019|?Has site|?Has twitter|?Has twitch stream|?Has instagram|?Has facebook|?Has youtube channel|?Has faceit profile|?Has vk|?Has esea id|?Has steam profile'

Requesting and managing datas

To fetch Liquipedia’s data we need with the “.get()” method of the requests module. And then, we will load and read the results with the json module.

The first part of our request is composed of four parts.

  • The first one : “https://liquipedia.net/counterstrike/api.php?action=ask&query=", which is the adress of the api and the actions it will carry, here “ask” (“API module to query Semantic MediaWiki using the ask language”) and “query” (“Fetch data from and about MediaWiki”).
  • The second one : “[[category:CSGO%20Teams]]”, which is the category we want to examine, in our case the CS:GO teams.
  • The thrid one : wich is our attributes variable we created earlier.
  • The fourth one : “|limit=1500&format=json” which specifies the maximum number of results we want and the format we need. Here a json file.
api_request = requests.get("https://liquipedia.net/counterstrike/api.php?action=ask&query=[[category:CSGO%20Teams]]"+attributes+"|limit=1500&format=json")

api = json.loads(api_request.content)

We can create a dataframe out of our json file, but we only need to keep the [“results”] part of it, and get rid of the [“query”] entry.

We will do so with by looping through the results as we create a list contening one dictionnary for each team.

teams = []

for result in api["query"]["results"]:
Name_team = result
teams.append(api["query"]["results"][Name_team]['printouts'])

Let’s have a quick glance at the informations we gathered so far :

#showing the two first teams
teams[:2]
[{'Has name': ['/10/'],
'Is active': ['f'],
'Was created': ['2013'],
'Was disbanded': [{'timestamp': '1398556800', 'raw': '1/2014/4/27'}],
'Has earnings': [3900],
'Has location': ['Germany'],
'Has region': ['Europe'],
'Has sponsor': [],
'Has earnings in 2012': [0],
'Has earnings in 2013': [3900],
'Has earnings in 2014': [0],
'Has earnings in 2015': [0],
'Has earnings in 2016': [0],
'Has earnings in 2017': [0],
'Has earnings in 2018': [0],
'Has earnings in 2019': [0],
'Has site': [],
'Has twitter': [],
'Has twitch stream': [],
'Has instagram': [],
'Has facebook': ['https://facebook.com/team10cs'],
'Has youtube channel': [],
'Has faceit profile': [],
'Has vk': [],
'Has esea id': [],
'Has steam profile': []},
{'Has name': ['100 Thieves'],
'Is active': ['t'],
'Was created': ['2017-11-20 <br>[[File:Csgo icon.png|link=Counter-Strike: Global Offensive]]: 2017-12-12'],
'Was disbanded': [],
'Has earnings': [67000],
'Has location': ['United States'],
'Has region': ['North America'],
'Has sponsor': ['[https://www.rocketmortgage.com/ Rocket mortgage]<br />[https://www.redbull.com/us-en/ Red Bull]<br />[https://stockx.com/ stockX]<br />[https://www.totinos.com/ Totinos]<br />[http://cash.app/ cashapp]'],
'Has earnings in 2012': [0],
'Has earnings in 2013': [0],
'Has earnings in 2014': [0],
'Has earnings in 2015': [0],
'Has earnings in 2016': [0],
'Has earnings in 2017': [0],
'Has earnings in 2018': [0],
'Has earnings in 2019': [67000],
'Has site': ['https://www.100thieves.com/'],
'Has twitter': ['https://twitter.com/100Thieves'],
'Has twitch stream': [],
'Has instagram': ['https://www.instagram.com/100thieves'],
'Has facebook': ['https://facebook.com/100Thieves'],
'Has youtube channel': ['https://www.youtube.com/100thieves'],
'Has faceit profile': [],
'Has vk': [],
'Has esea id': [],
'Has steam profile': []}]

Looks good, now we can create a dataframe that will be easier to analyse with the remaining teams.

Creating a dataframe

CSGO_teams = pd.DataFrame(teams)
CSGO_teams.head()
As the scrollbar suggests, there are more columns than displayed on this pictures.

Cleaning the dataset will be the main focus of the next part — and you can check it here. For now, we’ll just remove brackets and rearrange the columns for them to be ordered the way we passed arguments in our queries.

#removing brackets and backslashes
CSGO_teams = CSGO_teams.applymap(lambda x: str(x).replace("[","").replace("]","").replace("\'",""))

#creating a list with the columns in the good order
cols = ["Has name",
"Is active","Was created",
"Was disbanded", "Has earnings",
"Has location","Has region",
"Has sponsor",
"Has earnings in 2012",
"Has earnings in 2013",
"Has earnings in 2014",
"Has earnings in 2015",
"Has earnings in 2016",
"Has earnings in 2017",
"Has earnings in 2018",
"Has earnings in 2019",
"Has site",
"Has twitter",
"Has twitch stream",
"Has instagram",
"Has facebook",
"Has youtube channel",
"Has faceit profile",
"Has vk",
"Has esea id",
"Has steam profile"]

CSGO_teams = CSGO_teams[cols]

Now we can export the dataframe in a csv format and move to the next part of our CS:GO teams analysis.

# exporting the csv
CSGO_teams.to_csv(os.getcwd() + "\%s.csv" % "CSGO_teams",index=False)

--

--

Thibaut Cordenier

I love UX, I love Data and use them both to make things better. Reach me on Twitter : @T_Cordenier