One of the greatest mysteries of this season was who got which prizes in races. The questions of who qualifies where, who gets what prize have been popping up regularly in the Community Slack channels. Learn how I used SageMaker Notebook, Python and Pandas to determine the winners by reading data from Jochem Lugtenburg’s deepracer-race-data repository.
This article has been written and tested using Jupyter Notebook which you can use through AWS SageMaker Notebook service available in the AWS Console. At the end of this article you will find links to the repository holding this notebook and instructions on how to load it for your own experimentation.
AWS DeepRacer is a 1/18th scale autonomous race car but also much more. It is a complete program that has helped thousands of employees in numerous organizations begin their educational journey into machine learning through fun and rivalry.
Visit AWS DeepRacer page to learn more about how it can help you and your organization begin and progress the journey towards machine learning.
Join the AWS Machine Learning Community to talk to people who have used DeepRacer in their learning experience.
AWS offer two types of prizes in the Pro Division of AWS:
- First Prize: AWS DeepRacer Championship qualification (24 total, 3 per month),
- Second Prize: AWS DeepRacer EVO (80 total, 10 per month),
But not everything is straightforward. Good racer could not get two cars or go win two slots in the Championships. There is “Prize condition”: Each participant may receive a maximum of 1 of each prize type during the 2021 season.
Let’s try and identify the winners using Python, Pandas and Jupyter Notebook.
Fetching the data for this task could be a labour intensive task. There is no official DeepRacer API available.
Luckily Jochem Lugtenburg, a developer, AWS Community Builder and a DeepRacer expert in the AWS Machine Learning Community, has prepared a project gathering the data and exposing it to everyone in a GitHub repository: deepracer-race-data. We will use final race data for each month from this project.
Let’s import needed dependencies and load the data:
import pandas as pd import numpy as np from IPython.display import display, HTML import urllib.request from urllib.error import HTTPError pd.set_option("display.max_rows", None, "display.max_columns", None)
There is a file leaderboards.csv in the GitHub repository that contains information about all competitions: start date, end date, number of participants, rules, etc. We need to find and extract information about this season races. Let’s take a closer look to this summary CSV file:
df_leaderboards = pd.read_csv('https://raw.githubusercontent.com/aws-deepracer-community/deepracer-race-data/main/raw_data/leaderboards/leaderboards.csv') df_leaderboards.head()
You need to manually add data to variable month_races which is column “Name” from df_leaderboards: leaderboards.csv
months_races = ['March Qualifier', 'April Qualifier', 'May Qualifier', 'June Qualifier', 'July Qualifier', 'August Qualifier', 'September Qualifier', 'October Qualifier'] race_type = ['HEAD_TO_HEAD_RACING', 'TIME_TRIAL', 'OBJECT_AVOIDANCE'] # Get Arn URLs according to race type: HEAD_TO_HEAD_RACING – month qualifier leader board and OBJECT_AVOIDANCE – final race month_leaderboard_arn = df_leaderboards.loc[(df_leaderboards['Name'].isin(months_races)) & (df_leaderboards['RaceType'] == "HEAD_TO_HEAD_RACING")]['Arn'].values month_final_arn = df_leaderboards.loc[(df_leaderboards['Name'].isin(months_races)) & (df_leaderboards['RaceType'] == "OBJECT_AVOIDANCE")]['Arn'].values # Next we need to get raw data for final races and month qualifier leader. We put dataframes to lists: one for month qualifier and second for winners path = "https://raw.githubusercontent.com/aws-deepracer-community/deepracer-race-data/main/raw_data/leaderboards/" suffix = "/FINAL.csv" list_end_of_month= for arn_leaderboard in month_leaderboard_arn: try: list_end_of_month.append(pd.read_csv(path+month_leaderboard_arn[np.where(month_leaderboard_arn == arn_leaderboard)].replace(":", "%3A")+suffix)) except urllib.error.HTTPError as err: list_end_of_month.append(pd.DataFrame(columns=['Alias', 'UserId', 'Rank', 'Month'])) list_finale= for arn_win in month_final_arn: try: list_finale.append(pd.read_csv(path+month_final_arn[np.where(month_final_arn == arn_win)].replace(":", "%3A")+suffix)) except urllib.error.HTTPError as err: list_finale.append(pd.DataFrame(columns=['Alias', 'UserId', 'Rank', 'Month'])) pass
Let’s mention this bit:
try: except urllib.error.HTTPError as err:
that was used due to Final October race still upcoming at the time of writing this article. Because of that the file FINAL.csv was not available and we protected ourselves by adding an empty dataframe so that we didn’t have to handle a missing entry in a list in code that followed. Right now we could leave it out.
Let’s have a quick look at what information we can get from the repository:
# Index 0 is foe March, respectively 1 is for April etc. list_finale.head()
In reality we only care about three pieces of data: Alias, Rank and and UserId.
Let’s have a look at a narrowed dataset:
list_finale[['Alias', 'UserId', 'Rank']].head()
Let’s start with the finalists. After each month’s Pro Division race, top 16 racers who have not yet qualified into the championships compete in a finale race. Top 3 racers from each such race qualify into the championships.
df_month_finale dataframes above hold results for those races. They are sorted by Rank which makes it easier for us as we don’t have to think about reordering the records.
To build a list of winners, for each month we need to take the top 3 racers and append them to a list of racers. Luckily we don’t need to worry about duplicate racers either as none of the previous winners take part in the finales any more.
This is our method to determine the finalists:
Month = ['March', 'April', 'May', 'June', 'July', 'August', 'September', 'October'] def championship_racers(): df_winners = pd.DataFrame(columns=['Alias', 'UserId', 'Rank', 'Month']) for idx, finale in enumerate(list_finale): df_winners = df_winners.append(finale[['Alias', 'UserId', 'Rank']].iloc[:3]).reset_index(drop=True) df_winners['Month']= df_winners['Month'].fillna(Month[idx]) return df_winners
A few words on what we did here:
- We’ve prepared an empty Pandas dataframe with columns
- We enumerated over the list of dataframes –
enumeratemethod iterates over the elements of a list but it also provides an index of the element in that list. This helps us add the month name as an extra column
- We’ve added a month column, just to know in which month this person qualified
- We’ve simply appended top three rows from each finale dataframe into the winners dataframe. We used
ilocto limit the number of rows
- We’ve reset the index values. Without this the new dataframe with winners would have the index all messed up
- To add the months we use a
fillnamethod. It’s pretty handy as it only sets values where they are missing
Let’s see the results:
Let’s imagine that the month just finished and the finale race has not happened yet but we want to see who will compete in it. Let’s see who compete in October final race:
list_end_of_month[['Alias', 'UserId', 'Rank']].head(n=16)
Well, this is wrong – I can see people who have already qualified into the championships on this list.
To determine the finale racers we need to use October race results, but we first need to remove those who already qualified. Let’s try that:
def october_finale_racers(): championship_racers_so_far = championship_racers()[['Alias', 'UserId', 'Rank']] return list_end_of_month[['Alias', 'UserId', 'Rank']].append(championship_racers_so_far).drop_duplicates('UserId', keep='last').reset_index(drop=True).head(n=16)
We’ve use a new method:
drop_duplicates. It takes values of specified column and by default keeps the first one and discards the rest. We used a little trick here – by putting the championship racers at the end and telling the
drop_duplicates to only keep the last appearance we make sure that if the they raced in October, their entry will be dropped, leaving only those who can join the top finale race.
UserId is a unique and unchangeable identifier for each racer. Starting this season users can change their Aliases (which comes as a great relief to those racing under their often very creative model names) so trying to enforce uniqueness using the Alias values would not work.
Let’s run this:
AWS DeepRacer Evo winners
Now time for the Evo winners. Here rules get a little bit more complicated. As written above, it is top 10 racers that win the car and only once. In March we care about top 10 finale racers. In April – some from finale, but maybe we have someone from below the top 16 that wins one?
We need to take the month’s race results into consideration but also the fact that someone from places 10-16 in that race might be in top 10 in the finale.
Let’s have a look at the code and then we’ll look at what’s going on in here:
def car_winners(): df_car_winners = pd.DataFrame(columns=['Alias', 'UserId', 'Rank', 'Month']) for idx, list_file in enumerate(list_end_of_month): df_car_winners = df_car_winners.append(list_finale[idx][['Alias', 'UserId', 'Rank']]).append(list_file[['Alias', 'UserId', 'Rank']]).drop_duplicates('UserId').reset_index(drop=True).iloc[:(idx+1)*10] df_car_winners['Month']= df_car_winners['Month'].fillna(Month[idx]) return df_car_winners
Let’s focus at this bit as it’s the most important here:
df_car_winners.append(list_finale[idx][['Alias', 'UserId', 'Rank']]).append(list_file[['Alias', 'UserId', 'Rank']]).drop_duplicates('UserId')
For each month we take the car winners so far, append the finale results to it, then the month race results, and drop the duplicates leaving only the first occurrence of a UserId value. This means that if someone is already a car winner, their entries will be removed from finale and month race results. Likewise, if they were finale racers, their month race results will be removed.
Effectively this means that we build a list containing all the car winners so far and a list of rank-ordered performers in a given month who have not yet won a car. All we need to do now is to drop everyone except of the top ten for a given month. Since each month the list is growing, we perform
Let’s see the results:
Wildcard race winners
AWS always give a last minute opportunity to qualify. Normally this would be a live race at the re:Invent conference but since everything is taking place virtually, so is this Wildcard race. Top five participants take part in the championships, but we need to sift out those who race but already had their places secured.
The race just finished and we wanted to know who qualified. We can either add FINAL.csv in the months_races list or get raw information from the Github table. Click “raw” and copy URL:
Now all that’s left is to load the file, remove the finalists so far and list top five racers:
wildcard_open = pd.read_csv("https://raw.githubusercontent.com/aws-deepracer-community/deepracer-race-data/main/raw_data/leaderboards/arn%3Aaws%3Adeepracer%3A%3A%3Aleaderboard/08db3006-f491-48b4-a238-926c6465e5d8/FINAL.csv") def wildcard_qualifier(df_wildcard): winners = championship_racers() wildcard_5 = df_wildcard[['Alias', 'UserId', 'Rank']].append(winners).drop_duplicates('UserId', keep='last').reset_index(drop=True).head(n=5) wildcard_5['Month']= wildcard_5['Month'].fillna("wildcard") return wildcard_5
Let’s see who qualified through the Wildcard race:
There are things we cannot verify ourselves that may greatly influence the above results. AWS perform eligibility checks on those racers and the racers themselves need to claim the prize. This means that racers can get removed from this list and we have no way do find out.
This article has been prepared as a Jupyter Notebook which we have shared on GitHub. You can download it yourself and play with it a little here: https://github.com/aws-deepracer-community/deepracer-educational-resources/
I guess you want to play around with notebook. You can easily do it by running this into AWS Sagemaker.
First, login into your ASW account and navigate to Amazon SageMaker:
Navigate to Notebook – Notebook Instances and Create notebook instance (orange button on a right top corner). Choose a name for Notebook instance other parameters you can leave as default value.
Next, in the Git Repositories section select “Clone a public Git repository to this instance only” and provide the URL of this article’s repository: https://github.com/aws-deepracer-community/deepracer-educational-resources.git
Wait a couple of minutes for Instance to be created.
Now you can open the notebook and make changes to it. If you get an error about kernel not being found, select conda_python3
Do not forget to Stop Notebook instance when you finish to prevent unexpected billing. YOU WILL GET BILLED IF YOU DO NOT.