How to use Python and Pandas to pick a Fantasy Sports team

7 min readApr 6, 2020

Recently, I had a discussion about fantasy sports at work, last year I did not win the office Fantasy Rugby 6 Nations Tournament despite knowing much more about rugby than everyone else in the office.

I believed this was due to bad luck while my co-workers believed it was because I was bad at fantasy sports and they were much better. To prove them wrong, this year I decided to enter a team that would be picked randomly and see if it could beat them (and me).

The 6 Nations is an annual rugby tournament between Europe’s top 6 national teams, each week there are 3 games played and there are 5 game weeks.

The fantasy rugby rules were:

Each player gains or loses points based on their actions in game such as scoring tries, making tackles or losing the ball.
You pick 15 players and your points total is the sum of each players points.
There is also a captain that you choose that receives double points.
There are no substitutes.
Each player has a position you must pick them in and a cost.
The total cost of a team must not exceed €100m.

I decided not to pick my team truly randomly and instead decided to use weighted randomness. The easiest way to think of weighted randomness is to imagine picking a ticket with a players name on it out of a hat, but players can have more than 1 ticket based on certain criteria. I decided to award “tickets” as follows:

Only players who are starting can receive tickets and they start with 1.
All players who are playing at home receive 1 additional ticket.
If a player plays for one of the 4 favourite teams (England, France, Ireland or Wales) they receive 1 additional ticket.
If a player is playing against Scotland, an 80/1 underdog to win the tournament, they receive 1 additional ticket.
If a player is playing against Italy, a 300/1 underdog to win the tournament, they receive 2 additional tickets.

It is possible for a player to have 5 tickets and another to have 1, making them 5 times more likely to be picked.

Part 1: Preparing the Data

The first thing I did was to collect the data on which players are starting, where are they playing, who they play for and who they are playing against from ESPN scrum and turn it into a data frame.

Starting Players for week 1 of the 6 Nations

I next collected player data from the fantasy rugby website and turned it into a data frame.

Luckily, most rugby outlets keep every player’s name unique so even though there is a James Ryan and a John Ryan playing for Ireland, their names are recorded as Ja Ryan and Jo Ryan. This lets me perform an inner merge on the name and to get only the players starting and the additional fantasy data columns.

teams_df = pd.read_csv('team_sheets.csv')
cost_df = pd.read_csv('player_costs.csv')
merged_df = pd.merge(teams_df, cost_df, how='inner', on=['name']

It is common in rugby for coaches to play players out of position, above you can see that although the fantasy site has designated G North as an Outside-Back, he is actually playing as a Centre, it doesn’t affect picking the team as he will still be receiving points as an Outside Back.

Next I need to add the weights, or “tickets” as I described above. I used the following code to add the weights:

def add_weights(df):
    df1 = pd.DataFrame(columns=df.columns.tolist())
    for x in df.iterrows():
        weight = 0.1
        if x[1]['venue'] == 'home':
            weight += 0.1
        if x[1]['team'] == 'Ireland' or x[1]['team'] == 'England':
            weight += 0.1
        elif x[1]['team'] == 'Wales' or x[1]['team']=='France':
            weight += 0.1
        if x[1]['opposition'] == 'Italy':
            weight += 0.2
        elif x[1]['opposition'] == 'Scotland':
            weight += 0.1
        
        x[1]['weight'] = weight
        df1 = df1.append(x[1])
    return df1

It is a little unorthodox, generally you don’t want to iterate through an entire data frame as if the data frame is large it gets very slow, however, it does the job and the data frame is quite small, only 90 rows.

The final data frame for week 1 resulted in Welsh players having a weighting of 0.5, Italians 0.1, Irish 0.4, Scottish 0.1, French 0.3 and English 0.2. So a Welsh player will have 5 times the amount of “tickets” as an Italian player.

Part 2: Picking the Team

Next is to actually pick the team. The key method is dataframe.sample which will pick out players randomly with weight taken into account when the “weights” key word is set, however, I need to account for the games rules i.e. picking players in position and within the €100M budget. I achieve this with the following code.

def transform():    #part 1
    teams_df = pd.read_csv('team_sheets.csv')
    cost_df = pd.read_csv('player_costs.csv')
    merged_df =pd.merge(teams_df, cost_df, how='inner', on=['name'])
    weighted_df = add_weights(merged_df)
    #part 2
    while True:
        team = pick_players(weighted_df)
        if check_team(team, cost_df):
            break      
    return teamdef pick_players(df):
    team = []
    team.extend(pick_player(df,[],'Prop',2))
    team.extend(pick_player(df,[],'Hooker',1))
    team.extend(pick_player(df,[],'Lock',2))
    team.extend(pick_player(df,[],'Back-Rower',3))
    team.extend(pick_player(df,[],'Scrum-Half',1))
    team.extend(pick_player(df,[],'Fly-Half',1))
    team.extend(pick_player(df,[],'Centre',2))
    team.extend(pick_player(df,[],'Outside-Back',3))
    return teamdef pick_player(df, current_team, position, num_of_players=1):
    pos = df.loc[df.position==position]
    player = pos.sample(num_of_players, weights=pos.weight)
    player_names = player.name.tolist()
    if player_names in current_team:
        return False
    return player_namesdef check_team(team, cost_df):
    cost = 0.0
    for player in team:
        cost_row = cost_df.loc[cost_df.name==player].cost
        cost += float(cost_row.tolist()[0])
    return cost <100

In the above code I use the pick players method to pick the number of players in their correct position to come up with a valid team and then I check the cost to make sure the team is within the cost. If it fails then I simply repeat until the team is within cost. Some code is split into multiple lines to make it more readable.

Part 3: Swapping Players

The 6 nations normally goes on for 5 game weeks and between each week you are allowed to make 3 changes. I decided to swap players based on the following criteria:

· Swap out any 3 players not playing.

· If there is less than 3 players not playing, swap out 3, less the number of players not playing, randomly.

· Pick 3 players using weighted randomness.

First thing to do is to collect the list of players I currently have into a data frame.

Next, I collect the team sheets for the next week and see which players aren’t playing.

If there is 3 players not playing, I pick those players to sub out. Or else, I pick out players randomly to sub out. I store their positions and then drop them from the current team data frame.

I then recreate the weighted data frame from part 1 and pick out players to replace those picked out using the methods described above. There is an additional check to make sure the player picked out is not already in the team that has been picked.

def update_team(current_players, game_week=2):
    indices = []
    position_to_swap_in = []
    player_to_swap_in = []    teams_df = pd.read_csv('team_sheets.csv')
    teams_df = df1.loc[df1.gameweek==game_week]
    cost_df = pd.read_csv('player_costs.csv')
    ps_in = pd.merge(current_players, df1, how='inner', on=['name'])
    ps_n_in=current_players[~current_players.name.isin(ps_in.name)]
    ps_to_switch = ps_n_in.name.tolist()    if len(ps_to_switch) >= 3:
        ps_to_switch = ps_to_switch[-3:]
    elif len(ps_to_switch) < 3:
        players = ps_in.sample(3- len(ps_to_switch))
        players = players.name.tolist()
        ps_to_switch.extend(players)
    for player in ps_to_switch:
        drop= current_players.loc[current_players.name==player]
        position_to_swap_in.append(drop.position.tolist()[0])
        rows_to_drop=drop.index.values.astype(int)[0]
        indices.append(rows_to_drop)
        current_players.drop(rows_to_drop, inplace=True)    current_team = current_players.name.tolist()
    merged_df= pd.merge(teams_df, cost_df, how='inner', on=['name'])
    weighted_df = add_weights(merged_df)
    for item in position_to_swap_in:
        while True:
            player = pick_player(merged_df, current_team, item)
            check = check_team(current_team,cost_df)
            if player not in current_team and check:
                break
        player_to_swap_in.append(player)
    current_team.extend(player_to_swap_in)
    return current_team

How did I do?

After 3 game weeks the 6 Nations was suspended due to concerns around Covid-19, however, the team I picked randomly had the highest total in the office for all 3 game weeks that were played and a place in the top 1000 players.

Conclusions

I believe that it’s possible to get a small increase in points based on knowing the players but I think so much of Fantasy Sports is left up to chance it is impossible to “be good” at it, for example, if you decided you wanted to pick a Welsh winger and knew a lot about rugby, you would know that Johnny McNicholl has been in great form for his club team while Josh Adams has had injury issues and had barely played for his, however in week 1, McNicholl scored joint lowest of any player who played with 1 point, while Adams scored highest of all players with 51.

It is also not uncommon for players to withdraw just before the game, for example, Damian Penaud for France was one of the most popular picks for round 1, I picked him in my normal team but he withdrew before the game and received 0 points. With a little bit of bad luck it is quite easy for someone who picked a team on better reasoning to lose, especially in a tournament with only 5 game weeks.