import numpy as np
import pandas as pd
import plotly.express as px
= pd.read_csv("https://raw.githubusercontent.com/natekratzer/nwsl/main/data/team/2021_season_ovr.csv")
df 'goals_scored_per_match'] = (df['GF']/df['MP']).round(2)
df['goals_allowed_per_match'] = (df['GA']/df['MP']).round(2)
df['mean']= df['GF'].mean()/24 #24 games in the season
df[
= px.scatter(df,
fig ="goals_scored_per_match",
x= "goals_allowed_per_match",
y = dict(goals_scored_per_match = "Scored", goals_allowed_per_match = "Allowed"),
labels = 'simple_white',
template = "NWSL Goals Per Match, 2021",
title = 'Abbr',
text = 500,
width = 500
height
)
= fig.update_traces(textposition = 'top center')
fig = fig.update_xaxes(range = [0.6, 1.8], nticks = 7)
fig = fig.update_yaxes(range = [0.6, 1.8], nticks = 7,
fig = "x", # make the y axis tied to X
scaleanchor = 1)
scaleratio
= fig.add_hline(y = 1.15, opacity = 1, line_width = 2, line_dash = 'dash', line_color = 'grey')
fig = fig.add_vline(x = 1.15, opacity = 1, line_width = 2, line_dash = 'dash', line_color = 'grey')
fig
= fig.add_annotation(x = 1.6, y = 0.63, text = "Data from fbref.com", showarrow = False)
fig
fig.show()
Racing Louisville in their first NWSL season
The NWSL Challenge Cup kicked off earlier this week, so I took quick look at some of the stats from last season. I got the data from FBREF and made some quick graphs - I’m including the code here, but feel free to ignore it if you’re only interested in the football statistics.
I wanted to know if the NWSL had teams that were focused on offense or defense, so I looked first at average goals scored and allowed per game. On average, teams score 1.15 goals per game, so I added those as reference lines.
For the most part teams weren’t really good at one end and not the other. The closest any team came to that is Houston, which is above average on offense and below average on defense. For the most part though, offensive and defensive skill go together.
We’d expect average goals to matter a lot, but soccer is a pretty high variance sport, so I also wanted to know how well goal differential predicted results. Here results are the points that determine standings (3 pts for a win, 1 for a draw, 0 for a loss).
= px.scatter(df,
fig = "GD",
x= "Pts",
y ='ols',
trendline= dict(GD = "Goal Differential", Pts = "Points"),
labels = 'simple_white',
template = "NWSL Goals and Results, 2021",
title = 'Abbr'
text
)
= fig.update_traces(textposition = 'top center')
fig
fig.show()
As we’d expect they track pretty neatly. Washington and Chicago had slightly better seasons than you’d expect from goal differential alone, but nothing wild.
Racing Louisville and Homefield Advantage
Racing Louisville is my team, so I also pulled some of their game specific data and here again started looking at goals. In this case I was curious about how much of a homefield advantage they have.
= pd.read_csv("https://raw.githubusercontent.com/natekratzer/nwsl/main/data/team/lou_games.csv")
df2
= df2[df2['Comp'] == 'NWSL'] #exclude challenge cup which is in this dataset
df2
# Reformat to long
= df2[['Venue', 'GF', 'GA']].melt(id_vars = ['Venue'], value_vars = ['GF', 'GA'])
goals_df
# recode GF and GA to Scored and Allowed
= ['GF', 'GA']
old_list = ['Scored', 'Allowed']
new_list 'variable'] = goals_df['variable'].replace(old_list, new_list)
goals_df[
# Group by and summarize into new dataframe
= (goals_df.groupby(['Venue', 'variable'])['value']
grouped_df
.mean()='Goals')
.to_frame(name
.reset_index())
# Visualize
= px.bar(grouped_df,
fig = 'variable',
x = 'Goals',
y = 'variable',
color = 'Venue',
facet_col = dict(variable = 'Allowed/Scored', Goals = 'Goals Per Match'),
labels = 'simple_white',
template = "Racing Louisville Struggles with Defense on the Road")
title
fig.show()
Here we do see a clear offense/defense distinction, which is that Louisville’s defense collapses during road games. The offense is slightly worse (0.75 goals per match compared to 1.0 at home), but the defense gives up over 2 goals a game on average during away matches.
Not surprisingly, Louisville also wound up with a much worse away record (1-3-8) than home record (4-4-4)
= (df2.groupby(['Venue', 'Result'])['Date']
record_df
.count()= 'Matches')
.to_frame(name
.reset_index())
= px.bar(record_df,
fig = 'Result',
x = 'Matches',
y = 'Result',
color = 'Venue',
facet_col #labels = dict(variable = 'Allowed/Scored', Goals = 'Goals Per Match'),
= 'simple_white',
template = "Racing Louisville is Much Better at Home")
title
fig.show()