In this post I will use the NBA API to access shot chart data and use it to make some cool plots based on the shot zone infromation which is available in the raw data.
I wrote a package in order to access the NBA api. It can be see on my github page (https://github.com/eyalshafran/NBAapi). This NBA package also includes some plotting features as I will show in this post. This package is an on going project which will be updated as I keep working on this blog.
import NBAapi as nba
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
import sys
from scipy import misc
from scipy.stats.stats import pearsonr
%matplotlib inline
First let's access the data and preview it:
shotchart,leagueavergae = nba.shotchart.shotchartdetail(season='2016-17') # get shot chart data from NBA.stats
shotchart.head()
Extracting zone based statistics for each player¶
Each player has a unique player ID and also a name (which might not be unique). It is possible to just work with the player ID but I find that it is less informative when looking at the data and therefore I'm creating a new column (called PLAYER) which incorporates both the player name and ID.
I'm going to create a list of tuples with zone names which will be used later.
The shot zone can be found using the combination of the 'SHOT_ZONE_RANGE' and 'SHOT_ZONE_AREA' columns. I will also use the 'SHOT_MADE_FLAG' columns to see whether the shot was made or not. I'm going to use the groupby method in order to get a dataframe with zone based infromation for each player. The aggergator size will show us how many times a player shot from each zone and whether they made it or not:
shotchart['PLAYER'] = zip(shotchart['PLAYER_NAME'],shotchart['PLAYER_ID'])
zones_list = [(u'Less Than 8 ft.', u'Center(C)'),
(u'8-16 ft.', u'Center(C)'),
(u'8-16 ft.', u'Left Side(L)'),
(u'8-16 ft.', u'Right Side(R)'),
(u'16-24 ft.', u'Center(C)'),
(u'16-24 ft.', u'Left Side Center(LC)'),
(u'16-24 ft.', u'Left Side(L)'),
(u'16-24 ft.', u'Right Side Center(RC)'),
(u'16-24 ft.', u'Right Side(R)'),
(u'24+ ft.', u'Center(C)'),
(u'24+ ft.', u'Left Side Center(LC)'),
(u'24+ ft.', u'Left Side(L)'),
(u'24+ ft.', u'Right Side Center(RC)'),
(u'24+ ft.', u'Right Side(R)'),
(u'Back Court Shot', u'Back Court(BC)')]
# Create dataframe with PLAYER as index and the rest as columns
zones = shotchart.groupby(['SHOT_ZONE_RANGE','SHOT_ZONE_AREA','SHOT_MADE_FLAG','PLAYER']).size().unstack(fill_value=0).T
zones.head()
The shot chart data does not say how many games each player played. We will use the player biostats data to get that infromation:
players = nba.player.biostats(season='2016-17')
players['PLAYER'] = zip(players['PLAYER_NAME'],players['PLAYER_ID'])
players.set_index('PLAYER',inplace=True)
players.head()
We will need to merge the GP column from the players dataframe with the zones dataframe that we created earlier. Since both dataframes have the same index we can use pandas join
GP = players.loc[:,['GP']] # create DataFrame with single GP column
GP.columns = pd.MultiIndex.from_product([GP.columns,[''],['']]) # change column to multiindex before join (prevents join warning)
zones_with_GP = zones.join(GP) # only inclued game played from players
zones_with_GP.columns = pd.MultiIndex.from_tuples(zones_with_GP.columns.tolist(),
names=['SHOT_ZONE_RANGE','SHOT_ZONE_AREA','MADE'])
zones_with_GP = zones_with_GP.sortlevel(0,axis=1) # sort columns for better performance (+ avoid warning)
zones_with_GP.head()
Let's do some plotting!¶
Which players takes the most shots per zone?¶
I already included some plotting tools in the package. For the court plot I used the following blog http://savvastjortjoglou.com/nba-shot-sharts.html. I made some changes to the court function (biggest change is working in feet instead of feet*10 which the shot chart location comes in).
I also have a plt.text_in_zone function which accepts a text and the zone tuple and writes the text in the specified zone.
We need to sum over the 0s (missed shot) and 1s (made shots) to get the total shots and divide by the number of game played.
path = os.path.dirname(nba.__file__) # get path of the nba module
floor = misc.imread(path+'\\data\\court.jpg') # load floor template
plt.figure(figsize=(15,12.5),facecolor='white') # set up figure
ax = nba.plot.court(lw=4,outer_lines=False) # plot NBA court - don't include the outer lines
ax.axis('off')
nba.plot.zones(lw=2,color='white',linewidth=3)
eligible = zones_with_GP.loc[:,'GP'].values > 10 # only include players which player more than 10 games
# we are going to use the zone_list to plot information in each zone
for zone in zones_list:
# calculate shots per game for specific zone and sort from highest to lowest
shots_PG = (zones_with_GP.loc[eligible,zone].sum(axis=1)/zones_with_GP.loc[eligible,'GP']).sort_values(0,ascending=False)
name = [] # will be used to store the text we want to print
# run a loop to find top 3 players
for j in range(3):
# create text
name.append(shots_PG.index[j][0].split(' ')[0][0]+'. ' + shots_PG.index[j][0].split(' ')[1]+':%0.1f' %shots_PG.values[j])
nba.plot.text_in_zone('\n'.join(name),zone,color='black',backgroundcolor = 'white',alpha = 1)
plt.title('Most Shots by Zone',fontsize=16)
plt.imshow(floor,extent=[-30,30,-7,43]) # plot floor
Which players have the highest FG% at every zone?¶
plt.figure(figsize=(15,12.5),facecolor='white')
ax = nba.plot.court(lw=4,outer_lines=False)
ax.axis('off')
nba.plot.zones(color='gray')
eligible = zones_with_GP.loc[:,'GP'].values > 10
for zone in zones_list:
# create new dataframe with total shot, shots per game and FG%
df = pd.concat([zones_with_GP.loc[eligible,zone].sum(axis=1),
zones_with_GP.loc[eligible,zone].sum(axis=1)/zones_with_GP.loc[eligible,'GP'],
100.0*zones_with_GP.loc[eligible,(zone[0],zone[1],1)]/zones_with_GP.loc[eligible,zone].sum(axis=1)],axis=1)
df.columns = ['SHOTS','SHOTS_PG','FGP']
# only include players that have a total of more than 10 shots or are in the top 100 in shots taken (from that zone)
top100 = df.loc[:,'SHOTS_PG'].sort_values(0,ascending=False)[100]
if zone != (u'Back Court Shot', u'Back Court(BC)'):
mask = (df.loc[:,'SHOTS_PG'] >= top100) & (df.loc[:,'SHOTS']>=10)
else:
mask = (df.loc[:,'SHOTS']>=2)
# sort by FG%
perc_leaders = df.iloc[mask.values,:].sort_values('FGP',ascending=False)
name = []
for j in range(3):
name.append(perc_leaders.index[j][0].split(' ')[0][0]+'. ' + perc_leaders.index[j][0].split(' ')[1]+': %0.1f (%d)' %(perc_leaders.ix[j,'FGP'],perc_leaders.ix[j,'SHOTS']))
nba.plot.text_in_zone('\n'.join(name),zone,color='black',backgroundcolor = 'white',alpha = 1)
plt.title('Highest Field Goal % by Zone',fontsize=16)
plt.text(-15,-7,'Player: FG % \n (total shots)',horizontalalignment='center')
I'm going to run the same analysis for the league average and therefore run the groupby without the PLAYER column. I also added another row calculating the FG%.
leagueaverage = shotchart.groupby(['SHOT_ZONE_RANGE','SHOT_ZONE_AREA','SHOT_MADE_FLAG']).size().unstack(fill_value=0).T
leagueaverage = pd.concat([leagueaverage,pd.DataFrame(leagueaverage.loc[1,:]/leagueaverage.sum(),columns=['FGP']).T])
np.round(leagueaverage,2) # round to make display nicer
I'm going to plot the FG% and the distribution of shots from each zone for the entire league
plt.figure(figsize=(15,12.5),facecolor='white')
ax = nba.plot.court(lw=4,outer_lines=False)
ax.axis('off')
nba.plot.zones(color='gray')
total_shots = leagueaverage.loc[0,:].sum()+leagueaverage.loc[1,:].sum()
for zone in zones_list:
name = 'FG%% - %0.1f \nDST - %0.1f' %(100*leagueaverage.loc['FGP',zone],100*(leagueaverage.loc[0,zone]+leagueaverage.loc[1,zone])/total_shots)
nba.plot.text_in_zone(name,zone,color='black',backgroundcolor = 'white',alpha = 1,fontsize=16)
plt.title('Shooting by Zone (League Average)',fontsize=16)
Kevin Durant¶
I'm going to do a similar analysis as the league average but for a specific player. I choose Kevin Durant but any player would work. The mask can also be done for a team instead of a player (which I will show later)
durant = shotchart.loc[shotchart['PLAYER_NAME']=='Kevin Durant',:] # create a dataframe that only includes Kevin Durant's shots
made = durant['SHOT_MADE_FLAG']==1 # mask for made shots
We can plot all the shots Durant made and missed but it is difficult to extract any information from these plots:
plt.figure(figsize=(15,12.5),facecolor='white')
ax = nba.plot.court(lw=4,outer_lines=True)
ax.axis('off')
nba.plot.zones(color='gray',linewidth=2)
plt.scatter(0.1*durant.loc[made,'LOC_X'],0.1*durant.loc[made,'LOC_Y'],color='blue',alpha=0.5)
plt.scatter(0.1*durant.loc[~made,'LOC_X'],0.1*durant.loc[~made,'LOC_Y'],color='red',alpha=0.5,marker='x')
I'm going to break down Durant's shoots by zone and compare it to the league average as done on the NBA website (http://stats.nba.com/)
durant_by_zone = durant.groupby(['SHOT_ZONE_RANGE','SHOT_ZONE_AREA','SHOT_MADE_FLAG']).size().unstack(fill_value=0).T
durant_by_zone= pd.concat([durant_by_zone,pd.DataFrame(durant_by_zone.loc[1,:]/durant_by_zone.sum(),columns=['FGP']).T])
np.round(durant_by_zone,2)
plt.figure(figsize=(15,12.5),facecolor='white')
ax = nba.plot.court(lw=4,outer_lines=False)
ax.axis('off')
nba.plot.zones(color='gray')
nba.plot.zones(color='gray',linewidth=2)
for zone in zones_list:
name = ['%0.2f%% (%d)' %(100.0*durant_by_zone.loc['FGP',zone],durant_by_zone.loc[0,zone]+durant_by_zone.loc[1,zone]),
'LA: %0.2f%%' %(100.0*leagueaverage.loc['FGP',zone])]
nba.plot.text_in_zone('\n'.join(name),zone,color='black',backgroundcolor = 'white',alpha = 1,fontsize=14)
plt.title('Durant vs. League',fontsize=16)
plt.text(-15,-7,'FG % (total shots) \n League Average',horizontalalignment='center')
I'm going to do one exmple with teams instead of players. In order to get the team stats per zone we need too do the same groupby operation as we did for players but this time we will do it with the 'TEAM_NAME' column:
team_by_zone = shotchart.groupby(['SHOT_ZONE_RANGE','SHOT_ZONE_AREA','SHOT_MADE_FLAG','TEAM_NAME']).size().unstack(fill_value=0).T
team_by_zone
Which teams have the highest FG% at every zone?¶
plt.figure(figsize=(15,12.5),facecolor='white')
ax = nba.plot.court(lw=4,outer_lines=False)
ax.axis('off')
nba.plot.zones(color='white',linewidth=3)
for zone in zones_list:
# create series and sort by FG%
perc_leaders = (100.0*team_by_zone.loc[:,(zone[0],zone[1],1)]/team_by_zone.loc[:,zone].sum(axis=1)).sort_values(ascending=False)
name = []
for j in range(3):
name.append(perc_leaders.index[j]+': %0.1f' %(perc_leaders.ix[j,'FGP']))
nba.plot.text_in_zone('\n'.join(name),zone,color='black',backgroundcolor = 'white',fontsize=11)
plt.title('Highest Field Goal % by Zone',fontsize=16)
plt.imshow(floor,extent=[-30,30,-7,43])
As we have seen in above, about 40.9% of shots in the NBA are from less than 8 feet and 31.6% from 3s . Let's try and compare the correlation between team's winning % and their shooting % from under the basket, three point FG% and FG%:
zone = (u'Less Than 8 ft.', u'Center(C)')
perc_leaders = (100.0*team_by_zone.loc[:,(zone[0],zone[1],1)]/team_by_zone.loc[:,zone].sum(axis=1)).sort_values(ascending=False)
team_stats = nba.team.stats(season='2016-17').set_index('TEAM_NAME') # load team stats from NBA API
team_stats_with_zone = team_stats.merge(perc_leaders.to_frame(),left_index=True,right_index=True) # merge team stats with % from less than 8 ft.
plt.figure(figsize=(18,6))
plt.subplot(1,3,1)
plt.scatter(100.0*team_stats_with_zone['W_PCT'],team_stats_with_zone[0],s=50)
plt.xlabel(r'TEAM WINNING %')
plt.ylabel(r'TEAM FG% FROM LESS THAN 8 FT.')
plt.title('Correlation = %.2f' %pearsonr(100.0*team_stats_with_zone['W_PCT'],team_stats_with_zone[0])[0])
plt.subplot(1,3,2)
plt.scatter(100.0*team_stats_with_zone['W_PCT'],100.0*team_stats_with_zone['FG3_PCT'],s=50,color='green')
plt.title('Correlation = %.2f' %pearsonr(100.0*team_stats_with_zone['W_PCT'],team_stats_with_zone['FG3_PCT'])[0])
plt.xlabel(r'TEAM WINNING %')
plt.ylabel(r'TEAM FG% FROM 3')
plt.subplot(1,3,3)
plt.scatter(100.0*team_stats_with_zone['W_PCT'],100.0*team_stats_with_zone['FG_PCT'],s=50,color='red')
plt.title('Correlation = %.2f' %pearsonr(100.0*team_stats_with_zone['W_PCT'],team_stats_with_zone['FG_PCT'])[0])
plt.xlabel(r'TEAM WINNING %')
plt.ylabel(r'TEAM FG% ')
References:¶
- Court plotting function was modified from: http://savvastjortjoglou.com/
- Accessing the NBA APi - http://www.gregreda.com/2015/02/15/web-scraping-finding-the-api/
- Another NBA package for python - https://pypi.python.org/pypi/nbastats/1.0.0
print(pd.__version__)
print(np.__version__)
print (sys.version)
Comments
comments powered by Disqus