In this post I will use the NBA API to access shot chart data and use it to make some cool plots based on the shot zone infromation which is available in the raw data.

I wrote a package in order to access the NBA api. It can be see on my github page (https://github.com/eyalshafran/NBAapi). This NBA package also includes some plotting features as I will show in this post. This package is an on going project which will be updated as I keep working on this blog.

In [1]:
import NBAapi as nba
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
import sys
from scipy import misc
from scipy.stats.stats import pearsonr
%matplotlib inline 

First let's access the data and preview it:

In [2]:
shotchart,leagueavergae = nba.shotchart.shotchartdetail(season='2016-17') # get shot chart data from NBA.stats
shotchart.head()
Out[2]:
GRID_TYPE GAME_ID GAME_EVENT_ID PLAYER_ID PLAYER_NAME TEAM_ID TEAM_NAME PERIOD MINUTES_REMAINING SECONDS_REMAINING ... SHOT_ZONE_AREA SHOT_ZONE_RANGE SHOT_DISTANCE LOC_X LOC_Y SHOT_ATTEMPTED_FLAG SHOT_MADE_FLAG GAME_DATE HTM VTM
0 Shot Chart Detail 0021600001 2 201565 Derrick Rose 1610612752 New York Knicks 1 11 40 ... Center(C) Less Than 8 ft. 0 4 8 1 1 20161025 CLE NYK
1 Shot Chart Detail 0021600001 3 201567 Kevin Love 1610612739 Cleveland Cavaliers 1 11 26 ... Center(C) Less Than 8 ft. 3 -11 36 1 0 20161025 CLE NYK
2 Shot Chart Detail 0021600001 5 2546 Carmelo Anthony 1610612752 New York Knicks 1 11 16 ... Right Side Center(RC) 16-24 ft. 19 148 129 1 0 20161025 CLE NYK
3 Shot Chart Detail 0021600001 7 204001 Kristaps Porzingis 1610612752 New York Knicks 1 11 15 ... Center(C) Less Than 8 ft. 2 24 -1 1 1 20161025 CLE NYK
4 Shot Chart Detail 0021600001 8 2544 LeBron James 1610612739 Cleveland Cavaliers 1 10 59 ... Left Side(L) 8-16 ft. 11 -79 80 1 1 20161025 CLE NYK

5 rows × 24 columns

Extracting zone based statistics for each player

Each player has a unique player ID and also a name (which might not be unique). It is possible to just work with the player ID but I find that it is less informative when looking at the data and therefore I'm creating a new column (called PLAYER) which incorporates both the player name and ID.

I'm going to create a list of tuples with zone names which will be used later.

The shot zone can be found using the combination of the 'SHOT_ZONE_RANGE' and 'SHOT_ZONE_AREA' columns. I will also use the 'SHOT_MADE_FLAG' columns to see whether the shot was made or not. I'm going to use the groupby method in order to get a dataframe with zone based infromation for each player. The aggergator size will show us how many times a player shot from each zone and whether they made it or not:

In [3]:
shotchart['PLAYER'] = zip(shotchart['PLAYER_NAME'],shotchart['PLAYER_ID'])
zones_list = [(u'Less Than 8 ft.', u'Center(C)'),
              (u'8-16 ft.', u'Center(C)'),
              (u'8-16 ft.', u'Left Side(L)'),
              (u'8-16 ft.', u'Right Side(R)'),
              (u'16-24 ft.', u'Center(C)'),
              (u'16-24 ft.', u'Left Side Center(LC)'),
              (u'16-24 ft.', u'Left Side(L)'),
              (u'16-24 ft.', u'Right Side Center(RC)'),
              (u'16-24 ft.', u'Right Side(R)'),
              (u'24+ ft.', u'Center(C)'),
              (u'24+ ft.', u'Left Side Center(LC)'),
              (u'24+ ft.', u'Left Side(L)'),
              (u'24+ ft.', u'Right Side Center(RC)'),
              (u'24+ ft.', u'Right Side(R)'),
              (u'Back Court Shot', u'Back Court(BC)')]
# Create dataframe with PLAYER as index and the rest as columns
zones = shotchart.groupby(['SHOT_ZONE_RANGE','SHOT_ZONE_AREA','SHOT_MADE_FLAG','PLAYER']).size().unstack(fill_value=0).T
zones.head()
Out[3]:
SHOT_ZONE_RANGE 16-24 ft. ... 8-16 ft. Back Court Shot Less Than 8 ft.
SHOT_ZONE_AREA Center(C) Left Side Center(LC) Left Side(L) Right Side Center(RC) Right Side(R) ... Center(C) Left Side(L) Right Side(R) Back Court(BC) Center(C)
SHOT_MADE_FLAG 0 1 0 1 0 1 0 1 0 1 ... 0 1 0 1 0 1 0 1 0 1
PLAYER
(AJ Hammons, 1627773) 0 2 4 2 1 1 2 1 1 1 ... 1 0 1 0 4 0 0 0 6 5
(Aaron Brooks, 201166) 5 0 3 5 7 5 6 0 2 2 ... 5 4 6 7 7 10 7 0 58 40
(Aaron Gordon, 203932) 10 8 15 6 12 5 25 10 14 7 ... 20 15 32 20 19 15 3 0 135 230
(Aaron Harrison, 1626151) 0 0 1 0 1 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
(Adreian Payne, 203940) 1 2 3 1 0 0 1 2 0 0 ... 0 0 1 0 0 3 0 0 13 12

5 rows × 30 columns

The shot chart data does not say how many games each player played. We will use the player biostats data to get that infromation:

In [4]:
players = nba.player.biostats(season='2016-17')
players['PLAYER'] = zip(players['PLAYER_NAME'],players['PLAYER_ID'])
players.set_index('PLAYER',inplace=True)
players.head()
Out[4]:
PLAYER_ID PLAYER_NAME TEAM_ID TEAM_ABBREVIATION AGE PLAYER_HEIGHT PLAYER_HEIGHT_INCHES PLAYER_WEIGHT COLLEGE COUNTRY ... GP PTS REB AST NET_RATING OREB_PCT DREB_PCT USG_PCT TS_PCT AST_PCT
PLAYER
(AJ Hammons, 1627773) 1627773 AJ Hammons 1610612742 DAL 24.0 7-0 84 260 Purdue USA ... 22 2.2 1.6 0.2 -0.6 0.049 0.199 0.167 0.472 0.038
(Aaron Brooks, 201166) 201166 Aaron Brooks 1610612754 IND 32.0 6-0 72 161 Oregon USA ... 65 5.0 1.1 1.9 -3.0 0.022 0.064 0.190 0.507 0.216
(Aaron Gordon, 203932) 203932 Aaron Gordon 1610612753 ORL 21.0 6-9 81 220 Arizona USA ... 80 12.7 5.1 1.9 -2.8 0.054 0.141 0.200 0.530 0.097
(Aaron Harrison, 1626151) 1626151 Aaron Harrison 1610612766 CHA 22.0 6-6 78 210 Kentucky USA ... 5 0.2 0.6 0.6 -18.6 0.000 0.200 0.142 0.102 0.375
(Adreian Payne, 203940) 203940 Adreian Payne 1610612750 MIN 26.0 6-10 82 237 Michigan State USA ... 18 3.5 1.8 0.4 0.8 0.069 0.200 0.224 0.505 0.089

5 rows × 23 columns

We will need to merge the GP column from the players dataframe with the zones dataframe that we created earlier. Since both dataframes have the same index we can use pandas join

In [5]:
GP = players.loc[:,['GP']] # create DataFrame with single GP column
GP.columns = pd.MultiIndex.from_product([GP.columns,[''],['']]) # change column to multiindex before join (prevents join warning)
zones_with_GP = zones.join(GP) # only inclued game played from players
zones_with_GP.columns = pd.MultiIndex.from_tuples(zones_with_GP.columns.tolist(), 
                                                  names=['SHOT_ZONE_RANGE','SHOT_ZONE_AREA','MADE'])
zones_with_GP = zones_with_GP.sortlevel(0,axis=1) # sort columns for better performance (+ avoid warning) 
zones_with_GP.head()
Out[5]:
SHOT_ZONE_RANGE 16-24 ft. ... 8-16 ft. Back Court Shot GP Less Than 8 ft.
SHOT_ZONE_AREA Center(C) Left Side Center(LC) Left Side(L) Right Side Center(RC) Right Side(R) ... Center(C) Left Side(L) Right Side(R) Back Court(BC) Center(C)
MADE 0 1 0 1 0 1 0 1 0 1 ... 1 0 1 0 1 0 1 0 1
PLAYER
(AJ Hammons, 1627773) 0 2 4 2 1 1 2 1 1 1 ... 0 1 0 4 0 0 0 22 6 5
(Aaron Brooks, 201166) 5 0 3 5 7 5 6 0 2 2 ... 4 6 7 7 10 7 0 65 58 40
(Aaron Gordon, 203932) 10 8 15 6 12 5 25 10 14 7 ... 15 32 20 19 15 3 0 80 135 230
(Aaron Harrison, 1626151) 0 0 1 0 1 0 0 0 0 0 ... 0 0 0 0 0 0 0 5 0 0
(Adreian Payne, 203940) 1 2 3 1 0 0 1 2 0 0 ... 0 1 0 0 3 0 0 18 13 12

5 rows × 31 columns

Let's do some plotting!

Which players takes the most shots per zone?

I already included some plotting tools in the package. For the court plot I used the following blog http://savvastjortjoglou.com/nba-shot-sharts.html. I made some changes to the court function (biggest change is working in feet instead of feet*10 which the shot chart location comes in).

I also have a plt.text_in_zone function which accepts a text and the zone tuple and writes the text in the specified zone.

We need to sum over the 0s (missed shot) and 1s (made shots) to get the total shots and divide by the number of game played.

In [6]:
path = os.path.dirname(nba.__file__) # get path of the nba module
floor = misc.imread(path+'\\data\\court.jpg') # load floor template
plt.figure(figsize=(15,12.5),facecolor='white') # set up figure
ax = nba.plot.court(lw=4,outer_lines=False) # plot NBA court - don't include the outer lines
ax.axis('off')
nba.plot.zones(lw=2,color='white',linewidth=3)
eligible = zones_with_GP.loc[:,'GP'].values > 10 # only include players which player more than 10 games
# we are going to use the zone_list to plot information in each zone
for zone in zones_list:
    # calculate shots per game for specific zone and sort from highest to lowest
    shots_PG = (zones_with_GP.loc[eligible,zone].sum(axis=1)/zones_with_GP.loc[eligible,'GP']).sort_values(0,ascending=False)
    name = [] # will be used to store the text we want to print
    # run a loop to find top 3 players 
    for j in range(3):
        # create text
        name.append(shots_PG.index[j][0].split(' ')[0][0]+'. ' + shots_PG.index[j][0].split(' ')[1]+':%0.1f' %shots_PG.values[j])
    nba.plot.text_in_zone('\n'.join(name),zone,color='black',backgroundcolor = 'white',alpha = 1)
plt.title('Most Shots by Zone',fontsize=16)
plt.imshow(floor,extent=[-30,30,-7,43]) # plot floor
Out[6]:
<matplotlib.image.AxesImage at 0x1a418da0>

Which players have the highest FG% at every zone?

In [7]:
plt.figure(figsize=(15,12.5),facecolor='white')
ax = nba.plot.court(lw=4,outer_lines=False)
ax.axis('off')
nba.plot.zones(color='gray')
eligible = zones_with_GP.loc[:,'GP'].values > 10 
for zone in zones_list:
    # create new dataframe with total shot, shots per game and FG%
    df = pd.concat([zones_with_GP.loc[eligible,zone].sum(axis=1),
                    zones_with_GP.loc[eligible,zone].sum(axis=1)/zones_with_GP.loc[eligible,'GP'],
                    100.0*zones_with_GP.loc[eligible,(zone[0],zone[1],1)]/zones_with_GP.loc[eligible,zone].sum(axis=1)],axis=1)
    df.columns = ['SHOTS','SHOTS_PG','FGP']
    # only include players that have a total of more than 10 shots or are in the top 100 in shots taken (from that zone)
    top100 = df.loc[:,'SHOTS_PG'].sort_values(0,ascending=False)[100]
    if zone != (u'Back Court Shot', u'Back Court(BC)'):
        mask = (df.loc[:,'SHOTS_PG'] >= top100) & (df.loc[:,'SHOTS']>=10)
    else:
        mask = (df.loc[:,'SHOTS']>=2)    
    # sort by FG%
    perc_leaders = df.iloc[mask.values,:].sort_values('FGP',ascending=False)
    name = []
    for j in range(3):
        name.append(perc_leaders.index[j][0].split(' ')[0][0]+'. ' + perc_leaders.index[j][0].split(' ')[1]+': %0.1f (%d)' %(perc_leaders.ix[j,'FGP'],perc_leaders.ix[j,'SHOTS']))
    nba.plot.text_in_zone('\n'.join(name),zone,color='black',backgroundcolor = 'white',alpha = 1)
plt.title('Highest Field Goal % by Zone',fontsize=16)
plt.text(-15,-7,'Player: FG % \n (total shots)',horizontalalignment='center')
Out[7]:
<matplotlib.text.Text at 0x20e512e8>

I'm going to run the same analysis for the league average and therefore run the groupby without the PLAYER column. I also added another row calculating the FG%.

In [8]:
leagueaverage = shotchart.groupby(['SHOT_ZONE_RANGE','SHOT_ZONE_AREA','SHOT_MADE_FLAG']).size().unstack(fill_value=0).T
leagueaverage = pd.concat([leagueaverage,pd.DataFrame(leagueaverage.loc[1,:]/leagueaverage.sum(),columns=['FGP']).T])
np.round(leagueaverage,2) # round to make display nicer
Out[8]:
SHOT_ZONE_RANGE 16-24 ft. 24+ ft. 8-16 ft. Back Court Shot Less Than 8 ft.
SHOT_ZONE_AREA Center(C) Left Side Center(LC) Left Side(L) Right Side Center(RC) Right Side(R) Center(C) Left Side Center(LC) Left Side(L) Right Side Center(RC) Right Side(R) Center(C) Left Side(L) Right Side(R) Back Court(BC) Center(C)
0 3871.0 4039.00 2675.00 4559.0 2718.00 8538.00 12067.00 4930.00 11810.00 4674.00 5327.00 5623.0 5457.0 549.00 37099.00
1 2574.0 2808.00 1878.00 3016.0 1770.00 4648.00 6513.00 3102.00 6434.00 2966.00 4052.00 3781.0 3594.0 14.00 48840.00
FGP 0.4 0.41 0.41 0.4 0.39 0.35 0.35 0.39 0.35 0.39 0.43 0.4 0.4 0.02 0.57

I'm going to plot the FG% and the distribution of shots from each zone for the entire league

In [9]:
plt.figure(figsize=(15,12.5),facecolor='white')
ax = nba.plot.court(lw=4,outer_lines=False)
ax.axis('off')
nba.plot.zones(color='gray')
total_shots = leagueaverage.loc[0,:].sum()+leagueaverage.loc[1,:].sum()
for zone in zones_list:
    name = 'FG%% - %0.1f \nDST - %0.1f' %(100*leagueaverage.loc['FGP',zone],100*(leagueaverage.loc[0,zone]+leagueaverage.loc[1,zone])/total_shots)
    nba.plot.text_in_zone(name,zone,color='black',backgroundcolor = 'white',alpha = 1,fontsize=16)
plt.title('Shooting by Zone (League Average)',fontsize=16)
Out[9]:
<matplotlib.text.Text at 0x225c9b00>

Kevin Durant

I'm going to do a similar analysis as the league average but for a specific player. I choose Kevin Durant but any player would work. The mask can also be done for a team instead of a player (which I will show later)

In [10]:
durant = shotchart.loc[shotchart['PLAYER_NAME']=='Kevin Durant',:] # create a dataframe that only includes Kevin Durant's shots
made = durant['SHOT_MADE_FLAG']==1 # mask for made shots

We can plot all the shots Durant made and missed but it is difficult to extract any information from these plots:

In [11]:
plt.figure(figsize=(15,12.5),facecolor='white')
ax = nba.plot.court(lw=4,outer_lines=True)
ax.axis('off')
nba.plot.zones(color='gray',linewidth=2)
plt.scatter(0.1*durant.loc[made,'LOC_X'],0.1*durant.loc[made,'LOC_Y'],color='blue',alpha=0.5)
plt.scatter(0.1*durant.loc[~made,'LOC_X'],0.1*durant.loc[~made,'LOC_Y'],color='red',alpha=0.5,marker='x')
Out[11]:
<matplotlib.collections.PathCollection at 0x1d0b5a58>

I'm going to break down Durant's shoots by zone and compare it to the league average as done on the NBA website (http://stats.nba.com/)

In [12]:
durant_by_zone = durant.groupby(['SHOT_ZONE_RANGE','SHOT_ZONE_AREA','SHOT_MADE_FLAG']).size().unstack(fill_value=0).T
durant_by_zone= pd.concat([durant_by_zone,pd.DataFrame(durant_by_zone.loc[1,:]/durant_by_zone.sum(),columns=['FGP']).T])
np.round(durant_by_zone,2)
Out[12]:
SHOT_ZONE_RANGE 16-24 ft. 24+ ft. 8-16 ft. Back Court Shot Less Than 8 ft.
SHOT_ZONE_AREA Center(C) Left Side Center(LC) Left Side(L) Right Side Center(RC) Right Side(R) Center(C) Left Side Center(LC) Left Side(L) Right Side Center(RC) Right Side(R) Center(C) Left Side(L) Right Side(R) Back Court(BC) Center(C)
0 13.0 16.00 6.0 28.00 6.00 58.00 43.00 6.0 69.00 17.00 43.00 32.0 39.00 2.0 97.00
1 13.0 19.00 14.0 27.00 13.00 27.00 38.00 9.0 35.00 8.00 45.00 21.0 23.00 0.0 259.00
FGP 0.5 0.54 0.7 0.49 0.68 0.32 0.47 0.6 0.34 0.32 0.51 0.4 0.37 0.0 0.73
In [13]:
plt.figure(figsize=(15,12.5),facecolor='white')
ax = nba.plot.court(lw=4,outer_lines=False)
ax.axis('off')
nba.plot.zones(color='gray')
nba.plot.zones(color='gray',linewidth=2)
for zone in zones_list:
    name = ['%0.2f%% (%d)' %(100.0*durant_by_zone.loc['FGP',zone],durant_by_zone.loc[0,zone]+durant_by_zone.loc[1,zone]),
            'LA: %0.2f%%' %(100.0*leagueaverage.loc['FGP',zone])]
    nba.plot.text_in_zone('\n'.join(name),zone,color='black',backgroundcolor = 'white',alpha = 1,fontsize=14)
plt.title('Durant vs. League',fontsize=16)
plt.text(-15,-7,'FG % (total shots) \n League Average',horizontalalignment='center')
Out[13]:
<matplotlib.text.Text at 0x1f032208>

I'm going to do one exmple with teams instead of players. In order to get the team stats per zone we need too do the same groupby operation as we did for players but this time we will do it with the 'TEAM_NAME' column:

In [14]:
team_by_zone = shotchart.groupby(['SHOT_ZONE_RANGE','SHOT_ZONE_AREA','SHOT_MADE_FLAG','TEAM_NAME']).size().unstack(fill_value=0).T
team_by_zone
Out[14]:
SHOT_ZONE_RANGE 16-24 ft. ... 8-16 ft. Back Court Shot Less Than 8 ft.
SHOT_ZONE_AREA Center(C) Left Side Center(LC) Left Side(L) Right Side Center(RC) Right Side(R) ... Center(C) Left Side(L) Right Side(R) Back Court(BC) Center(C)
SHOT_MADE_FLAG 0 1 0 1 0 1 0 1 0 1 ... 0 1 0 1 0 1 0 1 0 1
TEAM_NAME
Atlanta Hawks 138 94 125 94 89 60 142 109 103 92 ... 178 130 154 98 187 99 16 0 1271 1618
Boston Celtics 124 74 86 72 59 35 124 98 67 36 ... 118 106 106 82 166 92 24 0 1203 1588
Brooklyn Nets 61 47 51 34 39 22 89 49 64 31 ... 160 128 145 74 167 116 22 1 1394 1724
Charlotte Hornets 145 82 114 89 94 58 186 105 119 70 ... 167 130 174 116 210 144 16 1 1175 1475
Chicago Bulls 206 133 144 84 127 76 183 112 136 77 ... 183 119 226 131 269 177 11 0 1290 1637
Cleveland Cavaliers 75 45 108 68 80 75 110 82 84 70 ... 98 79 205 131 174 127 11 0 1041 1531
Dallas Mavericks 160 124 183 105 110 88 194 115 133 67 ... 170 158 185 139 182 135 20 0 870 1159
Denver Nuggets 129 72 98 62 77 48 97 91 69 36 ... 159 108 178 93 115 82 21 0 1400 1915
Detroit Pistons 143 99 153 117 104 96 140 79 128 84 ... 225 187 274 236 266 174 30 1 1297 1566
Golden State Warriors 101 68 108 99 64 79 165 126 53 60 ... 212 184 148 97 186 120 33 1 991 1717
Houston Rockets 59 37 39 26 34 22 49 34 45 16 ... 106 66 86 46 89 58 17 0 1215 1819
Indiana Pacers 191 154 156 120 131 85 201 131 107 74 ... 202 140 195 139 174 129 22 0 1177 1540
LA Clippers 206 140 170 146 69 57 200 166 108 79 ... 154 119 120 92 117 78 16 3 1030 1524
Los Angeles Lakers 103 75 145 96 77 40 133 74 55 43 ... 223 158 200 104 211 121 17 1 1413 1783
Memphis Grizzlies 127 55 143 111 93 67 136 99 113 44 ... 182 116 204 125 119 77 35 0 1351 1523
Miami Heat 89 64 166 102 89 58 127 92 94 42 ... 211 157 208 132 175 104 17 0 1271 1643
Milwaukee Bucks 100 61 101 73 103 80 82 31 57 35 ... 110 58 197 101 150 116 13 0 1405 1907
Minnesota Timberwolves 152 98 192 126 128 101 200 114 95 79 ... 134 95 221 134 184 129 12 1 1259 1758
New Orleans Pelicans 128 78 173 101 126 70 161 106 73 50 ... 149 129 206 147 170 119 13 0 1306 1642
New York Knicks 130 95 165 127 126 87 179 119 121 95 ... 150 135 254 205 252 167 23 0 1314 1510
Oklahoma City Thunder 99 70 124 76 75 36 133 81 52 32 ... 190 152 203 106 193 107 17 2 1437 1885
Orlando Magic 122 83 174 97 81 57 193 119 113 82 ... 218 140 215 181 226 135 26 0 1213 1544
Philadelphia 76ers 106 52 83 46 60 33 134 62 72 47 ... 161 105 158 101 199 111 7 0 1317 1700
Phoenix Suns 155 91 161 111 113 55 212 137 111 54 ... 253 189 208 155 221 140 21 1 1317 1723
Portland Trail Blazers 128 89 165 125 90 53 143 110 82 59 ... 213 168 140 116 133 104 18 0 1302 1567
Sacramento Kings 145 103 130 100 88 57 176 101 86 59 ... 237 192 142 92 170 79 19 0 1232 1585
San Antonio Spurs 156 127 171 137 114 102 190 125 109 73 ... 165 108 269 189 219 143 13 0 1075 1465
Toronto Raptors 135 99 147 97 87 75 162 100 94 59 ... 219 184 219 164 165 124 12 1 1207 1584
Utah Jazz 106 64 98 62 78 31 132 74 77 48 ... 199 161 153 103 185 146 14 1 1117 1553
Washington Wizards 152 101 166 105 70 75 186 175 98 77 ... 181 151 230 152 183 141 13 0 1209 1655

30 rows × 30 columns

Which teams have the highest FG% at every zone?

In [15]:
plt.figure(figsize=(15,12.5),facecolor='white')
ax = nba.plot.court(lw=4,outer_lines=False)
ax.axis('off')
nba.plot.zones(color='white',linewidth=3) 
for zone in zones_list:
    # create series and sort by FG%
    perc_leaders = (100.0*team_by_zone.loc[:,(zone[0],zone[1],1)]/team_by_zone.loc[:,zone].sum(axis=1)).sort_values(ascending=False)
    name = []
    for j in range(3):
        name.append(perc_leaders.index[j]+': %0.1f' %(perc_leaders.ix[j,'FGP']))
    nba.plot.text_in_zone('\n'.join(name),zone,color='black',backgroundcolor = 'white',fontsize=11)
plt.title('Highest Field Goal % by Zone',fontsize=16)
plt.imshow(floor,extent=[-30,30,-7,43])
Out[15]:
<matplotlib.image.AxesImage at 0x23a6db00>

As we have seen in above, about 40.9% of shots in the NBA are from less than 8 feet and 31.6% from 3s . Let's try and compare the correlation between team's winning % and their shooting % from under the basket, three point FG% and FG%:

In [16]:
zone = (u'Less Than 8 ft.', u'Center(C)')
perc_leaders = (100.0*team_by_zone.loc[:,(zone[0],zone[1],1)]/team_by_zone.loc[:,zone].sum(axis=1)).sort_values(ascending=False)
team_stats = nba.team.stats(season='2016-17').set_index('TEAM_NAME') # load team stats from NBA API
team_stats_with_zone = team_stats.merge(perc_leaders.to_frame(),left_index=True,right_index=True) # merge team stats with % from less than 8 ft.
plt.figure(figsize=(18,6))
plt.subplot(1,3,1)
plt.scatter(100.0*team_stats_with_zone['W_PCT'],team_stats_with_zone[0],s=50)
plt.xlabel(r'TEAM WINNING %')
plt.ylabel(r'TEAM FG% FROM LESS THAN 8 FT.')
plt.title('Correlation = %.2f' %pearsonr(100.0*team_stats_with_zone['W_PCT'],team_stats_with_zone[0])[0])
plt.subplot(1,3,2)
plt.scatter(100.0*team_stats_with_zone['W_PCT'],100.0*team_stats_with_zone['FG3_PCT'],s=50,color='green')
plt.title('Correlation = %.2f' %pearsonr(100.0*team_stats_with_zone['W_PCT'],team_stats_with_zone['FG3_PCT'])[0])
plt.xlabel(r'TEAM WINNING %')
plt.ylabel(r'TEAM FG% FROM 3')
plt.subplot(1,3,3)
plt.scatter(100.0*team_stats_with_zone['W_PCT'],100.0*team_stats_with_zone['FG_PCT'],s=50,color='red')
plt.title('Correlation = %.2f' %pearsonr(100.0*team_stats_with_zone['W_PCT'],team_stats_with_zone['FG_PCT'])[0])
plt.xlabel(r'TEAM WINNING %')
plt.ylabel(r'TEAM FG% ')
Out[16]:
<matplotlib.text.Text at 0xd1773c8>

References:

  1. Court plotting function was modified from: http://savvastjortjoglou.com/
  2. Accessing the NBA APi - http://www.gregreda.com/2015/02/15/web-scraping-finding-the-api/
  3. Another NBA package for python - https://pypi.python.org/pypi/nbastats/1.0.0
In [17]:
print(pd.__version__)
print(np.__version__)
print (sys.version)
0.19.1
1.12.0
2.7.11 |Anaconda custom (64-bit)| (default, Feb 16 2016, 09:58:36) [MSC v.1500 64 bit (AMD64)]


Comments

comments powered by Disqus