In this post I will use the NBA API to access shot chart data and use it to make some cool plots based on the shot zone infromation which is available in the raw data.

I wrote a package in order to access the NBA api. It can be see on my github page (https://github.com/eyalshafran/NBAapi). This NBA package also includes some plotting features as I will show in this post. This package is an on going project which will be updated as I keep working on this blog.

In [1]:

import NBAapi as nba
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
import sys
from scipy import misc
from scipy.stats.stats import pearsonr
%matplotlib inline

First let's access the data and preview it:

In [2]:

shotchart,leagueavergae = nba.shotchart.shotchartdetail(season='2016-17') # get shot chart data from NBA.stats
shotchart.head()

Out[2]:

	GRID_TYPE	GAME_ID	GAME_EVENT_ID	PLAYER_ID	PLAYER_NAME	TEAM_ID	TEAM_NAME	PERIOD	MINUTES_REMAINING	SECONDS_REMAINING	...	SHOT_ZONE_AREA	SHOT_ZONE_RANGE	SHOT_DISTANCE	LOC_X	LOC_Y	SHOT_ATTEMPTED_FLAG	SHOT_MADE_FLAG	GAME_DATE	HTM	VTM
0	Shot Chart Detail	0021600001	2	201565	Derrick Rose	1610612752	New York Knicks	1	11	40	...	Center(C)	Less Than 8 ft.	0	4	8	1	1	20161025	CLE	NYK
1	Shot Chart Detail	0021600001	3	201567	Kevin Love	1610612739	Cleveland Cavaliers	1	11	26	...	Center(C)	Less Than 8 ft.	3	-11	36	1	0	20161025	CLE	NYK
2	Shot Chart Detail	0021600001	5	2546	Carmelo Anthony	1610612752	New York Knicks	1	11	16	...	Right Side Center(RC)	16-24 ft.	19	148	129	1	0	20161025	CLE	NYK
3	Shot Chart Detail	0021600001	7	204001	Kristaps Porzingis	1610612752	New York Knicks	1	11	15	...	Center(C)	Less Than 8 ft.	2	24	-1	1	1	20161025	CLE	NYK
4	Shot Chart Detail	0021600001	8	2544	LeBron James	1610612739	Cleveland Cavaliers	1	10	59	...	Left Side(L)	8-16 ft.	11	-79	80	1	1	20161025	CLE	NYK

5 rows × 24 columns

Extracting zone based statistics for each player¶

Each player has a unique player ID and also a name (which might not be unique). It is possible to just work with the player ID but I find that it is less informative when looking at the data and therefore I'm creating a new column (called PLAYER) which incorporates both the player name and ID.

I'm going to create a list of tuples with zone names which will be used later.

The shot zone can be found using the combination of the 'SHOT_ZONE_RANGE' and 'SHOT_ZONE_AREA' columns. I will also use the 'SHOT_MADE_FLAG' columns to see whether the shot was made or not. I'm going to use the groupby method in order to get a dataframe with zone based infromation for each player. The aggergator size will show us how many times a player shot from each zone and whether they made it or not:

In [3]:

shotchart['PLAYER'] = zip(shotchart['PLAYER_NAME'],shotchart['PLAYER_ID'])
zones_list = [(u'Less Than 8 ft.', u'Center(C)'),
              (u'8-16 ft.', u'Center(C)'),
              (u'8-16 ft.', u'Left Side(L)'),
              (u'8-16 ft.', u'Right Side(R)'),
              (u'16-24 ft.', u'Center(C)'),
              (u'16-24 ft.', u'Left Side Center(LC)'),
              (u'16-24 ft.', u'Left Side(L)'),
              (u'16-24 ft.', u'Right Side Center(RC)'),
              (u'16-24 ft.', u'Right Side(R)'),
              (u'24+ ft.', u'Center(C)'),
              (u'24+ ft.', u'Left Side Center(LC)'),
              (u'24+ ft.', u'Left Side(L)'),
              (u'24+ ft.', u'Right Side Center(RC)'),
              (u'24+ ft.', u'Right Side(R)'),
              (u'Back Court Shot', u'Back Court(BC)')]
# Create dataframe with PLAYER as index and the rest as columns
zones = shotchart.groupby(['SHOT_ZONE_RANGE','SHOT_ZONE_AREA','SHOT_MADE_FLAG','PLAYER']).size().unstack(fill_value=0).T
zones.head()

Out[3]:

SHOT_ZONE_RANGE	16-24 ft.										...	8-16 ft.						Back Court Shot		Less Than 8 ft.
SHOT_ZONE_AREA	Center(C)		Left Side Center(LC)		Left Side(L)		Right Side Center(RC)		Right Side(R)		...	Center(C)		Left Side(L)		Right Side(R)		Back Court(BC)		Center(C)
SHOT_MADE_FLAG	0	1	0	1	0	1	0	1	0	1	...	0	1	0	1	0	1	0	1	0	1
PLAYER
(AJ Hammons, 1627773)	0	2	4	2	1	1	2	1	1	1	...	1	0	1	0	4	0	0	0	6	5
(Aaron Brooks, 201166)	5	0	3	5	7	5	6	0	2	2	...	5	4	6	7	7	10	7	0	58	40
(Aaron Gordon, 203932)	10	8	15	6	12	5	25	10	14	7	...	20	15	32	20	19	15	3	0	135	230
(Aaron Harrison, 1626151)	0	0	1	0	1	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
(Adreian Payne, 203940)	1	2	3	1	0	0	1	2	0	0	...	0	0	1	0	0	3	0	0	13	12

5 rows × 30 columns

The shot chart data does not say how many games each player played. We will use the player biostats data to get that infromation:

In [4]:

players = nba.player.biostats(season='2016-17')
players['PLAYER'] = zip(players['PLAYER_NAME'],players['PLAYER_ID'])
players.set_index('PLAYER',inplace=True)
players.head()

Out[4]:

	PLAYER_ID	PLAYER_NAME	TEAM_ID	TEAM_ABBREVIATION	AGE	PLAYER_HEIGHT	PLAYER_HEIGHT_INCHES	PLAYER_WEIGHT	COLLEGE	COUNTRY	...	GP	PTS	REB	AST	NET_RATING	OREB_PCT	DREB_PCT	USG_PCT	TS_PCT	AST_PCT
PLAYER
(AJ Hammons, 1627773)	1627773	AJ Hammons	1610612742	DAL	24.0	7-0	84	260	Purdue	USA	...	22	2.2	1.6	0.2	-0.6	0.049	0.199	0.167	0.472	0.038
(Aaron Brooks, 201166)	201166	Aaron Brooks	1610612754	IND	32.0	6-0	72	161	Oregon	USA	...	65	5.0	1.1	1.9	-3.0	0.022	0.064	0.190	0.507	0.216
(Aaron Gordon, 203932)	203932	Aaron Gordon	1610612753	ORL	21.0	6-9	81	220	Arizona	USA	...	80	12.7	5.1	1.9	-2.8	0.054	0.141	0.200	0.530	0.097
(Aaron Harrison, 1626151)	1626151	Aaron Harrison	1610612766	CHA	22.0	6-6	78	210	Kentucky	USA	...	5	0.2	0.6	0.6	-18.6	0.000	0.200	0.142	0.102	0.375
(Adreian Payne, 203940)	203940	Adreian Payne	1610612750	MIN	26.0	6-10	82	237	Michigan State	USA	...	18	3.5	1.8	0.4	0.8	0.069	0.200	0.224	0.505	0.089

5 rows × 23 columns

We will need to merge the GP column from the players dataframe with the zones dataframe that we created earlier. Since both dataframes have the same index we can use pandas join

In [5]:

GP = players.loc[:,['GP']] # create DataFrame with single GP column
GP.columns = pd.MultiIndex.from_product([GP.columns,[''],['']]) # change column to multiindex before join (prevents join warning)
zones_with_GP = zones.join(GP) # only inclued game played from players
zones_with_GP.columns = pd.MultiIndex.from_tuples(zones_with_GP.columns.tolist(), 
                                                  names=['SHOT_ZONE_RANGE','SHOT_ZONE_AREA','MADE'])
zones_with_GP = zones_with_GP.sortlevel(0,axis=1) # sort columns for better performance (+ avoid warning) 
zones_with_GP.head()

Out[5]:

SHOT_ZONE_RANGE	16-24 ft.										...	8-16 ft.					Back Court Shot		GP	Less Than 8 ft.
SHOT_ZONE_AREA	Center(C)		Left Side Center(LC)		Left Side(L)		Right Side Center(RC)		Right Side(R)		...	Center(C)	Left Side(L)		Right Side(R)		Back Court(BC)			Center(C)
MADE	0	1	0	1	0	1	0	1	0	1	...	1	0	1	0	1	0	1		0	1
PLAYER
(AJ Hammons, 1627773)	0	2	4	2	1	1	2	1	1	1	...	0	1	0	4	0	0	0	22	6	5
(Aaron Brooks, 201166)	5	0	3	5	7	5	6	0	2	2	...	4	6	7	7	10	7	0	65	58	40
(Aaron Gordon, 203932)	10	8	15	6	12	5	25	10	14	7	...	15	32	20	19	15	3	0	80	135	230
(Aaron Harrison, 1626151)	0	0	1	0	1	0	0	0	0	0	...	0	0	0	0	0	0	0	5	0	0
(Adreian Payne, 203940)	1	2	3	1	0	0	1	2	0	0	...	0	1	0	0	3	0	0	18	13	12

5 rows × 31 columns

Let's do some plotting!¶

Which players takes the most shots per zone?¶

I already included some plotting tools in the package. For the court plot I used the following blog http://savvastjortjoglou.com/nba-shot-sharts.html. I made some changes to the court function (biggest change is working in feet instead of feet*10 which the shot chart location comes in).

I also have a plt.text_in_zone function which accepts a text and the zone tuple and writes the text in the specified zone.

We need to sum over the 0s (missed shot) and 1s (made shots) to get the total shots and divide by the number of game played.

In [6]:

path = os.path.dirname(nba.__file__) # get path of the nba module
floor = misc.imread(path+'\\data\\court.jpg') # load floor template
plt.figure(figsize=(15,12.5),facecolor='white') # set up figure
ax = nba.plot.court(lw=4,outer_lines=False) # plot NBA court - don't include the outer lines
ax.axis('off')
nba.plot.zones(lw=2,color='white',linewidth=3)
eligible = zones_with_GP.loc[:,'GP'].values > 10 # only include players which player more than 10 games
# we are going to use the zone_list to plot information in each zone
for zone in zones_list:
    # calculate shots per game for specific zone and sort from highest to lowest
    shots_PG = (zones_with_GP.loc[eligible,zone].sum(axis=1)/zones_with_GP.loc[eligible,'GP']).sort_values(0,ascending=False)
    name = [] # will be used to store the text we want to print
    # run a loop to find top 3 players 
    for j in range(3):
        # create text
        name.append(shots_PG.index[j][0].split(' ')[0][0]+'. ' + shots_PG.index[j][0].split(' ')[1]+':%0.1f' %shots_PG.values[j])
    nba.plot.text_in_zone('\n'.join(name),zone,color='black',backgroundcolor = 'white',alpha = 1)
plt.title('Most Shots by Zone',fontsize=16)
plt.imshow(floor,extent=[-30,30,-7,43]) # plot floor

Out[6]:

<matplotlib.image.AxesImage at 0x1a418da0>

Which players have the highest FG% at every zone?¶

In [7]:

plt.figure(figsize=(15,12.5),facecolor='white')
ax = nba.plot.court(lw=4,outer_lines=False)
ax.axis('off')
nba.plot.zones(color='gray')
eligible = zones_with_GP.loc[:,'GP'].values > 10 
for zone in zones_list:
    # create new dataframe with total shot, shots per game and FG%
    df = pd.concat([zones_with_GP.loc[eligible,zone].sum(axis=1),
                    zones_with_GP.loc[eligible,zone].sum(axis=1)/zones_with_GP.loc[eligible,'GP'],
                    100.0*zones_with_GP.loc[eligible,(zone[0],zone[1],1)]/zones_with_GP.loc[eligible,zone].sum(axis=1)],axis=1)
    df.columns = ['SHOTS','SHOTS_PG','FGP']
    # only include players that have a total of more than 10 shots or are in the top 100 in shots taken (from that zone)
    top100 = df.loc[:,'SHOTS_PG'].sort_values(0,ascending=False)[100]
    if zone != (u'Back Court Shot', u'Back Court(BC)'):
        mask = (df.loc[:,'SHOTS_PG'] >= top100) & (df.loc[:,'SHOTS']>=10)
    else:
        mask = (df.loc[:,'SHOTS']>=2)    
    # sort by FG%
    perc_leaders = df.iloc[mask.values,:].sort_values('FGP',ascending=False)
    name = []
    for j in range(3):
        name.append(perc_leaders.index[j][0].split(' ')[0][0]+'. ' + perc_leaders.index[j][0].split(' ')[1]+': %0.1f (%d)' %(perc_leaders.ix[j,'FGP'],perc_leaders.ix[j,'SHOTS']))
    nba.plot.text_in_zone('\n'.join(name),zone,color='black',backgroundcolor = 'white',alpha = 1)
plt.title('Highest Field Goal % by Zone',fontsize=16)
plt.text(-15,-7,'Player: FG % \n (total shots)',horizontalalignment='center')

Out[7]:

<matplotlib.text.Text at 0x20e512e8>

I'm going to run the same analysis for the league average and therefore run the groupby without the PLAYER column. I also added another row calculating the FG%.

In [8]:

leagueaverage = shotchart.groupby(['SHOT_ZONE_RANGE','SHOT_ZONE_AREA','SHOT_MADE_FLAG']).size().unstack(fill_value=0).T
leagueaverage = pd.concat([leagueaverage,pd.DataFrame(leagueaverage.loc[1,:]/leagueaverage.sum(),columns=['FGP']).T])
np.round(leagueaverage,2) # round to make display nicer

Out[8]:

SHOT_ZONE_RANGE	16-24 ft.					24+ ft.					8-16 ft.			Back Court Shot	Less Than 8 ft.
SHOT_ZONE_AREA	Center(C)	Left Side Center(LC)	Left Side(L)	Right Side Center(RC)	Right Side(R)	Center(C)	Left Side Center(LC)	Left Side(L)	Right Side Center(RC)	Right Side(R)	Center(C)	Left Side(L)	Right Side(R)	Back Court(BC)	Center(C)
0	3871.0	4039.00	2675.00	4559.0	2718.00	8538.00	12067.00	4930.00	11810.00	4674.00	5327.00	5623.0	5457.0	549.00	37099.00
1	2574.0	2808.00	1878.00	3016.0	1770.00	4648.00	6513.00	3102.00	6434.00	2966.00	4052.00	3781.0	3594.0	14.00	48840.00
FGP	0.4	0.41	0.41	0.4	0.39	0.35	0.35	0.39	0.35	0.39	0.43	0.4	0.4	0.02	0.57

I'm going to plot the FG% and the distribution of shots from each zone for the entire league

In [9]:

plt.figure(figsize=(15,12.5),facecolor='white')
ax = nba.plot.court(lw=4,outer_lines=False)
ax.axis('off')
nba.plot.zones(color='gray')
total_shots = leagueaverage.loc[0,:].sum()+leagueaverage.loc[1,:].sum()
for zone in zones_list:
    name = 'FG%% - %0.1f \nDST - %0.1f' %(100*leagueaverage.loc['FGP',zone],100*(leagueaverage.loc[0,zone]+leagueaverage.loc[1,zone])/total_shots)
    nba.plot.text_in_zone(name,zone,color='black',backgroundcolor = 'white',alpha = 1,fontsize=16)
plt.title('Shooting by Zone (League Average)',fontsize=16)

Out[9]:

<matplotlib.text.Text at 0x225c9b00>

Kevin Durant¶

I'm going to do a similar analysis as the league average but for a specific player. I choose Kevin Durant but any player would work. The mask can also be done for a team instead of a player (which I will show later)

In [10]:

durant = shotchart.loc[shotchart['PLAYER_NAME']=='Kevin Durant',:] # create a dataframe that only includes Kevin Durant's shots
made = durant['SHOT_MADE_FLAG']==1 # mask for made shots

We can plot all the shots Durant made and missed but it is difficult to extract any information from these plots:

In [11]:

plt.figure(figsize=(15,12.5),facecolor='white')
ax = nba.plot.court(lw=4,outer_lines=True)
ax.axis('off')
nba.plot.zones(color='gray',linewidth=2)
plt.scatter(0.1*durant.loc[made,'LOC_X'],0.1*durant.loc[made,'LOC_Y'],color='blue',alpha=0.5)
plt.scatter(0.1*durant.loc[~made,'LOC_X'],0.1*durant.loc[~made,'LOC_Y'],color='red',alpha=0.5,marker='x')

Out[11]:

<matplotlib.collections.PathCollection at 0x1d0b5a58>

I'm going to break down Durant's shoots by zone and compare it to the league average as done on the NBA website (http://stats.nba.com/)

In [12]:

durant_by_zone = durant.groupby(['SHOT_ZONE_RANGE','SHOT_ZONE_AREA','SHOT_MADE_FLAG']).size().unstack(fill_value=0).T
durant_by_zone= pd.concat([durant_by_zone,pd.DataFrame(durant_by_zone.loc[1,:]/durant_by_zone.sum(),columns=['FGP']).T])
np.round(durant_by_zone,2)

Out[12]:

SHOT_ZONE_RANGE	16-24 ft.					24+ ft.					8-16 ft.			Back Court Shot	Less Than 8 ft.
SHOT_ZONE_AREA	Center(C)	Left Side Center(LC)	Left Side(L)	Right Side Center(RC)	Right Side(R)	Center(C)	Left Side Center(LC)	Left Side(L)	Right Side Center(RC)	Right Side(R)	Center(C)	Left Side(L)	Right Side(R)	Back Court(BC)	Center(C)
0	13.0	16.00	6.0	28.00	6.00	58.00	43.00	6.0	69.00	17.00	43.00	32.0	39.00	2.0	97.00
1	13.0	19.00	14.0	27.00	13.00	27.00	38.00	9.0	35.00	8.00	45.00	21.0	23.00	0.0	259.00
FGP	0.5	0.54	0.7	0.49	0.68	0.32	0.47	0.6	0.34	0.32	0.51	0.4	0.37	0.0	0.73

In [13]:

plt.figure(figsize=(15,12.5),facecolor='white')
ax = nba.plot.court(lw=4,outer_lines=False)
ax.axis('off')
nba.plot.zones(color='gray')
nba.plot.zones(color='gray',linewidth=2)
for zone in zones_list:
    name = ['%0.2f%% (%d)' %(100.0*durant_by_zone.loc['FGP',zone],durant_by_zone.loc[0,zone]+durant_by_zone.loc[1,zone]),
            'LA: %0.2f%%' %(100.0*leagueaverage.loc['FGP',zone])]
    nba.plot.text_in_zone('\n'.join(name),zone,color='black',backgroundcolor = 'white',alpha = 1,fontsize=14)
plt.title('Durant vs. League',fontsize=16)
plt.text(-15,-7,'FG % (total shots) \n League Average',horizontalalignment='center')

Out[13]:

<matplotlib.text.Text at 0x1f032208>

I'm going to do one exmple with teams instead of players. In order to get the team stats per zone we need too do the same groupby operation as we did for players but this time we will do it with the 'TEAM_NAME' column:

In [14]:

team_by_zone = shotchart.groupby(['SHOT_ZONE_RANGE','SHOT_ZONE_AREA','SHOT_MADE_FLAG','TEAM_NAME']).size().unstack(fill_value=0).T
team_by_zone

Out[14]:

SHOT_ZONE_RANGE	16-24 ft.										...	8-16 ft.						Back Court Shot		Less Than 8 ft.
SHOT_ZONE_AREA	Center(C)		Left Side Center(LC)		Left Side(L)		Right Side Center(RC)		Right Side(R)		...	Center(C)		Left Side(L)		Right Side(R)		Back Court(BC)		Center(C)
SHOT_MADE_FLAG	0	1	0	1	0	1	0	1	0	1	...	0	1	0	1	0	1	0	1	0	1
TEAM_NAME
Atlanta Hawks	138	94	125	94	89	60	142	109	103	92	...	178	130	154	98	187	99	16	0	1271	1618
Boston Celtics	124	74	86	72	59	35	124	98	67	36	...	118	106	106	82	166	92	24	0	1203	1588
Brooklyn Nets	61	47	51	34	39	22	89	49	64	31	...	160	128	145	74	167	116	22	1	1394	1724
Charlotte Hornets	145	82	114	89	94	58	186	105	119	70	...	167	130	174	116	210	144	16	1	1175	1475
Chicago Bulls	206	133	144	84	127	76	183	112	136	77	...	183	119	226	131	269	177	11	0	1290	1637
Cleveland Cavaliers	75	45	108	68	80	75	110	82	84	70	...	98	79	205	131	174	127	11	0	1041	1531
Dallas Mavericks	160	124	183	105	110	88	194	115	133	67	...	170	158	185	139	182	135	20	0	870	1159
Denver Nuggets	129	72	98	62	77	48	97	91	69	36	...	159	108	178	93	115	82	21	0	1400	1915
Detroit Pistons	143	99	153	117	104	96	140	79	128	84	...	225	187	274	236	266	174	30	1	1297	1566
Golden State Warriors	101	68	108	99	64	79	165	126	53	60	...	212	184	148	97	186	120	33	1	991	1717
Houston Rockets	59	37	39	26	34	22	49	34	45	16	...	106	66	86	46	89	58	17	0	1215	1819
Indiana Pacers	191	154	156	120	131	85	201	131	107	74	...	202	140	195	139	174	129	22	0	1177	1540
LA Clippers	206	140	170	146	69	57	200	166	108	79	...	154	119	120	92	117	78	16	3	1030	1524
Los Angeles Lakers	103	75	145	96	77	40	133	74	55	43	...	223	158	200	104	211	121	17	1	1413	1783
Memphis Grizzlies	127	55	143	111	93	67	136	99	113	44	...	182	116	204	125	119	77	35	0	1351	1523
Miami Heat	89	64	166	102	89	58	127	92	94	42	...	211	157	208	132	175	104	17	0	1271	1643
Milwaukee Bucks	100	61	101	73	103	80	82	31	57	35	...	110	58	197	101	150	116	13	0	1405	1907
Minnesota Timberwolves	152	98	192	126	128	101	200	114	95	79	...	134	95	221	134	184	129	12	1	1259	1758
New Orleans Pelicans	128	78	173	101	126	70	161	106	73	50	...	149	129	206	147	170	119	13	0	1306	1642
New York Knicks	130	95	165	127	126	87	179	119	121	95	...	150	135	254	205	252	167	23	0	1314	1510
Oklahoma City Thunder	99	70	124	76	75	36	133	81	52	32	...	190	152	203	106	193	107	17	2	1437	1885
Orlando Magic	122	83	174	97	81	57	193	119	113	82	...	218	140	215	181	226	135	26	0	1213	1544
Philadelphia 76ers	106	52	83	46	60	33	134	62	72	47	...	161	105	158	101	199	111	7	0	1317	1700
Phoenix Suns	155	91	161	111	113	55	212	137	111	54	...	253	189	208	155	221	140	21	1	1317	1723
Portland Trail Blazers	128	89	165	125	90	53	143	110	82	59	...	213	168	140	116	133	104	18	0	1302	1567
Sacramento Kings	145	103	130	100	88	57	176	101	86	59	...	237	192	142	92	170	79	19	0	1232	1585
San Antonio Spurs	156	127	171	137	114	102	190	125	109	73	...	165	108	269	189	219	143	13	0	1075	1465
Toronto Raptors	135	99	147	97	87	75	162	100	94	59	...	219	184	219	164	165	124	12	1	1207	1584
Utah Jazz	106	64	98	62	78	31	132	74	77	48	...	199	161	153	103	185	146	14	1	1117	1553
Washington Wizards	152	101	166	105	70	75	186	175	98	77	...	181	151	230	152	183	141	13	0	1209	1655

30 rows × 30 columns

Which teams have the highest FG% at every zone?¶

In [15]:

plt.figure(figsize=(15,12.5),facecolor='white')
ax = nba.plot.court(lw=4,outer_lines=False)
ax.axis('off')
nba.plot.zones(color='white',linewidth=3) 
for zone in zones_list:
    # create series and sort by FG%
    perc_leaders = (100.0*team_by_zone.loc[:,(zone[0],zone[1],1)]/team_by_zone.loc[:,zone].sum(axis=1)).sort_values(ascending=False)
    name = []
    for j in range(3):
        name.append(perc_leaders.index[j]+': %0.1f' %(perc_leaders.ix[j,'FGP']))
    nba.plot.text_in_zone('\n'.join(name),zone,color='black',backgroundcolor = 'white',fontsize=11)
plt.title('Highest Field Goal % by Zone',fontsize=16)
plt.imshow(floor,extent=[-30,30,-7,43])

Out[15]:

<matplotlib.image.AxesImage at 0x23a6db00>

As we have seen in above, about 40.9% of shots in the NBA are from less than 8 feet and 31.6% from 3s . Let's try and compare the correlation between team's winning % and their shooting % from under the basket, three point FG% and FG%:

In [16]:

zone = (u'Less Than 8 ft.', u'Center(C)')
perc_leaders = (100.0*team_by_zone.loc[:,(zone[0],zone[1],1)]/team_by_zone.loc[:,zone].sum(axis=1)).sort_values(ascending=False)
team_stats = nba.team.stats(season='2016-17').set_index('TEAM_NAME') # load team stats from NBA API
team_stats_with_zone = team_stats.merge(perc_leaders.to_frame(),left_index=True,right_index=True) # merge team stats with % from less than 8 ft.
plt.figure(figsize=(18,6))
plt.subplot(1,3,1)
plt.scatter(100.0*team_stats_with_zone['W_PCT'],team_stats_with_zone[0],s=50)
plt.xlabel(r'TEAM WINNING %')
plt.ylabel(r'TEAM FG% FROM LESS THAN 8 FT.')
plt.title('Correlation = %.2f' %pearsonr(100.0*team_stats_with_zone['W_PCT'],team_stats_with_zone[0])[0])
plt.subplot(1,3,2)
plt.scatter(100.0*team_stats_with_zone['W_PCT'],100.0*team_stats_with_zone['FG3_PCT'],s=50,color='green')
plt.title('Correlation = %.2f' %pearsonr(100.0*team_stats_with_zone['W_PCT'],team_stats_with_zone['FG3_PCT'])[0])
plt.xlabel(r'TEAM WINNING %')
plt.ylabel(r'TEAM FG% FROM 3')
plt.subplot(1,3,3)
plt.scatter(100.0*team_stats_with_zone['W_PCT'],100.0*team_stats_with_zone['FG_PCT'],s=50,color='red')
plt.title('Correlation = %.2f' %pearsonr(100.0*team_stats_with_zone['W_PCT'],team_stats_with_zone['FG_PCT'])[0])
plt.xlabel(r'TEAM WINNING %')
plt.ylabel(r'TEAM FG% ')

Out[16]:

<matplotlib.text.Text at 0xd1773c8>

References:¶

Court plotting function was modified from: http://savvastjortjoglou.com/
Accessing the NBA APi - http://www.gregreda.com/2015/02/15/web-scraping-finding-the-api/
Another NBA package for python - https://pypi.python.org/pypi/nbastats/1.0.0

In [17]:

print(pd.__version__)
print(np.__version__)
print (sys.version)

0.19.1
1.12.0
2.7.11 |Anaconda custom (64-bit)| (default, Feb 16 2016, 09:58:36) [MSC v.1500 64 bit (AMD64)]

Shooting statistics per zone