a-new-way-to-classify-nba-players-using-analytics

A New Way to Classify NBA Players Using Analytics

About 5 months ago, I stumbled upon this article on TheScore. The summary: the traditional 5 positions are no longer enough to describe NBA players. The game has changed after all. The authors come up with a way to classify players in 9 classes, based on the way they play the game.

In this article, I will take another shot at classifying players in various clusters, depending on what they do on the court. However, I will do it using data science and more precisely the K-Means clustering.

I will also take a deeper look at what makes a winning team, i.e. what type of players should be put together for a team to be successful.

Let’s get to it!

Preparing the data

I began by scaping data directly from NBA.com. In total, I collected a total of 28 stats for all 529 players that played in the league in 2019–2020.

Along with traditional stats (points per game, assists, rebounds, etc.), I also collected stats describing shot location, type of offensive play (drive, iso, etc.) defensive efficiency and usage rate.

Then, I decided to get rid of players that played less than 12 minutes per game, as I felt classifying players based on how they play when they barely play was not gonna provide accurate results.

#Remove players with at less than 12min per game
df=df[df.MINUTES > 12]
df = df.reset_index(drop=True)

That leaves us with a total of 412 players.

Feature creation

I decided to create 3 new variables, describing what percentage of a player’s field goal attempts come from where on the court (paint, mid range or 3 point line).

#Get total FG attempts
shot_attempts=df.filter(regex='FGA').sum(axis=1)#Get new variables
df['CLOSE']=df['PAINT_FGA']/shot_attempts
df['MID_RANGE']=df['MR_FGA']/shot_attempts
df['PERIMETER']=shots_perimeter/shot_attempts

Here is a snapshot of the final dataframe used, containing a total of 31 columns and 412 rows.

df.head()

Image for post

Player clustering

Let’s begin by scaling the data. Scaling means to change the range of values without changing the distribution. That is useful because machine learning algorithms work much better when features are on the same scale.

Here is how to scale the data.

data = df.iloc[:,5:34]
scaled_data = StandardScaler().fit_transform(data)

Then, it’s time to find the best number of clusters. In order to do so, I will use the silhouette score, which is available using scikit-learn.

The silhouette score is a metric that measures the quality of clusters. It ranges between 1 and -1, with a score of 1 meaning that clusters are well apart from each other and clearly different (which is what we want).

So simply put, we want the highest silhouette score possible.

The following loop calculates the silhouette score for every k between 6 and 12. I started the loop at 6 because at 5, it basically classifies players by position.

#Find best number of clusters
sil = []
kmax = 12
my_range=range(5,kmax+1)for i in my_range:
    kmeans = KMeans(n_clusters = i).fit(scaled_data)
    labels = kmeans.labels_
    sil.append(silhouette_score(scaled_data, labels, metric =  
    'correlation'))

And here is the plot with the scores.

#Plot it
plt.plot(my_range, sil, 'bx-')
plt.xlabel('k')
plt.ylabel('Silhouette Score')
plt.title('Silhouette Score by K')
plt.show()

Image for post

We see that we should be using a total of 7 clusters to classify NBA players based on the way they play.

Let’s look at those clusters.

Image for post

Looking at this, I came up with the following names for the clusters.

  • Stretch players. Solid players, mostly there to stretch the court. Efficient 3 point shooters. Players like Danny Green, House, Reddick.
  • High usage bigs. Big players, great rebounders that score from post up plays. Players like Vucevic, Jokic, Embiid.
  • Low usage Bigs. Usually big centers that don’t typically start. Not really involved in the offense. Players like Zubac, Biyombo, McGee.
  • Ball dominant scorers. Players that shoot a lot, can score from everywhere. Good playmakers and ball handlers. The main stars. Players like Harden, Mitchell, Lebron (A.K.A The Goat).
  • Versatile rotation players. Efficient but low usage players, typically smaller. Players like Caruso, Dellavedova, Connaughton.
  • High quality contributors. Good players, secondary ball handlers. Typically good 3 point shooters, versatile players. Guys like Tobias Harris, Middleton, Bledsoe.
  • Athletic forwards. Players that like to drive or post up, not great shooters. Players like Aaron Gordon, Derrick Jones, Jabari Parker.

So what makes a good team?

With all that in mind, it can be interesting to see how good teams were composed this year.

For this exercise, I considered that the good teams were the final 8 teams in the playoffs this year. The following code separates the teams and creates a radar plot, using Plotly.

Image for post

This radar plot shows a few interesting things. First, we see that bad teams had a lot more players in the “Athletic Forwards” cluster (3.1 vs 1.9 player per team). This type of player does not seem to help in creating a contender.

Good teams also tend to have more strech players (3 vs 2.1 per team). That is not surprising. Having players that belong to the “Ball Dominant Scorers” cluster is one part of the equation (and it’s obviously very important). Surrounding them with good shooters that stretch defenses is crucial too.

Conclusion

There is more than one way to win in the NBA. However, surrounding your star players with the right pieces is a crucial part of it. Hopefully, this article gave you some insight into that.