One of my papers back when I was a teaching postdoc (research educator) for Freshman Research Initiative at UT Austin was on the discrimination of wine varietals using peptidic sensor arrays. Our research group used principal component analysis and linear discriminant analysis to look at our data which are UV-vis readings of mixtures of wine with the sensors. This is supposed to give a fingerprint for each wine varietal or the data were expected to be classified using the machine learning algorithms I’ve mentioned. Though back at the time, we didn’t explore any other algorithms, PCA and LDA were very useful and effective in classifying our data.

To carry out statistical analysis and data visualization, we used XLStat and Statistica. Recently, I was able to reproduce a 3d plot using Python of the linear discriminant analysis of the same data.

To create the plot, I looked at the matplotlib documentation on 3d plotting. But by looking at the codes used by Andreas Müller/Sarah Guido in their book, “Introduction to Machine Learning with Python” in creating their scatter plots, specifically the mglearn.discrete_scatter() function in creating customized data points for the plot, I was able to properly put the appropriate legends for my plot. Using the examples from the matplotlib docs, I was only able to show one data point type. But by creating individual “lines” for each wine, I was able to get the legend right. To do this, I looped over the transformed data (X_lda below). Each wine had eight (hence, the num = 8 code below) data points.

fig = plt.figure(figsize=(10,8))
ax = plt.subplot(111, projection='3d')
ax.view_init(azim=70, elev=-150)
markers = ['o', '^', 'v', 'D', 's', '*', 'p', 'h', 'H', '8', '<', '>']
row=0
num = 8
for i, (label, color_) in enumerate(zip(np.unique(target), current_cycler())):
    ax.scatter(X_lda[row:row+num,0], X_lda[row:row+num,1], X_lda[row:row+num,2], s=100,
               color=color_['color'], label=label, edgecolors='black', marker=markers[i])
    row+=num

plt.legend(bbox_to_anchor=(1.2,0.7))
ax.set_xlabel('F1, 67.4%')
ax.set_ylabel('F2, 17.3%')
ax.set_zlabel('F3, 9,6%')
ax.set_title("Linear Discriminant Analysis of Responses of \n5 Wine Varietals to Peptidic Sensor Arrays", fontdict={'fontsize': 15}, loc='center')
plt.show()

The code for the plot is in my GitHub repo here.

Recreating this plot was actually fun! I wish I knew how to do this back when I was writing that paper because it looks a thousand times better.