5. 课后练习-使用更多的分类器
本文最后更新于 2023年10月20日 上午
课后练习 5
Tasks:
- Study k-Nearest Neighbours classifiers sklearn.neighbors.KNeighborsClassifier — scikit-learn 0.24.1 documentation (scikit-learn.org)
- Study RandomForrest classifiers sklearn.ensemble.RandomForestClassifier — scikit-learn 0.24.1 documentation (scikit-learn.org)
- Study Naïve Bayes classifiers 1.9. Naive Bayes — scikit-learn 0.24.1 documentation (scikitlearn.org) ## Programming exercise: This tutorial will use the MNIST dataset which was explored in tutorial 3. ### Q1. Train a k-Nearest Neighbours classifier for handwritten digit recognition with MNIST dataset. Try different parameter settings and study how the performance varies.
- Plot the accuracy vs k while changing the number of neighbours (k)
with values [1, 3, 5, 7, 9] ### Q2. Train a RandomForrest classifier for handwritten digit recognition with MNIST dataset. Try different parameter settings and study how the performance varies.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
digits = datasets.load_digits()
labels = digits.target
data = images.reshape(len(images), -1)
x_train, x_test, y_train, y_test = train_test_split(data, labels, test_size=0.2, shuffle=False)
g = [1, 3, 5, 7, 9]
accurancy = []
for g_ in g:
clf = KNeighborsClassifier(n_neighbors = g)
clf.fit(x_train, y_train)
acc = clf.predict(x_test, y_test)
accurancy.append(acc)
plt.plot(g, accurancy)
plt.show() - Plot the accuracy vs max_depth while changing the max depth
parameter with values [1, 2, 4, 8, 16] ### Q3. Train a Gaussian Naive Bayes classifier for handwritten digit recognition with the MNIST dataset.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForrest
digits = datasets.load_digits()
labels = digits.target
data = images.reshape(len(images), -1)
x_train, x_test, y_train, y_test = train_test_split(data, labels, test_size=0.2, shuffle=False)
g = [1, 2, 4, 8, 16]
accurancy = []
for g_ in g:
clf = RandomForrest(max_depth = g)
clf.fit(x_train, y_train)
acc = clf.predict(x_test, y_test)
accurancy.append(acc)
plt.plot(g, accurancy)
plt.show()1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForrest
digits = datasets.load_digits()
labels = digits.target
data = images.reshape(len(images), -1)
x_train, x_test, y_train, y_test = train_test_split(data, labels, test_size=0.2, shuffle=False)
clf1 = sk_bayes.BernoulliNB(alpha=1.0,
binarize=0.0,
fit_prior=True,
class_prior=None)
clf1.fit(x_train, y_train)
acc_BN = clf1.score(x_test, y_test)
acc.append(acc_BN) - Plus: Displaying the wrong images ### Q4. Do a comparison between the four classifiers (SVM – Tutorial 3, kNN, RandomForrest and NaïveBayes) by plotting the best performing accuracy value for each classifier in a bar chart.
1
2
3
4
5
6
7# 显示错误的图片
clf = RandomForrest(max_depth = g)
clf.fit(x_train, y_train)
predictions = clf.predict(x_test)
# clf.predict_proba() 显示每张图有多少概率是哪个标签
print(predictions) # 这样会输出所有图片的预测标签
print(y_test)
5. 课后练习-使用更多的分类器
https://l61012345.top/2021/01/28/Machine Learning-NAU/5.a 课后练习-手写字符识别/