+
\ No newline at end of file
--
Gitee
From 76c950d92b3bb1e93cf2a7bb7a6b47e1d0ce69cd Mon Sep 17 00:00:00 2001
From: Yantong He <8850706+yantong-he@user.noreply.gitee.com>
Date: Tue, 23 Mar 2021 22:06:46 +0800
Subject: [PATCH 15/40] update 19210680053/README.md.
---
19210680053/README.md | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/19210680053/README.md b/19210680053/README.md
index 0ebb55a..f1120e4 100644
--- a/19210680053/README.md
+++ b/19210680053/README.md
@@ -56,5 +56,4 @@ $$
$$
这是我生成的训练集:
-
-
\ No newline at end of file
+
--
Gitee
From 38f7e3f08f46e35de1d8a9d1196d5b15813166e8 Mon Sep 17 00:00:00 2001
From: Yantong He <8850706+yantong-he@user.noreply.gitee.com>
Date: Tue, 23 Mar 2021 22:07:09 +0800
Subject: [PATCH 16/40] update 19210680053/README.md.
---
19210680053/README.md | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/19210680053/README.md b/19210680053/README.md
index f1120e4..c3d6972 100644
--- a/19210680053/README.md
+++ b/19210680053/README.md
@@ -56,4 +56,6 @@ $$
$$
这是我生成的训练集:
-
+
+
+
--
Gitee
From 70661b684d00582c30fea9d54f9050bbe6c6b292 Mon Sep 17 00:00:00 2001
From: Yantong He <8850706+yantong-he@user.noreply.gitee.com>
Date: Tue, 23 Mar 2021 22:07:41 +0800
Subject: [PATCH 17/40] update 19210680053/README.md.
---
19210680053/README.md | 1 +
1 file changed, 1 insertion(+)
diff --git a/19210680053/README.md b/19210680053/README.md
index c3d6972..49d11e2 100644
--- a/19210680053/README.md
+++ b/19210680053/README.md
@@ -14,6 +14,7 @@ d-使用choose函数,将预测结果与test label进行比对,结果相同
我使用以下参数生成了如下三个二维高斯分布,label分别为0,1,2
+\begin{array}{l}
label=0
$$
\begin{array}{l}
--
Gitee
From beba617525d7d13ed3bcd9a14a78517b6da5680f Mon Sep 17 00:00:00 2001
From: Yantong He <8850706+yantong-he@user.noreply.gitee.com>
Date: Tue, 23 Mar 2021 22:10:59 +0800
Subject: [PATCH 18/40] update 19210680053/README.md.
---
19210680053/README.md | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/19210680053/README.md b/19210680053/README.md
index 49d11e2..74c120a 100644
--- a/19210680053/README.md
+++ b/19210680053/README.md
@@ -14,7 +14,8 @@ d-使用choose函数,将预测结果与test label进行比对,结果相同
我使用以下参数生成了如下三个二维高斯分布,label分别为0,1,2
-\begin{array}{l}
+
+
label=0
$$
\begin{array}{l}
@@ -60,3 +61,16 @@ $$
+
+
+这是我生成的测试集:
+
+
+
+
+
+可以通过如下表格来报告我的实验结果
+
+Algo |kvalue|Acc |
+-----| ---- |---- |
+KNN | 5 |0.6225 |
\ No newline at end of file
--
Gitee
From 097968424e28273da4d7d65fd2e68f4a951f29ea Mon Sep 17 00:00:00 2001
From: Yantong He <8850706+yantong-he@user.noreply.gitee.com>
Date: Wed, 24 Mar 2021 10:10:55 +0800
Subject: [PATCH 19/40] update 19210680053/README.md.
---
19210680053/README.md | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/19210680053/README.md b/19210680053/README.md
index 74c120a..35d5816 100644
--- a/19210680053/README.md
+++ b/19210680053/README.md
@@ -1,22 +1,22 @@
我使用的包为numpy,在class KNN中:
-a-使用函数euclidean进行向量间欧式距离的计算
+a.使用函数euclidean进行向量间欧式距离的计算
-b-使用closest函数进行逐个向量输入,分别计算它与全部train data的欧氏距离,并输出距它最近k个点出现次数最多train label。当最近k个点不存在出现次数最多train label(如出现次数均等),将进行label随机输出
+b.使用closest函数进行逐个向量输入,分别计算它与全部train data的欧氏距离,并输出距它最近k个点出现次数最多train label。当最近k个点不存在出现次数最多train label(如出现次数均等),将进行label随机输出
-c-使用predict函数将全部test data逐个输入,得到预测结果
+c.使用predict函数将全部test data逐个输入,得到预测结果
-d-使用choose函数,将预测结果与test label进行比对,结果相同取值为1,不同为0,进行准确率计算。k值选择范围是2,3,...6,从中选取使预测结果准确率最高k值,并输出准确率预测
+d.使用choose函数,将预测结果与test label进行比对,结果相同取值为1,不同为0,进行准确率计算。k值选择范围是2,3,...6,从中选取使预测结果准确率最高k值,并输出准确率预测
我使用以下参数生成了如下三个二维高斯分布,label分别为0,1,2
-label=0
+ label=0
$$
\begin{array}{l}
\Sigma=\left[\begin{array}{cc}
@@ -30,7 +30,7 @@ $$
$$
-label=1
+ label=1
$$
\begin{array}{l}
\Sigma=\left[\begin{array}{cc}
@@ -44,7 +44,7 @@ $$
$$
-label=2
+ label=2
$$
\begin{array}{l}
\Sigma=\left[\begin{array}{cc}
@@ -73,4 +73,8 @@ $$
Algo |kvalue|Acc |
-----| ---- |---- |
-KNN | 5 |0.6225 |
\ No newline at end of file
+KNN | 5 |0.6225 |
+
+
+
+对于label=0和label=2的分布较靠近彼此,进而导致判定新输入的测试实例所属label的准确性下降。
\ No newline at end of file
--
Gitee
From 5559b55994d40a77bba54e10be64ff2647d05496 Mon Sep 17 00:00:00 2001
From: Yantong He <8850706+yantong-he@user.noreply.gitee.com>
Date: Wed, 24 Mar 2021 10:18:38 +0800
Subject: [PATCH 20/40] update 19210680053/README.md.
---
19210680053/README.md | 69 ++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 68 insertions(+), 1 deletion(-)
diff --git a/19210680053/README.md b/19210680053/README.md
index 35d5816..5a01a8b 100644
--- a/19210680053/README.md
+++ b/19210680053/README.md
@@ -77,4 +77,71 @@ KNN | 5 |0.6225 |
-对于label=0和label=2的分布较靠近彼此,进而导致判定新输入的测试实例所属label的准确性下降。
\ No newline at end of file
+
+由于label=0和label=2的对应分布较靠近,进而导致判定新输入的测试实例所属label的准确性仅为62.25%。
+
+
+改变高斯分布距离,我使用以下参数生成高斯分布。
+
+
+ label=0
+$$
+\begin{array}{l}
+\Sigma=\left[\begin{array}{cc}
+10 & 2.1 \\\\
+2.1 & 12
+\end{array}\right] \\\\
+\mu=\left[\begin{array}{ll}
+20 & 25
+\end{array}\right]
+\end{array}
+$$
+
+
+ label=1
+$$
+\begin{array}{l}
+\Sigma=\left[\begin{array}{cc}
+23 & 0 \\\\
+0 & 22
+\end{array}\right] \\\\
+\mu=\left[\begin{array}{ll}
+16 & -5
+\end{array}\right]
+\end{array}
+$$
+
+
+ label=2
+$$
+\begin{array}{l}
+\Sigma=\left[\begin{array}{cc}
+10 & 5 \\\\
+5 & 10
+\end{array}\right] \\\\
+\mu=\left[\begin{array}{ll}
+3 & 5
+\end{array}\right]
+\end{array}
+$$
+
+这是我生成的训练集:
+
+
+
+
+
+这是我生成的测试集:
+
+
+
+
+
+可以通过如下表格来报告我的实验结果
+
+Algo |kvalue|Acc |
+-----| ---- |---- |
+KNN | 2 |0.9975 |
+
+
+此时3个高斯分布距离较远,通过较少的k值
\ No newline at end of file
--
Gitee
From e61439f32efb0891b17d1f68e4aa37c579f0fcac Mon Sep 17 00:00:00 2001
From: Yantong He <8850706+yantong-he@user.noreply.gitee.com>
Date: Wed, 24 Mar 2021 10:22:03 +0800
Subject: [PATCH 21/40] update 19210680053/source.py.
---
19210680053/source.py | 101 +++++++++++++++++++++++++++++++++++++-----
1 file changed, 90 insertions(+), 11 deletions(-)
diff --git a/19210680053/source.py b/19210680053/source.py
index 968f4d3..f44bde6 100644
--- a/19210680053/source.py
+++ b/19210680053/source.py
@@ -1,13 +1,19 @@
-from scipy.spatial import distance
+import matplotlib.pyplot as plt
+import numpy as np
+import sys
class KNN():
+ def __init__(self, train_data, train_label):
+ self.train_data = train_data
+ self.train_label = train_label
+ def euclidean(self,v1,v2):
+ return np.sqrt(np.sum(np.square(v1 - v2)))
def fit(self, X_train, Y_train):
- self.X_train = X_train
- self.Y_train = Y_train
-
- def predict(self, x_test,k):
+ self.train_data = train_data
+ self.train_label = train_label
+ def predict(self, train_data,k):
predictions = []
- for item in x_test:
+ for item in train_data:
label = self.closest(item,k)
predictions.append(label)
return predictions
@@ -15,14 +21,87 @@ class KNN():
def closest(self, item,k):
min_ind = 0
distlst=[]
- idxlst=list(range(len(self.X_train)))
+ idxlst=list(range(len(self.train_data)))
#get distance between X_test with all X_train
- for i in range(0,len(self.X_train)):
- distlst.append(distance.euclidean(item, self.X_train[i]))
+ for i in range(0,len(self.train_data)):
+ distlst.append(self.euclidean(item, self.train_data[i]))
#make up a dictionary with distance and order
distdict=dict(zip(idxlst,distlst))
distdict=dict(sorted(distdict.items(),key=lambda item:item[1]))
#get first K nearest position
min_ind=list(dict(list(distdict.items())[:k]).keys())
- min_dist=[self.Y_train[i] for i in min_ind]
- return max(min_dist,key=min_dist.count)
\ No newline at end of file
+ min_dist=[self.train_label[i] for i in min_ind]
+ return max(min_dist,key=min_dist.count)
+
+ def choose(self,test_data,test_label):
+ acclst=[]
+ for k in range(2,7):
+ res=self.predict(test_data,k)
+ acc=np.mean(np.equal(res, test_label))
+ acclst.append(acc)
+ max_acc=max(acclst)
+ max_k=acclst.index(max_acc)+2
+ return max_k,max_acc
+
+
+def generate():
+ mean = (20, 25)
+ cov = np.array([[10,2.1], [2.1, 12]])
+ x = np.random.multivariate_normal(mean, cov, (800,))
+
+ mean = (16, -5)
+ cov = np.array([[23, 0], [0, 22]])
+ y = np.random.multivariate_normal(mean, cov, (200,))
+
+ mean = (3, 5)
+ cov = np.array([[10,5],[5,10]])
+ z = np.random.multivariate_normal(mean, cov, (1000,))
+
+ idx = np.arange(2000)
+ np.random.shuffle(idx)
+ data = np.concatenate([x,y,z])
+ label = np.concatenate([
+ np.zeros((800,),dtype=int),
+ np.ones((200,),dtype=int),
+ np.ones((1000,),dtype=int)*2
+ ])
+ data = data[idx]
+ label = label[idx]
+
+ train_data, test_data = data[:1600,], data[1600:,]
+ train_label, test_label = label[:1600,], label[1600:,]
+ np.save("data.npy",((train_data, train_label), (test_data, test_label)
+ ))
+
+def display(data, label, name):
+ datas =[[],[],[]]
+ for i in range(len(data)):
+ datas[label[i]].append(data[i])
+
+ for each in datas:
+ each = np.array(each)
+ plt.scatter(each[:, 0], each[:, 1])
+ label=[str(i) for i in list(range(len(datas)))]
+ plt.legend(['label '+i for i in label])
+ plt.show()
+
+def read():
+ (train_data, train_label), (test_data, test_label) = np.load("data.npy",allow_pickle=True)
+ return (train_data, train_label), (test_data, test_label)
+
+
+if __name__ == "__main__":
+ mode=0
+ if mode == 0:
+ generate()
+ if mode == 1:
+ (train_data, train_label), (test_data, test_label) = read()
+ display(train_data, train_label, 'train')
+ display(test_data, test_label, 'test')
+ else:
+ (train_data, train_label), (test_data, test_label) = read()
+
+ model = KNN()
+ model.fit(train_data, train_label)
+ k ,acc = model.choose(test_data,test_label)
+ print("k=",k,"acc=",acc)
\ No newline at end of file
--
Gitee
From 0247b1ad6765faeac19610e475f0a66a74e7c914 Mon Sep 17 00:00:00 2001
From: Yantong He <8850706+yantong-he@user.noreply.gitee.com>
Date: Wed, 24 Mar 2021 10:45:40 +0800
Subject: [PATCH 22/40] update 19210680053/source.py.
---
19210680053/source.py | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/19210680053/source.py b/19210680053/source.py
index f44bde6..e3fc856 100644
--- a/19210680053/source.py
+++ b/19210680053/source.py
@@ -22,10 +22,10 @@ class KNN():
min_ind = 0
distlst=[]
idxlst=list(range(len(self.train_data)))
- #get distance between X_test with all X_train
+ #get distance between test_data with train_data
for i in range(0,len(self.train_data)):
distlst.append(self.euclidean(item, self.train_data[i]))
- #make up a dictionary with distance and order
+ #make up a dictionary with distance and index
distdict=dict(zip(idxlst,distlst))
distdict=dict(sorted(distdict.items(),key=lambda item:item[1]))
#get first K nearest position
--
Gitee
From 7013392dd910386f0b3047d2b5ce18f6d4f47cfd Mon Sep 17 00:00:00 2001
From: Yantong He <8850706+yantong-he@user.noreply.gitee.com>
Date: Wed, 24 Mar 2021 10:46:20 +0800
Subject: [PATCH 23/40] update 19210680053/source.py.
---
19210680053/source.py | 1 -
1 file changed, 1 deletion(-)
diff --git a/19210680053/source.py b/19210680053/source.py
index e3fc856..3fc823c 100644
--- a/19210680053/source.py
+++ b/19210680053/source.py
@@ -1,6 +1,5 @@
import matplotlib.pyplot as plt
import numpy as np
-import sys
class KNN():
def __init__(self, train_data, train_label):
--
Gitee
From 7dbc387d2a5182280ff995d09488d55b6960ff5e Mon Sep 17 00:00:00 2001
From: Yantong He <8850706+yantong-he@user.noreply.gitee.com>
Date: Wed, 24 Mar 2021 10:53:11 +0800
Subject: [PATCH 24/40] update 19210680053/README.md.
---
19210680053/README.md | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/19210680053/README.md b/19210680053/README.md
index 5a01a8b..7981fe0 100644
--- a/19210680053/README.md
+++ b/19210680053/README.md
@@ -129,13 +129,13 @@ $$
-
+
这是我生成的测试集:
-
+
可以通过如下表格来报告我的实验结果
@@ -144,4 +144,12 @@ Algo |kvalue|Acc |
KNN | 2 |0.9975 |
-此时3个高斯分布距离较远,通过较少的k值
\ No newline at end of file
+此时3个高斯分布距离较远,通过较少的k值即可得到较为准确的判断。增加高斯分布间的距离可以提升实验的准确性。
+
+## 代码使用方法
+
+```bash
+改变mode数值:
+mode=0 #数据生成
+mode=1 #数据可视化
+mode取非0-1值 #训练和测试
--
Gitee
From 35022e12753c9a3c5cff5ef48a4aab5eae1bfa36 Mon Sep 17 00:00:00 2001
From: Yantong He <8850706+yantong-he@user.noreply.gitee.com>
Date: Wed, 24 Mar 2021 16:38:39 +0800
Subject: [PATCH 25/40] update 19210680053/README.md.
---
19210680053/README.md | 2 ++
1 file changed, 2 insertions(+)
diff --git a/19210680053/README.md b/19210680053/README.md
index 7981fe0..a21c437 100644
--- a/19210680053/README.md
+++ b/19210680053/README.md
@@ -1,3 +1,5 @@
+#课程报告
+
我使用的包为numpy,在class KNN中:
--
Gitee
From 87a5bc75f0a3683b7cea9b49f88c8a4bff17aa16 Mon Sep 17 00:00:00 2001
From: Yantong He <8850706+yantong-he@user.noreply.gitee.com>
Date: Wed, 24 Mar 2021 16:40:32 +0800
Subject: [PATCH 26/40] update 19210680053/README.md.
---
19210680053/README.md | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/19210680053/README.md b/19210680053/README.md
index a21c437..1776560 100644
--- a/19210680053/README.md
+++ b/19210680053/README.md
@@ -1,4 +1,6 @@
-#课程报告
+# 课程报告
+
+## 说明
我使用的包为numpy,在class KNN中:
--
Gitee
From 9eb136aa218dc061b2d07ddf2c1c39cf0b5f8690 Mon Sep 17 00:00:00 2001
From: Yantong He <8850706+yantong-he@user.noreply.gitee.com>
Date: Wed, 24 Mar 2021 16:57:41 +0800
Subject: [PATCH 27/40] =?UTF-8?q?=E6=96=B0=E5=BB=BA=2019210680053?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
---
assignment-1/submission/19210680053/.keep | 0
1 file changed, 0 insertions(+), 0 deletions(-)
create mode 100644 assignment-1/submission/19210680053/.keep
diff --git a/assignment-1/submission/19210680053/.keep b/assignment-1/submission/19210680053/.keep
new file mode 100644
index 0000000..e69de29
--
Gitee
From c7387d9067a74990132d2e604732181ff0bac70d Mon Sep 17 00:00:00 2001
From: Yantong He <8850706+yantong-he@user.noreply.gitee.com>
Date: Wed, 24 Mar 2021 17:05:58 +0800
Subject: [PATCH 28/40] add assignment-1/submission/19210680053/source.py.
---
assignment-1/submission/19210680053/source.py | 106 ++++++++++++++++++
1 file changed, 106 insertions(+)
create mode 100644 assignment-1/submission/19210680053/source.py
diff --git a/assignment-1/submission/19210680053/source.py b/assignment-1/submission/19210680053/source.py
new file mode 100644
index 0000000..a7bdf92
--- /dev/null
+++ b/assignment-1/submission/19210680053/source.py
@@ -0,0 +1,106 @@
+import matplotlib.pyplot as plt
+import numpy as np
+
+class KNN():
+ def __init__(self, train_data, train_label):
+ self.train_data = train_data
+ self.train_label = train_label
+ def euclidean(self,v1,v2):
+ return np.sqrt(np.sum(np.square(v1 - v2)))
+ def fit(self, X_train, Y_train):
+ self.train_data = train_data
+ self.train_label = train_label
+ def predict(self, train_data,k):
+ predictions = []
+ for item in train_data:
+ label = self.closest(item,k)
+ predictions.append(label)
+ return predictions
+
+ def closest(self, item,k):
+ min_ind = 0
+ distlst=[]
+ idxlst=list(range(len(self.train_data)))
+ #get distance between test_data with train_data
+ for i in range(0,len(self.train_data)):
+ distlst.append(self.euclidean(item, self.train_data[i]))
+ #make up a dictionary with distance and index
+ distdict=dict(zip(idxlst,distlst))
+ distdict=dict(sorted(distdict.items(),key=lambda item:item[1]))
+ #get first K nearest position
+ min_ind=list(dict(list(distdict.items())[:k]).keys())
+ min_dist=[self.train_label[i] for i in min_ind]
+ return max(min_dist,key=min_dist.count)
+
+ def choose(self,test_data,test_label):
+ acclst=[]
+ for k in range(2,7):
+ res=self.predict(test_data,k)
+ acc=np.mean(np.equal(res, test_label))
+ acclst.append(acc)
+ max_acc=max(acclst)
+ max_k=acclst.index(max_acc)+2
+ return max_k,max_acc
+
+
+def generate():
+ mean = (20, 25)
+ cov = np.array([[10,2.1], [2.1, 12]])
+ x = np.random.multivariate_normal(mean, cov, (800,))
+
+ mean = (16, -5)
+ cov = np.array([[23, 0], [0, 22]])
+ y = np.random.multivariate_normal(mean, cov, (200,))
+
+ mean = (3, 5)
+ cov = np.array([[10,5],[5,10]])
+ z = np.random.multivariate_normal(mean, cov, (1000,))
+
+ idx = np.arange(2000)
+ np.random.shuffle(idx)
+ data = np.concatenate([x,y,z])
+ label = np.concatenate([
+ np.zeros((800,),dtype=int),
+ np.ones((200,),dtype=int),
+ np.ones((1000,),dtype=int)*2
+ ])
+ data = data[idx]
+ label = label[idx]
+
+ train_data, test_data = data[:1600,], data[1600:,]
+ train_label, test_label = label[:1600,], label[1600:,]
+ np.save("data.npy",((train_data, train_label), (test_data, test_label)
+ ))
+
+def display(data, label, name):
+ datas =[[],[],[]]
+ for i in range(len(data)):
+ datas[label[i]].append(data[i])
+
+ for each in datas:
+ each = np.array(each)
+ plt.scatter(each[:, 0], each[:, 1])
+ label=[str(i) for i in list(range(len(datas)))]
+ plt.legend(['label '+i for i in label])
+ plt.show()
+
+def read():
+ (train_data, train_label), (test_data, test_label) = np.load("data.npy",allow_pickle=True)
+ return (train_data, train_label), (test_data, test_label)
+
+
+if __name__ == "__main__":
+ mode=0
+ if mode == 0:
+ generate()
+ if mode == 1:
+ (train_data, train_label), (test_data, test_label) = read()
+ display(train_data, train_label, 'train')
+ display(test_data, test_label, 'test')
+ else:
+ (train_data, train_label), (test_data, test_label) = read()
+
+ model = KNN()
+ model.fit(train_data, train_label)
+ k ,acc = model.choose(test_data,test_label)
+ print("k=",k,"acc=",acc)
\ No newline at end of file
--
Gitee
From 23e46ae70ff9d1c9522ad9c7654fbf7618c19688 Mon Sep 17 00:00:00 2001
From: Yantong He <8850706+yantong-he@user.noreply.gitee.com>
Date: Wed, 24 Mar 2021 17:06:32 +0800
Subject: [PATCH 29/40] add assignment-1/submission/19210680053/README.md.
---
assignment-1/submission/19210680053/README.md | 159 ++++++++++++++++++
1 file changed, 159 insertions(+)
create mode 100644 assignment-1/submission/19210680053/README.md
diff --git a/assignment-1/submission/19210680053/README.md b/assignment-1/submission/19210680053/README.md
new file mode 100644
index 0000000..327490f
--- /dev/null
+++ b/assignment-1/submission/19210680053/README.md
@@ -0,0 +1,159 @@
+# 课程报告
+
+## 说明
+
+我使用的包为numpy,在class KNN中:
+
+
+a.使用函数euclidean进行向量间欧式距离的计算
+
+
+b.使用closest函数进行逐个向量输入,分别计算它与全部train data的欧氏距离,并输出距它最近k个点出现次数最多train label。当最近k个点不存在出现次数最多train label(如出现次数均等),将进行label随机输出
+
+
+c.使用predict函数将全部test data逐个输入,得到预测结果
+
+
+d.使用choose函数,将预测结果与test label进行比对,结果相同取值为1,不同为0,进行准确率计算。k值选择范围是2,3,...6,从中选取使预测结果准确率最高k值,并输出准确率预测
+
+
+我使用以下参数生成了如下三个二维高斯分布,label分别为0,1,2
+
+
+ label=0
+$$
+\begin{array}{l}
+\Sigma=\left[\begin{array}{cc}
+10 & 0 \\\\
+0 & 10
+\end{array}\right] \\\\
+\mu=\left[\begin{array}{ll}
+20 & 25
+\end{array}\right]
+\end{array}
+$$
+
+
+ label=1
+$$
+\begin{array}{l}
+\Sigma=\left[\begin{array}{cc}
+23 & 0 \\\\
+0 & 22
+\end{array}\right] \\\\
+\mu=\left[\begin{array}{ll}
+16 & -5
+\end{array}\right]
+\end{array}
+$$
+
+
+ label=2
+$$
+\begin{array}{l}
+\Sigma=\left[\begin{array}{cc}
+10 & 5 \\\\
+5 & 10
+\end{array}\right] \\\\
+\mu=\left[\begin{array}{ll}
+20 & 25
+\end{array}\right]
+\end{array}
+$$
+
+这是我生成的训练集:
+
+
+
+
+
+这是我生成的测试集:
+
+
+
+
+
+可以通过如下表格来报告我的实验结果
+
+Algo |kvalue|Acc |
+-----| ---- |---- |
+KNN | 5 |0.6225 |
+
+
+
+
+由于label=0和label=2的对应分布较靠近,进而导致判定新输入的测试实例所属label的准确性仅为62.25%。
+
+
+改变高斯分布距离,我使用以下参数生成高斯分布。
+
+
+ label=0
+$$
+\begin{array}{l}
+\Sigma=\left[\begin{array}{cc}
+10 & 2.1 \\\\
+2.1 & 12
+\end{array}\right] \\\\
+\mu=\left[\begin{array}{ll}
+20 & 25
+\end{array}\right]
+\end{array}
+$$
+
+
+ label=1
+$$
+\begin{array}{l}
+\Sigma=\left[\begin{array}{cc}
+23 & 0 \\\\
+0 & 22
+\end{array}\right] \\\\
+\mu=\left[\begin{array}{ll}
+16 & -5
+\end{array}\right]
+\end{array}
+$$
+
+
+ label=2
+$$
+\begin{array}{l}
+\Sigma=\left[\begin{array}{cc}
+10 & 5 \\\\
+5 & 10
+\end{array}\right] \\\\
+\mu=\left[\begin{array}{ll}
+3 & 5
+\end{array}\right]
+\end{array}
+$$
+
+这是我生成的训练集:
+
+
+
+
+
+这是我生成的测试集:
+
+
+
+
+
+可以通过如下表格来报告我的实验结果
+
+Algo |kvalue|Acc |
+-----| ---- |---- |
+KNN | 2 |0.9975 |
+
+
+此时3个高斯分布距离较远,通过较少的k值即可得到较为准确的判断。增加高斯分布间的距离可以提升实验的准确性。
+
+## 代码使用方法
+
+```bash
+改变mode数值:
+mode=0 #数据生成
+mode=1 #数据可视化
+mode取非0-1值 #训练和测试
--
Gitee
From f5c6c053ebc0dd2aec7c8cd769106c2722bf809b Mon Sep 17 00:00:00 2001
From: Yantong He <8850706+yantong-he@user.noreply.gitee.com>
Date: Wed, 24 Mar 2021 22:02:42 +0800
Subject: [PATCH 30/40] update assignment-1/submission/19210680053/README.md.
---
assignment-1/submission/19210680053/README.md | 70 +++++++++++++++++++
1 file changed, 70 insertions(+)
diff --git a/assignment-1/submission/19210680053/README.md b/assignment-1/submission/19210680053/README.md
index 327490f..92ff9d2 100644
--- a/assignment-1/submission/19210680053/README.md
+++ b/assignment-1/submission/19210680053/README.md
@@ -85,6 +85,76 @@ KNN | 5 |0.6225 |
由于label=0和label=2的对应分布较靠近,进而导致判定新输入的测试实例所属label的准确性仅为62.25%。
+为进一步探究高斯分布距离对预测准确性影响,我使用如下参数进行分布生成:
+
+ label=0
+$$
+\begin{array}{l}
+\Sigma=\left[\begin{array}{cc}
+10 & 2.1 \\\\
+2.1 & 12
+\end{array}\right] \\\\
+\mu=\left[\begin{array}{ll}
+20 & 25
+\end{array}\right]
+\end{array}
+$$
+
+
+ label=1
+$$
+\begin{array}{l}
+\Sigma=\left[\begin{array}{cc}
+23 & 0 \\\\
+0 & 22
+\end{array}\right] \\\\
+\mu=\left[\begin{array}{ll}
+20 & 25
+\end{array}\right]
+\end{array}
+$$
+
+
+ label=2
+$$
+\begin{array}{l}
+\Sigma=\left[\begin{array}{cc}
+10 & 5 \\\\
+5 & 10
+\end{array}\right] \\\\
+\mu=\left[\begin{array}{ll}
+20 & 25
+\end{array}\right]
+\end{array}
+$$
+
+这是我生成的训练集:
+
+
+
+
+这是我生成的测试集:
+
+
+
+
+
+可以通过如下表格来报告我的实验结果
+
+Algo |kvalue|Acc |
+-----| ---- |---- |
+KNN | 5 |0.4725 |
+
+此时3个高斯分布距离彼此都很近,进行不同k值选取,提升实验的准确性最高达到47.25%。
+
+|kvalue|Acc |
+| ---- |---- |
+| 2 |0.465 |
+| 3 |0.465 |
+| 4 |0.4425 |
+| 5 |0.4725 |
+| 6 |0.46 |
+
改变高斯分布距离,我使用以下参数生成高斯分布。
--
Gitee
From 25ea92950e81d86f954d56d49088e9306fd63ddb Mon Sep 17 00:00:00 2001
From: Yantong He <8850706+yantong-he@user.noreply.gitee.com>
Date: Wed, 24 Mar 2021 22:04:14 +0800
Subject: [PATCH 31/40] update assignment-1/submission/19210680053/README.md.
---
assignment-1/submission/19210680053/README.md | 2 ++
1 file changed, 2 insertions(+)
diff --git a/assignment-1/submission/19210680053/README.md b/assignment-1/submission/19210680053/README.md
index 92ff9d2..001eadb 100644
--- a/assignment-1/submission/19210680053/README.md
+++ b/assignment-1/submission/19210680053/README.md
@@ -17,6 +17,8 @@ c.使用predict函数将全部test data逐个输入,得到预测结果
d.使用choose函数,将预测结果与test label进行比对,结果相同取值为1,不同为0,进行准确率计算。k值选择范围是2,3,...6,从中选取使预测结果准确率最高k值,并输出准确率预测
+## 数据生成 实验探究
+
我使用以下参数生成了如下三个二维高斯分布,label分别为0,1,2
--
Gitee
From 1c03c0d24201f5d67c50ab702b56c154f48f13d1 Mon Sep 17 00:00:00 2001
From: Yantong He <8850706+yantong-he@user.noreply.gitee.com>
Date: Wed, 24 Mar 2021 22:08:38 +0800
Subject: [PATCH 32/40] update assignment-1/submission/19210680053/README.md.
---
assignment-1/submission/19210680053/README.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/assignment-1/submission/19210680053/README.md b/assignment-1/submission/19210680053/README.md
index 001eadb..e760382 100644
--- a/assignment-1/submission/19210680053/README.md
+++ b/assignment-1/submission/19210680053/README.md
@@ -84,7 +84,7 @@ KNN | 5 |0.6225 |
-由于label=0和label=2的对应分布较靠近,进而导致判定新输入的测试实例所属label的准确性仅为62.25%。
+由于label=0和label=2的对应高斯分布较靠近,导致训练准确性为62.25%。
为进一步探究高斯分布距离对预测准确性影响,我使用如下参数进行分布生成:
@@ -147,7 +147,7 @@ Algo |kvalue|Acc |
-----| ---- |---- |
KNN | 5 |0.4725 |
-此时3个高斯分布距离彼此都很近,进行不同k值选取,提升实验的准确性最高达到47.25%。
+此时3个高斯分布距离彼此都很近,进行不同k值选取,实验的准确性最高达到47.25%。
|kvalue|Acc |
| ---- |---- |
--
Gitee
From 08bad2b98c66515324c50602abf156d5bd267860 Mon Sep 17 00:00:00 2001
From: Yantong He <8850706+yantong-he@user.noreply.gitee.com>
Date: Wed, 24 Mar 2021 22:12:27 +0800
Subject: [PATCH 33/40] =?UTF-8?q?=E6=96=B0=E5=BB=BA=20img?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
---
assignment-1/submission/19210680053/img/.keep | 0
1 file changed, 0 insertions(+), 0 deletions(-)
create mode 100644 assignment-1/submission/19210680053/img/.keep
diff --git a/assignment-1/submission/19210680053/img/.keep b/assignment-1/submission/19210680053/img/.keep
new file mode 100644
index 0000000..e69de29
--
Gitee
From 3e607e286ae67b775046c712fb61ac44a58cb45f Mon Sep 17 00:00:00 2001
From: Yantong He <8850706+yantong-he@user.noreply.gitee.com>
Date: Wed, 24 Mar 2021 22:14:34 +0800
Subject: [PATCH 34/40] display
---
.../submission/19210680053/img/test 1.png | Bin 0 -> 12714 bytes
.../submission/19210680053/img/test 2.png | Bin 0 -> 19098 bytes
.../submission/19210680053/img/test 3.png | Bin 0 -> 12989 bytes
.../submission/19210680053/img/train 1.png | Bin 0 -> 15830 bytes
.../submission/19210680053/img/train 2.png | Bin 0 -> 20871 bytes
.../submission/19210680053/img/train 3.png | Bin 0 -> 14542 bytes
6 files changed, 0 insertions(+), 0 deletions(-)
create mode 100644 assignment-1/submission/19210680053/img/test 1.png
create mode 100644 assignment-1/submission/19210680053/img/test 2.png
create mode 100644 assignment-1/submission/19210680053/img/test 3.png
create mode 100644 assignment-1/submission/19210680053/img/train 1.png
create mode 100644 assignment-1/submission/19210680053/img/train 2.png
create mode 100644 assignment-1/submission/19210680053/img/train 3.png
diff --git a/assignment-1/submission/19210680053/img/test 1.png b/assignment-1/submission/19210680053/img/test 1.png
new file mode 100644
index 0000000000000000000000000000000000000000..bf515460fd3bf6e81d027117399749a3b10c29fe
GIT binary patch
literal 12714
zcmdUVXH=6>vo0l4q$GeMy#`R}Re{i35CKJ{Ns}TX9i)Wbi?k4=HxUp}dM^P%YNQtl
z9f1JSA@qKO-?`_Wb^qMoH(4v|&71ws%-(xu^30wm@|l(z#chV$czAde>QH4JJUo0D
z?)rp;2=_^vl5~KF$A+h_tnktkf6LdqShsUhkh@FF=3xe?M(XcGHe-8!72G
z+Qraqjv+4X7LB|ftLpLo23)AE;|L^Q-7MIf MikT0=j~Yiw(@?Y&? 4aEW>2*!E1L
zRBtU5tC1*>JfdWtTdjz8EP7bxLb8Xi+Ebk}K$kVF6q?v!<&R1ZebCD${>@?%kI_<|
z^w%*81f0#un3=0>W$-u`v!H20Gw(I@K(biZ#f}%Mx2F_&_SUP5!|U~KTZYe0{m$_?
zWZ=MmKLR41aLX8xA>-v@By`!ADy1W2_bd2<>d`lQMez2Cx}^}eY8qOEwx`Wc!qyC@6Y>=6f*EnuLq#xs^8M!3*d9a_RtR7;0P*U#}wkIMy5E<_mUl!;z
zYj2A!g@@yv+vahuU(5{z2Q>{?{6y^_ZaR5hX)#}=89(vLvI(Z%Xy#nyHqINvx9~@GHE9_7*3~
2?_#gj_fT#Mod=H`lb3oP$uq{tph2
?QYH_l3WIM
zI)8h(y}`9G0~
zKSt4O@%*18Z2^V*0
Ynky%Eg^$g
zLI`H4B9=L(R4uI~Z!Hx|qp9~kq36t;^ZtEr@;v9>`@7HY{_gkt5#1OpuIu&G@T%R!
z^%_cus4CBM7TWeTgcIXbpdnlcTxnUq|49E5se_FWe%12LNLUoCBA}-#OZU2dPD=*F
zW_WlH+z75O_3Bi+LFRcU*&RNAWNDO2e$R)&E3jvzM78D0vyM00*On$a*BT?hXk5YY
zCz~Uw1wl=Jj91E9Y-3x2yPt2`0pS6(6h1!+0!z@33N>l`Z&q4deZoiGmlTS&6Pv(Q
z>1cPxQN!4+i7*PEB^!(Al>gVloh&I!43jx;1Ud7;p`X^mOmXz{{hEO}=q4jF
z^nz-nR;fdQba_m
rz(w<$V9{^dV}n$1{RlnWp$&=C?QI>U;5Nb0DHi$rHaBeQ=5x7bQ7F
zjOpd5vYzg~LLQTdn