diff --git a/assignment-3/submission/18307130116/README.md b/assignment-3/submission/18307130116/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..1d5392c675488da60bf167158cd3e6975362c4d4
--- /dev/null
+++ b/assignment-3/submission/18307130116/README.md
@@ -0,0 +1,244 @@
+# Assignment3实验报告
+
+[toc]
+
+## 注释
+
+本次实验中所有函数均配有详尽的google风格的代码注释,因此除重点函数外,将略去实验报告中对API的详细介绍
+
+## 模型实现
+
+### 数据预处理
+
+为了避免极端值的出现,笔者将原先的数据各个维度投影到[-10, 10]区间,且等比例投影在欧式距离的情况下并不会影响到聚类结果,从而避免在计算概率值时出现指数运算溢出的情况
+
+### Kmeans
+
+#### 原理与收敛条件
+
+Kmeans的原理相对较为简单,在初始随机选中一些点作为聚类中心后,计算每个点对于聚类中心的距离,并归到最近的类中,更新聚类中心后重复该过程直至收敛。
+
+本次实验的收敛条件采用聚类中心是否变化作为最终条件,当某一轮更新后,聚类中心不发生改变,认为收敛。由于Kmeans对初值敏感,为避免某些随机初始化情况下在一些点上震荡,同时给定循环轮数的上界。
+
+#### 模型结构
+
+除给定的`init`,`fit`和`predict`三个API外,为方便使用,增添了`get_class`,`get_distance`,`update_center`三个API方便使用,`get_class`给定某个坐标点,在当前的聚类中心下,获得对应的类,`get_distance`:给定两个坐标点,获得对应的距离,`update_center`:根据聚类情况更新聚类中心
+
+### GMM
+
+#### 原理和问题
+
+高斯混合模型的基本假设为点的分布为高斯分布的叠加,从而整个过程是利用多个高斯分布对样本点拟合
+
+GMM同样**对于初值非常敏感**,另一方面GMM对于数据量要求也较高,可以想见在极端小的数据规模下,高斯分布拟合出的结果会是一个瘦高的类似于冲激函数的形态,这将意味着,较小的扰动都会带来较大的概率值变化,从而影响对数似然的敛散性,因此对于初始值选取,有较高的要求。
+
+另一方面,GMM模型更新采用的为EM算法,E步在固定协方差和均值后,计算高维高斯概率值,并加和归一化后获得最终结果,其目标是在E步利用变分方法,获得一个对后验概率的拟合,使得似然和证据下界相同,并在M步给定了后验估计后,优化高斯分布的参数,从而优化似然值
+
+#### 初始值选取
+
+前面提到,GMM算法对初始值非常敏感,尤其是在少样本的情况下,GMM算法会在自变量小范围震荡时,引发较大的似然值改变。实际代码实现时,若采用似然改变阈值作为收敛条件,会出现无法跑出结果的情况。
+
+笔者阅读SKlearn中GMM的源码,以及查找了相关资料,总结出了两种初始值选取方法
+
+* 采用抽样法,对训练数据进行抽样,并采取平均结果作为初始值
+* 采用Kmeans预先训练,采用聚类中心作为高斯分布的均值
+
+在本次实验中,采用第二种实现方法,经过实际试验测试,较好的改进了原先震荡的情况
+
+#### 模型结构
+
+除了给定的接口外,额外添加了`point_probability`和`calculate`两个api,其中`point_probabilty`用于计算点对于所有类的联合概率,且为避免出现0,在计算式添加$e^{-200}$,以完成平滑操作,`calculate`则计算点在某一类下的概率值
+
+### ClusteringAlgorithm
+
+#### 实现原理
+
+以Kmeans聚类方法为基础,clusteringAlgorithm主要完成的任务是筛选一个合适的聚类数K,将在后文采用两种方法完成
+
+#### K值选取方法实现
+
+##### 经验法
+
+经验法为K值的经验公式,一般为$\sqrt{n/2}$,在本轮实验中,该数值作为了Elbow法的上界
+
+##### Elbow
+
+Elbow方法主要寻找整个变化曲线中的拐点,对于如下图像,拐点处被认为是最合理的点,图中该点的类别为4,正确为3,大致在正确值附近
+
+
+
+在该点处,类别切分没有太细,且误差较小,一般考虑为较好的K值。在本次实验中,为了自动化寻找拐点,我们用数组`dis`记录下每个k的阈值,并且认为前一次下降的幅度如果小于后一次下降幅度的两倍,即达到了拐点处,具体选择情况见实验部分[自动化聚类算法实验结果](###自动化聚类算法实验结果)
+
+##### Canopy
+
+###### 原理
+
+Canopy为一种粗粒度的聚类方法,其原理为给定一个两个阈值t1, t2,满足t1>t2,在初始选定点后,按照某种距离度量,与初始点距离小于t2的点有较大概率不会成为类中心,将其从数据集中删去,小于t1的类型的点则有较大概率为同类型点,本次实验中因为只需要c运行后的类别数量,对同类型的有哪些不做考虑。重复上述过程直到整个数据集为空
+
+###### 实现与阈值设定
+
+Canopy的实现相对较为简单,这里主要讨论t1和t2值的给定,出于自动化获取k值的考量,这里选取t1,t2也通过获得数据维度的方法直接给定,考虑到真实数据分布为[-10, 10],认为$\sqrt{维度数}*2$为t2,而t1选2*t2,且考虑到在聚类中,一直聚类到所有点全部被划分完成,一定程度上会受到离群点影响,且Canopy预处理过程只需要获取最终的K值,对每个点的聚类情况并不关心。这里更改终止条件为,**当80%的点都完成聚类后,将终止Canopy算法并返回**,80%的阈值设定为考虑到二八法则,即认为80%的点能够较真实的反映数据的总体情况,最终结果参见实验部分[自动化聚类算法实验结果](###自动化聚类算法实验结果)
+
+## 实验部分
+
+### 数据生成与可视化
+
+数据生成部分重复利用了[Assignment1](https://gitee.com/fnlp/prml-21-spring/blob/master/assignment-1/submission/18307130116/source.py#L94)的API,并在此基础上添加了代码注释以及method接口,可以生成对数正态分布和高维高斯,可视化部分除了在[Assignment1](https://gitee.com/fnlp/prml-21-spring/blob/master/assignment-1/submission/18307130116/source.py#L129)基础上添加了代码注释以外,还添加了新的color参数,因为在考虑到实际聚类问题实现并不清楚各个类的分布情况,不着色绘图有时是合理的
+
+对应的API接口`data_generate_and_save`负责数据生成和保存,`data_load`负责数据载入,`visualize`负责数据可视化
+
+### 聚类效果评估方法
+
+除了直接观察法外,我们采用Silhouette Coefficient(轮廓系数,简称SC)作为聚类效果的衡量指标,SC的度量方式为计算一个点到其簇中其他点的平均距离,同该点到其他类所有点的平均距离最小值的比值,该比值介于[-1,1],且越趋近于1,表现越优
+
+`compute_SC`接口只需给定所有的点和其标签,即可完成计算
+
+### 基础实验
+
+#### 常规情况
+
+在这部分实验中,我们将简单生成一组数据,可视化Kmeans和GMM聚类结果,并和真实值相对比
+
+参数如下表所示
+
+| 类别 | 均值    | 方差             | 点数 |
+| ---- | ------- | ---------------- | ---- |
+| 1    | (1, -7) | [[1, 0], [0, 1]] | 800  |
+| 2    | (1, -4) | [[1, 0], [0, 1]] | 800  |
+| 3    | (1, 0)  | [[1, 0], [0, 1]] | 800  |
+
+无着色图例和真实图例如下
+
+
 +
+Kmeans获取的结果(左) 和 GMM获取的结果(右)如下图所示
+
+
+
+Kmeans获取的结果(左) 和 GMM获取的结果(右)如下图所示
+
+
 +
+从图中可见,大体上模型都将其正确聚类,在细微表现上有所差异
+
+计算其SC值,Kmeans = 0.497,GMM = 0.497,两者相差不大,Kmeans略优于GMM
+
+#### 数据中存在极端值
+
+当数据中存在极端值时,即考虑某个维度较大,进行相关实验,参数如下
+
+| 类别 | 均值      | 方差               | 点数 |
+| ---- | --------- | ------------------ | ---- |
+| 1    | (1, -700) | [[1, 0], [0, 100]] | 800  |
+| 2    | (1, -400) | [[1, 0], [0, 100]] | 800  |
+| 3    | (1, 0)    | [[1, 0], [0, 1]]   | 800  |
+
+该部分实际测试为测试极端值是否会影响模型计算,实际效果上,属于各类区别很大的情况
+
+无着色图例和真实图例如下
+
+
+
+从图中可见,大体上模型都将其正确聚类,在细微表现上有所差异
+
+计算其SC值,Kmeans = 0.497,GMM = 0.497,两者相差不大,Kmeans略优于GMM
+
+#### 数据中存在极端值
+
+当数据中存在极端值时,即考虑某个维度较大,进行相关实验,参数如下
+
+| 类别 | 均值      | 方差               | 点数 |
+| ---- | --------- | ------------------ | ---- |
+| 1    | (1, -700) | [[1, 0], [0, 100]] | 800  |
+| 2    | (1, -400) | [[1, 0], [0, 100]] | 800  |
+| 3    | (1, 0)    | [[1, 0], [0, 1]]   | 800  |
+
+该部分实际测试为测试极端值是否会影响模型计算,实际效果上,属于各类区别很大的情况
+
+无着色图例和真实图例如下
+
+
 +
+Kmeans获取的结果(左) 和 GMM获取的结果(右)如下图所示
+
+
+
+Kmeans获取的结果(左) 和 GMM获取的结果(右)如下图所示
+
+
 +
+计算其SC值,Kmeans = 0.973,GMM = 0.973,两者完全相同
+
+#### 高度重叠下的聚类结果
+
+以上我们已经测试了重叠不大,无重叠两组数据,进一步比较,我们将测试高度重叠下的聚类结果
+
+采用如下参数
+
+| 类别 | 均值   | 方差             | 点数 |
+| ---- | ------ | ---------------- | ---- |
+| 1    | (1, 1) | [[2, 0], [0, 2]] | 800  |
+| 2    | (1, 2) | [[2, 0], [0, 2]] | 800  |
+| 3    | (1, 0) | [[2, 0], [0, 2]] | 800  |
+
+结果如下图所示
+
+无着色图例和真实图例如下
+
+
+
+计算其SC值,Kmeans = 0.973,GMM = 0.973,两者完全相同
+
+#### 高度重叠下的聚类结果
+
+以上我们已经测试了重叠不大,无重叠两组数据,进一步比较,我们将测试高度重叠下的聚类结果
+
+采用如下参数
+
+| 类别 | 均值   | 方差             | 点数 |
+| ---- | ------ | ---------------- | ---- |
+| 1    | (1, 1) | [[2, 0], [0, 2]] | 800  |
+| 2    | (1, 2) | [[2, 0], [0, 2]] | 800  |
+| 3    | (1, 0) | [[2, 0], [0, 2]] | 800  |
+
+结果如下图所示
+
+无着色图例和真实图例如下
+
+
 +
+Kmeans获取的结果(左) 和 GMM获取的结果(右)如下图所示
+
+
+
+Kmeans获取的结果(左) 和 GMM获取的结果(右)如下图所示
+
+
 +
+计算其SC值,Kmeans = 0.323,GMM = 0.310,Kmeans略优于GMM,我们同时计算了高度重叠下的原始数据SC,为-0.003,聚类器聚类结果相比较原始类别更优,但是在实际操作过程中,这样可能具有欺诈性,因此在使用聚类器时,应该首先对数据分布有所了解,而不应该直接使用聚类器尝试,否则可能得到“评价较优”的错误结果
+
+### GMM模型补充实验
+
+我们GMM在进行拟合时,初值使用了Kmeans获取,考虑到GMM的意义不止在于聚类,更在于能够估算叠加形式的高斯分布参数,我们进一步尝试了非高斯分布在GMM下的分类情况,仅作为补充了解,非高斯分布在GMM下会有怎样的表现
+
+
+
+
+
+计算其SC值,Kmeans = 0.323,GMM = 0.310,Kmeans略优于GMM,我们同时计算了高度重叠下的原始数据SC,为-0.003,聚类器聚类结果相比较原始类别更优,但是在实际操作过程中,这样可能具有欺诈性,因此在使用聚类器时,应该首先对数据分布有所了解,而不应该直接使用聚类器尝试,否则可能得到“评价较优”的错误结果
+
+### GMM模型补充实验
+
+我们GMM在进行拟合时,初值使用了Kmeans获取,考虑到GMM的意义不止在于聚类,更在于能够估算叠加形式的高斯分布参数,我们进一步尝试了非高斯分布在GMM下的分类情况,仅作为补充了解,非高斯分布在GMM下会有怎样的表现
+
+
+
+
 +
+Kmeans获取的结果(左) 和 GMM获取的结果(右)如下图所示
+
+
+
+Kmeans获取的结果(左) 和 GMM获取的结果(右)如下图所示
+
+
 +
+计算其SC值,Kmeans = 0.677,GMM = 0.62009476,原数据的SC为-0.0348,对数正态会被GMM理解成高度重叠的高斯分布,并没有对实际效果产生过大的影响
+
+### 自动化聚类算法实验结果
+
+ClusteringAlgorithm自动化实现是基于Kmeans完成的,因此在以下实验中,我们仅关注K值选取,而不重复进行Kmeans聚类效果实验
+
+我们首先根据某一组参数,多轮测试基于ELBOW算法和Canopy算法的结果,真实值为K = 3,数据参数见表格,选取某一轮可视化如下图,加号为测试数据,有类别版本见右侧
+
+
+
+计算其SC值,Kmeans = 0.677,GMM = 0.62009476,原数据的SC为-0.0348,对数正态会被GMM理解成高度重叠的高斯分布,并没有对实际效果产生过大的影响
+
+### 自动化聚类算法实验结果
+
+ClusteringAlgorithm自动化实现是基于Kmeans完成的,因此在以下实验中,我们仅关注K值选取,而不重复进行Kmeans聚类效果实验
+
+我们首先根据某一组参数,多轮测试基于ELBOW算法和Canopy算法的结果,真实值为K = 3,数据参数见表格,选取某一轮可视化如下图,加号为测试数据,有类别版本见右侧
+
+
 +
+| 类别 | 均值     | 方差                   | 点数 |
+| ---- | -------- | ---------------------- | ---- |
+| 1    | (1, 2)   | [[73, 0], [0, 22]]     | 800  |
+| 2    | (16, -5) | [[21.2, 0], [0, 32.1]] | 200  |
+| 3    | (10, 22) | [[10, 5], [5, 10]]     | 1000 |
+
+| 轮数 | ELBOW自动获取的K | Canopy自动获取的K |
+| ---- | ---------------- | ----------------- |
+| 1    | 5                | 4                 |
+| 2    | 4                | 4                 |
+| 3    | 4                | 5                 |
+| 4    | 3                | 5                 |
+| 5    | 4                | 4                 |
+
+可以看出,ELBOW和Canopy算法获得K都和真实值相差不大,较好的完成了自动化的工作,相比之下,ELBOW需要跑$\sqrt{n/2}$轮Kmeans,而Canopy算法只需要跑一轮即可,速度相对较快,ELBOW算法的波动情况更突出,能否取得最优结果取决于Kmeans过程中的随机初值,但总的来说完成情况较好
+
+### Canopy鲁棒性测试
+
+根据上文结果,ELBOW和Canopy算法获得的K值结果相差不大,且Canopy算法速度远快于ELBOW,但是由于依赖于阈值t1,t2,而在本次实验,阈值是对于同一维度数据完全相同的,因此增添鲁棒性测试。而ELBOW算法为暴力枚举K,以获得最好的结果,相对而言可以预想到鲁棒性较好,这里略去Kmeans的测试
+
+为保证阈值相同,笔者测试了多组二维数据,参数和结果如下表格所示,增添第三组几乎不相交组和第四组高度重叠组,测试实际效果
+
+| 轮数 | 均值                           | 协方差                                                      | 各类别点数      | 真实类别数 | 结果 |
+| ---- | ------------------------------ | ----------------------------------------------------------- | --------------- | ---------- | ---- |
+| 1    | [(1, -7), (1, -4), (1, 0)]     | 单位阵                                                      | [800, 800, 800] | 3          | 3    |
+| 2    | [(1, -7), (1, -4), (1, 0)]     | [[10, 0],[0, 1]]     [[1, 0], [0, 10]]     [[2, 0], [6, 5]] | [800, 800, 800] | 3          | 3    |
+| 3    | [(10, 10), (-10, -10), (5, 0)] | 单位阵                                                      | [800, 800, 800] | 3          | 3    |
+| 4    | [(1, 1), (1, 2), (1, 0)]       | [[2, 0],[0, 2]]     [[2, 0], [0, 2]]     [[2, 0], [0, 2]]   | [800, 800, 800] | 3          | 4    |
+
+高度重叠组和完全分离数据可视化如下:
+
+
+
+| 类别 | 均值     | 方差                   | 点数 |
+| ---- | -------- | ---------------------- | ---- |
+| 1    | (1, 2)   | [[73, 0], [0, 22]]     | 800  |
+| 2    | (16, -5) | [[21.2, 0], [0, 32.1]] | 200  |
+| 3    | (10, 22) | [[10, 5], [5, 10]]     | 1000 |
+
+| 轮数 | ELBOW自动获取的K | Canopy自动获取的K |
+| ---- | ---------------- | ----------------- |
+| 1    | 5                | 4                 |
+| 2    | 4                | 4                 |
+| 3    | 4                | 5                 |
+| 4    | 3                | 5                 |
+| 5    | 4                | 4                 |
+
+可以看出,ELBOW和Canopy算法获得K都和真实值相差不大,较好的完成了自动化的工作,相比之下,ELBOW需要跑$\sqrt{n/2}$轮Kmeans,而Canopy算法只需要跑一轮即可,速度相对较快,ELBOW算法的波动情况更突出,能否取得最优结果取决于Kmeans过程中的随机初值,但总的来说完成情况较好
+
+### Canopy鲁棒性测试
+
+根据上文结果,ELBOW和Canopy算法获得的K值结果相差不大,且Canopy算法速度远快于ELBOW,但是由于依赖于阈值t1,t2,而在本次实验,阈值是对于同一维度数据完全相同的,因此增添鲁棒性测试。而ELBOW算法为暴力枚举K,以获得最好的结果,相对而言可以预想到鲁棒性较好,这里略去Kmeans的测试
+
+为保证阈值相同,笔者测试了多组二维数据,参数和结果如下表格所示,增添第三组几乎不相交组和第四组高度重叠组,测试实际效果
+
+| 轮数 | 均值                           | 协方差                                                      | 各类别点数      | 真实类别数 | 结果 |
+| ---- | ------------------------------ | ----------------------------------------------------------- | --------------- | ---------- | ---- |
+| 1    | [(1, -7), (1, -4), (1, 0)]     | 单位阵                                                      | [800, 800, 800] | 3          | 3    |
+| 2    | [(1, -7), (1, -4), (1, 0)]     | [[10, 0],[0, 1]]     [[1, 0], [0, 10]]     [[2, 0], [6, 5]] | [800, 800, 800] | 3          | 3    |
+| 3    | [(10, 10), (-10, -10), (5, 0)] | 单位阵                                                      | [800, 800, 800] | 3          | 3    |
+| 4    | [(1, 1), (1, 2), (1, 0)]       | [[2, 0],[0, 2]]     [[2, 0], [0, 2]]     [[2, 0], [0, 2]]   | [800, 800, 800] | 3          | 4    |
+
+高度重叠组和完全分离数据可视化如下:
+
+
 +
+在高度重叠的数据下,Canopy方法仍表现出较好的性能,模型对分布鲁棒
+
+紧接着,我们测试模型对分布点数的鲁棒性,全部采取上表中的第一组参数,仅更改各类别点数
+
+| 轮数 | 点数             | 结果 |
+| ---- | ---------------- | ---- |
+| 1    | [80, 1000, 80]   | 1    |
+| 2    | [200, 1000, 10]  | 2    |
+| 3    | [200, 1000, 100] | 3    |
+
+可以看到点数对Canopy的影响较大,考虑到实际实现过程中,采用二八法则忽略少量离群点对于整体聚类数量的影响,在某个类别显著高于其他类时,其他类别的点将会被忽略,且距离越近忽略的可能性越大,这点也符合实际聚类时的特征
+
+进一步的,我们测试多类情况下的K值选取,[a,b]表示多组测试下,K值最终落入区间[a, b]
+
+| 轮数 | 均值                                | 协方差                                                       | 各类别点数      | 真实类别数 | 结果  |
+| ---- | ----------------------------------- | ------------------------------------------------------------ | --------------- | ---------- | ----- |
+| 1    | [(1, -7), (1, -4), (1, 0), (-2, 0)] | 单位阵                                                       | [800, 800, 800] | 4          | [3,4] |
+| 2    | [(1, -7), (1, -4), (1, 0), (-2, 0)] | [[10, 0],[0, 1]]     [[1, 0], [0, 10]]     [[2, 0], [6, 5]]    [[3, 0], [1, 5]] | [800, 800, 800] | 4          | [3,4] |
+
+#### 总结
+
+在这部分中,我们看到,Canopy的鲁棒性对于分布情况较好,在重叠程度较大和较小时都有较好的性能,但是对于各类点数鲁棒性较差,容易忽略一些少量点,这与实现方式相关,也符合预期
\ No newline at end of file
diff --git "a/assignment-3/submission/18307130116/img/GMM\345\237\272\347\241\200\345\256\236\351\252\214\347\273\223\346\236\234.png" "b/assignment-3/submission/18307130116/img/GMM\345\237\272\347\241\200\345\256\236\351\252\214\347\273\223\346\236\234.png"
new file mode 100644
index 0000000000000000000000000000000000000000..c369d0d437a06ee47a31c8aa28fc5b2360e17ea0
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/GMM\345\237\272\347\241\200\345\256\236\351\252\214\347\273\223\346\236\234.png" differ
diff --git "a/assignment-3/submission/18307130116/img/GMM\345\257\271\346\225\260\346\255\243\346\200\201\345\210\206\345\270\203.png" "b/assignment-3/submission/18307130116/img/GMM\345\257\271\346\225\260\346\255\243\346\200\201\345\210\206\345\270\203.png"
new file mode 100644
index 0000000000000000000000000000000000000000..5577014148822a67b7cff48ba5e73e5e5ddcff73
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/GMM\345\257\271\346\225\260\346\255\243\346\200\201\345\210\206\345\270\203.png" differ
diff --git "a/assignment-3/submission/18307130116/img/GMM\346\236\201\347\253\257\345\256\236\351\252\214\347\255\224\346\241\210.png" "b/assignment-3/submission/18307130116/img/GMM\346\236\201\347\253\257\345\256\236\351\252\214\347\255\224\346\241\210.png"
new file mode 100644
index 0000000000000000000000000000000000000000..40088f3cb594956cd7f3e9ccc6705ccf45827db3
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/GMM\346\236\201\347\253\257\345\256\236\351\252\214\347\255\224\346\241\210.png" differ
diff --git "a/assignment-3/submission/18307130116/img/GMM\351\253\230\345\272\246\351\207\215\345\217\240\347\255\224\346\241\210.png" "b/assignment-3/submission/18307130116/img/GMM\351\253\230\345\272\246\351\207\215\345\217\240\347\255\224\346\241\210.png"
new file mode 100644
index 0000000000000000000000000000000000000000..7294753f3d2219aece852691320d7f64611eb510
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/GMM\351\253\230\345\272\246\351\207\215\345\217\240\347\255\224\346\241\210.png" differ
diff --git "a/assignment-3/submission/18307130116/img/Kmeans\345\237\272\347\241\200\345\256\236\351\252\214\347\273\223\346\236\234.png" "b/assignment-3/submission/18307130116/img/Kmeans\345\237\272\347\241\200\345\256\236\351\252\214\347\273\223\346\236\234.png"
new file mode 100644
index 0000000000000000000000000000000000000000..d1aa314ef755065b8965ab3b811c8af5325ee35e
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/Kmeans\345\237\272\347\241\200\345\256\236\351\252\214\347\273\223\346\236\234.png" differ
diff --git "a/assignment-3/submission/18307130116/img/Kmeans\345\257\271\346\225\260\346\255\243\346\200\201\345\210\206\345\270\203\347\255\224\346\241\210.png" "b/assignment-3/submission/18307130116/img/Kmeans\345\257\271\346\225\260\346\255\243\346\200\201\345\210\206\345\270\203\347\255\224\346\241\210.png"
new file mode 100644
index 0000000000000000000000000000000000000000..ecff0a29f27fcd79d94e5f8195b6752aef9fba0b
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/Kmeans\345\257\271\346\225\260\346\255\243\346\200\201\345\210\206\345\270\203\347\255\224\346\241\210.png" differ
diff --git "a/assignment-3/submission/18307130116/img/Kmeans\346\236\201\347\253\257\345\256\236\351\252\214\347\255\224\346\241\210.png" "b/assignment-3/submission/18307130116/img/Kmeans\346\236\201\347\253\257\345\256\236\351\252\214\347\255\224\346\241\210.png"
new file mode 100644
index 0000000000000000000000000000000000000000..dab91ad6746eecb423c695a38749c1bb5fc0de42
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/Kmeans\346\236\201\347\253\257\345\256\236\351\252\214\347\255\224\346\241\210.png" differ
diff --git "a/assignment-3/submission/18307130116/img/Kmeans\351\253\230\345\272\246\351\207\215\345\217\240\347\255\224\346\241\210.png" "b/assignment-3/submission/18307130116/img/Kmeans\351\253\230\345\272\246\351\207\215\345\217\240\347\255\224\346\241\210.png"
new file mode 100644
index 0000000000000000000000000000000000000000..55b8377dc1f11b74d71c0eadd77dd40a5c4df74d
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/Kmeans\351\253\230\345\272\246\351\207\215\345\217\240\347\255\224\346\241\210.png" differ
diff --git a/assignment-3/submission/18307130116/img/elbow.png b/assignment-3/submission/18307130116/img/elbow.png
new file mode 100644
index 0000000000000000000000000000000000000000..f5a2e5e471a9f91630b179c9d9fa2fc478967176
Binary files /dev/null and b/assignment-3/submission/18307130116/img/elbow.png differ
diff --git "a/assignment-3/submission/18307130116/img/\345\237\272\347\241\200\345\256\236\351\252\214\347\255\224\346\241\210.png" "b/assignment-3/submission/18307130116/img/\345\237\272\347\241\200\345\256\236\351\252\214\347\255\224\346\241\210.png"
new file mode 100644
index 0000000000000000000000000000000000000000..79e2ea68d5a4d211a87599c3267bd2e48cf0daa4
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/\345\237\272\347\241\200\345\256\236\351\252\214\347\255\224\346\241\210.png" differ
diff --git "a/assignment-3/submission/18307130116/img/\345\237\272\347\241\200\345\256\236\351\252\214\351\242\230\347\233\256.png" "b/assignment-3/submission/18307130116/img/\345\237\272\347\241\200\345\256\236\351\252\214\351\242\230\347\233\256.png"
new file mode 100644
index 0000000000000000000000000000000000000000..90c9da70c56b6adb825c55a7372aa37693a66dec
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/\345\237\272\347\241\200\345\256\236\351\252\214\351\242\230\347\233\256.png" differ
diff --git "a/assignment-3/submission/18307130116/img/\345\256\214\345\205\250\345\210\206\347\246\273\346\225\260\346\215\256.png" "b/assignment-3/submission/18307130116/img/\345\256\214\345\205\250\345\210\206\347\246\273\346\225\260\346\215\256.png"
new file mode 100644
index 0000000000000000000000000000000000000000..572bf49a5ceaa1b1ce3e2493d81be2553c621e6d
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/\345\256\214\345\205\250\345\210\206\347\246\273\346\225\260\346\215\256.png" differ
diff --git "a/assignment-3/submission/18307130116/img/\345\257\271\346\225\260\346\255\243\346\200\201\345\210\206\345\270\203\347\255\224\346\241\210.png" "b/assignment-3/submission/18307130116/img/\345\257\271\346\225\260\346\255\243\346\200\201\345\210\206\345\270\203\347\255\224\346\241\210.png"
new file mode 100644
index 0000000000000000000000000000000000000000..d5c27c4b1854debe84105d77a636ea0119368383
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/\345\257\271\346\225\260\346\255\243\346\200\201\345\210\206\345\270\203\347\255\224\346\241\210.png" differ
diff --git "a/assignment-3/submission/18307130116/img/\345\257\271\346\225\260\346\255\243\346\200\201\345\210\206\345\270\203\351\242\230\347\233\256.png" "b/assignment-3/submission/18307130116/img/\345\257\271\346\225\260\346\255\243\346\200\201\345\210\206\345\270\203\351\242\230\347\233\256.png"
new file mode 100644
index 0000000000000000000000000000000000000000..1fb0fc17af5f4239734c8572df31d2f4ae68537f
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/\345\257\271\346\225\260\346\255\243\346\200\201\345\210\206\345\270\203\351\242\230\347\233\256.png" differ
diff --git "a/assignment-3/submission/18307130116/img/\346\227\240\347\261\273\345\210\253.png" "b/assignment-3/submission/18307130116/img/\346\227\240\347\261\273\345\210\253.png"
new file mode 100644
index 0000000000000000000000000000000000000000..144f73c866afc1c56cc6d50b5e9c01b342d8da2f
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/\346\227\240\347\261\273\345\210\253.png" differ
diff --git "a/assignment-3/submission/18307130116/img/\346\234\211\347\261\273\345\210\253data.png" "b/assignment-3/submission/18307130116/img/\346\234\211\347\261\273\345\210\253data.png"
new file mode 100644
index 0000000000000000000000000000000000000000..6d97fae31b24e72ecb5f4b19c52faa03636fbdf8
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/\346\234\211\347\261\273\345\210\253data.png" differ
diff --git "a/assignment-3/submission/18307130116/img/\346\236\201\347\253\257\345\256\236\351\252\214\347\255\224\346\241\210.png" "b/assignment-3/submission/18307130116/img/\346\236\201\347\253\257\345\256\236\351\252\214\347\255\224\346\241\210.png"
new file mode 100644
index 0000000000000000000000000000000000000000..dab91ad6746eecb423c695a38749c1bb5fc0de42
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/\346\236\201\347\253\257\345\256\236\351\252\214\347\255\224\346\241\210.png" differ
diff --git "a/assignment-3/submission/18307130116/img/\346\236\201\347\253\257\345\256\236\351\252\214\351\242\230\347\233\256.png" "b/assignment-3/submission/18307130116/img/\346\236\201\347\253\257\345\256\236\351\252\214\351\242\230\347\233\256.png"
new file mode 100644
index 0000000000000000000000000000000000000000..35a9a8754cd5d6d4e8ca3b3d8799f0110dc46f68
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/\346\236\201\347\253\257\345\256\236\351\252\214\351\242\230\347\233\256.png" differ
diff --git "a/assignment-3/submission/18307130116/img/\351\253\230\345\272\246\351\207\215\345\217\240\346\225\260\346\215\256.png" "b/assignment-3/submission/18307130116/img/\351\253\230\345\272\246\351\207\215\345\217\240\346\225\260\346\215\256.png"
new file mode 100644
index 0000000000000000000000000000000000000000..74fafa0211602f3c8c6682127e3196cfa97bde11
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/\351\253\230\345\272\246\351\207\215\345\217\240\346\225\260\346\215\256.png" differ
diff --git "a/assignment-3/submission/18307130116/img/\351\253\230\345\272\246\351\207\215\345\217\240\347\255\224\346\241\210.png" "b/assignment-3/submission/18307130116/img/\351\253\230\345\272\246\351\207\215\345\217\240\347\255\224\346\241\210.png"
new file mode 100644
index 0000000000000000000000000000000000000000..2fe7b8b7aaae54eebe4001233150cba346870577
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/\351\253\230\345\272\246\351\207\215\345\217\240\347\255\224\346\241\210.png" differ
diff --git "a/assignment-3/submission/18307130116/img/\351\253\230\345\272\246\351\207\215\345\217\240\351\242\230\347\233\256.png" "b/assignment-3/submission/18307130116/img/\351\253\230\345\272\246\351\207\215\345\217\240\351\242\230\347\233\256.png"
new file mode 100644
index 0000000000000000000000000000000000000000..a9ff7f42e30d93a44bff57b9facb19111a2b75bc
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/\351\253\230\345\272\246\351\207\215\345\217\240\351\242\230\347\233\256.png" differ
diff --git a/assignment-3/submission/18307130116/source.py b/assignment-3/submission/18307130116/source.py
new file mode 100644
index 0000000000000000000000000000000000000000..5487cd0c73af70ba038bd590a54d5cfa3c8c080a
--- /dev/null
+++ b/assignment-3/submission/18307130116/source.py
@@ -0,0 +1,649 @@
+import numpy as np
+import random
+import matplotlib.pyplot as plt
+import math
+import matplotlib.cm as cm
+from numpy.core.fromnumeric import argmin
+
+def data_preprocess(data):
+    """preprocess the data
+
+    use the range of data to transform the data
+
+    Args:
+        data(numpy.ndarray):raw data
+
+    Return:
+        numpy.ndarray: data after process
+    """
+
+    edge = max(abs(data.max()), abs(data.min()))
+    result_data = (data*10)/edge
+    return result_data
+
+
+def compute_SC(data, label, class_num):
+    """compute the Silhouette Coefficient
+
+    Args:
+        data(numpy.ndarray): data for compute
+        lable(list):label for every point
+        class_num(int): the number of cluster
+    
+    Return:
+        int: the value of Silhouette Coefficient
+    """
+    point_dict = {}
+    data = data_preprocess(data)
+    if len(data.shape) == 1:
+        dimention = 1
+    else:
+        dimention = data.shape[1]
+    for iter in range(class_num):
+        point_dict[iter] = []
+    for iter in range(len(data)):
+        point_dict[label[iter]].append(data[iter])
+    result = 0
+    for iter in range(len(data)):
+        now_point = data[iter]
+        now_point = now_point.reshape(-1, 1)
+        inner_dis = 0
+        now_label = label[iter]
+        for other in point_dict[now_label]:
+            other = other.reshape(-1, 1)
+            temp = 0
+            for i in range(dimention):
+                temp = temp + (now_point[i]-other[i]) ** 2
+            inner_dis = inner_dis + temp**0.5
+        inner_dis = inner_dis / (len(point_dict[now_label]) - 1)
+        out_dis_min = math.inf
+        for label_iter in range(class_num):
+            if label_iter == now_label:
+                continue
+            out_dis = 0
+            for other in point_dict[label_iter]:
+                other = other.reshape(-1, 1)
+                temp = 0
+                for i in range(dimention):
+                    temp = temp + (now_point[i]-other[i]) ** 2
+                out_dis = out_dis + temp**0.5
+            out_dis = out_dis / len(point_dict[label_iter])
+            if out_dis < out_dis_min:
+                out_dis_min = out_dis
+        result = result + (out_dis_min - inner_dis)/max(out_dis_min, inner_dis)
+    result = result / len(data)
+    return result
+
+
+def data_generate_and_save(class_num, mean_list, cov_list, num_list, save_path = "", method = "Gaussian"):
+    """generate data that obey Guassian distribution
+    
+    label will be saved in the meantime.
+
+    Args:
+        class_num(int): the number of class
+        mean_list(list): mean_list[i] stand for the mean of class[i]
+        cov_list(list): similar to mean_list, stand for the covariance
+        num_list(list): similar to mean_list, stand for the number of points in class[i]
+        save_path(str): the data storage path, end with slash.
+        method(str): the distribution data will follow, support gaussian and lognormal
+    """
+    if method == "lognormal":
+        data = np.random.lognormal(mean_list[0], cov_list[0], num_list[0])
+    elif method == "Gaussian":
+        data = np.random.multivariate_normal(mean_list[0], cov_list[0], (num_list[0],))
+    label = np.zeros((num_list[0],),dtype=int)
+    total = num_list[0]
+    
+    for iter in range(1, class_num):
+        if method == "lognormal":
+            temp = np.random.lognormal(mean_list[0], cov_list[0], num_list[0])
+        elif method == "Gaussian":
+            temp = np.random.multivariate_normal(mean_list[0], cov_list[0], (num_list[0],))
+        label_temp = np.ones((num_list[iter],),dtype=int)*iter
+        data = np.concatenate([data, temp])
+        label = np.concatenate([label, label_temp])
+        total += num_list[iter]
+    
+    idx = np.arange(total)
+    np.random.shuffle(idx)
+    data = data[idx]
+    label = label[idx]
+    train_num = int(total * 0.8)
+    train_data = data[:train_num, ]
+    test_data = data[train_num:, ]
+    train_label = label[:train_num, ]
+    test_label = label[train_num:, ]
+    np.save(save_path+"data.npy", ((train_data, train_label), (test_data, test_label)))
+
+
+def data_load(path = ""):
+    """load data from path given
+
+    data should follow the format(data, label)
+
+    Args:
+        path(str): the path data stored in
+    
+    Return:
+        tuple: stand for the data and label
+    """
+
+    (train_data, train_label), (test_data, test_label) = np.load(path+"data.npy",allow_pickle=True)
+    return (train_data, train_label), (test_data, test_label)
+
+
+def visualize(data, label=None, dimention = 2, class_num = 1, test_data=np.array([None]), color=False):
+    """draw a scatter
+    
+    if you want to distinguish class with color, parameter color should be True.
+    It will distribute color for each class automatically.
+    The test data will be marked with the label of plus
+
+    Args:
+        data(numpy.ndarray):train dataset
+        label(numpy.ndarray):label for train data
+        class_num(int):the number of clusters, only used when color = True
+        test_data(numpy.ndarray): test dataset
+        color(boolean): True if you want different for each class, otherwise false 
+        dimention(int): the data dimention, should be 1 or2
+    """
+    
+    if color == True:
+        data_x = {}
+        data_y = {}
+        for iter in range(class_num):
+            data_x[iter] = []
+            data_y[iter] = []
+        if dimention == 2:
+            for iter in range(len(label)):
+                data_x[label[iter]].append(data[iter, 0])
+                data_y[label[iter]].append(data[iter, 1])
+        elif dimention == 1:
+            for iter in range(len(label)):
+                data_x[label[iter]].append(data[iter])
+                data_y[label[iter]].append(0)
+        colors = cm.rainbow(np.linspace(0, 1, class_num))
+        for class_idx, c in zip(range(class_num), colors):
+            plt.scatter(data_x[class_idx], data_y[class_idx], color=c)
+        if(test_data.any() != None):
+            if dimention == 2:
+                plt.scatter(test_data[:, 0], test_data[:, 1], marker='+')
+            elif dimention == 1:
+                plt.scatter(test_data, np.zeros(len(test_data)), marker='+')
+    else:
+        if dimention == 2:
+            plt.scatter(data[:, 0], data[:, 1], marker='o')
+        elif dimention == 1:
+            plt.scatter(data, np.zeros(len(data)), marker='o')
+        if(test_data.any() != None):
+            if dimention == 2:
+                plt.scatter(test_data[:, 0], test_data[:, 1], marker='+')
+            elif dimention == 1:
+                plt.scatter(test_data, np.zeros(len(test_data)), marker='+')
+    plt.show()
+
+        
+class Canopy:
+    """Canopy clustering method
+
+    The model will init thereshold automatically according to
+    the dimention of dataset
+    A low accuracy method to get cluster number K for the whole dataset
+
+    Attribute:
+        t1(int): stand for the first thershold
+        t2(int): stand for the second thereshold
+        dimention: dimention of data
+    """
+    def __init__(self):
+        """init the whole model
+        """
+        self.t1 = 0
+        self.t2 = 0
+        self.dimention = 0
+    
+    def get_distance(self, point1, point2, method="Euclidean"):
+        """Compute the distance between two points
+
+        Check the dimention first and will throw an warning if dimention
+        differ. Will use the mininum dimention for compute. Different method will 
+        support in the future.
+
+        Args:
+            point1(numpy.ndarray):One point for compute
+            point2(numpy.ndarray):The other point for compute
+            method(str):The way to compute distance
+        
+        Return:
+            float: distance between two points
+        """
+
+        dis = 0
+        point1 = point1.reshape(-1, 1)
+        point2 = point2.reshape(-1, 1)
+        if method == "Euclidean":
+            for iter in range(self.dimention):
+                dis += (point1[iter]-point2[iter]) ** 2
+            return dis ** 0.5
+    
+    def fit(self, train_data):
+        """train the model
+        
+        Args:
+            train_data(numpy.ndarray): dataset for training
+        
+        Return:
+            list: contain the turple with the format of (center point, [points around])
+        """
+
+        train_data = data_preprocess(train_data)
+        train_num = train_data.shape[0]
+        if len(train_data.shape) == 1:
+            self.dimention = 1
+        else:
+            self.dimention = train_data.shape[1]
+        self.t2 = 2 * self.dimention**0.5
+        self.t1 = 2 * self.t2
+        result = []
+        while len(train_data) >= 0.2 * train_num:
+            idx = random.randint(0, len(train_data) - 1)
+            center = train_data[idx]
+            point_list = []
+            point_need_delete = []
+            train_data = np.delete(train_data, idx, 0)
+            for iter in range(len(train_data)):
+                dis = self.get_distance(train_data[iter], center)
+                if dis < self.t2:
+                    point_need_delete.append(iter)
+                elif dis < self.t1:
+                    point_list.append(train_data[iter])
+            result.append((center, point_list))
+            train_data = np.delete(train_data, point_need_delete, 0)
+        return result
+
+
+class KMeans:
+    """Kmeans Clustering Algorithm
+
+    Attributes:
+        n_clusters(int): number of clusters
+        cluster_center(list): center of each cluster
+        class_point_dict(dict): point dict of each cluster
+        dimention(int): dimention of data
+    """
+
+    def __init__(self, n_clusters):
+        """Inits the clusterer
+
+        Init the points dicts and number of clusters with empty.
+        Init the cluster number with argument
+
+        Args:
+            n_clusters(int): number of clusters
+        """
+
+        self.n_clusters = n_clusters
+        self.cluster_center = []
+        self.class_point_dict = {}
+        self.dimention = 0
+    
+    def fit(self, train_data):
+        """Train the clusterer
+        
+        Get the dimention of data and random select the center of each cluster first.
+        Label each point with the label of cluster center nearest to it.
+        Loop until cluster center don't change
+
+        Args:
+            train_data(numpy.ndarray): training data for this task
+        """
+        
+        train_data = data_preprocess(train_data)
+        train_data_num = train_data.shape[0]
+        if len(train_data.shape) == 1:
+            self.dimention = 1
+        else:
+            self.dimention = train_data.shape[1]
+        for iter in range(self.n_clusters):
+            self.class_point_dict[iter] = []
+        idx = random.sample(range(train_data_num), self.n_clusters)
+        for iter in range(self.n_clusters):
+            self.cluster_center.append(train_data[idx[iter]])
+        for iter in train_data:
+            label = self.get_class(iter)
+            self.class_point_dict[label].append(iter)
+        epoch = 0
+        while not self.update_center() and epoch < 100:
+            for label in range(self.n_clusters):
+                self.class_point_dict[label] = []
+            for iter in train_data:
+                label = self.get_class(iter)
+                self.class_point_dict[label].append(iter)
+            epoch = epoch + 1
+        
+    def predict(self, test_data):
+        """Predict label
+
+        Args:
+            test_data(numpy.ndarray): Data for test
+        
+        Return:
+            numpy.ndarray: each label of test_data
+        """
+
+        test_data = data_preprocess(test_data)
+        result = np.array([])
+        for iter in test_data:
+            label = self.get_class(iter)
+            result = np.r_[result, np.array([label])]
+        return result
+    
+    def get_class(self, point):
+        """Get the point class according to center points
+
+        Args:
+            point(numpy.ndarray): Point for found
+        
+        Return:
+            int: In range(clusters number), stand for the class
+        """
+
+        min_class = 0
+        for iter in range(self.n_clusters):
+            temp = self.get_distance(self.cluster_center[iter], point)
+            if iter == 0:
+                min_dis = temp
+            else:
+                if min_dis > temp:
+                    min_dis = temp
+                    min_class = iter
+        return min_class
+    
+    def get_distance(self, point1, point2, method="Euclidean"):
+        """Compute the distance between two points
+
+        Check the dimention first and will throw an warning if dimention
+        differ. Will use the mininum dimention for compute. Different method will 
+        support in the future.
+
+        Args:
+            point1(numpy.ndarray):One point for compute
+            point2(numpy.ndarray):The other point for compute
+            method(str):The way to compute distance
+        
+        Return:
+            float: distance between two points
+        """
+
+        dis = 0
+        point1 = point1.reshape(-1, 1)
+        point2 = point2.reshape(-1, 1)
+        if method == "Euclidean":
+            for iter in range(self.dimention):
+                dis += (point1[iter]-point2[iter]) ** 2
+            return dis ** 0.5
+    
+    def update_center(self):
+        """use the class_point_dict to update the cluster_center
+
+        Return:
+            boolean:stand for whether center update or not
+        """
+
+        result = True
+        for iter in range(self.n_clusters):
+            temp = np.zeros(self.dimention)
+            for point in self.class_point_dict[iter]:
+                temp = temp + point
+            temp = temp / len(self.class_point_dict[iter])
+            result = result and (temp == self.cluster_center[iter]).all()
+            self.cluster_center[iter] = temp
+        return result
+
+
+class GaussianMixture:
+    """Gaussian mixture model for clustering
+
+    Attributes:
+        n_clusters(int): number of clusters
+        pi(numpy.ndarray): probability for all the clusters
+        cov(dict): covariance matrix for each cluster
+        mean(dict): mean matrix for each cluster
+        gamma(numpy.ndarray): gamma in EM algorithm
+        epsilon(int): lowbound of ending threshold
+        dimention(int): dimention of data
+    """
+
+    def __init__(self, n_clusters):
+        """init the parameter for this model
+
+        Randomly init the probability.
+        Init cluster number with parameter
+
+        Args:
+            n_clusters(int): number of clusters
+        """
+
+        self.pi = np.ones(n_clusters)/n_clusters
+        self.n_clusters = n_clusters
+        self.cov = {}
+        self.mean = {}
+        self.gamma = None
+        self.epsilon = 1e-20
+        self.dimention = 0
+
+    def fit(self, train_data):
+        """Train the model, using EM
+
+        Init the mean and covariance for each class first. The initial covariance matrix
+        for every cluster is unit matrix, and kmeans to init the mean of each class
+        In E step, calculate the probability for every point in every cluster.
+        In M step, update the parameter for every distribution
+
+        Args:
+            train_data(numpy.ndarray): data for train
+        """
+
+        k_model = KMeans(self.n_clusters)
+        k_model.fit(train_data)
+        train_data = data_preprocess(train_data)
+        train_num = train_data.shape[0]
+        if len(train_data.shape) == 1:
+            self.dimention = 1
+        else:
+            self.dimention = train_data.shape[1]
+        for iter in range(self.n_clusters):
+            self.mean[iter] = k_model.cluster_center[iter]
+            self.cov[iter] = np.ones(self.dimention)
+            self.cov[iter] = np.diag(self.cov[iter])
+        self.gamma = np.empty([train_num, self.n_clusters])
+        for i in range(20):
+            #E step
+            for iter in range(train_num):
+                temp = np.array(self.point_probability(train_data[iter]))
+                self.gamma[iter, :] = temp
+            log_h_w = self.gamma
+            self.gamma = self.gamma/self.gamma.sum(axis=1).reshape(-1, 1)
+            #termination condition
+
+            #M step
+            self.pi = np.sum(self.gamma, axis=0)/train_num
+            for label in range(self.n_clusters):
+                mean = np.zeros(self.dimention)
+                cov = np.zeros([self.dimention, self.dimention])
+                for iter in range(train_num):
+                    mean += self.gamma[iter, label] * train_data[iter]
+                    point = train_data[iter].reshape(-1, 1)
+                    label_mean = self.mean[label].reshape(-1, 1)
+                    rest = point - label_mean
+                    cov += self.gamma[iter, label] * np.matmul(rest, rest.T)
+                self.mean[label] = mean/np.sum(self.gamma, axis=0)[label]
+                self.cov[label] = cov/np.sum(self.gamma, axis=0)[label]
+    
+    def predict(self, test_data):
+        """Predict label
+
+        Args:
+            test_data(numpy.ndarray): Data for test
+        
+        Return:
+            numpy.ndarray: each label of test_data
+        """
+
+        test_data = data_preprocess(test_data)
+        edge = max(abs(test_data.max()), abs(test_data.min()))
+        test_data = (test_data*10)/edge
+        result = []
+        for iter in test_data:
+            temp = self.point_probability(iter)
+            label = temp.index(max(temp))
+            result.append(label)
+        return np.array(result)
+
+    def point_probability(self, point):
+        """calculate the probability of every gaussian distribution
+
+        Args:
+            point(numpy.ndarray): point need to be calculated
+        
+        Return:
+            list: probability for every distribution
+        """
+
+        result = []
+        for iter in range(self.n_clusters):
+            result.append(self.calculate(point, iter) * self.pi[iter])
+        return result
+
+    def calculate(self, point, iter):
+        """calculate the probability for the iter-th distribution
+
+        Args:
+            point(numpy.ndarray): the point need to calculate
+            iter(int): the number of distribution
+        
+        Return:
+            float: the probability of the point
+
+        """
+
+        point = point.reshape(-1, 1)
+        mean = self.mean[iter]
+        mean = mean.reshape(-1, 1)
+        cov = self.cov[iter]
+        cov = np.matrix(cov)
+        D = self.dimention
+        coef = 1/((2*math.pi) ** (D/2) * (np.linalg.det(cov))**0.5)
+        pow = -0.5 * np.matmul(np.matmul((point - mean).T, cov.I), (point - mean))
+        result = coef * np.exp(pow) + np.exp(-200)
+        return float(result)
+    
+
+class ClusteringAlgorithm:
+    """Auto cluster
+    
+    Automatically choose k and cluster the data, using Kmeans
+
+    Attribute:
+        K(int): the number of clusters
+        cluster(class): the clusterer
+    """
+
+    def __init__(self):
+        """init the clusterer
+        """
+
+        self.K = 3
+        self.clusterer = None
+    
+    def fit(self, train_data, method="Elbow"):
+        """train the cluster
+        
+        Automatically choos the number of clusters with given method
+        For Elbow, we think if the difference between K-1, K, K+1 satisfies
+        [K-1] -[K] <= 2*([k]- [k+1]), then the graph is smooth enough.
+        will support Canopy in the future
+
+        Args:
+            train_data(numpy.ndarray): the dataset for training
+            method(str): will support Elbow and Canopy
+        """
+
+        if method == "Elbow":
+            train_num = train_data.shape[0]
+            upbound = int((train_num/2)**0.5)+2
+            dis = np.zeros(upbound)
+            for i in range(1, upbound):
+                self.clusterer = KMeans(i)
+                self.clusterer.fit(train_data)
+                label_dict = self.clusterer.class_point_dict
+                center = self.clusterer.cluster_center
+                for iter in range(i):
+                    for point in label_dict[iter]:
+                        dis[i] = dis[i]+self.clusterer.get_distance(point,center[iter])
+            dis[0] = 6 * dis[1]
+            if upbound <= 5:
+                min = 1
+                for i in range(1, upbound):
+                    if dis[i] <= dis[min]:
+                        min = i
+            else:
+                for min in range(1, upbound-1):
+                    if dis[min-1]-dis[min] <= 2 * (dis[min]-dis[min+1]):
+                        break
+            self.clusterer = KMeans(min)
+            self.clusterer.fit(train_data)
+            print("choose {}".format(min))
+        elif method == "Canopy":
+            canopy = Canopy()
+            K = len(canopy.fit(train_data))
+            print("choose {}".format(K))
+            self.clusterer = KMeans(K)
+            self.clusterer.fit(train_data)
+
+    
+    def predict(self, test_data):
+        """predict the test_data
+        
+        Args:
+            test_data(numpy.ndarray): data for test
+        
+        Return:
+            list: label for each point
+        """
+
+        return self.clusterer.predict(test_data)
+
+
+if __name__ == "__main__":
+    mean_list = [(1, 2), (16, -5), (10, 22)]
+    cov_list = [np.array([[73, 0], [0, 22]]), np.array([[21.2, 0], [0, 32.1]]), np.array([[10, 5], [5, 10]])]
+    num_list = [80, 80, 80]
+    save_path = ""
+    data_generate_and_save(3, mean_list, cov_list, num_list, save_path)
+    (train_data, train_label), (test_data, test_label) = data_load()
+    visualize(train_data, dimention=2, class_num=3)
+    visualize(train_data, dimention=2, label=train_label, class_num=3, color = True)
+    # print(train_data)
+    # print(type(train_data))
+    # print(train_data.shape)
+    k = KMeans(3)
+    k.fit(train_data)
+    label1 = k.predict(train_data)
+    visualize(train_data, dimention=2, label=label1, class_num=3, color=True)
+    print(compute_SC(train_data, label1, 3))
+
+    g = GaussianMixture(3)
+    g.fit(train_data)
+    label2 = g.predict(train_data)
+    visualize(train_data, label=label2, dimention=2, class_num=3, color=True)
+    print(compute_SC(train_data, label2, 3))
+
+    # print(compute_SC(train_data, train_label, 3))
+    # k = ClusteringAlgorithm()
+    # k.fit(train_data, method="Elbow")
+    # k.predict(train_data)
+    # e = ClusteringAlgorithm()
+    # e.fit(train_data, method="Canopy")
+
diff --git a/assignment-3/submission/18307130116/tester_demo.py b/assignment-3/submission/18307130116/tester_demo.py
new file mode 100644
index 0000000000000000000000000000000000000000..19ec0e8091691d4aaaa6b53dbb695fde9e826d89
--- /dev/null
+++ b/assignment-3/submission/18307130116/tester_demo.py
@@ -0,0 +1,117 @@
+import numpy as np
+import sys
+
+from source import KMeans, GaussianMixture
+
+
+def shuffle(*datas):
+    data = np.concatenate(datas)
+    label = np.concatenate([
+        np.ones((d.shape[0],), dtype=int)*i
+        for (i, d) in enumerate(datas)
+    ])
+    N = data.shape[0]
+    idx = np.arange(N)
+    np.random.shuffle(idx)
+    data = data[idx]
+    label = label[idx]
+    return data, label
+
+
+def data_1():
+    mean = (1, 2)
+    cov = np.array([[73, 0], [0, 22]])
+    x = np.random.multivariate_normal(mean, cov, (800,))
+
+    mean = (16, -5)
+    cov = np.array([[21.2, 0], [0, 32.1]])
+    y = np.random.multivariate_normal(mean, cov, (200,))
+
+    mean = (10, 22)
+    cov = np.array([[10, 5], [5, 10]])
+    z = np.random.multivariate_normal(mean, cov, (1000,))
+
+    data, _ = shuffle(x, y, z)
+    return (data, data), 3
+
+
+def data_2():
+    train_data = np.array([
+        [23, 12, 173, 2134],
+        [99, -12, -126, -31],
+        [55, -145, -123, -342],
+    ])
+    return (train_data, train_data), 2
+
+
+def data_3():
+    train_data = np.array([
+        [23],
+        [-2999],
+        [-2955],
+    ])
+    return (train_data, train_data), 2
+
+
+def test_with_n_clusters(data_fuction, algorithm_class):
+    (train_data, test_data), n_clusters = data_fuction()
+    model = algorithm_class(n_clusters)
+    model.fit(train_data)
+    res = model.predict(test_data)
+    assert len(
+        res.shape) == 1 and res.shape[0] == test_data.shape[0], "shape of result is wrong"
+    return res
+
+
+def testcase_1_1():
+    test_with_n_clusters(data_1, KMeans)
+    return True
+
+
+def testcase_1_2():
+    res = test_with_n_clusters(data_2, KMeans)
+    return res[0] != res[1] and res[1] == res[2]
+
+
+def testcase_2_1():
+    test_with_n_clusters(data_1, GaussianMixture)
+    return True
+
+
+def testcase_2_2():
+    res = test_with_n_clusters(data_3, GaussianMixture)
+    return res[0] != res[1] and res[1] == res[2]
+
+
+def test_all(err_report=False):
+    testcases = [
+        ["KMeans-1", testcase_1_1, 4],
+        ["KMeans-2", testcase_1_2, 4],
+        # ["KMeans-3", testcase_1_3, 4],
+        # ["KMeans-4", testcase_1_4, 4],
+        # ["KMeans-5", testcase_1_5, 4],
+        ["GMM-1", testcase_2_1, 4],
+        ["GMM-2", testcase_2_2, 4],
+        # ["GMM-3", testcase_2_3, 4],
+        # ["GMM-4", testcase_2_4, 4],
+        # ["GMM-5", testcase_2_5, 4],
+    ]
+    sum_score = sum([case[2] for case in testcases])
+    score = 0
+    for case in testcases:
+        try:
+            res = case[2] if case[1]() else 0
+        except Exception as e:
+            if err_report:
+                print("Error [{}] occurs in {}".format(str(e), case[0]))
+            res = 0
+        score += res
+        print("+ {:14} {}/{}".format(case[0], res, case[2]))
+    print("{:16} {}/{}".format("FINAL SCORE", score, sum_score))
+
+
+if __name__ == "__main__":
+    if len(sys.argv) > 1 and sys.argv[1] == "--report":
+        test_all(True)
+    else:
+        test_all()
+
+在高度重叠的数据下,Canopy方法仍表现出较好的性能,模型对分布鲁棒
+
+紧接着,我们测试模型对分布点数的鲁棒性,全部采取上表中的第一组参数,仅更改各类别点数
+
+| 轮数 | 点数             | 结果 |
+| ---- | ---------------- | ---- |
+| 1    | [80, 1000, 80]   | 1    |
+| 2    | [200, 1000, 10]  | 2    |
+| 3    | [200, 1000, 100] | 3    |
+
+可以看到点数对Canopy的影响较大,考虑到实际实现过程中,采用二八法则忽略少量离群点对于整体聚类数量的影响,在某个类别显著高于其他类时,其他类别的点将会被忽略,且距离越近忽略的可能性越大,这点也符合实际聚类时的特征
+
+进一步的,我们测试多类情况下的K值选取,[a,b]表示多组测试下,K值最终落入区间[a, b]
+
+| 轮数 | 均值                                | 协方差                                                       | 各类别点数      | 真实类别数 | 结果  |
+| ---- | ----------------------------------- | ------------------------------------------------------------ | --------------- | ---------- | ----- |
+| 1    | [(1, -7), (1, -4), (1, 0), (-2, 0)] | 单位阵                                                       | [800, 800, 800] | 4          | [3,4] |
+| 2    | [(1, -7), (1, -4), (1, 0), (-2, 0)] | [[10, 0],[0, 1]]     [[1, 0], [0, 10]]     [[2, 0], [6, 5]]    [[3, 0], [1, 5]] | [800, 800, 800] | 4          | [3,4] |
+
+#### 总结
+
+在这部分中,我们看到,Canopy的鲁棒性对于分布情况较好,在重叠程度较大和较小时都有较好的性能,但是对于各类点数鲁棒性较差,容易忽略一些少量点,这与实现方式相关,也符合预期
\ No newline at end of file
diff --git "a/assignment-3/submission/18307130116/img/GMM\345\237\272\347\241\200\345\256\236\351\252\214\347\273\223\346\236\234.png" "b/assignment-3/submission/18307130116/img/GMM\345\237\272\347\241\200\345\256\236\351\252\214\347\273\223\346\236\234.png"
new file mode 100644
index 0000000000000000000000000000000000000000..c369d0d437a06ee47a31c8aa28fc5b2360e17ea0
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/GMM\345\237\272\347\241\200\345\256\236\351\252\214\347\273\223\346\236\234.png" differ
diff --git "a/assignment-3/submission/18307130116/img/GMM\345\257\271\346\225\260\346\255\243\346\200\201\345\210\206\345\270\203.png" "b/assignment-3/submission/18307130116/img/GMM\345\257\271\346\225\260\346\255\243\346\200\201\345\210\206\345\270\203.png"
new file mode 100644
index 0000000000000000000000000000000000000000..5577014148822a67b7cff48ba5e73e5e5ddcff73
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/GMM\345\257\271\346\225\260\346\255\243\346\200\201\345\210\206\345\270\203.png" differ
diff --git "a/assignment-3/submission/18307130116/img/GMM\346\236\201\347\253\257\345\256\236\351\252\214\347\255\224\346\241\210.png" "b/assignment-3/submission/18307130116/img/GMM\346\236\201\347\253\257\345\256\236\351\252\214\347\255\224\346\241\210.png"
new file mode 100644
index 0000000000000000000000000000000000000000..40088f3cb594956cd7f3e9ccc6705ccf45827db3
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/GMM\346\236\201\347\253\257\345\256\236\351\252\214\347\255\224\346\241\210.png" differ
diff --git "a/assignment-3/submission/18307130116/img/GMM\351\253\230\345\272\246\351\207\215\345\217\240\347\255\224\346\241\210.png" "b/assignment-3/submission/18307130116/img/GMM\351\253\230\345\272\246\351\207\215\345\217\240\347\255\224\346\241\210.png"
new file mode 100644
index 0000000000000000000000000000000000000000..7294753f3d2219aece852691320d7f64611eb510
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/GMM\351\253\230\345\272\246\351\207\215\345\217\240\347\255\224\346\241\210.png" differ
diff --git "a/assignment-3/submission/18307130116/img/Kmeans\345\237\272\347\241\200\345\256\236\351\252\214\347\273\223\346\236\234.png" "b/assignment-3/submission/18307130116/img/Kmeans\345\237\272\347\241\200\345\256\236\351\252\214\347\273\223\346\236\234.png"
new file mode 100644
index 0000000000000000000000000000000000000000..d1aa314ef755065b8965ab3b811c8af5325ee35e
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/Kmeans\345\237\272\347\241\200\345\256\236\351\252\214\347\273\223\346\236\234.png" differ
diff --git "a/assignment-3/submission/18307130116/img/Kmeans\345\257\271\346\225\260\346\255\243\346\200\201\345\210\206\345\270\203\347\255\224\346\241\210.png" "b/assignment-3/submission/18307130116/img/Kmeans\345\257\271\346\225\260\346\255\243\346\200\201\345\210\206\345\270\203\347\255\224\346\241\210.png"
new file mode 100644
index 0000000000000000000000000000000000000000..ecff0a29f27fcd79d94e5f8195b6752aef9fba0b
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/Kmeans\345\257\271\346\225\260\346\255\243\346\200\201\345\210\206\345\270\203\347\255\224\346\241\210.png" differ
diff --git "a/assignment-3/submission/18307130116/img/Kmeans\346\236\201\347\253\257\345\256\236\351\252\214\347\255\224\346\241\210.png" "b/assignment-3/submission/18307130116/img/Kmeans\346\236\201\347\253\257\345\256\236\351\252\214\347\255\224\346\241\210.png"
new file mode 100644
index 0000000000000000000000000000000000000000..dab91ad6746eecb423c695a38749c1bb5fc0de42
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/Kmeans\346\236\201\347\253\257\345\256\236\351\252\214\347\255\224\346\241\210.png" differ
diff --git "a/assignment-3/submission/18307130116/img/Kmeans\351\253\230\345\272\246\351\207\215\345\217\240\347\255\224\346\241\210.png" "b/assignment-3/submission/18307130116/img/Kmeans\351\253\230\345\272\246\351\207\215\345\217\240\347\255\224\346\241\210.png"
new file mode 100644
index 0000000000000000000000000000000000000000..55b8377dc1f11b74d71c0eadd77dd40a5c4df74d
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/Kmeans\351\253\230\345\272\246\351\207\215\345\217\240\347\255\224\346\241\210.png" differ
diff --git a/assignment-3/submission/18307130116/img/elbow.png b/assignment-3/submission/18307130116/img/elbow.png
new file mode 100644
index 0000000000000000000000000000000000000000..f5a2e5e471a9f91630b179c9d9fa2fc478967176
Binary files /dev/null and b/assignment-3/submission/18307130116/img/elbow.png differ
diff --git "a/assignment-3/submission/18307130116/img/\345\237\272\347\241\200\345\256\236\351\252\214\347\255\224\346\241\210.png" "b/assignment-3/submission/18307130116/img/\345\237\272\347\241\200\345\256\236\351\252\214\347\255\224\346\241\210.png"
new file mode 100644
index 0000000000000000000000000000000000000000..79e2ea68d5a4d211a87599c3267bd2e48cf0daa4
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/\345\237\272\347\241\200\345\256\236\351\252\214\347\255\224\346\241\210.png" differ
diff --git "a/assignment-3/submission/18307130116/img/\345\237\272\347\241\200\345\256\236\351\252\214\351\242\230\347\233\256.png" "b/assignment-3/submission/18307130116/img/\345\237\272\347\241\200\345\256\236\351\252\214\351\242\230\347\233\256.png"
new file mode 100644
index 0000000000000000000000000000000000000000..90c9da70c56b6adb825c55a7372aa37693a66dec
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/\345\237\272\347\241\200\345\256\236\351\252\214\351\242\230\347\233\256.png" differ
diff --git "a/assignment-3/submission/18307130116/img/\345\256\214\345\205\250\345\210\206\347\246\273\346\225\260\346\215\256.png" "b/assignment-3/submission/18307130116/img/\345\256\214\345\205\250\345\210\206\347\246\273\346\225\260\346\215\256.png"
new file mode 100644
index 0000000000000000000000000000000000000000..572bf49a5ceaa1b1ce3e2493d81be2553c621e6d
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/\345\256\214\345\205\250\345\210\206\347\246\273\346\225\260\346\215\256.png" differ
diff --git "a/assignment-3/submission/18307130116/img/\345\257\271\346\225\260\346\255\243\346\200\201\345\210\206\345\270\203\347\255\224\346\241\210.png" "b/assignment-3/submission/18307130116/img/\345\257\271\346\225\260\346\255\243\346\200\201\345\210\206\345\270\203\347\255\224\346\241\210.png"
new file mode 100644
index 0000000000000000000000000000000000000000..d5c27c4b1854debe84105d77a636ea0119368383
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/\345\257\271\346\225\260\346\255\243\346\200\201\345\210\206\345\270\203\347\255\224\346\241\210.png" differ
diff --git "a/assignment-3/submission/18307130116/img/\345\257\271\346\225\260\346\255\243\346\200\201\345\210\206\345\270\203\351\242\230\347\233\256.png" "b/assignment-3/submission/18307130116/img/\345\257\271\346\225\260\346\255\243\346\200\201\345\210\206\345\270\203\351\242\230\347\233\256.png"
new file mode 100644
index 0000000000000000000000000000000000000000..1fb0fc17af5f4239734c8572df31d2f4ae68537f
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/\345\257\271\346\225\260\346\255\243\346\200\201\345\210\206\345\270\203\351\242\230\347\233\256.png" differ
diff --git "a/assignment-3/submission/18307130116/img/\346\227\240\347\261\273\345\210\253.png" "b/assignment-3/submission/18307130116/img/\346\227\240\347\261\273\345\210\253.png"
new file mode 100644
index 0000000000000000000000000000000000000000..144f73c866afc1c56cc6d50b5e9c01b342d8da2f
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/\346\227\240\347\261\273\345\210\253.png" differ
diff --git "a/assignment-3/submission/18307130116/img/\346\234\211\347\261\273\345\210\253data.png" "b/assignment-3/submission/18307130116/img/\346\234\211\347\261\273\345\210\253data.png"
new file mode 100644
index 0000000000000000000000000000000000000000..6d97fae31b24e72ecb5f4b19c52faa03636fbdf8
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/\346\234\211\347\261\273\345\210\253data.png" differ
diff --git "a/assignment-3/submission/18307130116/img/\346\236\201\347\253\257\345\256\236\351\252\214\347\255\224\346\241\210.png" "b/assignment-3/submission/18307130116/img/\346\236\201\347\253\257\345\256\236\351\252\214\347\255\224\346\241\210.png"
new file mode 100644
index 0000000000000000000000000000000000000000..dab91ad6746eecb423c695a38749c1bb5fc0de42
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/\346\236\201\347\253\257\345\256\236\351\252\214\347\255\224\346\241\210.png" differ
diff --git "a/assignment-3/submission/18307130116/img/\346\236\201\347\253\257\345\256\236\351\252\214\351\242\230\347\233\256.png" "b/assignment-3/submission/18307130116/img/\346\236\201\347\253\257\345\256\236\351\252\214\351\242\230\347\233\256.png"
new file mode 100644
index 0000000000000000000000000000000000000000..35a9a8754cd5d6d4e8ca3b3d8799f0110dc46f68
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/\346\236\201\347\253\257\345\256\236\351\252\214\351\242\230\347\233\256.png" differ
diff --git "a/assignment-3/submission/18307130116/img/\351\253\230\345\272\246\351\207\215\345\217\240\346\225\260\346\215\256.png" "b/assignment-3/submission/18307130116/img/\351\253\230\345\272\246\351\207\215\345\217\240\346\225\260\346\215\256.png"
new file mode 100644
index 0000000000000000000000000000000000000000..74fafa0211602f3c8c6682127e3196cfa97bde11
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/\351\253\230\345\272\246\351\207\215\345\217\240\346\225\260\346\215\256.png" differ
diff --git "a/assignment-3/submission/18307130116/img/\351\253\230\345\272\246\351\207\215\345\217\240\347\255\224\346\241\210.png" "b/assignment-3/submission/18307130116/img/\351\253\230\345\272\246\351\207\215\345\217\240\347\255\224\346\241\210.png"
new file mode 100644
index 0000000000000000000000000000000000000000..2fe7b8b7aaae54eebe4001233150cba346870577
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/\351\253\230\345\272\246\351\207\215\345\217\240\347\255\224\346\241\210.png" differ
diff --git "a/assignment-3/submission/18307130116/img/\351\253\230\345\272\246\351\207\215\345\217\240\351\242\230\347\233\256.png" "b/assignment-3/submission/18307130116/img/\351\253\230\345\272\246\351\207\215\345\217\240\351\242\230\347\233\256.png"
new file mode 100644
index 0000000000000000000000000000000000000000..a9ff7f42e30d93a44bff57b9facb19111a2b75bc
Binary files /dev/null and "b/assignment-3/submission/18307130116/img/\351\253\230\345\272\246\351\207\215\345\217\240\351\242\230\347\233\256.png" differ
diff --git a/assignment-3/submission/18307130116/source.py b/assignment-3/submission/18307130116/source.py
new file mode 100644
index 0000000000000000000000000000000000000000..5487cd0c73af70ba038bd590a54d5cfa3c8c080a
--- /dev/null
+++ b/assignment-3/submission/18307130116/source.py
@@ -0,0 +1,649 @@
+import numpy as np
+import random
+import matplotlib.pyplot as plt
+import math
+import matplotlib.cm as cm
+from numpy.core.fromnumeric import argmin
+
+def data_preprocess(data):
+    """preprocess the data
+
+    use the range of data to transform the data
+
+    Args:
+        data(numpy.ndarray):raw data
+
+    Return:
+        numpy.ndarray: data after process
+    """
+
+    edge = max(abs(data.max()), abs(data.min()))
+    result_data = (data*10)/edge
+    return result_data
+
+
+def compute_SC(data, label, class_num):
+    """compute the Silhouette Coefficient
+
+    Args:
+        data(numpy.ndarray): data for compute
+        lable(list):label for every point
+        class_num(int): the number of cluster
+    
+    Return:
+        int: the value of Silhouette Coefficient
+    """
+    point_dict = {}
+    data = data_preprocess(data)
+    if len(data.shape) == 1:
+        dimention = 1
+    else:
+        dimention = data.shape[1]
+    for iter in range(class_num):
+        point_dict[iter] = []
+    for iter in range(len(data)):
+        point_dict[label[iter]].append(data[iter])
+    result = 0
+    for iter in range(len(data)):
+        now_point = data[iter]
+        now_point = now_point.reshape(-1, 1)
+        inner_dis = 0
+        now_label = label[iter]
+        for other in point_dict[now_label]:
+            other = other.reshape(-1, 1)
+            temp = 0
+            for i in range(dimention):
+                temp = temp + (now_point[i]-other[i]) ** 2
+            inner_dis = inner_dis + temp**0.5
+        inner_dis = inner_dis / (len(point_dict[now_label]) - 1)
+        out_dis_min = math.inf
+        for label_iter in range(class_num):
+            if label_iter == now_label:
+                continue
+            out_dis = 0
+            for other in point_dict[label_iter]:
+                other = other.reshape(-1, 1)
+                temp = 0
+                for i in range(dimention):
+                    temp = temp + (now_point[i]-other[i]) ** 2
+                out_dis = out_dis + temp**0.5
+            out_dis = out_dis / len(point_dict[label_iter])
+            if out_dis < out_dis_min:
+                out_dis_min = out_dis
+        result = result + (out_dis_min - inner_dis)/max(out_dis_min, inner_dis)
+    result = result / len(data)
+    return result
+
+
+def data_generate_and_save(class_num, mean_list, cov_list, num_list, save_path = "", method = "Gaussian"):
+    """generate data that obey Guassian distribution
+    
+    label will be saved in the meantime.
+
+    Args:
+        class_num(int): the number of class
+        mean_list(list): mean_list[i] stand for the mean of class[i]
+        cov_list(list): similar to mean_list, stand for the covariance
+        num_list(list): similar to mean_list, stand for the number of points in class[i]
+        save_path(str): the data storage path, end with slash.
+        method(str): the distribution data will follow, support gaussian and lognormal
+    """
+    if method == "lognormal":
+        data = np.random.lognormal(mean_list[0], cov_list[0], num_list[0])
+    elif method == "Gaussian":
+        data = np.random.multivariate_normal(mean_list[0], cov_list[0], (num_list[0],))
+    label = np.zeros((num_list[0],),dtype=int)
+    total = num_list[0]
+    
+    for iter in range(1, class_num):
+        if method == "lognormal":
+            temp = np.random.lognormal(mean_list[0], cov_list[0], num_list[0])
+        elif method == "Gaussian":
+            temp = np.random.multivariate_normal(mean_list[0], cov_list[0], (num_list[0],))
+        label_temp = np.ones((num_list[iter],),dtype=int)*iter
+        data = np.concatenate([data, temp])
+        label = np.concatenate([label, label_temp])
+        total += num_list[iter]
+    
+    idx = np.arange(total)
+    np.random.shuffle(idx)
+    data = data[idx]
+    label = label[idx]
+    train_num = int(total * 0.8)
+    train_data = data[:train_num, ]
+    test_data = data[train_num:, ]
+    train_label = label[:train_num, ]
+    test_label = label[train_num:, ]
+    np.save(save_path+"data.npy", ((train_data, train_label), (test_data, test_label)))
+
+
+def data_load(path = ""):
+    """load data from path given
+
+    data should follow the format(data, label)
+
+    Args:
+        path(str): the path data stored in
+    
+    Return:
+        tuple: stand for the data and label
+    """
+
+    (train_data, train_label), (test_data, test_label) = np.load(path+"data.npy",allow_pickle=True)
+    return (train_data, train_label), (test_data, test_label)
+
+
+def visualize(data, label=None, dimention = 2, class_num = 1, test_data=np.array([None]), color=False):
+    """draw a scatter
+    
+    if you want to distinguish class with color, parameter color should be True.
+    It will distribute color for each class automatically.
+    The test data will be marked with the label of plus
+
+    Args:
+        data(numpy.ndarray):train dataset
+        label(numpy.ndarray):label for train data
+        class_num(int):the number of clusters, only used when color = True
+        test_data(numpy.ndarray): test dataset
+        color(boolean): True if you want different for each class, otherwise false 
+        dimention(int): the data dimention, should be 1 or2
+    """
+    
+    if color == True:
+        data_x = {}
+        data_y = {}
+        for iter in range(class_num):
+            data_x[iter] = []
+            data_y[iter] = []
+        if dimention == 2:
+            for iter in range(len(label)):
+                data_x[label[iter]].append(data[iter, 0])
+                data_y[label[iter]].append(data[iter, 1])
+        elif dimention == 1:
+            for iter in range(len(label)):
+                data_x[label[iter]].append(data[iter])
+                data_y[label[iter]].append(0)
+        colors = cm.rainbow(np.linspace(0, 1, class_num))
+        for class_idx, c in zip(range(class_num), colors):
+            plt.scatter(data_x[class_idx], data_y[class_idx], color=c)
+        if(test_data.any() != None):
+            if dimention == 2:
+                plt.scatter(test_data[:, 0], test_data[:, 1], marker='+')
+            elif dimention == 1:
+                plt.scatter(test_data, np.zeros(len(test_data)), marker='+')
+    else:
+        if dimention == 2:
+            plt.scatter(data[:, 0], data[:, 1], marker='o')
+        elif dimention == 1:
+            plt.scatter(data, np.zeros(len(data)), marker='o')
+        if(test_data.any() != None):
+            if dimention == 2:
+                plt.scatter(test_data[:, 0], test_data[:, 1], marker='+')
+            elif dimention == 1:
+                plt.scatter(test_data, np.zeros(len(test_data)), marker='+')
+    plt.show()
+
+        
+class Canopy:
+    """Canopy clustering method
+
+    The model will init thereshold automatically according to
+    the dimention of dataset
+    A low accuracy method to get cluster number K for the whole dataset
+
+    Attribute:
+        t1(int): stand for the first thershold
+        t2(int): stand for the second thereshold
+        dimention: dimention of data
+    """
+    def __init__(self):
+        """init the whole model
+        """
+        self.t1 = 0
+        self.t2 = 0
+        self.dimention = 0
+    
+    def get_distance(self, point1, point2, method="Euclidean"):
+        """Compute the distance between two points
+
+        Check the dimention first and will throw an warning if dimention
+        differ. Will use the mininum dimention for compute. Different method will 
+        support in the future.
+
+        Args:
+            point1(numpy.ndarray):One point for compute
+            point2(numpy.ndarray):The other point for compute
+            method(str):The way to compute distance
+        
+        Return:
+            float: distance between two points
+        """
+
+        dis = 0
+        point1 = point1.reshape(-1, 1)
+        point2 = point2.reshape(-1, 1)
+        if method == "Euclidean":
+            for iter in range(self.dimention):
+                dis += (point1[iter]-point2[iter]) ** 2
+            return dis ** 0.5
+    
+    def fit(self, train_data):
+        """train the model
+        
+        Args:
+            train_data(numpy.ndarray): dataset for training
+        
+        Return:
+            list: contain the turple with the format of (center point, [points around])
+        """
+
+        train_data = data_preprocess(train_data)
+        train_num = train_data.shape[0]
+        if len(train_data.shape) == 1:
+            self.dimention = 1
+        else:
+            self.dimention = train_data.shape[1]
+        self.t2 = 2 * self.dimention**0.5
+        self.t1 = 2 * self.t2
+        result = []
+        while len(train_data) >= 0.2 * train_num:
+            idx = random.randint(0, len(train_data) - 1)
+            center = train_data[idx]
+            point_list = []
+            point_need_delete = []
+            train_data = np.delete(train_data, idx, 0)
+            for iter in range(len(train_data)):
+                dis = self.get_distance(train_data[iter], center)
+                if dis < self.t2:
+                    point_need_delete.append(iter)
+                elif dis < self.t1:
+                    point_list.append(train_data[iter])
+            result.append((center, point_list))
+            train_data = np.delete(train_data, point_need_delete, 0)
+        return result
+
+
+class KMeans:
+    """Kmeans Clustering Algorithm
+
+    Attributes:
+        n_clusters(int): number of clusters
+        cluster_center(list): center of each cluster
+        class_point_dict(dict): point dict of each cluster
+        dimention(int): dimention of data
+    """
+
+    def __init__(self, n_clusters):
+        """Inits the clusterer
+
+        Init the points dicts and number of clusters with empty.
+        Init the cluster number with argument
+
+        Args:
+            n_clusters(int): number of clusters
+        """
+
+        self.n_clusters = n_clusters
+        self.cluster_center = []
+        self.class_point_dict = {}
+        self.dimention = 0
+    
+    def fit(self, train_data):
+        """Train the clusterer
+        
+        Get the dimention of data and random select the center of each cluster first.
+        Label each point with the label of cluster center nearest to it.
+        Loop until cluster center don't change
+
+        Args:
+            train_data(numpy.ndarray): training data for this task
+        """
+        
+        train_data = data_preprocess(train_data)
+        train_data_num = train_data.shape[0]
+        if len(train_data.shape) == 1:
+            self.dimention = 1
+        else:
+            self.dimention = train_data.shape[1]
+        for iter in range(self.n_clusters):
+            self.class_point_dict[iter] = []
+        idx = random.sample(range(train_data_num), self.n_clusters)
+        for iter in range(self.n_clusters):
+            self.cluster_center.append(train_data[idx[iter]])
+        for iter in train_data:
+            label = self.get_class(iter)
+            self.class_point_dict[label].append(iter)
+        epoch = 0
+        while not self.update_center() and epoch < 100:
+            for label in range(self.n_clusters):
+                self.class_point_dict[label] = []
+            for iter in train_data:
+                label = self.get_class(iter)
+                self.class_point_dict[label].append(iter)
+            epoch = epoch + 1
+        
+    def predict(self, test_data):
+        """Predict label
+
+        Args:
+            test_data(numpy.ndarray): Data for test
+        
+        Return:
+            numpy.ndarray: each label of test_data
+        """
+
+        test_data = data_preprocess(test_data)
+        result = np.array([])
+        for iter in test_data:
+            label = self.get_class(iter)
+            result = np.r_[result, np.array([label])]
+        return result
+    
+    def get_class(self, point):
+        """Get the point class according to center points
+
+        Args:
+            point(numpy.ndarray): Point for found
+        
+        Return:
+            int: In range(clusters number), stand for the class
+        """
+
+        min_class = 0
+        for iter in range(self.n_clusters):
+            temp = self.get_distance(self.cluster_center[iter], point)
+            if iter == 0:
+                min_dis = temp
+            else:
+                if min_dis > temp:
+                    min_dis = temp
+                    min_class = iter
+        return min_class
+    
+    def get_distance(self, point1, point2, method="Euclidean"):
+        """Compute the distance between two points
+
+        Check the dimention first and will throw an warning if dimention
+        differ. Will use the mininum dimention for compute. Different method will 
+        support in the future.
+
+        Args:
+            point1(numpy.ndarray):One point for compute
+            point2(numpy.ndarray):The other point for compute
+            method(str):The way to compute distance
+        
+        Return:
+            float: distance between two points
+        """
+
+        dis = 0
+        point1 = point1.reshape(-1, 1)
+        point2 = point2.reshape(-1, 1)
+        if method == "Euclidean":
+            for iter in range(self.dimention):
+                dis += (point1[iter]-point2[iter]) ** 2
+            return dis ** 0.5
+    
+    def update_center(self):
+        """use the class_point_dict to update the cluster_center
+
+        Return:
+            boolean:stand for whether center update or not
+        """
+
+        result = True
+        for iter in range(self.n_clusters):
+            temp = np.zeros(self.dimention)
+            for point in self.class_point_dict[iter]:
+                temp = temp + point
+            temp = temp / len(self.class_point_dict[iter])
+            result = result and (temp == self.cluster_center[iter]).all()
+            self.cluster_center[iter] = temp
+        return result
+
+
+class GaussianMixture:
+    """Gaussian mixture model for clustering
+
+    Attributes:
+        n_clusters(int): number of clusters
+        pi(numpy.ndarray): probability for all the clusters
+        cov(dict): covariance matrix for each cluster
+        mean(dict): mean matrix for each cluster
+        gamma(numpy.ndarray): gamma in EM algorithm
+        epsilon(int): lowbound of ending threshold
+        dimention(int): dimention of data
+    """
+
+    def __init__(self, n_clusters):
+        """init the parameter for this model
+
+        Randomly init the probability.
+        Init cluster number with parameter
+
+        Args:
+            n_clusters(int): number of clusters
+        """
+
+        self.pi = np.ones(n_clusters)/n_clusters
+        self.n_clusters = n_clusters
+        self.cov = {}
+        self.mean = {}
+        self.gamma = None
+        self.epsilon = 1e-20
+        self.dimention = 0
+
+    def fit(self, train_data):
+        """Train the model, using EM
+
+        Init the mean and covariance for each class first. The initial covariance matrix
+        for every cluster is unit matrix, and kmeans to init the mean of each class
+        In E step, calculate the probability for every point in every cluster.
+        In M step, update the parameter for every distribution
+
+        Args:
+            train_data(numpy.ndarray): data for train
+        """
+
+        k_model = KMeans(self.n_clusters)
+        k_model.fit(train_data)
+        train_data = data_preprocess(train_data)
+        train_num = train_data.shape[0]
+        if len(train_data.shape) == 1:
+            self.dimention = 1
+        else:
+            self.dimention = train_data.shape[1]
+        for iter in range(self.n_clusters):
+            self.mean[iter] = k_model.cluster_center[iter]
+            self.cov[iter] = np.ones(self.dimention)
+            self.cov[iter] = np.diag(self.cov[iter])
+        self.gamma = np.empty([train_num, self.n_clusters])
+        for i in range(20):
+            #E step
+            for iter in range(train_num):
+                temp = np.array(self.point_probability(train_data[iter]))
+                self.gamma[iter, :] = temp
+            log_h_w = self.gamma
+            self.gamma = self.gamma/self.gamma.sum(axis=1).reshape(-1, 1)
+            #termination condition
+
+            #M step
+            self.pi = np.sum(self.gamma, axis=0)/train_num
+            for label in range(self.n_clusters):
+                mean = np.zeros(self.dimention)
+                cov = np.zeros([self.dimention, self.dimention])
+                for iter in range(train_num):
+                    mean += self.gamma[iter, label] * train_data[iter]
+                    point = train_data[iter].reshape(-1, 1)
+                    label_mean = self.mean[label].reshape(-1, 1)
+                    rest = point - label_mean
+                    cov += self.gamma[iter, label] * np.matmul(rest, rest.T)
+                self.mean[label] = mean/np.sum(self.gamma, axis=0)[label]
+                self.cov[label] = cov/np.sum(self.gamma, axis=0)[label]
+    
+    def predict(self, test_data):
+        """Predict label
+
+        Args:
+            test_data(numpy.ndarray): Data for test
+        
+        Return:
+            numpy.ndarray: each label of test_data
+        """
+
+        test_data = data_preprocess(test_data)
+        edge = max(abs(test_data.max()), abs(test_data.min()))
+        test_data = (test_data*10)/edge
+        result = []
+        for iter in test_data:
+            temp = self.point_probability(iter)
+            label = temp.index(max(temp))
+            result.append(label)
+        return np.array(result)
+
+    def point_probability(self, point):
+        """calculate the probability of every gaussian distribution
+
+        Args:
+            point(numpy.ndarray): point need to be calculated
+        
+        Return:
+            list: probability for every distribution
+        """
+
+        result = []
+        for iter in range(self.n_clusters):
+            result.append(self.calculate(point, iter) * self.pi[iter])
+        return result
+
+    def calculate(self, point, iter):
+        """calculate the probability for the iter-th distribution
+
+        Args:
+            point(numpy.ndarray): the point need to calculate
+            iter(int): the number of distribution
+        
+        Return:
+            float: the probability of the point
+
+        """
+
+        point = point.reshape(-1, 1)
+        mean = self.mean[iter]
+        mean = mean.reshape(-1, 1)
+        cov = self.cov[iter]
+        cov = np.matrix(cov)
+        D = self.dimention
+        coef = 1/((2*math.pi) ** (D/2) * (np.linalg.det(cov))**0.5)
+        pow = -0.5 * np.matmul(np.matmul((point - mean).T, cov.I), (point - mean))
+        result = coef * np.exp(pow) + np.exp(-200)
+        return float(result)
+    
+
+class ClusteringAlgorithm:
+    """Auto cluster
+    
+    Automatically choose k and cluster the data, using Kmeans
+
+    Attribute:
+        K(int): the number of clusters
+        cluster(class): the clusterer
+    """
+
+    def __init__(self):
+        """init the clusterer
+        """
+
+        self.K = 3
+        self.clusterer = None
+    
+    def fit(self, train_data, method="Elbow"):
+        """train the cluster
+        
+        Automatically choos the number of clusters with given method
+        For Elbow, we think if the difference between K-1, K, K+1 satisfies
+        [K-1] -[K] <= 2*([k]- [k+1]), then the graph is smooth enough.
+        will support Canopy in the future
+
+        Args:
+            train_data(numpy.ndarray): the dataset for training
+            method(str): will support Elbow and Canopy
+        """
+
+        if method == "Elbow":
+            train_num = train_data.shape[0]
+            upbound = int((train_num/2)**0.5)+2
+            dis = np.zeros(upbound)
+            for i in range(1, upbound):
+                self.clusterer = KMeans(i)
+                self.clusterer.fit(train_data)
+                label_dict = self.clusterer.class_point_dict
+                center = self.clusterer.cluster_center
+                for iter in range(i):
+                    for point in label_dict[iter]:
+                        dis[i] = dis[i]+self.clusterer.get_distance(point,center[iter])
+            dis[0] = 6 * dis[1]
+            if upbound <= 5:
+                min = 1
+                for i in range(1, upbound):
+                    if dis[i] <= dis[min]:
+                        min = i
+            else:
+                for min in range(1, upbound-1):
+                    if dis[min-1]-dis[min] <= 2 * (dis[min]-dis[min+1]):
+                        break
+            self.clusterer = KMeans(min)
+            self.clusterer.fit(train_data)
+            print("choose {}".format(min))
+        elif method == "Canopy":
+            canopy = Canopy()
+            K = len(canopy.fit(train_data))
+            print("choose {}".format(K))
+            self.clusterer = KMeans(K)
+            self.clusterer.fit(train_data)
+
+    
+    def predict(self, test_data):
+        """predict the test_data
+        
+        Args:
+            test_data(numpy.ndarray): data for test
+        
+        Return:
+            list: label for each point
+        """
+
+        return self.clusterer.predict(test_data)
+
+
+if __name__ == "__main__":
+    mean_list = [(1, 2), (16, -5), (10, 22)]
+    cov_list = [np.array([[73, 0], [0, 22]]), np.array([[21.2, 0], [0, 32.1]]), np.array([[10, 5], [5, 10]])]
+    num_list = [80, 80, 80]
+    save_path = ""
+    data_generate_and_save(3, mean_list, cov_list, num_list, save_path)
+    (train_data, train_label), (test_data, test_label) = data_load()
+    visualize(train_data, dimention=2, class_num=3)
+    visualize(train_data, dimention=2, label=train_label, class_num=3, color = True)
+    # print(train_data)
+    # print(type(train_data))
+    # print(train_data.shape)
+    k = KMeans(3)
+    k.fit(train_data)
+    label1 = k.predict(train_data)
+    visualize(train_data, dimention=2, label=label1, class_num=3, color=True)
+    print(compute_SC(train_data, label1, 3))
+
+    g = GaussianMixture(3)
+    g.fit(train_data)
+    label2 = g.predict(train_data)
+    visualize(train_data, label=label2, dimention=2, class_num=3, color=True)
+    print(compute_SC(train_data, label2, 3))
+
+    # print(compute_SC(train_data, train_label, 3))
+    # k = ClusteringAlgorithm()
+    # k.fit(train_data, method="Elbow")
+    # k.predict(train_data)
+    # e = ClusteringAlgorithm()
+    # e.fit(train_data, method="Canopy")
+
diff --git a/assignment-3/submission/18307130116/tester_demo.py b/assignment-3/submission/18307130116/tester_demo.py
new file mode 100644
index 0000000000000000000000000000000000000000..19ec0e8091691d4aaaa6b53dbb695fde9e826d89
--- /dev/null
+++ b/assignment-3/submission/18307130116/tester_demo.py
@@ -0,0 +1,117 @@
+import numpy as np
+import sys
+
+from source import KMeans, GaussianMixture
+
+
+def shuffle(*datas):
+    data = np.concatenate(datas)
+    label = np.concatenate([
+        np.ones((d.shape[0],), dtype=int)*i
+        for (i, d) in enumerate(datas)
+    ])
+    N = data.shape[0]
+    idx = np.arange(N)
+    np.random.shuffle(idx)
+    data = data[idx]
+    label = label[idx]
+    return data, label
+
+
+def data_1():
+    mean = (1, 2)
+    cov = np.array([[73, 0], [0, 22]])
+    x = np.random.multivariate_normal(mean, cov, (800,))
+
+    mean = (16, -5)
+    cov = np.array([[21.2, 0], [0, 32.1]])
+    y = np.random.multivariate_normal(mean, cov, (200,))
+
+    mean = (10, 22)
+    cov = np.array([[10, 5], [5, 10]])
+    z = np.random.multivariate_normal(mean, cov, (1000,))
+
+    data, _ = shuffle(x, y, z)
+    return (data, data), 3
+
+
+def data_2():
+    train_data = np.array([
+        [23, 12, 173, 2134],
+        [99, -12, -126, -31],
+        [55, -145, -123, -342],
+    ])
+    return (train_data, train_data), 2
+
+
+def data_3():
+    train_data = np.array([
+        [23],
+        [-2999],
+        [-2955],
+    ])
+    return (train_data, train_data), 2
+
+
+def test_with_n_clusters(data_fuction, algorithm_class):
+    (train_data, test_data), n_clusters = data_fuction()
+    model = algorithm_class(n_clusters)
+    model.fit(train_data)
+    res = model.predict(test_data)
+    assert len(
+        res.shape) == 1 and res.shape[0] == test_data.shape[0], "shape of result is wrong"
+    return res
+
+
+def testcase_1_1():
+    test_with_n_clusters(data_1, KMeans)
+    return True
+
+
+def testcase_1_2():
+    res = test_with_n_clusters(data_2, KMeans)
+    return res[0] != res[1] and res[1] == res[2]
+
+
+def testcase_2_1():
+    test_with_n_clusters(data_1, GaussianMixture)
+    return True
+
+
+def testcase_2_2():
+    res = test_with_n_clusters(data_3, GaussianMixture)
+    return res[0] != res[1] and res[1] == res[2]
+
+
+def test_all(err_report=False):
+    testcases = [
+        ["KMeans-1", testcase_1_1, 4],
+        ["KMeans-2", testcase_1_2, 4],
+        # ["KMeans-3", testcase_1_3, 4],
+        # ["KMeans-4", testcase_1_4, 4],
+        # ["KMeans-5", testcase_1_5, 4],
+        ["GMM-1", testcase_2_1, 4],
+        ["GMM-2", testcase_2_2, 4],
+        # ["GMM-3", testcase_2_3, 4],
+        # ["GMM-4", testcase_2_4, 4],
+        # ["GMM-5", testcase_2_5, 4],
+    ]
+    sum_score = sum([case[2] for case in testcases])
+    score = 0
+    for case in testcases:
+        try:
+            res = case[2] if case[1]() else 0
+        except Exception as e:
+            if err_report:
+                print("Error [{}] occurs in {}".format(str(e), case[0]))
+            res = 0
+        score += res
+        print("+ {:14} {}/{}".format(case[0], res, case[2]))
+    print("{:16} {}/{}".format("FINAL SCORE", score, sum_score))
+
+
+if __name__ == "__main__":
+    if len(sys.argv) > 1 and sys.argv[1] == "--report":
+        test_all(True)
+    else:
+        test_all()