diff --git a/assignment-1/submission/18300110042/README.md b/assignment-1/submission/18300110042/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..869015d3335cf5552420b6755ff47b7cd076e1c0
--- /dev/null
+++ b/assignment-1/submission/18300110042/README.md
@@ -0,0 +1,688 @@
+# 课程报告
+这是`prml-21-spring/assignment-1`的课程报告，我的代码在 [source.py](source.py) 中，[knn_lab.dat](knn_lab.dat)中可以设置每次实验时的参数，包括数据参数和模型参数。数据参数有：分几组，每组有多少个样本，数据的均值及方差；模型参数有：k值，weights，距离计算方法和数据归一化/标准化方法。
+
+## KNN Classifier
+k近邻法是一种监督学习的算法，可以用于分类或回归问题，本次作业中用`python`实现了k近邻的分类器。
+
+算法的输入是训练数据集 $$\\{ (x_1, y_1), (x_2, y_2) ,\cdots ,  (x_N, y_N) \\}, $$ 其中 $x_i \in X$ 是某一样本的特征向量， $y_i \in Y = \\{ c_1, \cdots c_K \\} $ 是该样本的标签；以及某一需要判断的实例 $x.$ 
+
+而输出则是某实例 $x$ 的标签 $y.$
+
+将 $x$ 映射为 $y$ 时，需要
+- 通过某种距离算法，找出训练集中与 $x$ 最近的 $k$ 个样本；
+- 根据某种规则（e.g., 投票或按照距离加权）决定 $x$ 的标签 $y$.
+
+作为一种基于实例的、非参的学习算法，k-NN需要存储整个数据集，并在计算的时候对整个数据集进行迭代。
+对于 $N$ 个样本，每个样本特征维度为 $D$ ，则对于一个目标样本的预测，需要的时间复杂度就是 $O(N*D)$ ，因而在数据量或特征维度较大的时候，k-NN的效率就会偏低。
+
+### KNN类实现
+#### 初始化
+根据`KNN`的性质，本次试验中设计了如下几个初始化参数：
+- `k`，决定k近邻的数量；
+实验中设定 $k\in [1, 50]$ ;
+- `weights` 决定距离在投票中所占的比重；
+交叉验证时的可选项为 $ \\{0, 0.1, 0.2, 0.5, 1, 2 \\}$
+- `norm`，设置数据归一化/标准化的方法；
+- `dist`，设定距离的计算方法；
+可选项为 `Manhattan` 或 `Euclidean`.
+
+#### fit() 函数
+fit() 函数主要进行:
+设置超参数；
+需要交叉验证时，通过10折交叉验证、网格搜索，选取平均准确率最高的超参数组合，如果传入数据小于 $10$ ，则使用 `leave-one-out cross validation`.
+
+#### predict() 函数
+predict() 函数根据已有的数据推断测试数据的标签。
+在 `predict` 的过程中，需要进行距离的计算以及最终的决策，实验中选择的距离的算法，以及确定测试数据标签的规则都在这一步实现。
+
+## 实验探究
+实验探究主要分为两个部分：
+1. 探究数据对于kNN模型的影响；
+2. 探究kNN模型本身的优化方法.
+
+决定kNN模型的有三个基本要素：
+- 距离度量；
+- k值的选择；
+- 分类决策规定；
+
+而进一步，kNN的模型确定下来后，就完全基于训练数据进行预测，因而数据的分布对于模型的表现非常重要，甚至可以说数据进一步规定了模型。
+
+我们首先通过固定这三个基本要素，探究数据对于一个固定的kNN模型的影响；而后再进一步从这三个基本要素开始，探究模型可能的优化方法。
+
+### 1. 数据对于kNN模型的影响
+先固定距离度量方式为 `Euclidean distance`；分类决策规定为投票方法；在每一次实验中固定 `k` 值；通过改变数据的分布进行探究。
+
+#### 初步实验
+1. 生成三组数据
+通过以下分布生成了三组数据，每组400个样本，共1200个：
+
+|           |       $1$       |        $2$        |        $3$        |   
+|   :----:  | :------------:  |  :------------:   |  :------------:   |
+| $\boldsymbol{\\mu}$ | $\begin{bmatrix} 1 & 50 \end{bmatrix}$ | $\begin{bmatrix} 15 & 10 \end{bmatrix}$ | $\begin{bmatrix} 10 & 20 \end{bmatrix}$ |
+| $\boldsymbol{\\Sigma}$ | $\begin{bmatrix} 1 & 0 \\\\ 0 & 10 \end{bmatrix}$ | $\begin{bmatrix} 10 & 15 \\\\ 15 & 40 \end{bmatrix}$ | $\begin{bmatrix} 20 & 0 \\\\ 0 & 30 \end{bmatrix}$ |
+
+这是生成的数据集：
+![总的数据](img/data_plotted_data_1.png "all data (test_1)")
+
+这是训练集：
+![训练集](img/data_plotted_train_data_1.png "training data (test_1)")
+
+这是测试集：
+![测试集](img/data_plotted_test_data_1.png "test data (test_1)")
+
+以下是不同k时的准确率：
+![accs_test_1](img/accs_test_1.png "accs_test_1")
+
+
+2. 修改数据的均值，重新生成三组数据，每组400个，共1200个：
+
+|           |       $1$       |        $2$        |        $3$        |   
+|   :----:  | :------------:  |  :------------:   |  :------------:   |
+| $\boldsymbol{\\mu}$ | $\begin{bmatrix} 1 & 20 \end{bmatrix}$ | $\begin{bmatrix} 2 & 20 \end{bmatrix}$ | $\begin{bmatrix} 2 & 25 \end{bmatrix}$ |
+| $\boldsymbol{\\Sigma}$ | $\begin{bmatrix} 1 & 0 \\\\ 0 & 10 \end{bmatrix}$ | $\begin{bmatrix} 10 & 15 \\\\ 15 & 40 \end{bmatrix}$ | $\begin{bmatrix} 20 & 0 \\\\ 0 & 30 \end{bmatrix}$ |
+
+这是生成的数据集：
+![总的数据](img/data_plotted_data_2.png "all data (test_2)")
+
+这是训练集：
+![训练集](img/data_plotted_train_data_2.png "training data (test_2)")
+
+这是测试集：
+![测试集](img/data_plotted_test_data_2.png "test data (test_2)")
+
+以下是不同k时的准确率：
+![accs_test_2](img/accs_test_2.png "accs_test_2")
+
+3. 修改数据的协方差矩阵（增大数据的方差），重新生成三组数据，每组400个，共1200个：
+
+|           |       1       |        2        |       3        |   
+|   :----:  | :------------:  |  :------------:   |  :------------:   |
+| $\boldsymbol{\\mu}$ | $\begin{bmatrix} 1 & 50 \end{bmatrix}$ | $\begin{bmatrix} 15 & 10 \end{bmatrix}$ | $\begin{bmatrix} 10 & 20 \end{bmatrix}$ |
+| $\boldsymbol{\\Sigma}$ | $\begin{bmatrix} 20 & 0 \\\\ 0 & 40 \end{bmatrix}$ | $\begin{bmatrix} 40 & 15 \\\\ 15 & 80 \end{bmatrix}$ | $\begin{bmatrix} 30 & 0 \\\\ 0 & 50 \end{bmatrix}$ |
+
+这是生成的数据集：
+![总的数据](img/data_plotted_data_3.png "all data (test_3)")
+
+这是训练集：
+![训练集](img/data_plotted_train_data_3.png "training data (test_3)")
+
+这是测试集：
+![测试集](img/data_plotted_test_data_3.png "test data (test_3)")
+
+以下是不同k时的准确率：
+![accs_test_3](img/accs_test_3.png "accs_test_3")
+
+
+4. 生成5组数据，每组240个样本，共1200个：
+
+|           |       $1$       |        $2$        |        $3$        |        $4$        |        $5$        |     
+|   :----:  | :------------:  |  :------------:   |  :------------:   |  :------------:   |  :------------:   |
+| $\boldsymbol{\\mu}$ | $\begin{bmatrix} 1 & 50 \end{bmatrix}$ | $\begin{bmatrix} 15 & 10 \end{bmatrix}$ | $\begin{bmatrix} 10 & 20 \end{bmatrix}$ | $\begin{bmatrix} 25 & 25 \end{bmatrix}$ | $\begin{bmatrix} 40 & 5 \end{bmatrix}$ |
+| $\boldsymbol{\\Sigma}$ | $\begin{bmatrix} 1 & 0 \\\\ 0 & 10 \end{bmatrix}$ | $\begin{bmatrix} 10 & 15 \\\\ 15 & 40 \end{bmatrix}$ | $\begin{bmatrix} 20 & 0 \\\\ 0 & 30 \end{bmatrix}$ | $\begin{bmatrix} 10 & 0 \\\\ 0 & 10 \end{bmatrix}$ | $\begin{bmatrix} 10 & 0 \\\\ 0 & 10 \end{bmatrix}$ |
+
+这是生成的数据集：
+![总的数据](img/data_plotted_data_4.png "all data (test_4)")
+
+这是训练集：
+![训练集](img/data_plotted_train_data_4.png "training data (test_4)")
+
+这是测试集：
+![测试集](img/data_plotted_test_data_4.png "test data (test_4)")
+
+以下是不同k时的准确率：
+![accs_test_4](img/accs_test_4.png "accs_test_4")
+
+5. 用（4）中同样的分布生成数据，每组400个样本，共2000个：
+
+这是生成的数据集：
+![总的数据](img/data_plotted_data_5.png "all data (test_5)")
+
+这是训练集：
+![训练集](img/data_plotted_train_data_5.png "training data (test_5)")
+
+这是测试集：
+![测试集](img/data_plotted_test_data_5.png "test data (test_5)")
+
+以下是不同k时的准确率：
+![accs_test_5](img/accs_test_5.png "accs_test_5")
+
+
+以下是五次实验中准确率和对应的 `k` 的汇总：
+
+|k      |   test_1    |   test_2    |   test_3    |   test_4    |   test_5    |   
+| :---: |   :------:  |   :------:  |   :------:  |   :------:  |   :------:  |
+| 1     |   0.9500    |   0.5792    |   0.8250    |   0.9125    |   0.9325    | 
+| 3     |   0.9583    |   0.6083    |   0.8708    |   0.9083    |   0.9450    | 
+| 5     |   0.9542    |   0.6500    |   0.8583    |   0.9167    |   0.9625    |  
+| 9     |   0.9583    |   0.6792    |   0.8792    |   0.9208    |   0.9625    |  
+| 15    |   0.9583    |   0.6625    |   0.8833    |   0.9333    |   0.9650    |
+| 20    |   0.9500    |   0.6208    |   0.8833    |   0.9333    |   0.9650    |
+
+可以看到，当将均值调得非常接近时，kNN的准确率是最低的，`test_2` 中的准确率非常的低；当方差放大时，准确率也有所下降（`test_3`），但前面表现不好的较大的 `k` 的准确率有所提升。当类别扩展为5类，且分布“距离”较大时，准确率没有明显下降（`test_4`，`test_5`）；`test_4` 中每一类别的数量（`240`）比 `test_5` 中（`400`）少了一些，当每一类别的样本数量变大时，准确率有一定的提升，但这可能和分布的“距离”等别的因素有关。
+
+通过以上观察，我们发现数据中影响kNN准确率的因素可能有两个：
+- 类别的样本数量
+- 分布的“距离”
+
+#### 进一步探究
+- 首先，对于类别中的样本数量问题，我们进行新的实验.
+
+6. 取实验（3）中的样本分布，即：
+
+|           |       $1$       |        $2$        |        $3$        |   
+|   :----:  | :------------:  |  :------------:   |  :------------:   |
+| $\boldsymbol{\\mu}$ | $\begin{bmatrix} 1 & 50 \end{bmatrix}$ | $\begin{bmatrix} 15 & 10 \end{bmatrix}$ | $\begin{bmatrix} 10 & 20 \end{bmatrix}$ |
+| $\boldsymbol{Sigma}$ | $\begin{bmatrix} 20 & 0 \\\\ 0 & 40 \end{bmatrix}$ | $\begin{bmatrix} 40 & 15 \\\\ 15 & 80 \end{bmatrix}$ | $\begin{bmatrix} 30 & 0 \\\\ 0 & 50 \end{bmatrix}$ |
+
+重新进行试验，调整样本个数，得到准确率结果如下：（横轴为每一类的样本个数，各类别数量相等）
+
+|k      |     100     |     300     |     400     |     700     |     1000    |   
+| :---: |   :------:  |   :------:  |   :------:  |   :------:  |   :------:  |
+| 1     |   0.8667    |   0.8556    |   0.8292    |   0.8048    |   0.7733    | 
+| 3     |   0.8500    |   0.8111    |   0.8750    |   0.8190    |   0.7967    | 
+| 5     |   0.9167    |   0.8500    |   0.8542    |   0.8238    |   0.8067    |  
+| 9     |   0.9000    |   0.8611    |   0.8583    |   0.8285    |   0.7983    |  
+| 15    |   0.8833    |   0.8611    |   0.8708    |   0.8238    |   0.8200    |
+| 20    |   0.8833    |   0.8778    |   0.8792    |   0.8357    |   0.8233    |
+
+
+7. 取实验（1）中的样本分布，即：
+
+|           |       $1$       |        $2$        |        $3$        |   
+|   :----:  | :------------:  |  :------------:   |  :------------:   |
+| $\boldsymbol{\\mu}$ | $\begin{bmatrix} 1 & 50 \end{bmatrix}$ | $\begin{bmatrix} 15 & 10 \end{bmatrix}$ | $\begin{bmatrix} 10 & 20 \end{bmatrix}$ |
+| $\boldsymbol{Sigma}$ | $\begin{bmatrix} 1 & 0 \\\\ 0 & 10 \end{bmatrix}$ | $\begin{bmatrix} 10 & 15 \\\\ 15 & 40 \end{bmatrix}$ | $\begin{bmatrix} 20 & 0 \\\\ 0 & 30 \end{bmatrix}$ |
+
+重新进行试验，调整样本个数，得到准确率结果如下：（横轴为每一类的样本个数，各类别数量相等）
+
+|k      |     100     |     300     |     400     |     700     |     1000    |   
+| :---: |   :------:  |   :------:  |   :------:  |   :------:  |   :------:  |
+| 1     |   0.8500    |   0.9222    |   0.9125    |   0.9095    |   0.9050    | 
+| 3     |   0.8833    |   0.9278    |   0.9250    |   0.9214    |   0.9250    | 
+| 5     |   0.8833    |   0.9278    |   0.9333    |   0.9262    |   0.9300    |  
+| 9     |   0.8833    |   0.9278    |   0.9208    |   0.9238    |   0.9400    |  
+| 15    |   0.8667    |   0.9389    |   0.9208    |   0.9286    |   0.9383    |
+| 20    |   0.8667    |   0.9333    |   0.9250    |   0.9286    |   0.9417    |
+
+
+从（6）和（7）中可以看出，每一类别的样本数量在不同分布的情况下有所不同，当几组数据的分布本身“距离”较小时，样本数量的增加没有明显的对于准确率的提升，从某种程度上可以理解为kNN对该任务本身性能不足；而当数据分布本身“距离”较大时，样本数量没有超过一定值的时后，kNN的表现会较弱，类似对数据的欠拟合，而当样本数量超过一定值的时候，其数量的增加对模型的准确率就没有很大的影响了。
+
+
+- 分布的“距离”
+在以上的实验中，我们发现分布的一些参数可以直接影响kNN的准确率。回顾kNN的模型，当模型通过三个基本要素确立了以后，其决策边界就由训练数据集直接决定，在实验中，训练集和测试集来自相同的总体，准确率就与数据的分布密切相关。
+
+用直觉判断，我们可以很容易的理解这个现象————均值接近时，不同组的样本更容易混杂在一起，模型很难根据最近的点做出准确的预测；而当方差放大时，不同均值的数据也更容易接近，导致准确率下降。进一步，则是分布之间的“距离”从某种程度上决定了kNN的准确率————对于直观上较为“接近”的分布，kNN产生的样本分类效果较差；而对于“距离”较远的分布，kNN的分类效果较好。
+
+那我们可以猜想，如果掌握了关于不同组数据的分布的信息，是否就可以直接对kNN的准确率进行预测？
+
+在本次试验中，训练数据与测试数据来自同样的服从高斯分布的总体，定义该总体的参数就是 $\mu$ 和 $\Sigma$ 。实验中，这两个参数对于准确率都有一定的影响，那么我们是否有可能把这两者结合起来考察？或者通过这两个参数获得某种关于分布之间“距离”的度量？
+
+对于不同分布之间的“距离”，有很多计算方法，但限于知识水平，这里只尝试了三种：
+1. KL散度（Kullback-Leibler divergence）;
+2. 最大均值差异（Maximum Mean Discrepancy, MMD）；
+3. Wasserstein Distance.
+
+因为这些方法都可以用来衡量两个分布之间的距离，为了简化问题，我们在以下实验中随机产生两组数量相同、服从不同的高斯分布的二维数据，分别通过以上几种度量方法计算分布之间的距离，而后观察其与kNN算法准确率之间的关系。
+
+`随机` 指的是 $\mu_i \sim U [ -50, 50 ], i\in \\{ 1, 2 \\}$ ；且 $\Sigma_{tr_{i}} \sim U [ 0, 100 ], i\in  \\{ 1, 2 \\}$ ，其中 $tr_{i}$ 指协方差矩阵的对角元素；而对于非对角元素，我们采取设置为 `0` 或 $\Sigma_{\tilde{tr}} \sim U [-\sqrt{\Sigma_{tr_1}\times\Sigma_{tr_2}}, \sqrt{\Sigma_{tr_1}\times\Sigma_{tr_2}} ]$ 的方法————因为要确保生成的是半正定矩阵，这里简单的采用对称阵，即两个非对角元素相等。
+
+
+##### 1. KL Divergence
+KL散度从信息论的角度，衡量两个分布之间的信息差距，对于连续随机变量 $x$，以及两个概率分布 $p ( x )$ 和 $q ( x )$，它们之间的KL散度为： $$\begin{equation}
+\begin{aligned}
+D_{KL} ( p ( x ) || q ( x ) )  & = E(\ln{p ( x ) - \ln{q ( x )}})
+\\\\ & = \int_{-\infty}^{\infty} {p ( x ) \ln{\frac{p ( x ) }{q ( x ) } } dx}
+\end{aligned}
+\end{equation}$$.
+
+需要注意的是，KL散度衡量的是 $q ( x )$ 对于 $p ( x )$ 的信息损失，与 $p ( x)$ 对于 $q ( x )$ 的不同，具有不对称性，与“距离”不同。
+
+一般统计时常使用离散的数据点作为随机变量 $x$，而在计算中通过求和近似，但是因为实验中可以得到高斯分布的参数 $\mu$ 和 $\Sigma$ ，（而且可以使用`numpy`直接进行计算，计算量也较小），所以这里尝试直接计算两个分布之间的KL散度。
+
+记两个高斯分布的总体分别为 $N\mathbf{(\boldsymbol{\mu_1}, \Sigma_1)}$，$N\mathbf{(\boldsymbol{\mu_2}, \Sigma_2)}$，则有 $$p(\mathbf{x})=\frac{1}{(2 \pi)^{k / 2}|\Sigma_1|^{1 / 2}} \exp (-\frac{1}{2}(\mathbf{x}-\boldsymbol{\mu_1})^{T} \Sigma_1^{-1}(\mathbf{x}-\boldsymbol{\mu_1}))$$, $$q(\mathbf{x})=\frac{1}{(2 \pi)^{k / 2}|\Sigma_2|^{1 / 2}} \exp (-\frac{1}{2}(\mathbf{x}-\boldsymbol{\mu_2})^{T} \Sigma^{-1}(\mathbf{x}-\boldsymbol{\mu_2}))$$.
+
+可以得到 $$D_{K L}(p \| q)=\frac{1}{2}[\log \frac{|\Sigma_2|}{|\Sigma_1|}-k+(\mu_1-\mu_2)^{T} \Sigma_2^{-1}(\mu_1-\mu_2)+tr\\{\Sigma_2^{-1} \Sigma_1\\}]$$，其中 $k$ 是样本 $x$ 的特征维度。（具体的推导过程参考 [这里](https://mr-easy.github.io/2020-04-16-kl-divergence-between-2-gaussian-distributions/ "KL Divergence between 2 Gaussian Distributions")）
+
+每次生成 `2000` 个样本（为了保持平衡，每组样本个数相同，即每组 `1000` 个），通过 `100` 次迭代画图，我们最终得到的KL散度与kNN预测的准确率关系如下：
+- 当随机生成的协方差矩阵为对角阵时，
+
+k=1时，
+![KL散度与准确率的关系（k=1）](img/accs_kldiv_1nns_diagonal_large.png "k=1时KL散度与准确率的关系")
+
+k=3时，
+![KL散度与准确率的关系（k=3）](img/accs_kldiv_3nns_diagonal_large.png "k=3时KL散度与准确率的关系")
+
+k=5时，
+![KL散度与准确率的关系（k=5）](img/accs_kldiv_5nns_diagonal_large.png "k=5时KL散度与准确率的关系")
+
+k=10时，
+![KL散度与准确率的关系（k=10）](img/accs_kldiv_10nns_diagonal_large.png "k=10时KL散度与准确率的关系")
+
+k=20时，
+![KL散度与准确率的关系（k=20）](img/accs_kldiv_20nns_diagonal_large.png "k=20时KL散度与准确率的关系")
+
+k=50时，
+![KL散度与准确率的关系（k=50）](img/accs_kldiv_50nns_diagonal_large.png "k=50时KL散度与准确率的关系")
+
+
+- 当随机生成的协方差矩阵不是对角阵时，
+
+k=1时，
+![KL散度与准确率的关系（k=1）](img/accs_kldiv_1nns_random.png "k=1时KL散度与准确率的关系")
+
+k=3时，
+![KL散度与准确率的关系（k=3）](img/accs_kldiv_3nns_random.png "k=3时KL散度与准确率的关系")
+
+k=5时，
+![KL散度与准确率的关系（k=5）](img/accs_kldiv_5nns_random.png "k=5时KL散度与准确率的关系")
+
+k=10时，
+![KL散度与准确率的关系（k=10）](img/accs_kldiv_10nns_random.png "k=10时KL散度与准确率的关系")
+
+k=20时，
+![KL散度与准确率的关系（k=20）](img/accs_kldiv_20nns_random.png "k=20时KL散度与准确率的关系")
+
+k=50时，
+![KL散度与准确率的关系（k=50）](img/accs_kldiv_50nns_random.png "k=50时KL散度与准确率的关系")
+
+可以观察到，当KL散度超过一定的值以后，kNN的准确率基本都维持在较高的水平，有一定的相关性，但当KL散度较小时，波动很大。
+
+- 改变 `随机` 中均值和方差选取的范围，观察KL散度较小时与kNN准确率之间的关系。
+
+使得$\mu_i \sim U [ -10, 10 ], i\in  \\{ 1, 2 \\}$ ；且 $\Sigma_{tr_{i}} \sim U [ 0, 50 ], i\in \\{ 1, 2 \\}$ ，其中 $tr_{i}$ 指协方差矩阵的对角元素；为了简化实验，对于非对角元素，我们直接设置为 `0`. 为了限制KL散度，剔除掉KL散度大于 `300` 的分布。
+
+k=1时，
+![KL散度与准确率的关系（k=1）](img/accs_kldiv_1nns_diagonal.png "k=1时KL散度与准确率的关系")
+
+k=3时，
+![KL散度与准确率的关系（k=3）](img/accs_kldiv_3nns_diagonal.png "k=3时KL散度与准确率的关系")
+
+k=5时，
+![KL散度与准确率的关系（k=5）](img/accs_kldiv_5nns_diagonal.png "k=5时KL散度与准确率的关系")
+
+k=10时，
+![KL散度与准确率的关系（k=10）](img/accs_kldiv_10nns_diagonal.png "k=10时KL散度与准确率的关系")
+
+k=20时，
+![KL散度与准确率的关系（k=20）](img/accs_kldiv_20nns_diagonal.png "k=20时KL散度与准确率的关系")
+
+k=50时，
+![KL散度与准确率的关系（k=50）](img/accs_kldiv_50nns_diagonal.png "k=50时KL散度与准确率的关系")
+
+仍然可以观察到有一定的相关性，特别是当 `k` 较大时，较为明显。但在更小的范围内，准确率的波动较大。
+
+#### 2. Maximum Mean Discrepancy
+
+接下来的 `2` 和 `3` 由于能力有限，对其概念了解的不是很清楚，代码也调用了除`numpy`和`matplotlib`之外的包：`pytorch`、`scipy`。这里都使用已有的样本计算“距离”，而没有通过参数计算。
+
+`Maximum Mean Discrepancy` 也可以用来度量两个分布之间的距离。将数据分布投射到更高维度，将两个分布差距最大的k阶矩作为度量距离的标准。
+
+令$\mu_i \sim U [ -10, 10 ], i\in  \\{ 1, 2 \\}$ ；且 $\Sigma_{tr_{i}} \sim U [ 0, 50 ], i\in \\{ 1, 2 \\}$ ，其中 $tr_{i}$ 指协方差矩阵的对角元素；为了简化实验，对于非对角元素，我们直接设置为 `0`. 在这里我们迭代 `20` 次，得到更为清晰的图片。
+
+这里是`k`分别等于 `2, 10, 20, 50` 时的图：
+![Maximum Mean Discrepancy](img/MMD.png "Maximum Mean Discrepancy")
+
+
+#### 3. Wasserstein Distance
+`Wasserstein distance` 衡量的是把数据从分布“移动成”另一个分布时所需要移动的平均距离的最小值。相比 `KL Divergence` ，具有对称性，也可以描述如何从一个分布转化为另一个分布。
+
+使得$\mu_i \sim U [ -10, 10 ], i\in  \\{ 1, 2 \\}$ ；且 $\Sigma_{tr_{i}} \sim U [ 0, 50 ], i\in \\{ 1, 2 \\}$ ，其中 $tr_{i}$ 指协方差矩阵的对角元素；为了简化实验，对于非对角元素，我们直接设置为 `0`. 在这里我们迭代 `20` 次，得到更为清晰的图片。
+
+k=1时，
+![Wasserstein Distance (k=1) ](img/dist1.png "k=1时Wasserstein Distance")
+
+k=3时，
+![Wasserstein Distance (k=3) ](img/dist3.png "k=3时Wasserstein Distance")
+
+k=5时，
+![Wasserstein Distance (k=5) ](img/dist5.png "k=5时Wasserstein Distance")
+
+k=10时，
+![Wasserstein Distance (k=10) ](img/dist10.png "k=10时Wasserstein Distance")
+
+k=20时，
+![Wasserstein Distance (k=20) ](img/dist20.png "k=20时Wasserstein Distance")
+
+k=50时，
+![Wasserstein Distance (k=50) ](img/dist50.png "k=50时Wasserstein Distance")
+
+仍然可以观察到有一定的相关性，特别是当 `k` 较大时，较为明显。但在更小的范围内，准确率的波动较大。
+
+
+##### 小结
+当kNN的模型确定后，训练数据集直接决定了模型的表现。样本的数量和数据的分布相结合，会对结果产生一定的影响。
+
+- 当数据分布的“距离”较远时，样本数量不足会降低模型的表现；但当样本数量超过一定值的时候，模型的表现就不会再提升，反而因为模型需要遍历所有的样本导致时间复杂度较高，效率较低；而当数据分布的“距离”较近时，模型本身的能力有限，样本数量也不会对模型产生明显的影响。从这一点看，在优化kNN时，考虑时间复杂度为 $O(N*D)$ ，一方面可以考虑通过一些算法（如 `Fisher`）将数据降维（减小 `D`）；另一方面也可以控制样本数量，选择较具有代表性的样本（减小 `N`）；或使用 `KD-tree` 提高搜索效率。从这些方面可以帮助提高kNN的性能。
+
+- 数据的分布“距离”直接影响了kNN算法所能达到的上限，通过数据分布的参数直接计算的“距离”与kNN的表现之间有一定的相关性，当“距离”超过一定值的时候，kNN表现较为稳定；但当“距离”限于一定范围内时，波动很大。通过训练数据拟合的“距离”在预测kNN的表现时表现的更为可靠。一方面可能有度量“距离”方法的问题，另一方面也可以说明数据本身，比起分布，更为直接的对kNN造成了影响。
+
+- 这里我们产生的样本是服从二元高斯分布的，或许可以进一步探究，（1）对于两类不同分布的样本来说，分布的“距离”与kNN的准确率是否仍有直接的关系；（2）如果是服从别的分布，如均匀分布、对数正态分布、伽马分布等的样本，分布间的“距离”是否仍然与kNN的准确率有一定联系，或者kNN对于这类数据并不适用。
+
+
+
+### 2. kNN模型的优化
+决定kNN模型的三个基本要素为：（1）距离度量；（2）`k` 值的选择；（3）分类决策规定。
+
+以下从这三个方面进行探究。
+
+#### 1）距离计算
+kNN进行预测，选择k个最近邻时首先需要的就是衡量距离的方法。
+
+- 距离计算方法
+一般kNN都使用欧氏距离进行计算，这里分别尝试了曼哈顿距离和欧氏距离，他们都属于 `Minkowski Distance` 的一种。
+
+对于两个向量 $\boldsymbol{X}$ 和 $\boldsymbol{Y}$，`Minkowski Distance` 的计算公式为：$$\sqrt[p]{\sum_{i=1}^{n} { (x_{i} - y_{i})^{p}}} $$，其中 $n$ 为 $\boldsymbol{X}$ 和 $\boldsymbol{Y}$ 的维度。当 $p = 1$ 时，计算的就是曼哈顿距离；当 $p = 2$ 时，计算的是欧氏距离。
+
+- 曼哈顿距离
+曼哈顿距离，也称 `L1-distance` ，与欧氏距离不同，曼哈顿距离的度量受坐标轴的影响，或者可以说是欧氏距离在坐标轴上的投影之和。
+
+曼哈顿距离的计算公式即为：$$\textrm{Dist}(\boldsymbol{X}, \boldsymbol{Y}) =  \sum_{i=1}^{n} {| x_{i}-y_{i}| }$$ ，其中 $n$ 为 $\boldsymbol{X}$ 和 $\boldsymbol{Y}$ 的维度。
+
+- 欧氏距离
+欧氏距离，即 `L2-distance` ，直接衡量两个点在空间中的距离。
+
+计算公式为：$$\textrm{Dist} (\boldsymbol{X}, \boldsymbol{Y}) =  \sqrt{\sum_{i=1}^{n} { (x_{i}-y_{i})^2}}$$ ，其中 $n$ 为 $\boldsymbol{X}$ 和 $\boldsymbol{Y}$ 的维度。
+
+鉴于我们的数据都服从二元高斯分布（维度为2），猜想这两种距离计算的方法不会对结果产生很大的影响。
+
+在测试的过程中，随机生成了12组数据，每组数据一共有 `1200` 个样本，其中 `80%` 是训练集，`20%` 是测试集；
+随机生成的12组数据中，类别数量不同，每个类别的样本数量也不同，具体值如下：
+
+| 类别数量 |  1   |   2   |   3   |   4   |
+|  :---:  | :----------:  | :----------:  | :----------:  | :----------:  |
+|   3     | $[400, 400, 400] $ | $[600, 400, 200]$ | $[800, 200, 200]$ | $[900, 200, 100]$ |
+|   5     | $[240, 240, 240, 240, 240] $ | $[400, 300, 250, 125, 125]$ | $[500, 300, 200, 100, 100]$ | $[600, 400, 100, 70, 30]$ |
+|   7     | $[170, 170, 170, 170, 170, 175, 175] $ | $[300, 200, 200, 150, 150, 150, 50]$ | $[400, 300, 200, 100, 100, 80, 20]$ | $[500, 400, 100, 100, 70, 20, 10]$ |
+
+在固定 `k` 的情况下观察准确率的变化，$k \in [1,19]$。
+
+这里的 `随机` 指的是 $\mu_i \sim U[ -50, 50 ], i\in \\{ 1, 2 \\}$ ；且 $\Sigma_{tr_{i}} \sim U [ 0, 100 ], i\in  \\{ 1, 2 \\}$ ，其中 $tr_{i}$ 指协方差矩阵的对角元素；而对于非对角元素，$\Sigma_{\tilde{tr}} \sim U [-\sqrt{\Sigma_{tr_1}\times\Sigma_{tr_2}}, \sqrt{\Sigma_{tr_1}\times\Sigma_{tr_2}} ]$ 。
+
+生成的数据如下：
+![生成的所有数据](img/data_batch_plotted_all_dist.png "all data (testing distances)")
+
+对于每个 `k` 时的准确率取均值，发现 `距离计算方法` 不同时，准确率的均值没有明显变化，结果如下：
+
+|  k   |  Manhattan Distance  | Euclidean Distance|
+| :---:| :----------: | :----------: |
+|  1   |  0.9368      |  0.9365      |
+|  3   |  0.9462      |  0.9462      |
+|  5   |  0.9493      |  0.9493      |
+|  9   |  0.9545      |  0.9524      |
+|  13  |  0.9517      |  0.9524      |
+|  17  |  0.9538      |  0.9545      |
+|  19  |  0.9520      |  0.9542      |
+
+
+接下来都使用 `Euclidean Distance` 进行计算。
+
+
+- 归一化/标准化
+
+由于kNN的决策完全依赖数据间的距离，而我们使用的计算距离的方法会均等的考虑空间中各维度的距离。如果数据的某一维度的绝对值都偏大，那么这一维度上的“距离”所占的比重自然就会比其他维度的大很多，造成评判标准的不合理。因而在面对不同的数据时，常常会使用归一化或标准化的方法。
+
+实验中产生的数据都服从高斯分布，比较规则，可能这项操作对于实验结果不会有很大的改善。
+
+我们选取的归一化/标准化的方法主要有如下两个：
+1. `Min-max normalization`
+记 $\boldsymbol{x} =  [ x_1,x_2,\cdots ,x_n] , i \in \\{ 1, \cdots, n \\}$ ，
+`Min-max normalization` 的计算公式为：$$\hat{x_{i}} = \frac{x_{i} - \min{\boldsymbol{x}}}{\max{\boldsymbol{x}} - \max{\boldsymbol{x}}}$$ 
+这样做可以将数据压缩到 $[0, 1]$ 之间，但会改变原有的数据分布。
+
+2. `Standardization`
+记 $\boldsymbol{x} = [ x_1,x_2,\cdots ,x_n] , i \in \\{ 1, \cdots, n \\}$ ，
+`Standardization` 的计算公式为：$$\hat{x_{i}} = \frac{x_{i} - \textrm{mean}{\boldsymbol{ (x ) }}}{\textrm{std}(x)}$$ 可以将数据映射到 `N ( 0,1 )` 上，相比`Min-max normalization`，可以更好的保留原有的数据分布。
+
+采取与之前同样的方法，随机生成12组数据，观察准确率.
+
+生成的数据如下：
+![生成的所有数据](img/data_batch_plotted_all_norm.png "all data (testing normalization)")
+
+得到的平均结果如下：
+
+|  k    |  No Normalization | Min-Max Normalization | Standardization |
+| :---: | :----------: | :----------: | :----------: |
+|  1    |  0.9188      |  0.9163      |  0.9146      |
+|  3    |  0.9191      |  0.9170      |  0.9208      |
+|  5    |  0.9215      |  0.9198      |  0.9187      |
+|  9    |  0.9267      |  0.9260      |  0.9247      |
+|  13   |  0.9281      |  0.9271      |  0.9271      |
+|  17   |  0.9250      |  0.9243      |  0.9243      |
+|  20   |  0.9264      |  0.9240      |  0.9236      |
+
+在平均的情况下是否进行预处理对模型的表现影响较小。
+
+
+尝试比较极端的情况————两个维度之间的方差相差较大：
+
+生成三组数据，每组400个，共1200个：
+
+|           |       $1$       |        $2$        |        $3$        |   
+|   :----:  | :------------:  |  :------------:   |  :------------:   |
+| $\boldsymbol{\\mu}$ | $\begin{bmatrix} 1 & 40 \end{bmatrix}$ | $\begin{bmatrix} 5 & 30 \end{bmatrix}$ | $\begin{bmatrix} 10 & 20 \end{bmatrix}$ |
+| $\boldsymbol{\\Sigma}$ | $\begin{bmatrix} 1 & 0 \\\\ 0 & 1500 \end{bmatrix}$ | $\begin{bmatrix} 2 & 0 \\\\ 0 & 1000 \end{bmatrix}$ | $\begin{bmatrix} 5 & 0 \\\\ 0 & 500 \end{bmatrix}$ |
+
+这是生成的数据集：
+![总的数据](img/data_plotted_data_big_var.png "all data")
+
+这是训练集：
+![训练集](img/data_plotted_train_data_big_var.png "training data")
+
+这是测试集：
+![测试集](img/data_plotted_test_data_big_var.png "test data")
+
+
+得到的准确率如下：
+
+|  k    |  No Normalization | Min-Max Normalization | Standardization |
+| :---: | :----------: | :----------: | :----------: |
+|  1    |  0.8542      |  0.8875      |  0.8667      |
+|  3    |  0.8833      |  0.9042      |  0.9000      |
+|  5    |  0.8833      |  0.9083      |  **0.9250**      |
+|  9    |  0.8792      |  0.9042      |  0.9125      |
+|  13   |  0.8792      |  0.9042      |  0.9000      |
+|  17   |  0.8667      |  0.9083      |  0.9042      |
+|  20   |  0.8667      |  0.9083      |  0.8958      |
+
+在两个维度间方差较大的情况下，可以看到归一化/标准化对准确率的提升是有一定帮助的，而标准化产生的最佳结果较好。
+
+
+#### 2）k值的选择
+当其他值确定，且训练数据不变时，`k` 的选择决定了模型的决策边界。当我们在对 `k` 进行优化时，实际上在针对给定数据选取最合适的决策边界，即 `k`.
+
+通过几次实验尝试画出了k不同时的决策边界。
+
+1. 两类样本的情况
+通过以下参数生成了两组数据（每组数据为 `100` 个）：
+
+|           |       $1$       |        $2$        |
+|   ----    | :------------:  | :------------:    |
+| $\boldsymbol{\\mu}$ | $\begin{bmatrix} 1 & 30 \end{bmatrix}$ | $\begin{bmatrix} 2 & 30 \end{bmatrix}$ |
+| $\boldsymbol{\\Sigma}$ | $\begin{bmatrix} 1 & 0 \\\\ 0 & 10 \end{bmatrix}$ | $\begin{bmatrix} 1 & 0 \\\\ 0 & 10 \end{bmatrix}$ |  
+
+![这是画出的决策边界](img/boundry_2clusters.png "decision boundary (2 classes)")
+
+准确率如下：
+
+|  k    |  Accuracy |
+| :---: | :---------: |
+|  1    |  0.675      |
+|  **3**    |  0.725      |
+|  5    |  0.675      |
+|  9    |  0.550      |
+|  11   |  0.625      |
+|  13   |  0.625      |
+|  15   |  0.600      |
+|  17   |  0.550      |
+|  19   |  0.500      |
+
+
+2. 三类样本的情况
+通过以下参数生成了三组数据：
+
+|           |       $1$       |        $2$        |        $3$        |   
+|   :----:  | :------------:  |  :------------:   |  :------------:   |
+| $\boldsymbol{\\mu}$ | $\begin{bmatrix} 1 & 20 \end{bmatrix}$ | $\begin{bmatrix} 5 & 20 \end{bmatrix}$ | $\begin{bmatrix} 15 & 15 \end{bmatrix}$ |
+| $\boldsymbol{Sigma}$ | $\begin{bmatrix} 1 & 0 \\\\ 0 & 10 \end{bmatrix}$ | $\begin{bmatrix} 10 & 15 \\\\ 15 & 40 \end{bmatrix}$ | $\begin{bmatrix} 20 & 0 \\\\ 0 & 30 \end{bmatrix}$ |
+
+![这是画出的决策边界](img/boundry_3clusters.png "decision boundary (3 classes)")
+
+准确率如下：
+
+|  k    |  Accuracy |
+| :---: | :---------: |
+|  1    |  0.900      |
+|  3    |  0.883      |
+|  **5**    |  0.917      |
+|  9    |  0.900      |
+|  11   |  0.900      |
+|  13   |  0.900      |
+|  15   |  0.833      |
+|  17   |  0.900      |
+|  19   |  0.900      |
+
+
+3. 五类样本的情况
+通过以下参数生成了五组数据：
+
+|           |       $1$       |        $2$        |        $3$        |        $4$        |        $5$       |   
+|   :----:  | :------------:  |  :------------:   |  :------------:   |  :------------:   |   :------------: |
+| $\boldsymbol{\\mu}$ | $\begin{bmatrix} 1 & 20 \end{bmatrix}$ | $\begin{bmatrix} 5 & 20 \end{bmatrix}$ | $\begin{bmatrix} 20 & 30 \end{bmatrix}$ | $\begin{bmatrix} 30 & 25 \end{bmatrix}$ | $\begin{bmatrix} 25 & 25 \end{bmatrix}$ |
+| $\boldsymbol{\Sigma}$ | $\begin{bmatrix} 1 & 0 \\\\ 0 & 10 \end{bmatrix}$ | $\begin{bmatrix} 10 & 15 \\\\ 15 & 40 \end{bmatrix}$ | $\begin{bmatrix} 20 & 0 \\\\ 0 & 30 \end{bmatrix}$ | $\begin{bmatrix} 10 & 0 \\\\ 0 & 30 \end{bmatrix}$ | $\begin{bmatrix} 2 & 5 \\\\ 5 & 50 \end{bmatrix}$ |
+
+![这是画出的决策边界](img/boundry_5clusters.png "decision boundary (5 classes)")
+
+
+准确率如下：
+
+|  k    |  Accuracy |
+| :---: | :---------: |
+|  1    |  0.760      |
+|  3    |  0.810      |
+|  5    |  0.830      |
+|  9    |  0.860      |
+|  11   |  0.860      |
+|  **13**   |  0.880      |
+|  **15**   |  0.880      |
+|  17   |  0.870      |
+|  19   |  0.860      |
+
+kNN的决策边界是非线性的，k较小时，决策边界较为陡峭，模型复杂度较高；随着k的增大，决策边界趋向平缓，模型复杂度降低，性能也可能随之下降。
+
+
+#### 3）分类决策规定
+我们在进行之前的实验时，都是运用 `投票` 方法进行决策；而直觉上，我们也可能想到距离更近的点所投的“票”应该更为重要。因而这里尝试改变分类决策的规定，根据距离进行加权运算。
+
+选择的根据距离进行加权的公式为：$$w (x,x_{i}) = \exp{\\{-\lambda \|x - x_{i}\|^{2}\\}}, i \in  \\{1,2,\cdots ,k \\} $$ 其中 $x$ 为待预测的实例，$x_{i}$ 为被选中的 `k` 个样本中的第 `i` 个，$\lambda \ge 0$ 为超参数，可以决定距离在最终决定时所占的权重，$\lambda$ 越大，距离所占的权重越大；$\lambda$ 越小，距离所占的权重越小；当 $\lambda = 0$ 时，则与 `投票` 方法相同，不考虑距离。
+
+最终我们预测的 $x$ 属于各类别的概率为：$$\textrm{Pr} (y|x ) = \frac{{\textstyle \sum_{i=1}^{n}{w(x,x_{i} ) \delta(y,y_{i})}}}{{\textstyle \sum_{i=1}^{n}{w(x,x_{i})}}}$$
+其中，$\begin{array}{l} \delta (y,y_{i}) = \\{\begin{matrix} 1, \space y = y_{i}\\\\ 0, \space y \ne y_{i} \end{matrix}. \end{array}$ 为示性函数，$\textrm{Pr} (y|x)$ 为待预测的实例的标签为 $y$ 的概率，$y_{i}$ 为被选中的 `k` 个样本中的第 `i` 个样本的标签。
+
+在实际操作中，由于分母相同，计算过程中省略了这一步，直接进行加权。
+
+自定的 `KNN` 类中，若 $\lambda$ 未定，自动优化的待选 $\lambda \in \\{0, 0.1, 0.2, 0.5, 1, 2 \\}$
+
+我们采用了和之前画决策边界时同样的分布，即：
+
+1. 两类样本的情况
+
+通过以下参数生成了两组数据（每组数据为 `100` 个）：
+
+|           |       $1$       |        $2$        |
+|   ----    | :------------:  | :------------:    |
+| $\boldsymbol{\\mu}$ | $\begin{bmatrix} 1 & 30 \end{bmatrix}$ | $\begin{bmatrix} 2 & 30 \end{bmatrix}$ |
+| $\boldsymbol{\\Sigma}$ | $\begin{bmatrix} 1 & 0 \\\\ 0 & 10 \end{bmatrix}$ | $\begin{bmatrix} 1 & 0 \\\\ 0 & 10 \end{bmatrix}$ |  
+
+![这是画出的决策边界](img/boundry_2clusters_w.png "decision boundary (2 classes with weights)")
+
+优化后的权重和准确率如下：
+
+|  k    |  $\lambda$ |  Accuracy |
+| :---: |  :-------: | :---------: |
+|  1    |  0.2       |  0.625      |
+|  3    |  2         |  0.625      |
+|  5    |  1         |  0.675      |
+|  9    |  0.2       |  0.775      |
+|  **11**   |  0         |  0.800      |
+|  13   |  0.1       |  0.775      |
+|  15   |  0.5       |  0.775      |
+|  17   |  2         |  0.750      |
+|  **19**   |  0.5       |  0.800      |
+
+
+2. 三类样本的情况
+通过以下参数生成了三组数据（每组数据为 `100` 个）：
+
+|           |       $1$       |        $2$        |        $3$        |   
+|   :----:  | :------------:  |  :------------:   |  :------------:   |
+| $\boldsymbol{\\mu}$ | $\begin{bmatrix} 1 & 20 \end{bmatrix}$ | $\begin{bmatrix} 5 & 20 \end{bmatrix}$ | $\begin{bmatrix} 15 & 15 \end{bmatrix}$ |
+| $\boldsymbol{\\Sigma}$ | $\begin{bmatrix} 1 & 0 \\\\ 0 & 10 \end{bmatrix}$ | $\begin{bmatrix} 10 & 15 \\\\ 15 & 40 \end{bmatrix}$ | $\begin{bmatrix} 20 & 0 \\\\ 0 & 30 \end{bmatrix}$ |
+
+![这是画出的决策边界](img/boundry_3clusters_w.png "decision boundary (3 classes with weights)")
+
+优化后的权重和准确率如下：
+
+|  k    |  $\lambda$ |  Accuracy |
+| :---: |  :-------: | :---------: |
+|  1    |  1         |  0.900      |
+|  3    |  0.1       |  0.917      |
+|  **5**    |  0.2       |  0.967      |
+|  9    |  0.1       |  0.933      |
+|  11   |  0.1       |  0.917      |
+|  **13**   |  0.2       |  0.967      |
+|  **15**   |  0.2       |  0.967      |
+|  **17**   |  0         |  0.967      |
+|  **19**   |  0.1       |  0.967      |
+
+
+3. 五类样本的情况
+通过以下参数生成了五组数据（每组数据为 `100` 个）：
+
+|           |       $1$       |        $2$        |        $3$        |        $4$        |        $5$       |   
+|   :----:  | :------------:  |  :------------:   |  :------------:   |  :------------:   |   :------------: |
+| $\boldsymbol{\\mu}$ | $\begin{bmatrix} 1 & 20 \end{bmatrix}$ | $\begin{bmatrix} 5 & 20 \end{bmatrix}$ | $\begin{bmatrix} 20 & 30 \end{bmatrix}$ | $\begin{bmatrix} 30 & 25 \end{bmatrix}$ | $\begin{bmatrix} 25 & 25 \end{bmatrix}$ |
+| $\boldsymbol{\\Sigma}$ | $\begin{bmatrix} 1 & 0 \\\\ 0 & 10 \end{bmatrix}$ | $\begin{bmatrix} 10 & 15 \\\\ 15 & 40 \end{bmatrix}$ | $\begin{bmatrix} 20 & 0 \\\\ 0 & 30 \end{bmatrix}$ | $\begin{bmatrix} 10 & 0 \\\\ 0 & 30 \end{bmatrix}$ | $\begin{bmatrix} 2 & 5 \\\\ 5 & 50 \end{bmatrix}$ |
+
+![这是画出的决策边界](img/boundry_5clusters_w.png "decision boundary (5 classes with weights)")
+
+优化后的权重和准确率如下：
+
+|  k    |  $\lambda$ |  Accuracy |
+| :---: |  :-------: | :---------: |
+|  1    |  1         |  0.760      |
+|  3    |  2         |  0.800      |
+|  5    |  2         |  0.810      |
+|  9    |  2         |  0.800      |
+|  11   |  0.1       |  0.810      |
+|  **13**   |  1         |  0.830      |
+|  15   |  2         |  0.800      |
+|  17   |  2         |  0.800      |
+|  **19**   |  1         |  0.830      |
+
+可以看到当加入权重时，会倾向于选择更大的k，准确率普遍有所提升（不过因为两次的数据并不相同，只是遵从相同的分布，可能有一定的偶然性）。
+
+但kNN的 `decision boundary` 不再随 `k` 变化得那么明显，模型在 `k` 变大后的能力衰减较小。
+
+##### 小结
+- 模型中 `距离度量方法` 的变化对于实验选取的数据的结果没有明显的影响，而kNN一般也使用 `Euclidean Distance` 进行度量。而在这里需要注意预处理时数据在不同特征维度上的方差，进而影响距离绝对值大小的因素，考虑对数据进行归一化/标准化操作。
+其中归一化的操作会修改数据的原始分布，造成一定的问题；标准化的操作可能在大多数情况下更好。
+
+- `k` 值可以看做对模型的平滑处理，`k` 值越大，模型的复杂度降低，决策边界也会更加平缓，但预测的能力也会有所下降。
+
+- 我们可以尝试通过改变 `分类决策规定` 来对模型进行调整，一般我们直接使用 `投票` 方法进行决策。对此，一种常见且符合直觉的方法就是根据距离调整 `k` 个训练样本 `投票` 所占的比例。通过加权算法，模型在 `k` 值变大时仍然可以保持一定的复杂度，提高能力。
+
+在对 `k` 值和 `分类决策规定` 进行调整的过程中，我们可以在模型的复杂程度和稳定性（决策边界的平缓程度）之间做一些 trade-off 。
+
+
+## 总结
+- kNN是一个比较简单的监督学习方法，属于基于实例的非参数估计，因而其能力直接受数据影响，在本实验中，当样本量足够的时候，数据的分布对其分类准确率有直接的影响。本实验中探究的是kNN对于二元高斯分布产生数据的分类效果。kNN适用于哪类分布的数据，以及如何更好的度量数据的分布与kNN准确率的关系可以进一步探究。
+
+- kNN的复杂度为 `O(N*D)`，因而会有 “Curse of Dimensionality” 的问题，我们可以从 `N` 和 `D` 两个角度进行优化，比如通过如 `KD-tree`的方法减少搜索的时间复杂度；或者通过一些算法对特征进行降维来减少 `D`。而在实验中发现，样本数量的不断增加对于kNN预测的准确率没有很大的影响，所以在保持样本数量足够的前提下，我们可以选取更合适、更具有代表性的样本来减少 `N`。
+
+- kNN的三个基本要素为：（1）距离度量；（2）`k` 值的选择；（3）分类决策规定。通过改变这三个要素，我们可以对模型进行一定的优化，使之更适合需要判断的数据。
+
+
+### 代码运行方法：
+```
+    python source.py 0 #0-5，为不同的lab number
+```
diff --git a/assignment-1/submission/18300110042/img/MMD.png b/assignment-1/submission/18300110042/img/MMD.png
new file mode 100644
index 0000000000000000000000000000000000000000..9e480548d5f6d576c474182fa70c555893b1d324
Binary files /dev/null and b/assignment-1/submission/18300110042/img/MMD.png differ
diff --git a/assignment-1/submission/18300110042/img/accs_kldiv_10nns_diagonal.png b/assignment-1/submission/18300110042/img/accs_kldiv_10nns_diagonal.png
new file mode 100644
index 0000000000000000000000000000000000000000..4e9849c67ddd9edf818572ea8822cd9ceab88a75
Binary files /dev/null and b/assignment-1/submission/18300110042/img/accs_kldiv_10nns_diagonal.png differ
diff --git a/assignment-1/submission/18300110042/img/accs_kldiv_10nns_diagonal_large.png b/assignment-1/submission/18300110042/img/accs_kldiv_10nns_diagonal_large.png
new file mode 100644
index 0000000000000000000000000000000000000000..9046a45c112f977a9ebc42ffa942fcf1116e9f9e
Binary files /dev/null and b/assignment-1/submission/18300110042/img/accs_kldiv_10nns_diagonal_large.png differ
diff --git a/assignment-1/submission/18300110042/img/accs_kldiv_10nns_random.png b/assignment-1/submission/18300110042/img/accs_kldiv_10nns_random.png
new file mode 100644
index 0000000000000000000000000000000000000000..94669f10b58b3aafd07baa55a6b448bd6d63180b
Binary files /dev/null and b/assignment-1/submission/18300110042/img/accs_kldiv_10nns_random.png differ
diff --git a/assignment-1/submission/18300110042/img/accs_kldiv_1nns_diagonal.png b/assignment-1/submission/18300110042/img/accs_kldiv_1nns_diagonal.png
new file mode 100644
index 0000000000000000000000000000000000000000..1ed8bcaf3b0da7535cac018eec49a7757f71fea0
Binary files /dev/null and b/assignment-1/submission/18300110042/img/accs_kldiv_1nns_diagonal.png differ
diff --git a/assignment-1/submission/18300110042/img/accs_kldiv_1nns_diagonal_large.png b/assignment-1/submission/18300110042/img/accs_kldiv_1nns_diagonal_large.png
new file mode 100644
index 0000000000000000000000000000000000000000..506cf2165072be5616a2d5e30f4878a71ce281e5
Binary files /dev/null and b/assignment-1/submission/18300110042/img/accs_kldiv_1nns_diagonal_large.png differ
diff --git a/assignment-1/submission/18300110042/img/accs_kldiv_1nns_random.png b/assignment-1/submission/18300110042/img/accs_kldiv_1nns_random.png
new file mode 100644
index 0000000000000000000000000000000000000000..4c6f0be1495ecab3c2c695484e5ed2d4e84f3e72
Binary files /dev/null and b/assignment-1/submission/18300110042/img/accs_kldiv_1nns_random.png differ
diff --git a/assignment-1/submission/18300110042/img/accs_kldiv_20nns_diagonal.png b/assignment-1/submission/18300110042/img/accs_kldiv_20nns_diagonal.png
new file mode 100644
index 0000000000000000000000000000000000000000..164a17e030d5e11f470049025eefaff931cd0b17
Binary files /dev/null and b/assignment-1/submission/18300110042/img/accs_kldiv_20nns_diagonal.png differ
diff --git a/assignment-1/submission/18300110042/img/accs_kldiv_20nns_diagonal_large.png b/assignment-1/submission/18300110042/img/accs_kldiv_20nns_diagonal_large.png
new file mode 100644
index 0000000000000000000000000000000000000000..8434573b55ed5edf048c09cdab328f4e1b067459
Binary files /dev/null and b/assignment-1/submission/18300110042/img/accs_kldiv_20nns_diagonal_large.png differ
diff --git a/assignment-1/submission/18300110042/img/accs_kldiv_20nns_random.png b/assignment-1/submission/18300110042/img/accs_kldiv_20nns_random.png
new file mode 100644
index 0000000000000000000000000000000000000000..9ec5715674fbc35dc5b1d7f68e2c4a727f60bcb0
Binary files /dev/null and b/assignment-1/submission/18300110042/img/accs_kldiv_20nns_random.png differ
diff --git a/assignment-1/submission/18300110042/img/accs_kldiv_3nns_diagonal.png b/assignment-1/submission/18300110042/img/accs_kldiv_3nns_diagonal.png
new file mode 100644
index 0000000000000000000000000000000000000000..0e5987644b9ceef93d5bda14ad5862c124d4a15b
Binary files /dev/null and b/assignment-1/submission/18300110042/img/accs_kldiv_3nns_diagonal.png differ
diff --git a/assignment-1/submission/18300110042/img/accs_kldiv_3nns_diagonal_large.png b/assignment-1/submission/18300110042/img/accs_kldiv_3nns_diagonal_large.png
new file mode 100644
index 0000000000000000000000000000000000000000..7487bdf99524b465c03bf092bddfbc873363e6e0
Binary files /dev/null and b/assignment-1/submission/18300110042/img/accs_kldiv_3nns_diagonal_large.png differ
diff --git a/assignment-1/submission/18300110042/img/accs_kldiv_3nns_random.png b/assignment-1/submission/18300110042/img/accs_kldiv_3nns_random.png
new file mode 100644
index 0000000000000000000000000000000000000000..9fed2e781371dd4b6453d59b54b6302b3203bb79
Binary files /dev/null and b/assignment-1/submission/18300110042/img/accs_kldiv_3nns_random.png differ
diff --git a/assignment-1/submission/18300110042/img/accs_kldiv_50nns_diagonal.png b/assignment-1/submission/18300110042/img/accs_kldiv_50nns_diagonal.png
new file mode 100644
index 0000000000000000000000000000000000000000..a19fe3d0d17942370977fc431c39f718c35ea445
Binary files /dev/null and b/assignment-1/submission/18300110042/img/accs_kldiv_50nns_diagonal.png differ
diff --git a/assignment-1/submission/18300110042/img/accs_kldiv_50nns_diagonal_large.png b/assignment-1/submission/18300110042/img/accs_kldiv_50nns_diagonal_large.png
new file mode 100644
index 0000000000000000000000000000000000000000..27b898128a6014de89deff15a42b2322f2f8c86a
Binary files /dev/null and b/assignment-1/submission/18300110042/img/accs_kldiv_50nns_diagonal_large.png differ
diff --git a/assignment-1/submission/18300110042/img/accs_kldiv_50nns_random.png b/assignment-1/submission/18300110042/img/accs_kldiv_50nns_random.png
new file mode 100644
index 0000000000000000000000000000000000000000..f28948c5ab02093a0eae2eba203b05426e6e5520
Binary files /dev/null and b/assignment-1/submission/18300110042/img/accs_kldiv_50nns_random.png differ
diff --git a/assignment-1/submission/18300110042/img/accs_kldiv_5nns_diagonal.png b/assignment-1/submission/18300110042/img/accs_kldiv_5nns_diagonal.png
new file mode 100644
index 0000000000000000000000000000000000000000..7a1fcf7ab270cb9fe3632e5acf4e41d1e21650c2
Binary files /dev/null and b/assignment-1/submission/18300110042/img/accs_kldiv_5nns_diagonal.png differ
diff --git a/assignment-1/submission/18300110042/img/accs_kldiv_5nns_diagonal_large.png b/assignment-1/submission/18300110042/img/accs_kldiv_5nns_diagonal_large.png
new file mode 100644
index 0000000000000000000000000000000000000000..81623abf058b28136bffef667c67507881cc1882
Binary files /dev/null and b/assignment-1/submission/18300110042/img/accs_kldiv_5nns_diagonal_large.png differ
diff --git a/assignment-1/submission/18300110042/img/accs_kldiv_5nns_random.png b/assignment-1/submission/18300110042/img/accs_kldiv_5nns_random.png
new file mode 100644
index 0000000000000000000000000000000000000000..11c0de98a997239eee14f9e46d9316b24a42e215
Binary files /dev/null and b/assignment-1/submission/18300110042/img/accs_kldiv_5nns_random.png differ
diff --git a/assignment-1/submission/18300110042/img/accs_test_1.png b/assignment-1/submission/18300110042/img/accs_test_1.png
new file mode 100644
index 0000000000000000000000000000000000000000..1f177a5a01946a8018862e221d47e7a5aa124c72
Binary files /dev/null and b/assignment-1/submission/18300110042/img/accs_test_1.png differ
diff --git a/assignment-1/submission/18300110042/img/accs_test_2.png b/assignment-1/submission/18300110042/img/accs_test_2.png
new file mode 100644
index 0000000000000000000000000000000000000000..01208478a4f536660d120ca4fa8a47c1c6485a7b
Binary files /dev/null and b/assignment-1/submission/18300110042/img/accs_test_2.png differ
diff --git a/assignment-1/submission/18300110042/img/accs_test_3.png b/assignment-1/submission/18300110042/img/accs_test_3.png
new file mode 100644
index 0000000000000000000000000000000000000000..3ce585932b84b001a4e4bc769652aa885d12d77e
Binary files /dev/null and b/assignment-1/submission/18300110042/img/accs_test_3.png differ
diff --git a/assignment-1/submission/18300110042/img/accs_test_4.png b/assignment-1/submission/18300110042/img/accs_test_4.png
new file mode 100644
index 0000000000000000000000000000000000000000..935579fe4f3e4566870ceadd050209a4a54df3cb
Binary files /dev/null and b/assignment-1/submission/18300110042/img/accs_test_4.png differ
diff --git a/assignment-1/submission/18300110042/img/accs_test_5.png b/assignment-1/submission/18300110042/img/accs_test_5.png
new file mode 100644
index 0000000000000000000000000000000000000000..5803abe316944d6175bd5124ef42fbeec13579e5
Binary files /dev/null and b/assignment-1/submission/18300110042/img/accs_test_5.png differ
diff --git a/assignment-1/submission/18300110042/img/boundry_2clusters.png b/assignment-1/submission/18300110042/img/boundry_2clusters.png
new file mode 100644
index 0000000000000000000000000000000000000000..05fdfd6c5ae7ae0fbfde22cd06a56aead09078fa
Binary files /dev/null and b/assignment-1/submission/18300110042/img/boundry_2clusters.png differ
diff --git a/assignment-1/submission/18300110042/img/boundry_2clusters_w.png b/assignment-1/submission/18300110042/img/boundry_2clusters_w.png
new file mode 100644
index 0000000000000000000000000000000000000000..05fdfd6c5ae7ae0fbfde22cd06a56aead09078fa
Binary files /dev/null and b/assignment-1/submission/18300110042/img/boundry_2clusters_w.png differ
diff --git a/assignment-1/submission/18300110042/img/boundry_3clusters.png b/assignment-1/submission/18300110042/img/boundry_3clusters.png
new file mode 100644
index 0000000000000000000000000000000000000000..66a8dddd694f17aa2f412c08fcc08ceb75a93e18
Binary files /dev/null and b/assignment-1/submission/18300110042/img/boundry_3clusters.png differ
diff --git a/assignment-1/submission/18300110042/img/boundry_3clusters_w.png b/assignment-1/submission/18300110042/img/boundry_3clusters_w.png
new file mode 100644
index 0000000000000000000000000000000000000000..66a8dddd694f17aa2f412c08fcc08ceb75a93e18
Binary files /dev/null and b/assignment-1/submission/18300110042/img/boundry_3clusters_w.png differ
diff --git a/assignment-1/submission/18300110042/img/boundry_5clusters.png b/assignment-1/submission/18300110042/img/boundry_5clusters.png
new file mode 100644
index 0000000000000000000000000000000000000000..88aed93e47738b63c7f8c8b8712018c93eaa4e6b
Binary files /dev/null and b/assignment-1/submission/18300110042/img/boundry_5clusters.png differ
diff --git a/assignment-1/submission/18300110042/img/boundry_5clusters_w.png b/assignment-1/submission/18300110042/img/boundry_5clusters_w.png
new file mode 100644
index 0000000000000000000000000000000000000000..88aed93e47738b63c7f8c8b8712018c93eaa4e6b
Binary files /dev/null and b/assignment-1/submission/18300110042/img/boundry_5clusters_w.png differ
diff --git a/assignment-1/submission/18300110042/img/data_batch_plotted_all_dist.png b/assignment-1/submission/18300110042/img/data_batch_plotted_all_dist.png
new file mode 100644
index 0000000000000000000000000000000000000000..0cf7ff48201971c4273eb2c11df48a880fc0b307
Binary files /dev/null and b/assignment-1/submission/18300110042/img/data_batch_plotted_all_dist.png differ
diff --git a/assignment-1/submission/18300110042/img/data_batch_plotted_all_norm.png b/assignment-1/submission/18300110042/img/data_batch_plotted_all_norm.png
new file mode 100644
index 0000000000000000000000000000000000000000..84429d4aeb0c6341101121ed051bb6ab37e52aef
Binary files /dev/null and b/assignment-1/submission/18300110042/img/data_batch_plotted_all_norm.png differ
diff --git a/assignment-1/submission/18300110042/img/data_plotted_data_1.png b/assignment-1/submission/18300110042/img/data_plotted_data_1.png
new file mode 100644
index 0000000000000000000000000000000000000000..b1145c9b1e0e3bfa692d16feb0375acdf68c3ec3
Binary files /dev/null and b/assignment-1/submission/18300110042/img/data_plotted_data_1.png differ
diff --git a/assignment-1/submission/18300110042/img/data_plotted_data_2.png b/assignment-1/submission/18300110042/img/data_plotted_data_2.png
new file mode 100644
index 0000000000000000000000000000000000000000..2264fb038478e1057cb5b152432c70543a0ca500
Binary files /dev/null and b/assignment-1/submission/18300110042/img/data_plotted_data_2.png differ
diff --git a/assignment-1/submission/18300110042/img/data_plotted_data_3.png b/assignment-1/submission/18300110042/img/data_plotted_data_3.png
new file mode 100644
index 0000000000000000000000000000000000000000..a40044a3990b9c3761ea86e61a46067b00fb9bf8
Binary files /dev/null and b/assignment-1/submission/18300110042/img/data_plotted_data_3.png differ
diff --git a/assignment-1/submission/18300110042/img/data_plotted_data_4.png b/assignment-1/submission/18300110042/img/data_plotted_data_4.png
new file mode 100644
index 0000000000000000000000000000000000000000..f69af197376048c62ea9d5790db6ffec941b7dd7
Binary files /dev/null and b/assignment-1/submission/18300110042/img/data_plotted_data_4.png differ
diff --git a/assignment-1/submission/18300110042/img/data_plotted_data_5.png b/assignment-1/submission/18300110042/img/data_plotted_data_5.png
new file mode 100644
index 0000000000000000000000000000000000000000..03c93ffd629cc4d8139a99bb917152a69d2508e4
Binary files /dev/null and b/assignment-1/submission/18300110042/img/data_plotted_data_5.png differ
diff --git a/assignment-1/submission/18300110042/img/data_plotted_data_big_var.png b/assignment-1/submission/18300110042/img/data_plotted_data_big_var.png
new file mode 100644
index 0000000000000000000000000000000000000000..d1356dbba2e449ffc8844bb6abafff3aeff5a434
Binary files /dev/null and b/assignment-1/submission/18300110042/img/data_plotted_data_big_var.png differ
diff --git a/assignment-1/submission/18300110042/img/data_plotted_test_data_1.png b/assignment-1/submission/18300110042/img/data_plotted_test_data_1.png
new file mode 100644
index 0000000000000000000000000000000000000000..c309f57dfe7f9923ecd82a7d9655ea7d741ebf0e
Binary files /dev/null and b/assignment-1/submission/18300110042/img/data_plotted_test_data_1.png differ
diff --git a/assignment-1/submission/18300110042/img/data_plotted_test_data_2.png b/assignment-1/submission/18300110042/img/data_plotted_test_data_2.png
new file mode 100644
index 0000000000000000000000000000000000000000..b8f5f9858c2229cea79d4fab7f27a467b8372460
Binary files /dev/null and b/assignment-1/submission/18300110042/img/data_plotted_test_data_2.png differ
diff --git a/assignment-1/submission/18300110042/img/data_plotted_test_data_3.png b/assignment-1/submission/18300110042/img/data_plotted_test_data_3.png
new file mode 100644
index 0000000000000000000000000000000000000000..5c665a1be7d66e7fae6e995c5d230cb0f2dc1d7a
Binary files /dev/null and b/assignment-1/submission/18300110042/img/data_plotted_test_data_3.png differ
diff --git a/assignment-1/submission/18300110042/img/data_plotted_test_data_4.png b/assignment-1/submission/18300110042/img/data_plotted_test_data_4.png
new file mode 100644
index 0000000000000000000000000000000000000000..e26ae81c6501667d4899d17516a78a234938caa0
Binary files /dev/null and b/assignment-1/submission/18300110042/img/data_plotted_test_data_4.png differ
diff --git a/assignment-1/submission/18300110042/img/data_plotted_test_data_5.png b/assignment-1/submission/18300110042/img/data_plotted_test_data_5.png
new file mode 100644
index 0000000000000000000000000000000000000000..026878a9509bd4deba2eebf4ca2605a9404fc01f
Binary files /dev/null and b/assignment-1/submission/18300110042/img/data_plotted_test_data_5.png differ
diff --git a/assignment-1/submission/18300110042/img/data_plotted_test_data_big_var.png b/assignment-1/submission/18300110042/img/data_plotted_test_data_big_var.png
new file mode 100644
index 0000000000000000000000000000000000000000..334020220b0e5f26d241c0f5e2510be6f66fe918
Binary files /dev/null and b/assignment-1/submission/18300110042/img/data_plotted_test_data_big_var.png differ
diff --git a/assignment-1/submission/18300110042/img/data_plotted_train_data_1.png b/assignment-1/submission/18300110042/img/data_plotted_train_data_1.png
new file mode 100644
index 0000000000000000000000000000000000000000..4cd8166132858a45bb291b9c910a70fdebd2fde4
Binary files /dev/null and b/assignment-1/submission/18300110042/img/data_plotted_train_data_1.png differ
diff --git a/assignment-1/submission/18300110042/img/data_plotted_train_data_2.png b/assignment-1/submission/18300110042/img/data_plotted_train_data_2.png
new file mode 100644
index 0000000000000000000000000000000000000000..722aa9edd5d5c82aab7f93082b50601c5c3e4b92
Binary files /dev/null and b/assignment-1/submission/18300110042/img/data_plotted_train_data_2.png differ
diff --git a/assignment-1/submission/18300110042/img/data_plotted_train_data_3.png b/assignment-1/submission/18300110042/img/data_plotted_train_data_3.png
new file mode 100644
index 0000000000000000000000000000000000000000..91c7b217e4a33679968c6529dc257b9184c8e309
Binary files /dev/null and b/assignment-1/submission/18300110042/img/data_plotted_train_data_3.png differ
diff --git a/assignment-1/submission/18300110042/img/data_plotted_train_data_4.png b/assignment-1/submission/18300110042/img/data_plotted_train_data_4.png
new file mode 100644
index 0000000000000000000000000000000000000000..057eb512fb13bf80fb8aef184ba5ac9229048cca
Binary files /dev/null and b/assignment-1/submission/18300110042/img/data_plotted_train_data_4.png differ
diff --git a/assignment-1/submission/18300110042/img/data_plotted_train_data_5.png b/assignment-1/submission/18300110042/img/data_plotted_train_data_5.png
new file mode 100644
index 0000000000000000000000000000000000000000..6280e96757dde6b85afb09de9bfc19d063f68644
Binary files /dev/null and b/assignment-1/submission/18300110042/img/data_plotted_train_data_5.png differ
diff --git a/assignment-1/submission/18300110042/img/data_plotted_train_data_big_var.png b/assignment-1/submission/18300110042/img/data_plotted_train_data_big_var.png
new file mode 100644
index 0000000000000000000000000000000000000000..3c0e137f92d0a2b096e2b533214615a9f2a456d1
Binary files /dev/null and b/assignment-1/submission/18300110042/img/data_plotted_train_data_big_var.png differ
diff --git a/assignment-1/submission/18300110042/img/dist1.png b/assignment-1/submission/18300110042/img/dist1.png
new file mode 100644
index 0000000000000000000000000000000000000000..09cbd28466a6575bb537b7e8814503f12c56d4e9
Binary files /dev/null and b/assignment-1/submission/18300110042/img/dist1.png differ
diff --git a/assignment-1/submission/18300110042/img/dist10.png b/assignment-1/submission/18300110042/img/dist10.png
new file mode 100644
index 0000000000000000000000000000000000000000..5fa4f708bb6445e0a4da46ed775ed2ba654dce42
Binary files /dev/null and b/assignment-1/submission/18300110042/img/dist10.png differ
diff --git a/assignment-1/submission/18300110042/img/dist20.png b/assignment-1/submission/18300110042/img/dist20.png
new file mode 100644
index 0000000000000000000000000000000000000000..c9eca5605ab07f23672c774af3acfa6dd1957ea2
Binary files /dev/null and b/assignment-1/submission/18300110042/img/dist20.png differ
diff --git a/assignment-1/submission/18300110042/img/dist3.png b/assignment-1/submission/18300110042/img/dist3.png
new file mode 100644
index 0000000000000000000000000000000000000000..4aed63e721564867b87a0dd3afd31cb43d9338a5
Binary files /dev/null and b/assignment-1/submission/18300110042/img/dist3.png differ
diff --git a/assignment-1/submission/18300110042/img/dist5.png b/assignment-1/submission/18300110042/img/dist5.png
new file mode 100644
index 0000000000000000000000000000000000000000..d3aa5b03cf4a6c86c98d5d623db1683478513b8d
Binary files /dev/null and b/assignment-1/submission/18300110042/img/dist5.png differ
diff --git a/assignment-1/submission/18300110042/img/dist50.png b/assignment-1/submission/18300110042/img/dist50.png
new file mode 100644
index 0000000000000000000000000000000000000000..b8a35fb26d139318e86e9cc8b3c985f8e0b29158
Binary files /dev/null and b/assignment-1/submission/18300110042/img/dist50.png differ
diff --git a/assignment-1/submission/18300110042/knn_lab.dat b/assignment-1/submission/18300110042/knn_lab.dat
new file mode 100644
index 0000000000000000000000000000000000000000..41e9b68e149b69d5871dfa8da0eafa277394d71c
--- /dev/null
+++ b/assignment-1/submission/18300110042/knn_lab.dat
@@ -0,0 +1,117 @@
+{
+    "knn_lab": [
+        {
+            "means":  {
+                "method": "fix",
+                "data": [ [1, 50], [15, 10], [10, 20] ]
+            },
+            "covs": {
+                "method": "fix",
+                "data": [
+                    [ [1,   0], [0, 10 ] ],
+                    [ [10,  15], [15, 40 ] ],
+                    [ [20, 0], [0, 30] ]
+                ]
+            },
+            "n_data": [ 400, 400, 400 ],
+            "k": [1, 3, 5, 9, 15, 20],
+            "dist": "euc",
+            "weights": 2,
+            "norm": "N"
+        },
+        {
+            "means":  {
+                "method": "fix",
+                "data": [ [1, 10], [5, 15] ]
+            },
+            "covs": {
+                "method": "fix",
+                "data": [
+                    [ [73,   0], [0, 22  ] ],
+                    [ [21.2, 0], [0, 32.1] ]
+                ]
+            },
+            "n_data": [ 1000, 1000 ],
+            "k": [5],
+            "dist": "euc",
+            "weights": 2,
+            "norm": "N"
+        },
+        {
+            "means":  {
+                "method": "random",
+                "data": [ [1, 10], [5, 15] ]
+            },
+            "covs": {
+                "method": "fix",
+                "data": [
+                    [ [73,   0], [0, 22  ] ],
+                    [ [21.2, 0], [0, 32.1] ]
+                ]
+            },
+            "n_data": [ 1000, 1000 ],
+            "k": [50],
+            "dist": "euc",
+            "weights": 0,
+            "norm": "N"
+        },
+        {
+            "means":  {
+                "method": "fix",
+                "data": [ [1, 20], [5, 20], [15, 15] ]
+            },
+            "covs": {
+                "method": "fix",
+                "data": [
+                    [ [1,   0], [0, 10  ] ],
+                    [ [10, 15], [15, 40] ],
+                    [ [20, 0], [0, 30] ]
+                ]
+            },
+            "n_data": [ 100, 100, 100 ],
+            "k": [1, 3, 5, 9, 11, 13, 15, 17, 19],
+            "dist": "euc",
+            "weights": 0,
+            "norm": "N"
+        },
+        {
+            "means":  {
+                "method": "fix",
+                "data": [ [1, 30], [2, 30] ]
+            },
+            "covs": {
+                "method": "fix",
+                "data": [
+                    [ [1,   0], [0, 10  ] ],
+                    [ [1,   0], [0, 10  ] ]
+                ]
+            },
+            "n_data": [ 100, 100 ],
+            "k": [1, 3, 5, 9, 11, 13, 15, 17, 19],
+            "dist": "euc",
+            "weights": 0,
+            "norm": "N"
+        },
+        {
+            "means":  {
+                "method": "fix",
+                "data": [ [1, 20], [5, 20], [20, 30], [30, 25], [25, 25] ]
+            },
+            "covs": {
+                "method": "fix",
+                "data": [
+                    [ [1,   0], [0, 10  ] ],
+                    [ [10, 15], [15, 40] ],
+                    [ [20, 0], [0, 30] ],
+                    [ [10, 0], [0, 30] ],
+                    [ [2, 5], [5, 50] ]
+                ]
+            },
+            "n_data": [ 100, 100, 100, 100, 100 ],
+            "k": [1, 3, 5, 9, 11, 13, 15, 17, 19],
+            "dist": "euc",
+            "weights": 0,
+            "norm": "N"
+        }
+    ]
+}
\ No newline at end of file
diff --git a/assignment-1/submission/18300110042/source.py b/assignment-1/submission/18300110042/source.py
new file mode 100644
index 0000000000000000000000000000000000000000..f72b02ca335921ffda687c593bbb4c612e0e453f
--- /dev/null
+++ b/assignment-1/submission/18300110042/source.py
@@ -0,0 +1,397 @@
+import sys
+import numpy as np
+import matplotlib.pyplot as plt
+
+class KNN:
+    def __init__(self, k=5, weights=0, norm='N', dist='euc', cv_records=False, verbose=False):
+        # init models' hyperparameters
+        self.k = k
+        self.weights = weights
+        self.norm = norm
+        self.dist = dist
+        self.cv_records = cv_records
+        self.verbose = verbose
+
+
+    def fit(self, train_data, train_label):
+        # check data input
+        assert train_data.shape[0] == train_label.shape[0]
+        assert train_data.shape[1]
+
+        # init train data and train labels
+        self.train_data = train_data
+        self.train_label = train_label
+
+        # search the best hyperparameters set and optimize the model if requested
+        # hyperparameters which can be optimized (k, weights, norm, dist)
+        best_hyperparam = self.grid_search_cv()
+
+        # modify the knn model's hyperparameters 
+        if best_hyperparam:
+            for hp in best_hyperparam:
+                setattr(self, hp[0], hp[1])
+
+
+    def predict(self, test_data):
+        train_data_indices = np.arange(self.train_data.shape[0])
+        return self.predict_cv(test_data, train_data_indices, self.k, self.weights, self.norm, self.dist)
+
+
+    def grid_search_cv(self, cv=10):
+        best_hyperparam = []
+
+        # construct hyperparams for grid search
+        hyperparams = dict()
+        if self.k == 0:
+            k_up_bound = 50
+            if self.train_data.shape[0] < 50:
+                k_up_bound = self.train_data.shape[0]
+            hyperparams['k'] = [k for k in range(1, k_up_bound)]
+        if self.norm == 'auto':
+            hyperparams['norm'] = ['N', 'min_max', 'standard']
+        if self.weights == -1:
+            hyperparams['weights'] = [0, 0.1, 0.2, 0.5, 1, 2]
+        if self.dist == 'auto':
+            hyperparams['dist'] = ['euc', 'manhattan']
+
+        # return if optimize is not requested
+        if not hyperparams:
+            return best_hyperparam 
+
+        # construct parameters conmbination
+        p_names, p_values = zip(*sorted(hyperparams.items()))
+        hps = [tuple(hp) for hp in p_values]
+        combns = [[]]
+        for hp in hps:
+            combns = [x + [y] for x in combns for y in hp]
+
+        # grid search for the best hyperparameters
+        accuracy_scores = []
+        for comb in combns:
+            params = dict(zip(p_names, comb))
+            # cross validate to evaluate model
+            score = self.cross_validation(cv, params)
+            accuracy_scores.append(score)
+
+        best_comb = sorted(zip(combns, accuracy_scores), key=lambda x: x[1], reverse=True)[0]
+        best_hyperparam = list(zip(p_names, best_comb[0]))
+        return best_hyperparam
+
+          
+    def cross_validation(self, cv, hyperparam):
+        # shuffle the train data
+        shuffled_indexs = np.random.permutation(self.train_data.shape[0])
+
+        # k-fold split train data 
+        kfolds = np.array_split(shuffled_indexs, cv)
+        kfolds = [f for f in kfolds if len(f) != 0]
+        
+        accuracy_score = 0
+        for i in range(len(kfolds)):
+            val_data = self.train_data[kfolds[i]]
+            val_label = self.train_label[kfolds[i]]
+            train_data_indices = np.concatenate(kfolds[:i] + kfolds[i+1:]).flatten()
+
+            k = hyperparam.get('k', self.k)
+            weights = hyperparam.get('weights', self.weights)
+            norm = hyperparam.get('norm', self.norm)
+            dist = hyperparam.get('dist', self.dist)
+            predict_label = self.predict_cv(val_data, train_data_indices, k, weights, norm, dist)
+
+            accuracy_score += np.mean(np.equal(predict_label, val_label))
+
+        return accuracy_score / len(kfolds)
+        
+
+    def predict_cv(self, test_data, train_data_indices, k, weights, norm, dist):
+        # normalize data 
+        train_data = self.train_data[train_data_indices]
+        train_label = self.train_label[train_data_indices]
+        normparams = self.calc_normparams(train_data, norm)
+        norm_train_data, norm_test_data = self.normalize_data(train_data, test_data, norm, normparams)
+
+        # find k nearest neighbors
+        predict_labels = []
+        nn_labels_list, nn_distances_list = self.get_nearest_neighbors(norm_test_data, norm_train_data, train_label, k, dist)
+        
+        if (nn_labels_list >= 0).all() and not weights:
+            for labels in nn_labels_list:
+                pred = np.bincount(labels).argmax()
+                predict_labels.append(pred)
+        else:
+            d_weights = np.exp(-weights * nn_distances_list ** 2)
+            for i in range(test_data.shape[0]):
+                votes = {}
+                labels = nn_labels_list[i]
+                for j in range(k):
+                    l = labels[j]
+                    if l in votes:
+                        votes[l] += d_weights[i][j]
+                    else:
+                        votes[l] = d_weights[i][j]
+                pred = sorted(votes.items(), key=lambda x: x[1], reverse=True)[0][0]
+                predict_labels.append(pred)
+
+        return np.array(predict_labels)
+        
+    
+    def get_nearest_neighbors(self, test_data, train_data, train_label, k, dist):
+        # calc distances 
+        distances_list = np.vstack([self.get_distance(d, train_data, dist) for d in test_data])
+
+        # find the k nearest neighbors
+        nn_indices_list = np.argsort(distances_list, axis=-1)[:, :k]
+        nn_labels_list = train_label[nn_indices_list]
+        nn_distances_list = np.sort(distances_list, axis=-1)[:, :k]
+
+        return nn_labels_list, nn_distances_list
+
+
+    def get_distance(self, x1, x2, dist='euc'):
+        if dist == 'manhattan':
+            distance = np.sum(np.absolute(x1 - x2), axis=-1)
+        else: # default: euclidean distance
+            distance = np.sqrt(np.sum((x1 - x2)**2, axis=-1))
+        return distance
+
+
+    def normalize_data(self, train_data, test_data, norm, normparams):
+        if norm == 'min_max':
+            norm_train_data = (train_data - normparams['f_min']) / normparams['denom']
+            norm_test_data = (test_data - normparams['f_min']) / normparams['denom']
+        elif norm == 'standard': 
+            norm_train_data = (train_data - normparams['mean']) / normparams['sigma']
+            norm_test_data = (test_data - normparams['mean']) / normparams['sigma']
+        else: # default no normalization
+            norm_train_data = train_data
+            norm_test_data = test_data
+        return norm_train_data, norm_test_data
+
+
+    def calc_normparams(self, data, norm):
+        params = dict()
+        if norm == 'min_max':
+            feature_max = data.max(axis=0)
+            feature_min = data.min(axis=0)
+            denom = feature_max - feature_min
+            params['f_min'] = feature_min
+            params['denom'] = denom
+        if norm == 'standard':
+            mean = np.mean(data, axis=0)
+            sigma = np.std(data, axis=0)
+            params['mean'] = mean
+            params['sigma'] = sigma
+        return params
+
+"""
+------------------ below is code for experiement ------------------
+"""
+
+def load_lab_data(file_name):
+    import os, json
+    knn_lab_list = []
+    if os.path.exists(file_name):
+        with open(file_name, 'r') as f:
+            json_data = json.loads(f.read())
+            knn_lab_list = json_data.get('knn_lab', [])
+    return knn_lab_list
+
+def parse_lab_data(lab):
+    n_data = lab['n_data']
+    ks = lab['k']
+    weights = lab['weights']
+    norm = lab['norm']
+    dist = lab['dist']
+    d_means = lab['means']
+    means = []
+    if d_means['method'] == 'fix':
+        means = d_means['data']
+    else:
+        for m in d_means['data']:
+            mean = np.random.uniform(m[0], m[1], 2)
+            means.append(mean)
+    d_covs = lab['covs']
+    covs = []
+    if d_covs['method'] == 'fix':
+        covs = d_covs['data']
+    else:
+        for c in d_covs['data']:
+            cov = np.zeros((2, 2))
+            cov[0, 0] = cov[0, 0] + np.random.uniform(c[0], c[1], 1)
+            cov[1, 1] = cov[1, 1] + np.random.uniform(c[0], c[1], 1)
+            x = np.sqrt(cov[0, 0] * cov[1, 1])
+            # cov[0, 1] = np.random.uniform(-x, x, 1)
+            cov[1, 0] = cov[0, 1]
+            covs.append(cov)
+    return means, covs, n_data, ks, weights, norm, dist 
+
+def create_clustered_data(d_means, d_covs, n_data):
+    ds = []
+    ls = []
+    for i in range(len(d_means)):
+        d = np.random.multivariate_normal(d_means[i], d_covs[i], n_data[i])
+        ds.append(d)
+        ls.append(np.ones((n_data[i],), dtype=int) * i)
+    return ds, ls
+
+def combine_all_data(ds, ls):
+    data = np.concatenate(ds)
+    lable = np.concatenate(ls)
+    return data, lable
+
+def generate_lab_data(data, lable, rate=0.2):
+    idx = np.arange(len(data))
+    np.random.shuffle(idx)
+    data = data[idx]
+    label = lable[idx]
+    split = int(len(data) * (1 - rate))
+    train_data, test_data = data[:split,], data[split:,]
+    train_label, test_label = label[:split,], label[split:,]
+    return (train_data, train_label), (test_data, test_label)
+
+def run_lab(labnum):
+    labs = load_lab_data('knn_lab.dat')
+    lab = labs[labnum]
+    means, covs, n_data, ks, weights, norm, dist = parse_lab_data(lab)
+    lab['means'] = means
+    lab['covs'] = covs
+    ds, ls = create_clustered_data(means, covs, n_data)
+    data, label = combine_all_data(ds, ls)
+    (train_data, train_label), (test_data, test_label) = generate_lab_data(data, label)
+    accs = []
+    models = []
+    for k in ks:
+        model = KNN(k, weights, norm, dist)
+        model.fit(train_data, train_label) 
+        predict_label =  model.predict(test_data)
+        models.append(model)
+        accs.append(np.mean(np.equal(predict_label, test_label))) 
+    print("accs =", accs)
+    return models, lab, ds, data, label, accs
+
+def plot_data(data, labels, title='data', save=False, 
+colours=['tab:blue', 'tab:orange', 'tab:green', 'tab:red', 'tab:purple', 'tab:brown', 'tab:pink', 'tab:grey', 'tab:olive', 'tab:cyan']):
+    assert data.shape[0] == labels.shape[0]
+
+    num_samples = data.shape[0]
+    label_record = sorted(set(labels))
+    assert len(colours) >= len(label_record)
+    label_dict = {k: v for v, k in enumerate(label_record)}
+    data_record = [[] for l in label_record]
+    for i in range(num_samples):
+        data_record[label_dict[labels[i]]].append(data[i])
+    plt.title(title)
+    for t in range(len(label_record)):
+        data_t = np.array(data_record[t])
+        plt.scatter(data_t[:, 0], data_t[:, 1], c=colours[t])
+    if save:
+        plt.savefig(f'data_plotted_{title}')
+    plt.show()
+
+def plot_decision_boundary(labnum, save=False,
+colours=['tab:blue', 'tab:orange', 'tab:green', 'tab:red', 'tab:purple', 'tab:brown', 'tab:pink', 'tab:grey', 'tab:olive', 'tab:cyan']):
+    models, lab, ds, data, labels, accs =  run_lab(labnum)
+    for m in models:
+        print(m.weights)
+    ks = lab['k']
+    assert data.shape[0] == labels.shape[0]
+    label_record = sorted(set(labels))
+    num_classes = len(label_record)
+    assert len(colours) >= len(label_record)
+    label_dict = {k: v for v, k in enumerate(label_record)}
+
+    x_min, x_max = np.min(data[:, 0]) - 1, np.max(data[:, 0]) + 1
+    y_min, y_max = np.min(data[:, 1]) - 1, np.max(data[:, 1]) + 1
+    xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1), np.arange(y_min, y_max, 0.1))
+
+    assert len(ks) == 9
+
+    fig, axs = plt.subplots(3, 3, sharex='col', sharey='row', figsize=(15,12))
+    indices = [(x, y) for x in [0,1,2] for y in [0,1,2]]
+    titles = ['KNN (k=%d)' % k for k in ks]
+
+    for idx, model, title in zip(indices, models, titles):
+        Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
+        Z = Z.reshape(xx.shape)
+
+        axs[idx[0], idx[1]].contourf(xx, yy, Z, alpha=0.5)
+        colour_labels = [colours[label_dict[label]] for label in labels]
+        axs[idx[0], idx[1]].scatter(data[:,0], data[:,1], s=20, c=colour_labels, edgecolors=colour_labels)
+        axs[idx[0], idx[1]].set_title(title)
+
+    if save:
+        plt.savefig(f'decision_boundary_plotted_{num_classes}classes')
+
+    plt.show()
+
+def get_KLdiv(mean_1, cov_1, mean_2, cov_2):
+    assert len(mean_1) == len(mean_2)
+    num_dims = len(mean_1)
+    mu1 = np.array(mean_1)
+    mu2 = np.array(mean_2)
+    cov2_inv = np.linalg.inv(cov_2)
+
+    logd = np.log(np.linalg.det(cov_2) / np.linalg.det(cov_1))
+    trace_cov = np.trace(np.matmul(cov2_inv, cov_1))
+    mean_cov = (mu2 - mu1).T.dot(cov2_inv).dot((mu2 - mu1))
+    kldiv = 1/2 * (logd + trace_cov + mean_cov - num_dims)
+    return kldiv
+
+def plot_distance(labnum):
+    dists = []
+    accss = []
+    for i in range(20):
+        models, lab, ds, data, label, accs = run_lab(labnum)
+        dists.append(wasserstein_distance(ds[0], ds[1]))
+        accss.append(accs[0])
+    dtoa = zip(dists, accss)
+    dtoa = dict(sorted(dtoa, key=lambda x: x[0]))
+    fig, ax = plt.subplots()
+    ax.plot(dtoa.keys(), dtoa.values())
+    plt.show()
+def guassian_kernel(source, target, kernel_mul=2.0, kernel_num=5, fix_sigma=None):
+    import torch
+    n_samples = int(source.size()[0])+int(target.size()[0])
+    total = torch.cat([source, target], dim=0)
+    total0 = total.unsqueeze(0).expand(int(total.size(0)), int(total.size(0)), int(total.size(1)))
+    total1 = total.unsqueeze(1).expand(int(total.size(0)), int(total.size(0)), int(total.size(1)))
+    L2_distance = ((total0-total1)**2).sum(2) 
+    if fix_sigma:
+        bandwidth = fix_sigma
+    else:
+        bandwidth = torch.sum(L2_distance.data) / (n_samples**2-n_samples)
+    bandwidth /= kernel_mul ** (kernel_num // 2)
+    bandwidth_list = [bandwidth * (kernel_mul**i) for i in range(kernel_num)]
+    kernel_val = [torch.exp(-L2_distance / bandwidth_temp) for bandwidth_temp in bandwidth_list]
+    return sum(kernel_val)
+
+def mmd(source, target, kernel_mul=2.0, kernel_num=5, fix_sigma=None):
+    import torch
+    batch_size = int(source.size()[0])
+    kernels = guassian_kernel(source, target,
+        kernel_mul=kernel_mul, kernel_num=kernel_num, fix_sigma=fix_sigma)
+    XX = kernels[:batch_size, :batch_size]
+    YY = kernels[batch_size:, batch_size:]
+    XY = kernels[:batch_size, batch_size:]
+    YX = kernels[batch_size:, :batch_size]
+    loss = torch.mean(XX + YY - XY -YX)
+    return loss
+
+def wasserstein_distance(x, y):
+    from scipy.spatial.distance import cdist
+    from scipy.optimize import linear_sum_assignment
+    d = cdist(x, y)
+    assignment = linear_sum_assignment(d)
+    return d[assignment].sum() / len(x) 
+
+def maximum_mean_discrepancy(x, y):
+    import torch
+    from torch.autograd import Variable
+    X = Variable(torch.Tensor(x))
+    Y = Variable(torch.Tensor(y))
+    return mmd(X,Y).item()
+
+if __name__ == '__main__':
+    if len(sys.argv) > 1:
+        labnum = sys.argv[1] 
+        models, lab, ds, data, label, accs = run_lab(int(labnum))