diff --git a/assignment-2/submission/18307130154/README.md b/assignment-2/submission/18307130154/README.md index ff4f27113907f90f642e7a913bf80e9324f8108b..51cea514496f5e1d5ba2fae3e49da09fe4afb6ca 100644 --- a/assignment-2/submission/18307130154/README.md +++ b/assignment-2/submission/18307130154/README.md @@ -390,9 +390,11 @@ with torch.no_grad(): return tensor.uniform_(-bound, bound) ``` -同时将参数a 的值设置为5。 +~~同时将参数a 的值设置为5。~~ -### 使用numpy完成get_torch_initialization +同时将参数a 的值设置为根号5。 + +### ~~使用numpy完成get_torch_initialization~~ 修正 简单起见,我没有按照pytorch的封装方法分层实现初始化过程,后者主要为了提供多种不同的初始化方式。我直接按照线性层默认的初始方式——Kaiming均匀分布的公式用numpy实现了get_torch_initialization,其中a值取5, 代码如下: @@ -415,7 +417,7 @@ def get_torch_initialization(numpy = True): 顺便,我将utils其它函数(包括之前的mini_batch)转化为numpy版本,~~写在了numpyutils中~~现在全放在了numpy_mnist中。这样,使用这个工具包可以不使用torch包进行numpy_mnist。特别指出的是,download_mnist依然需要使用 torchvision这个包下载数据集。 -### 测试 +### ~~测试~~ 修正 在numpy_mnist替换了工具包之后重新运行,正确率和之前基本一致。 @@ -423,4 +425,64 @@ def get_torch_initialization(numpy = True): [0] Accuracy: 0.9340 [1] Accuracy: 0.9584 [2] Accuracy: 0.9684 -``` \ No newline at end of file +``` + +## 4月27日 对初始化方式的修正 + +之前提交的版本中采取和Linear层默认初始化方式相同的方式进行初始化,今天发现存在以下两方面的问题(特别感谢**彭润宇**同学的提醒): + +* Pytorch线性层采取默认初始化中,假定非线性层为**Leaky Relu**,并设置a值默认为**根号5**,而非5。前面我公式中采用了5,会造成很不好的效果。 +* 如**何恺明**论文中所述,a值代表leaky relu层负斜率,我们采用relu层,理论上a值应该取0才符合Kaiming初始化设计初衷。 + +本次修正针对上面两处问题进行修改,并补充探讨a值的选取。 + +### 修改 + +修改后的get_torch_initialization将a作为入参,并设置默认值为0,作为Relu层的Kaiming初始化方法。 + +```python +def get_torch_initialization(numpy = True,a = 0): + def Kaiming_uniform(fan_in,fan_out,a): + bound = 6.0 / (1 + a * a) / fan_in + bound = bound ** 0.5 + W = np.random.uniform(low=-bound, high=bound, size=(fan_in,fan_out)) + return W + + W1 = Kaiming_uniform(28 * 28, 256, a) + W2 = Kaiming_uniform(256, 64, a) + W3 = Kaiming_uniform(64, 10, a) + return W1,W2,W3 +``` + +### 对a值选取进行测试 + +Pytorch的Linear层默认非线性激活层为Leaky Relu,并将a设置为根号5的做法发人深思。为了比较a值选择对效果的影响,我选取不同的a值在原数据集上进行了测试(a从0到6,间隔为0.3,同时统计第1、2、3次迭代后的正确率)。但结果不甚理想,事实上结果中权重初始化方式对3轮迭代后的正确率影响很不明显,即使仅在第一轮迭代后。可以想见的原因包括: + +* 我们的模型及数据不会产生**梯度消失**或**神经元死亡**的问题 +* batch的随机性,测试次数少 + +我在img中保留了测试结果。但是对于我们的模型,还是按照何恺明在论文中指出的规则,对于Relu层使用a = 0。 + +### 一点问题 + +Pytorch对线性层的默认初始化中a值的选取令人困惑,按照何恺明指出,a值应该选择Leaky Relu层的**负斜率**,这个值应该是小于1 的正数(pytorch下层源码中是这样使用的,如下图) + +![image-20210427212809776](img/image-20210427212809776.png) + +但在linear层中将其默认值设置为根号5: + +```python +init.kaiming_uniform_(self.weight, a=math.sqrt(5)) +``` + +这两者存在矛盾,使得默认的线性层初始化中会将a=$\sqrt{5}$代入公式: +$$ +bound = \sqrt[]{\frac{6}{(1 + a ^2) \times fan\_in}} +$$ +得到一个较小的bound。 + +曾有多名国内外网友提及这个问题,目前我没有看到这个问题合理的解释,其中一个讨论的地址: + +https://github.com/pytorch/pytorch/issues/15314 + +我认为这有可能是Pytorch(version 3)的一处歧义甚至错误。 \ No newline at end of file diff --git a/assignment-2/submission/18307130154/img/image-20210427200512951.png b/assignment-2/submission/18307130154/img/image-20210427200512951.png new file mode 100644 index 0000000000000000000000000000000000000000..43189faca346fc18e2938d53d39691aea37c954e Binary files /dev/null and b/assignment-2/submission/18307130154/img/image-20210427200512951.png differ diff --git a/assignment-2/submission/18307130154/img/image-20210427203245993.png b/assignment-2/submission/18307130154/img/image-20210427203245993.png new file mode 100644 index 0000000000000000000000000000000000000000..52cfc7d3907638f1502a6a89866f44a6af6b73bd Binary files /dev/null and b/assignment-2/submission/18307130154/img/image-20210427203245993.png differ diff --git a/assignment-2/submission/18307130154/img/image-20210427203300617.png b/assignment-2/submission/18307130154/img/image-20210427203300617.png new file mode 100644 index 0000000000000000000000000000000000000000..24b35eed4c9f022a11991135806034b706dec21c Binary files /dev/null and b/assignment-2/submission/18307130154/img/image-20210427203300617.png differ diff --git a/assignment-2/submission/18307130154/img/image-20210427203337433.png b/assignment-2/submission/18307130154/img/image-20210427203337433.png new file mode 100644 index 0000000000000000000000000000000000000000..912b1ca130c033a9ba33e0f0b30254843241c5bc Binary files /dev/null and b/assignment-2/submission/18307130154/img/image-20210427203337433.png differ diff --git a/assignment-2/submission/18307130154/img/image-20210427205224362.png b/assignment-2/submission/18307130154/img/image-20210427205224362.png new file mode 100644 index 0000000000000000000000000000000000000000..1bb5da48837686d89da73925b935accbe5454c17 Binary files /dev/null and b/assignment-2/submission/18307130154/img/image-20210427205224362.png differ diff --git a/assignment-2/submission/18307130154/img/image-20210427205245840.png b/assignment-2/submission/18307130154/img/image-20210427205245840.png new file mode 100644 index 0000000000000000000000000000000000000000..4ec5e96e75e7987a6d12d4977a49205c03ca923a Binary files /dev/null and b/assignment-2/submission/18307130154/img/image-20210427205245840.png differ diff --git a/assignment-2/submission/18307130154/img/image-20210427205308848.png b/assignment-2/submission/18307130154/img/image-20210427205308848.png new file mode 100644 index 0000000000000000000000000000000000000000..060021006b29d7907d064146375f28b30079459e Binary files /dev/null and b/assignment-2/submission/18307130154/img/image-20210427205308848.png differ diff --git a/assignment-2/submission/18307130154/img/image-20210427212809776.png b/assignment-2/submission/18307130154/img/image-20210427212809776.png new file mode 100644 index 0000000000000000000000000000000000000000..d0e834c5023e6ce211c264c0c386a97af8e21172 Binary files /dev/null and b/assignment-2/submission/18307130154/img/image-20210427212809776.png differ diff --git a/assignment-2/submission/18307130154/numpy_mnist.py b/assignment-2/submission/18307130154/numpy_mnist.py index 594a47732e4a3414a9ae1fd33cfc85fe2d1630a7..1abc1e73eef32967faa94c5f1d93f20f8ae96d2d 100644 --- a/assignment-2/submission/18307130154/numpy_mnist.py +++ b/assignment-2/submission/18307130154/numpy_mnist.py @@ -3,9 +3,8 @@ from numpy_fnn import NumpyModel, NumpyLoss import numpy as np from matplotlib import pyplot as plt -def get_torch_initialization(numpy = True): +def get_torch_initialization(numpy = True,a=0): - a = 5 def Kaiming_uniform(fan_in,fan_out,a): bound = 6.0 / (1 + a * a) / fan_in