# Ca1mDeepLearning **Repository Path**: ccjabc/ca1m-deep-learning ## Basic Information - **Project Name**: Ca1mDeepLearning - **Description**: 深度学习笔记。 - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 1 - **Created**: 2025-10-21 - **Last Updated**: 2025-10-21 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # 机器学习基础 ## 重要基础 过拟合:学习器把训练样本学得"太 好"了的时候,很可能巳经把训练样本自身的一些特点当作了所有潜在样本都 会具有的一般性质,这样就会导致泛化性能下降。 欠拟合:训练样本的一般性质尚未”学好“。 ![image.png](./res/img1.png) 如何测试学习器的泛化误差(新样本上的误差)? → 在测试集上得到的”测试误差“作为泛化误差的近似值。(测试集和训练集应尽可能互斥) - 留出法:直接将数据集划分为两个互斥的集合,其中一个集合作为训练集S,另一个作为测试集T,而在划分过程中尽可能保持数据分布的一致性。(例如数据集有1000个样本,500个正例,500个反例,要求S占70%,T占30%,则S中包含350个正例,350个反例,但此处350个样本可能是前350个,亦可能是后350个,训练结果会有一定差别 → 随即划分、重复实验等减轻影响) - 交叉验证法:将数据集划分为k个大小相似的互斥子集,每个子集亦尽可能保持数据一致性。 ![image.png](./res/img2.png) 大多数学习算法都有参数(parameter)需要设置,参数类型不同,模型的性能会有一定差异,实际中常用方法为在某范围内设定步长,结合算法目的尽可能确定效果更佳的参数。 预测任务中的性能度量:样例集$D = {(x_1,y_1),(x_2,y_2),...,(x_m,y_m)}$,其中$y_i$是$x_i$的真实标记,则使用均方误差来评估学习器f的性能。 - m个样本时:$E(f;D) = \frac{1}{m}\sum_{i=1}^m(f(x_i)-y_i)^2.$ - 更一般情形(数据分布D和概率密度函数p):$E(f;D) = \int_{x \in D}(f(x)-y)^2p(x)dx.$ 分类任务中的性能度量(二分类任务或多分类任务,以二分类任务为例):错误率是分类错误的样本数占样本总数的比例,精度则是分类正确的样本数占样本总数的比例 。 - TP:真正例,即预测为正确且真实亦为正确的样本数;FP:假正例,即预测为正确但真实为错误的样本数;FN:假反例,即预测为错误但真实为正确的样本数;TN:真反例,即预测为错误且真实亦为错误的样本数。 ![image.png](./res/img3.png) - 例如原始数据有100个样本,值为1或0,1为正确标志,0为错误标志,真实情况:60个正确样本,40个错误样本。某个二分类器预测结果为70个正确样本和30个错误样本,70个正确样本中有50个样本真实亦为正确样本,则20个真实为错误样本,即TP为50,FP为20,因此FN为10,TN为20,因此查准率$P = \frac{50}{50 + 20}$,查全率$R = \frac{50}{50+10}.$ - 一般而言,查准率和查全率是相互矛盾的,查准率高时,查全率往往偏低;而查全率高时,查准率往往偏低。例如某二分类器若希望查全率较高,可以预测所有样本均为正确样本,则查全率为100%,查准率仅为60%;若希望查准率较高,“尽可能”预测最有可能为正确的样本,预测30个最有把握的为正确样本,真实亦为正确样本,但会忽略一些正确样本,TP为30,FP为0,FN为30,TN为40,因此查准率为100%,而查全率仅为50%。 - P-R曲线:以查准率为纵轴,查全率为横轴作图。若一个学习器的P-R曲线被另一个学习器的曲线完全“包住”,则可断言后者的性能优于前者,例如下图学习器A的性能优于学习器C;而P-R曲线交叉时难以断言,则可以借助“平衡点BEP”指标(查准率=查全率时的取值),下图学习器B的BEP约为0.72,而学习器A的BEP是0.8,则基于BEP(A的BEP更大,一定程度上表征器双率更高),可认为学习器A优于学习器B。 ![image.png](./res/img4.png) - 学习器在测试样本时产生一个实值或概率预测,将该预测值与一个分类阈值比较,若大于阈值则为正类,否则为反类。例如:神经网络在一般情况下对每个测试样本预测出一个[0.0, 1.0]之间的实值,然后将这个值与0.5进行比较,大于0.5则为正例,否则为反例。于不同任务中,根据实值进行排序,最可能的在前,可以根据任务需求来采用不同的“截断点”,若更重视“查准率”,则可选择靠前的位置进行截断;若更重视“查全率”,则可选择靠后的位置进行截断。 - ROC — “受试者工作特征”(Receiver Operating Characteristic)曲线,研究学习器泛化性能。ROC曲线的纵轴是“真正例率”:$TPR = \frac{TP}{TP + FN}$,横轴是“假正例率”:$FPR = \frac{FP}{TN + FN}$.若一个学习器的ROC曲线被另一个学习器的曲线完全“包住”,则可断言后者的性能优于前者,若二者交叉,则比较ROC曲线下的面积(AUC)。 ![image.png](./res/img5.png) 统计假设检验(hypothesis test)为比较学习器性能提供了重要依据(衡量泛化性能较复杂:测试集的性能只是泛化性能的近似;测试性能随着测试集的变化而变化;很多机器学习算法本身亦有一定随机性),基于假设检验结果可推断:若在测试集上观察到学习器A比B好,则A的泛化性能是否在统计意义上优于B,以及这个结论的把握有多大。 - 交叉验证t检验:对两个学习器A和B,若使用k折交叉验证法得到的测试错误率分别为$(\epsilon_1^A,\epsilon_2^A,...,\epsilon_k^A)$和$(\epsilon_1^B,\epsilon_2^B,...,\epsilon_k^B)$,其中$\epsilon_i^A$和$\epsilon_i^B$是在相同的第i折训练/测试集上得到的结果,则可用k折交叉验证“成对t检验”来进行比较检验。 ![image.png](./res/img6.png) - McNemar检验 ![image.png](./res/img7.png) “偏差-方差分解”:探究某学习器为什么会具有某程度的泛化性能。对测试样本$x$,令$y_D$为$x$在数据集中的标记,$y$为$x$的真是标记,$f(x;D)$为训练集D上学得模型$f$在$x$上的预测输出。以回归任务为例。 ![image.png](./res/img8.png) 分解可得$E(f;D)=bias^2(x)+var(x)+\epsilon^2$,即泛化误差可分解为偏差、方差与噪声之和。由偏差、方差、噪声定义可知:方差度量了同样大小的训练集的变动所导致的学习性能的变化;噪声表达了在当前任务上任何学习算法所能达到的期望泛化误差的下界;偏差度量了学习算法的期望预测与真实结果的偏离程度。(为了更好的泛化性能,则需偏差较小,即能够充分拟合数据,并且方差较小,即使数据扰动产生的影响较小)一般而言,偏差与方差是有冲突的,称为偏差-方差窘境,根据以下示意图可知:在训练不足时,学习器的拟合能力不够强,训练数据的扰动不足以使学习器产生显著变化,此时偏差主导了泛化误差;随着训练程度的加深,学习器的拟合能力逐渐增强,训练数据发生的扰动渐渐被学习器学到,方差逐渐主导了繁华错误率。 ![image.png](./res/img9.png) ## 线性模型 ![image.png](./res/img10.png) 线性回归目标:$f(x_i)=wx_i+b,s.t.f(x_i)\approx y_i.$ - 离散属性的处理:若有”序“,则连续化;否则,转化为k维向量。 - 均方误差最小化:$(w^*,b^*)=\underset{w,b}{\arg \min}\sum_{i=1}^m(f(x_i)-y_i)^2.$ 线性模型的变化 → 逼近非线性模型,如下: ![image.png](./res/img11.png) - 广义线性模型:$y=g^{-1}(w^Tx+b).$其中$g^{-1}$是单调可微的联系函数,例如$g()=ln()$时得到对数线性回归。 线性回归解决分类问题,以二分类为例 → 对率回归,如下: ![image.png](./res/img12.png) ![image.png](./res/img13.png) ## 神经网络 ![image.png](./res/img14.png) M-P神经元模型 - 多个神经元按一定的结构连接起来,即是神经网络 ![image.png](./res/img15.png) - 感知机由两层神经元组成,输入层接收外界输入信号后传递给输出层(M-P神经元) - 感知机只有输出层神经元进行激活函数处理,即只有一层功能神经元,其学习能力非常有限 ![image.png](./res/img16.png) - 感知机无法解决图5.4(d)所示的异或问题 ![image.png](./res/img17.png) - 输入层神经元接收外界输入,隐藏层与输出层神经元对信号进行加工,最终结果由输出神经元输出 - 输入层不进行函数处理,故(a)常被称为“两层网络” or “单隐层网络” 训练多层网络 → 误差逆传播算法(BP),“BP网络”一般指用BP算法训练的多层前馈神经网络。 ![image.png](./res/img18.png) - 训练集$D = {(x_1, y_1),(x2, y2), ... ,(x_m, y_m)},x_i \in R^d, y_i \in R^l.$即输入示例由$d$个属性描述,输出$l$维实值向量 - $d$个输入神经元,$l$个输出神经元,$q$个隐层神经元的多层前馈网络 - 输出层第$j$个神经元的阈值$\theta_j$,隐层第h个神经元的阈值$\gamma_h$ - 输入层第$i$个神经元与隐层第$h$个神经元之间的连接权为$v_{ih}$,隐层第$h$个神经元与输出层第$j$个神经元之间的连接权为$w_{hj}$ - 隐层第$h$个神经元的输入为$\alpha_h = \sum_{i=1}^d{v_{ih}x_i}$,输出层第$j$个神经元接收到的输入为$\beta_j = \sum_{h=1}^q{w_{hj}b_h}$,其中$b_h$为第$h$个神经元的输出 - 网络共$d * q + q * l + q + l = (d+l+1)q +l$个参数需确定 - BP算法是基于梯度下降策略,以最小化训练集D上的积累误差为目的 其他常见神经网络 - RBF网络 → 单隐层前馈神经网络 - ART网络 → 无监督学习网络 - SOM网络 → 无监督学习网络 - Elman网络 → 递归神经网络 - Boltzmann机 - 级联相关网络 → 结构自适应网络,希望在训练过程中找到最符合数据特点的网络结构。 ![image.png](./res/img19.png) # 神经网络基础 — Pytorch ![new41.jpg](./res/img20.jpg) - 输入图片大小:224 * 224 - 输入图片channels = 3,RGB三个通道 ## 卷积 — Convolution 卷积的目的:提取图像特征。 ```python import torch import torch.nn.functional as F # 输入 → 5 * 5 input = torch.tensor([[1, 2, 0, 3, 1], [0, 1, 2, 3, 1], [1, 2, 1, 0, 0], [5, 2, 3, 1, 1], [2, 1, 0, 1, 1]]) # 卷积核 → 3 * 3 kernel = torch.tensor([[1, 2, 1], [0, 1, 0], [2, 1, 0]]) # 尺寸转换 → satisfy卷积函数要求 input = torch.reshape(input, (1, 1, 5, 5)) # [5, 5] → [1, 1, 5, 5] kernel = torch.reshape(kernel, (1, 1, 3, 3)) # [3, 3] → [1, 1, 3, 3] # 卷积(步长stride为1) output = F.conv2d(input, kernel, stride=1) print("stride=1, output = ", output) # 卷积(步长stride为1,向上下左右填充1格,默认值为0) output = F.conv2d(input, kernel, stride=1, padding=1) print("stride=1, padding=1, output = ", output) ``` ```python import torch import torchvision from torch import nn from torch.nn import Conv2d from tensorboardX import SummaryWriter from torch.utils.data import DataLoader # 准备数据集 dataset = torchvision.datasets.CIFAR10("./data", train=False, transform=torchvision.transforms.ToTensor(), download=True) # 抽取数据,以64张为一组 dataloader = DataLoader(dataset, batch_size=64) # 搭建神经网络 class Pixel(nn.Module): # 卷积层 def __init__(self): super(Pixel, self).__init__() # 彩色图像 → in_channel为3 # out_channel为6, kernel_size为3 # if想保留图像原尺寸,则需要采取一定填充! self.conv1 = Conv2d(in_channels=3, out_channels=6, kernel_size=3, stride=1, padding=0) def forward(self, x): # 将x置于卷积层中 x = self.conv1(x) return x # 实例化 pixel = Pixel() # print(pixel) writer = SummaryWriter("./logs") # 便于可视化 step = 0 for data in dataloader: imgs, targets = data output = pixel(imgs) # print(imgs.shape) # torch.Size([64, 3, 32, 32]) # print(output.shape) # torch.Size([64, 6, 30, 30]) writer.add_images("input", imgs, step) # torch.Size([64, 6, 30, 30]) → torch.Size([..., 3, 30, 30]) output = torch.reshape(output, (-1, 3, 30, 30)) writer.add_images("output", output, step) step = step + 1 writer.close() # 运行结束后,使用tensorboard --logdir=logs可视化查看。 ``` ## 池化 — Pooling 池化的目的:保留输入的特征同时减小数据量。(效果:类似于1080p的视频”池化“为720p的视频) 常见的有最大池化 - maxpooling、平均池化 - avgpooling等。 ```python import torch from torch import nn from torch.nn import MaxPool2d # 输入 → 5 * 5 input = torch.tensor([[1, 2, 0, 3, 1], [0, 1, 2, 3, 1], [1, 2, 1, 0, 0], [5, 2, 3, 1, 1], [2, 1, 0, 1, 1]], dtype=torch.float32) # 尺寸转换 (N, C, H, W) <-> ("batch_size" , "channel", "height", "width") # "batch_size" → 每组图片数, -1表示自动匹配 # "channel" → 通道数 input = torch.reshape(input, (-1, 1, 5, 5)) # [5, 5] → [1, 1, 5, 5] class Pixel(nn.Module): def __init__(self): super(Pixel, self).__init__() # 最大池化层 self.maxpool1 = MaxPool2d(kernel_size=3, ceil_mode=True) def forward(self, input): output = self.maxpool1(input) return output pixel = Pixel() output = pixel(input) print(output) # [[2, 3], [5, 1]] if ceil_mode=False, output = [2] ``` ```python import torch import torchvision from torch import nn from torch.nn import MaxPool2d from tensorboardX import SummaryWriter from torch.utils.data import DataLoader # 准备数据集 dataset = torchvision.datasets.CIFAR10("./data", train=False, transform=torchvision.transforms.ToTensor(), download=True) # 抽取数据,以64张为一组 dataloader = DataLoader(dataset, batch_size=64) class Pixel(nn.Module): def __init__(self): super(Pixel, self).__init__() # 最大池化层 self.maxpool1 = MaxPool2d(kernel_size=3, ceil_mode=True) def forward(self, input): output = self.maxpool1(input) return output pixel = Pixel() writer = SummaryWriter("./logs_maxpool") step = 0 for data in dataloader: imgs, targets = data # print(imgs.shape) # torch.Size([64, 3, 32, 32]) writer.add_images("input", imgs, step) output = pixel(imgs) # print(output.shape) # torch.Size([64, 3, 11, 11]) writer.add_images("output", output, step) step = step + 1 writer.close() # tensorboard --logdir=logs_maxpool ``` ![image.png](./res/img21.png) ## 非线性激活 — Activation 非线性激活目的:对网络中引入非线性特征。 ReLU激活函数:$ReLU(x) = max(0, x)$ ![image.png](./res/img22.png) ```python import torch from torch import nn input = torch.tensor([[1, -0.5], [-1, 3]]) input = torch.reshape(input, (-1, 1, 2, 2)) class Pixel(nn.Module): def __init__(self): super(Pixel, self).__init__() self.relu1 = nn.ReLU() def forward(self, input): output = self.relu1(input) return output pixel = Pixel() output = pixel(input) print(output) # [[1, 0], [0, 3]] ``` Sigmoid激活函数:$Sigmoid(x) = \frac {1}{1 + exp(-x)}$ ![image.png](./res/img23.png) ```python import torch import torchvision from torch import nn from tensorboardX import SummaryWriter from torch.utils.data import DataLoader # 准备数据集 dataset = torchvision.datasets.CIFAR10("./data", train=False, transform=torchvision.transforms.ToTensor(), download=True) # 抽取数据,以64张为一组 dataloader = DataLoader(dataset, batch_size=64) class Pixel(nn.Module): def __init__(self): super(Pixel, self).__init__() # sigmoid激活 self.sigmoid1 = nn.Sigmoid() def forward(self, input): output = self.sigmoid1(input) return output pixel = Pixel() writer = SummaryWriter("./logs_sigmoid") step = 0 for data in dataloader: imgs, targets = data writer.add_images("input", imgs, step) output = pixel(imgs) writer.add_images("output", output, step) step = step + 1 writer.close() # tensorboard --logdir=logs_sigmoid ``` ![image.png](./res/img24.png) ## 全连接 — Fully Connected 全连接层的目的:**将输入的特征表示整合成一个向量,确定最终分类**。 ```python import torch import torchvision from torch import nn from tensorboardX import SummaryWriter from torch.utils.data import DataLoader # 准备数据集 dataset = torchvision.datasets.CIFAR10("./data", train=False, transform=torchvision.transforms.ToTensor(), download=True) # 抽取数据,以64张为一组 dataloader = DataLoader(dataset, batch_size=64) class Pixel(nn.Module): def __init__(self): super(Pixel, self).__init__() # 线性层 self.linear1 = nn.Linear(196608, 10) def forward(self, input): output = self.linear1(input) return output pixel = Pixel() for data in dataloader: imgs, targets = data print(imgs.shape) # output = torch.reshape(imgs, (1, 1, 1, -1)) # 展平 output = torch.flatten(imgs) print(output.shape) output = pixel(output) print(output.shape) ``` ## 神经网络搭建实战 以CIFAR10数据集为例,其中包含10个数据类别,网络如下: ![image.png](./res/img25.png) - 1 — 卷积层:输入通道为3,输出通道为32,卷积核大小为5,为保留尺寸,padding为2。 - 2 — 最大池化层:核函数尺寸为2。 - ...... - 7 — Flatten层:64 * 4 * 4 → 1024 * 1 * 1。 - 8 — 全连接层:1024 → 64 → 10. ```cpp import torch from torch import nn class Pixel(nn.Module): def __init__(self): super(Pixel, self).__init__() # 方式1 # # 卷积层:输入通道为3,输出通道为32,卷积核大小为5,为保留尺寸,padding为2 # self.conv1 = nn.Conv2d(3, 32, 5, padding=2) # # 最大池化层,核尺寸为2 * 2 # self.maxpool1 = nn.MaxPool2d(2) # self.conv2 = nn.Conv2d(32, 32, 5, padding=2) # self.maxpool2 = nn.MaxPool2d(2) # self.conv3 = nn.Conv2d(32, 64, 5, padding=2) # self.maxpool3 = nn.MaxPool2d(2) # # Flatten层:64 * 4 * 4 → 1024 * 1 * 1 # self.flatten = nn.Flatten() # # 全连接层 # self.linear1 = nn.Linear(1024, 64) # self.linear2 = nn.Linear(64, 10) # 方式2 self.model1 = nn.Sequential( nn.Conv2d(3, 32, 5, padding=2), nn.MaxPool2d(2), nn.Conv2d(32, 32, 5, padding=2), nn.MaxPool2d(2), nn.Conv2d(32, 64, 5, padding=2), nn.MaxPool2d(2), nn.Flatten(), nn.Linear(1024, 64), nn.Linear(64, 10) ) def forward(self, x): # 方式1 # x = self.conv1(x) # x = self.maxpool1(x) # x = self.conv2(x) # x = self.maxpool2(x) # x = self.conv3(x) # x = self.maxpool3(x) # x = self.flatten(x) # x = self.linear1(x) # x = self.linear2(x) # 方式2 x = self.model1(x) return x pixel = Pixel() print(pixel) # 测试搭建的网络 input = torch.ones(64, 3, 32, 32) output = pixel(input) print(output.shape) ``` ## 损失函数与反向传播 损失函数:衡量实际输出和目标之间的差距;为更新输出提供一定依据(反向传播)。 网络使用优化器的流程: 1. 选择合适的优化器,比如随机梯度下降SGD优化器,并指定网络参数和学习率等; 2. 在每步优化中,首先梯度清零(防止上一步的梯度干扰),其次反向传播(计算每个节点的梯度),最后进行参数调优。 ```cpp import torch import torchvision from torch import nn from torch.utils.data import DataLoader # 加载数据集 dataset = torchvision.datasets.CIFAR10("./data", train=False, transform=torchvision.transforms.ToTensor(), download=True) dataloader = DataLoader(dataset, batch_size=1) class Pixel(nn.Module): def __init__(self): super(Pixel, self).__init__() self.model1 = nn.Sequential( nn.Conv2d(3, 32, 5, padding=2), nn.MaxPool2d(2), nn.Conv2d(32, 32, 5, padding=2), nn.MaxPool2d(2), nn.Conv2d(32, 64, 5, padding=2), nn.MaxPool2d(2), nn.Flatten(), nn.Linear(1024, 64), nn.Linear(64, 10) ) def forward(self, x): x = self.model1(x) return x pixel = Pixel() # 损失函数 loss = nn.CrossEntropyLoss() # 优化器 → 以随机梯度下降优化器为例,根据每一步的梯度更新参数 optim = torch.optim.SGD(pixel.parameters(), lr=0.01) # 指定参数 & 学习率 for data in dataloader: imgs, targets = data outputs = pixel(imgs) # 网络输出 # 计算损失 result_loss = loss(outputs, targets) # print(result_loss) # 梯度清零 → 防止上一步的梯度干扰 optim.zero_grad() # 反向传播 → 计算每个节点的梯度 result_loss.backward() # 参数调优 optim.step() # 添加外循环对比调优效果 # for epoch in range(20): # running_loss = 0.0 # for data in dataloader: # imgs, targets = data # outputs = pixel(imgs) # 网络输出 # # 计算损失 # result_loss = loss(outputs, targets) # # print(result_loss) # # 梯度清零 → 防止上一步的梯度干扰 # optim.zero_grad() # # 反向传播 → 计算每个节点的梯度 # result_loss.backward() # # 参数调优 # optim.step() # running_loss = running_loss + result_loss # print(running_loss) ``` ## 完整的模型训练套路 以CIFAR10数据集为例,分类神经网络如下: ![image.png](./res/img37.png) ```python import torch import torchvision from torch import nn from torch.utils.data import DataLoader # 准备数据集 train_data = torchvision.datasets.CIFAR10(root="./data", train=True, transform=torchvision.transforms.ToTensor(), download=True) test_data = torchvision.datasets.CIFAR10(root="./data", train=False, transform=torchvision.transforms.ToTensor(), download=True) # length 长度 train_data_size = len(train_data) test_data_size = len(test_data) print("训练数据集的长度为:{}".format(train_data_size)) print("测试数据集的长度为:{}".format(test_data_size)) # 应用 DataLoader 来加载数据集 train_dataloader = DataLoader(train_data, batch_size=64) test_dataloader = DataLoader(test_data, batch_size=64) # 搭建神经网络 -> 10分类网络 class Pixel(nn.Module): def __init__(self): super(Pixel, self).__init__() self.model = nn.Sequential( nn.Conv2d(3, 32, 5, padding=2), nn.MaxPool2d(2), nn.Conv2d(32, 32, 5, padding=2), nn.MaxPool2d(2), nn.Conv2d(32, 64, 5, padding=2), nn.MaxPool2d(2), nn.Flatten(), nn.Linear(1024, 64), nn.Linear(64, 10) ) def forward(self, x): x = self.model(x) return x if __name__ == '__main__': pixel = Pixel() input = torch.ones(64, 3, 32, 32) output = pixel(input) print(output.shape) ``` ```python import torch from model import * from torch import nn from tensorboardX import SummaryWriter # 创建网络模型 pixel = Pixel() # 损失函数 loss_fn = nn.CrossEntropyLoss() # 优化器 learning_rate = 1e-2 # 学习速率 optimizer = torch.optim.SGD(pixel.parameters(), lr=learning_rate) # 设置训练网络的一些参数 total_train_step = 0 # 记录训练的次数 total_test_step = 0 # 记录测试的次数 epoch = 30 # 训练的轮数 # 添加tensorboard writer = SummaryWriter("./logs_train") for i in range(epoch): print("----------第 {} 轮训练开始----------".format(i + 1)) # 训练步骤开始 pixel.train() # 模型训练状态 -> 对某些层有作用 for data in train_dataloader: imgs, targets = data outputs = pixel(imgs) loss = loss_fn(outputs, targets) # 优化器优化模型 optimizer.zero_grad() # 梯度清零 loss.backward() optimizer.step() total_train_step = total_train_step + 1 if total_train_step % 100 == 0: print("训练次数:{}, Loss:{}".format(total_train_step, loss.item())) writer.add_scalar("train_loss", loss.item(), total_train_step) # 测试步骤开始 pixel.eval() # 模型验证状态 -> 对某些层有作用 total_test_loss = 0 total_accuracy = 0 with torch.no_grad(): for data in test_dataloader: imgs, targets = data outputs = pixel(imgs) loss = loss_fn(outputs, targets) total_test_loss = total_test_loss + loss accuracy = (outputs.argmax(1) == targets).sum() total_accuracy = total_accuracy + accuracy print("整体测试集上的Loss:{}".format(total_test_loss)) print("整体测试集上的正确率:{}".format(total_accuracy / test_data_size)) total_test_step = total_test_step + 1 writer.add_scalar("test_loss", total_test_loss, total_test_step) writer.add_scalar("test_accuracy", total_accuracy / test_data_size, total_test_step) # 保存模型 torch.save(pixel, "pixel_{}.pth".format(i + 1)) # torch.save(pixel.state_dict(), "pixel_{}.pth".format(i)) print("模型已保存") writer.close() # tensorboard --logdir=logs_train ``` ```python import torch import torchvision from torch import nn from torch.utils.data import DataLoader from tensorboardX import SummaryWriter # 准备数据集 train_data = torchvision.datasets.CIFAR10(root="./data", train=True, transform=torchvision.transforms.ToTensor(), download=True) test_data = torchvision.datasets.CIFAR10(root="./data", train=False, transform=torchvision.transforms.ToTensor(), download=True) # length 长度 train_data_size = len(train_data) test_data_size = len(test_data) print("训练数据集的长度为:{}".format(train_data_size)) print("测试数据集的长度为:{}".format(test_data_size)) # 应用 DataLoader 来加载数据集 train_dataloader = DataLoader(train_data, batch_size=64) test_dataloader = DataLoader(test_data, batch_size=64) # 搭建神经网络 -> 10分类网络 class Pixel(nn.Module): def __init__(self): super(Pixel, self).__init__() self.model = nn.Sequential( nn.Conv2d(3, 32, 5, padding=2), nn.MaxPool2d(2), nn.Conv2d(32, 32, 5, padding=2), nn.MaxPool2d(2), nn.Conv2d(32, 64, 5, padding=2), nn.MaxPool2d(2), nn.Flatten(), nn.Linear(1024, 64), nn.Linear(64, 10) ) def forward(self, x): x = self.model(x) return x # 指定训练设备 device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") print("device:{}".format(device)) # 创建网络模型 pixel = Pixel() pixel = pixel.to(device) # 损失函数 loss_fn = nn.CrossEntropyLoss() loss_fn = loss_fn.to(device) # 优化器 learning_rate = 1e-2 # 学习速率 optimizer = torch.optim.SGD(pixel.parameters(), lr=learning_rate) # 设置训练网络的一些参数 total_train_step = 0 # 记录训练的次数 total_test_step = 0 # 记录测试的次数 epoch = 30 # 训练的轮数 # 添加tensorboard writer = SummaryWriter("./logs_train") for i in range(epoch): print("----------第 {} 轮训练开始----------".format(i + 1)) # 训练步骤开始 pixel.train() # 模型训练状态 -> 对某些层有作用 for data in train_dataloader: imgs, targets = data imgs = imgs.to(device) targets = targets.to(device) outputs = pixel(imgs) loss = loss_fn(outputs, targets) # 优化器优化模型 optimizer.zero_grad() # 梯度清零 loss.backward() optimizer.step() total_train_step = total_train_step + 1 if total_train_step % 100 == 0: print("训练次数:{}, Loss:{}".format(total_train_step, loss.item())) writer.add_scalar("train_loss", loss.item(), total_train_step) # 测试步骤开始 pixel.eval() # 模型验证状态 -> 对某些层有作用 total_test_loss = 0 total_accuracy = 0 with torch.no_grad(): for data in test_dataloader: imgs, targets = data imgs = imgs.to(device) targets = targets.to(device) outputs = pixel(imgs) loss = loss_fn(outputs, targets) total_test_loss = total_test_loss + loss accuracy = (outputs.argmax(1) == targets).sum() total_accuracy = total_accuracy + accuracy print("整体测试集上的Loss:{}".format(total_test_loss)) print("整体测试集上的正确率:{}".format(total_accuracy / test_data_size)) total_test_step = total_test_step + 1 writer.add_scalar("test_loss", total_test_loss, total_test_step) writer.add_scalar("test_accuracy", total_accuracy / test_data_size, total_test_step) # 保存模型 torch.save(pixel, "./pth/pixel_{}.pth".format(i + 1)) # torch.save(pixel.state_dict(), "pixel_{}.pth".format(i)) print("模型已保存") writer.close() # tensorboard --logdir=logs_train ``` ## 经典的图像分类模型 ### 数据集目录 以花分类数据集为例(5种类别) ![image.png](./res/img36.png) ```python import os from shutil import copy, rmtree import random def mk_file(file_path: str): if os.path.exists(file_path): # 如果文件夹存在,则先删除原文件夹在重新创建 rmtree(file_path) os.makedirs(file_path) def main(): # 保证随机可复现 random.seed(0) # 将数据集中10%的数据划分到验证集中 split_rate = 0.1 # 指向你解压后的flower_photos文件夹 cwd = os.getcwd() data_root = os.path.join(cwd, "flower_data") origin_flower_path = os.path.join(data_root, "flower_photos") assert os.path.exists(origin_flower_path), "path '{}' does not exist.".format(origin_flower_path) flower_class = [cla for cla in os.listdir(origin_flower_path) if os.path.isdir(os.path.join(origin_flower_path, cla))] # 建立保存训练集的文件夹 train_root = os.path.join(data_root, "train") mk_file(train_root) for cla in flower_class: # 建立每个类别对应的文件夹 mk_file(os.path.join(train_root, cla)) # 建立保存验证集的文件夹 val_root = os.path.join(data_root, "val") mk_file(val_root) for cla in flower_class: # 建立每个类别对应的文件夹 mk_file(os.path.join(val_root, cla)) for cla in flower_class: cla_path = os.path.join(origin_flower_path, cla) images = os.listdir(cla_path) num = len(images) # 随机采样验证集的索引 eval_index = random.sample(images, k=int(num*split_rate)) for index, image in enumerate(images): if image in eval_index: # 将分配至验证集中的文件复制到相应目录 image_path = os.path.join(cla_path, image) new_path = os.path.join(val_root, cla) copy(image_path, new_path) else: # 将分配至训练集中的文件复制到相应目录 image_path = os.path.join(cla_path, image) new_path = os.path.join(train_root, cla) copy(image_path, new_path) print("\r[{}] processing [{}/{}]".format(cla, index+1, num), end="") # processing bar print() print("processing done!") if __name__ == '__main__': main() ``` ### AlexNet网络 ![image.png](./res/img26.png) - 作者使用了2块GPU并行运算,即AlexNet网络上下部分相同 - Conv1:input_size为[224, 224, 3],kernels为48 * 2,kernel_size为11,padding为[1, 2],stride为4,output_size为[55, 55, 96](注:卷积后矩阵尺寸大小计算公式$N = (W - F + 2P) / S + 1$其中W为输入图片大小W * W,F,步长S和Padding P,$ex:(224 - 11 + 1 + 2) / 4 + 1 = 55.$) - MaxPool1:input_size为[55, 55, 96],output_size为[27, 27, 96],$(55 - 3) / 2 + 1 = 27.$ - Conv2:input_size为[27, 27, 96],output_size为[27, 27, 256],$(27 - 5 + 4) / 1 + 1 = 27.$ - MaxPool2:input_size为[27, 27, 256],output_size为[13, 13, 256] - Conv3:input_size为[13, 13, 256], output_size为[13, 13, 384] - Conv4:input_size和output_size均为[13, 13, 384] - Conv5:input_size为[13, 13, 384], output_size为[13, 13, 256] - MaxPool3:input_size为[13, 13, 256], output_size为[6, 6, 256] - 最终层节点是1000,即数据集有1000个类别,实际应用时,要根据数据集的类别进行修改。 ```python # AlexNet网络上下部分相同,因此考虑一部分进行训练亦可! import torch.nn as nn import torch class AlexNet(nn.Module): def __init__(self, num_classes=1000, init_weights=False): super(AlexNet, self).__init__() self.features = nn.Sequential( nn.Conv2d(3, 48, kernel_size=11, stride=4, padding=2), # input[3, 224, 224] output[48, 55, 55] # ReLU激活函数 nn.ReLU(inplace=True), nn.MaxPool2d(kernel_size=3, stride=2), # output[48, 27, 27] nn.Conv2d(48, 128, kernel_size=5, padding=2), # output[128, 27, 27] nn.ReLU(inplace=True), nn.MaxPool2d(kernel_size=3, stride=2), # output[128, 13, 13] nn.Conv2d(128, 192, kernel_size=3, padding=1), # output[192, 13, 13] nn.ReLU(inplace=True), nn.Conv2d(192, 192, kernel_size=3, padding=1), # output[192, 13, 13] nn.ReLU(inplace=True), nn.Conv2d(192, 128, kernel_size=3, padding=1), # output[128, 13, 13] nn.ReLU(inplace=True), nn.MaxPool2d(kernel_size=3, stride=2), # output[128, 6, 6] ) self.classifier = nn.Sequential( # Dropout进行随机失活:以50%的比例随机失火神经元 nn.Dropout(p=0.5), nn.Linear(128 * 6 * 6, 2048), nn.ReLU(inplace=True), nn.Dropout(p=0.5), nn.Linear(2048, 2048), nn.ReLU(inplace=True), nn.Linear(2048, num_classes), ) # 若传入时init_weights为True,则初始化权重 if init_weights: self._initialize_weights() # 网络的正向传播过程 def forward(self, x): x = self.features(x) # 展平 x = torch.flatten(x, start_dim=1) # 分类结构 -> 得到网络的预测输出 x = self.classifier(x) return x # 权重初始化函数 def _initialize_weights(self): for m in self.modules(): if isinstance(m, nn.Conv2d): nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') if m.bias is not None: nn.init.constant_(m.bias, 0) elif isinstance(m, nn.Linear): nn.init.normal_(m.weight, 0, 0.01) nn.init.constant_(m.bias, 0) ``` ```python # 文件目录结构如下 ''' - data_set -- flower_data --- flower_photos --- train --- val - AlexNet -- model.py -- train.py ''' import os import sys import json import time import torch import torch.nn as nn from torchvision import transforms, datasets, utils import matplotlib.pyplot as plt import numpy as np import torch.optim as optim from tqdm import tqdm from model import AlexNet def main(): # 指定训练设备 device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") print("using {} device.".format(device)) data_transform = { # 训练集需要的处理:随机裁剪(224 * 224) -> 水平方向随机翻转 -> 转换为Tensor -> 标准化处理 "train": transforms.Compose([transforms.RandomResizedCrop(224), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]), # 验证集需要的处理:简单裁剪(224 * 224) -> 转换为Tensor -> 标准化处理 "val": transforms.Compose([transforms.Resize((224, 224)), # cannot 224, must (224, 224) transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])} # 实际使用可能需要调整 data_root = os.path.abspath(os.path.join(os.getcwd(), "./..")) # get data root path image_path = os.path.join(data_root, "data_set", "flower_data") # flower data set path assert os.path.exists(image_path), "{} path does not exist.".format(image_path) # 训练集路径 train_dataset = datasets.ImageFolder(root=os.path.join(image_path, "train"), transform=data_transform["train"]) train_num = len(train_dataset) # 交换val和key的位置 -> 返回的索引即为类别 # {'daisy':0, 'dandelion':1, 'roses':2, 'sunflower':3, 'tulips':4} flower_list = train_dataset.class_to_idx cla_dict = dict((val, key) for key, val in flower_list.items()) # write dict into json file json_str = json.dumps(cla_dict, indent=4) with open('class_indices.json', 'w') as json_file: json_file.write(json_str) batch_size = 32 # 加载训练集 -> num_workers:加载数据所使用的线程个数(Windows只能为0) train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=0) # 验证集 validate_dataset = datasets.ImageFolder(root=os.path.join(image_path, "val"), transform=data_transform["val"]) val_num = len(validate_dataset) validate_loader = torch.utils.data.DataLoader(validate_dataset, batch_size=batch_size, shuffle=False, num_workers=0) print("using {} images for training, {} images for validation.".format(train_num, val_num)) # 网络初始化:num_classes -> 数据集类别,init_weights -> 是否初始化权重 net = AlexNet(num_classes=5, init_weights=True) # 指定网络至设备中 net.to(device) # 损失函数 -> 针对多类别的损失交叉熵函数 loss_function = nn.CrossEntropyLoss() # pata = list(net.parameters()) # 优化器 -> Adam优化器 optimizer = optim.Adam(net.parameters(), lr=0.0002) # 保存权重的路径 save_path = './AlexNet.pth' # 训练总轮数 epochs = 10 # 最佳准确率 best_acc = 0.0 # 开始训练 for epoch in range(epochs): # train 过程 net.train() -> “开启”网络中的Dropout方法 net.train() # 以统计训练过程的平均损失 running_loss = 0.0 t1 = time.perf_counter() train_steps = len(train_loader) for step, data in enumerate(train_loader, start=0): images, labels = data # 清空梯度 optimizer.zero_grad() # 指定训练图像和label至设备中 outputs = net(images.to(device)) loss = loss_function(outputs, labels.to(device)) # 反向传播训练损失 loss.backward() # 通过optimizer更新每个节点的参数 optimizer.step() # print statistics running_loss += loss.item() # print train process rate = (step + 1) / len(train_loader) a = "*" * int(rate * 50) b = "." * int((1 - rate) * 50) print("\rtrain loss: {:^3.0f}%[{}->{}]{:.3f}".format(int(rate * 100), a, b, loss), end="") print() print(time.perf_counter() - t1) # validate 过程 net.eval() -> “关闭”网络中的Dropout方法 net.eval() acc = 0.0 # accumulate accurate number / epoch # no_grad() -> 禁止pytorch跟踪参数,验证过程不计算损失梯度 with torch.no_grad(): for val_data in validate_loader: val_images, val_labels = val_data # 指定图片至设备中 outputs = net(val_images.to(device)) # 输出的最大值作为预测 predict_y = torch.max(outputs, dim=1)[1] acc += (predict_y == val_labels.to(device)).sum().item() val_accurate = acc / val_num # 若当前轮准确率高于最优准确率,则更新权重保存 if val_accurate > best_acc: best_acc = val_accurate torch.save(net.state_dict(), save_path) print('[epoch %d] train_loss: %.3f val_accuracy: %.3f' % (epoch + 1, running_loss / train_steps, val_accurate)) print('Finished Training') if __name__ == '__main__': main() ``` ### VGGNet网络 ![image.png](./res/img27.png) - VGG通过堆叠多个3 * 3卷积核来替代大尺度卷积核 - 一般用VGG16即可 - 最终层节点是1000,即数据集有1000个类别,实际应用时,要根据数据集的类别进行修改。 ```cpp import torch.nn as nn import torch # official pretrain weights model_urls = { 'vgg11': 'https://download.pytorch.org/models/vgg11-bbd30ac9.pth', 'vgg13': 'https://download.pytorch.org/models/vgg13-c768596a.pth', 'vgg16': 'https://download.pytorch.org/models/vgg16-397923af.pth', 'vgg19': 'https://download.pytorch.org/models/vgg19-dcbb9e9d.pth' } class VGG(nn.Module): def __init__(self, features, num_classes=1000, init_weights=False): super(VGG, self).__init__() self.features = features self.classifier = nn.Sequential( nn.Linear(512*7*7, 4096), nn.ReLU(True), nn.Dropout(p=0.5), nn.Linear(4096, 4096), nn.ReLU(True), nn.Dropout(p=0.5), nn.Linear(4096, num_classes) ) if init_weights: self._initialize_weights() def forward(self, x): # N x 3 x 224 x 224 x = self.features(x) # N x 512 x 7 x 7 x = torch.flatten(x, start_dim=1) # N x 512*7*7 x = self.classifier(x) return x def _initialize_weights(self): for m in self.modules(): if isinstance(m, nn.Conv2d): # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') nn.init.xavier_uniform_(m.weight) if m.bias is not None: nn.init.constant_(m.bias, 0) elif isinstance(m, nn.Linear): nn.init.xavier_uniform_(m.weight) # nn.init.normal_(m.weight, 0, 0.01) nn.init.constant_(m.bias, 0) def make_features(cfg: list): layers = [] in_channels = 3 for v in cfg: if v == "M": layers += [nn.MaxPool2d(kernel_size=2, stride=2)] else: conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1) layers += [conv2d, nn.ReLU(True)] in_channels = v return nn.Sequential(*layers) cfgs = { 'vgg11': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'], 'vgg13': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'], 'vgg16': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'], 'vgg19': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'], } def vgg(model_name="vgg16", **kwargs): assert model_name in cfgs, "Warning: model number {} not in cfgs dict!".format(model_name) cfg = cfgs[model_name] model = VGG(make_features(cfg), **kwargs) return model ``` ```python # 文件目录结构如下 ''' - data_set -- flower_data --- flower_photos --- train --- val - VGG16 -- model.py -- train.py ''' import os import sys import json import torch import torch.nn as nn from torchvision import transforms, datasets import torch.optim as optim from tqdm import tqdm from model import vgg def main(): # choose device device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") print("using {} device.".format(device)) data_transform = { # 处理训练集 "train": transforms.Compose([transforms.RandomResizedCrop(224), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]), # # 处理验证集 "val": transforms.Compose([transforms.Resize((224, 224)), transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])} data_root = os.path.abspath(os.path.join(os.getcwd(), "./..")) # get data root path image_path = os.path.join(data_root, "data_set", "flower_data") # flower data set path assert os.path.exists(image_path), "{} path does not exist.".format(image_path) train_dataset = datasets.ImageFolder(root=os.path.join(image_path, "train"), transform=data_transform["train"]) train_num = len(train_dataset) # {'daisy':0, 'dandelion':1, 'roses':2, 'sunflower':3, 'tulips':4} flower_list = train_dataset.class_to_idx cla_dict = dict((val, key) for key, val in flower_list.items()) # write dict into json file json_str = json.dumps(cla_dict, indent=4) with open('class_indices.json', 'w') as json_file: json_file.write(json_str) batch_size = 32 nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8]) # number of workers print('Using {} dataloader workers every process'.format(nw)) train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=nw) validate_dataset = datasets.ImageFolder(root=os.path.join(image_path, "val"), transform=data_transform["val"]) val_num = len(validate_dataset) validate_loader = torch.utils.data.DataLoader(validate_dataset, batch_size=batch_size, shuffle=False, num_workers=nw) print("using {} images for training, {} images for validation.".format(train_num, val_num)) # test_data_iter = iter(validate_loader) # test_image, test_label = test_data_iter.next() model_name = "vgg16" # 网络初始化 net = vgg(model_name=model_name, num_classes=5, init_weights=True) net.to(device) loss_function = nn.CrossEntropyLoss() optimizer = optim.Adam(net.parameters(), lr=0.0001) epochs = 30 best_acc = 0.0 save_path = './{}Net.pth'.format(model_name) train_steps = len(train_loader) for epoch in range(epochs): # train net.train() running_loss = 0.0 train_bar = tqdm(train_loader, file=sys.stdout) for step, data in enumerate(train_bar): images, labels = data optimizer.zero_grad() outputs = net(images.to(device)) loss = loss_function(outputs, labels.to(device)) loss.backward() optimizer.step() # print statistics running_loss += loss.item() train_bar.desc = "train epoch[{}/{}] loss:{:.3f}".format(epoch + 1, epochs, loss) # validate net.eval() acc = 0.0 # accumulate accurate number / epoch with torch.no_grad(): val_bar = tqdm(validate_loader, file=sys.stdout) for val_data in val_bar: val_images, val_labels = val_data outputs = net(val_images.to(device)) predict_y = torch.max(outputs, dim=1)[1] acc += torch.eq(predict_y, val_labels.to(device)).sum().item() val_accurate = acc / val_num print('[epoch %d] train_loss: %.3f val_accuracy: %.3f' % (epoch + 1, running_loss / train_steps, val_accurate)) if val_accurate > best_acc: best_acc = val_accurate torch.save(net.state_dict(), save_path) print('Finished Training') if __name__ == '__main__': main() ``` ### GoogLeNet网络 ![image.png](./res/img28.png) - 引入了Inception结构 -> 融合不同尺度的特征信息 - 使用1 * 1的卷积核进行降维 - 添加了2个辅助分类器 - 模型参数较少,约为VGGNet模型参数的$\frac{1}{20}.$ ### ResNet网络 - 网络结构超深 → 突破1000层 - 提出residual模块 - 使用BatchNormalization加速训练(丢弃dropout) 迁移学习(”浅层网络学习到的信息迁移至深层网络中“)的优势:快速训练出理想的结果;当数据集较小时亦能达到理想的效果。 ![image.png](./res/img29.png) ### MobileNet网络 ![image.png](./res/img30.png) MobileNet v1 - 轻量级CNN网络,小幅降低准确率而提升效率 ![image.png](./res/img31.png) ![image.png](./res/img32.png) - Depthwise Convolution -> 减少运算量和参数数量 - 增加超参数α、β - 在ImageNet数据集上,MobileNet v1网络相比VGG16准确率减少了0.9%,但模型参数约为VGG模型参数的$\frac{1}{32}.$ ![image.png](./res/img33.png) MobileNet v2性能对比 - 相比MobileNet v1网络,准确率更高,模型更小 ![image.png](./res/img34.png) - Inverted Residuals -> 倒残差结构 - Linear Bottlenecks MobileNet v3:更新Block(bneck);使用NAS搜索参数;重新设计耗时层结构。 ### 模型.pth文件转为.pt文件,并用libtorch部署于c++端 ```python ''' 以AlexNet.pt模型文件为例 -> 使用时注意修改 num_calsses -> 数据集类别 img_path -> 图片路径 weights_path -> .pth权重文件路径 traced_script_module.save -> .pt权重文件名称 ''' import os import torch from PIL import Image from torchvision import transforms from model import AlexNet def main(): device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") print("using {} device.".format(device)) # create model model = AlexNet(num_classes=5).to(device) img_path = r'./tulip.jpg' image = Image.open(img_path).convert('RGB') data_transform = transforms.Compose( [transforms.Resize((224, 224)), transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]) img = data_transform(image) img = img.unsqueeze(dim=0) print(img.shape) # load model weights weights_path = "./AlexNet.pth" assert os.path.exists(weights_path), "file: '{}' dose not exist.".format(weights_path) testsize = 224 if torch.cuda.is_available(): modelState = torch.load(weights_path, map_location='cuda') model.load_state_dict(modelState, strict=False) model = model.cuda() model = model.eval() # An example input you would normally provide to your model's forward() method. example = torch.rand(1, 3, testsize, testsize) example = example.cuda() traced_script_module = torch.jit.trace(model, example) output = traced_script_module(img.cuda()) print(output.shape) pred = torch.argmax(output, dim=1) print(pred) traced_script_module.save('./AlexNet_cuda.pt') else: modelState = torch.load(weights_path, map_location='cpu') model.load_state_dict(modelState, strict=False) example = torch.rand(1, 3, testsize, testsize) example = example.cpu() traced_script_module = torch.jit.trace(model, example) output = traced_script_module(img.cpu()) print(output.shape) pred = torch.argmax(output, dim=1) print(pred) traced_script_module.save('./AlexNet_cpu.pt') if __name__ == '__main__': main() ``` - pytorch官网提供的libtorch工具是针对MSVC编辑器的,因此用MinGW比较繁琐。 - libtorch部署于Visual Studio中的步骤:VC++目录 -> 包含目录示例:E:\VariousTools\libtorch\include;VC++目录 -> 库目录示例:E:\VariousTools\libtorch\lib;链接器 -> 输入 -> 附加依赖项示例:asmjit.lib c10.lib c10_cuda.lib caffe2_nvrtc.lib clog.lib等依赖。 - libtorch的lib目录可添加至环境变量中,以避免找不到.dll文件的问题。 ```cpp // c++端调⽤pt⽂件完成图⽚分类:以AlexNet_cuda.pt模型文件为例 #include #include #include "torch/torch.h" #include #include "opencv2/core.hpp" #include "opencv2/imgproc.hpp" #include "opencv2/highgui.hpp" #include "opencv2/imgcodecs.hpp" using namespace std; // data_set types string classList[5] = { "daisy", "dandelion", "rose", "sunflower", "tulip" }; // the path of the image to be tested string image_path = "E://VSCodeFiles//pytorch_classify//AlexNet//tulip.jpg"; int main() { // loading model weight torch::jit::script::Module module; // the path of .pt file module = torch::jit::load("E://VSCodeFiles//pytorch_classify//AlexNet//AlexNet_cuda.pt"); module.eval(); // module.to(at::kCPU); module.to(at::kCUDA); // loading image & adjest format auto image = cv::imread(image_path, cv::IMREAD_COLOR); cv::cvtColor(image, image, cv::COLOR_BGR2RGB); cv::Mat image_transfomed = cv::Mat(cv::Size(224, 224), image.type()); cv::resize(image, image_transfomed, cv::Size(224, 224)); // transform to Tensor torch::Tensor tensor_image = torch::from_blob(image_transfomed.data, { image_transfomed.rows, image_transfomed.cols,3 }, torch::kByte); // passages transform : {C,H,W} -> {H,W,C} tensor_image = tensor_image.permute({ 2,0,1 }); // transform data type tensor_image = tensor_image.toType(torch::kFloat); auto tensor_image_Tmp = torch::autograd::make_variable(tensor_image); // 归⼀化,做除法:1/255 tensor_image = tensor_image.div(255); // 标准化 tensor_image[0] = (tensor_image[0] - 0.5) / 0.5; tensor_image[1] = (tensor_image[1] - 0.5) / 0.5; tensor_image[2] = (tensor_image[2] - 0.5) / 0.5; // 解压,将图⽚解压到1维 tensor_image = tensor_image.unsqueeze(0); // tensor_image = tensor_image.to(at::kCPU); tensor_image = tensor_image.to(at::kCUDA); // predict at::Tensor output = module.forward({ tensor_image }).toTensor(); cout << "output -> " << output << endl; auto prediction = output.argmax(1); int pre = torch::Tensor(prediction).item(); string result = classList[pre]; cout << "the type of test image is: " << result << endl; return 0; } ``` ## Pytorch模型的部署 ![image.png](./res/img35.png) ONNX(Open Neural Network Exchange) - 开源机器学习通用中间格式,由微软、Facebook、亚马逊、IBM共同发起 - 兼容各种深度学习框架 - 兼容各种推理引擎 - 兼容各种终端硬件 - 兼容各种操作系统 [ONNX runtime官网](https://onnxruntime.ai/)