diff --git a/community/cv/Resunet&Vggunet/README_CN.md b/community/cv/Resunet&Vggunet/README_CN.md new file mode 100644 index 0000000000000000000000000000000000000000..ee856c3592f5129018870390d9e44dd793e0aeb5 --- /dev/null +++ b/community/cv/Resunet&Vggunet/README_CN.md @@ -0,0 +1,225 @@ + +# 目录 + +- [目录](#目录) +- [项目说明](#项目说明) + - [项目简介](#项目简介) + - [项目意义](#项目意义) + - [语义分割简介](#语义分割简介) +- [模型说明](#模型说明) +- [数据集格式](#数据集格式) +- [快速入门](#快速入门) +- [文件说明](#文件说明) + - [脚本及代码](#脚本及代码) +- [推理过程](#推理过程) +- [数字图像处理过程](#数字图像处理过程) +- [性能](#性能) + +# [项目说明](#目录) + +## [项目简介](#目录) + +**本项目已在Ai Gallery上架notebook脚本,可以查阅[代码](https://pangu.huaweicloud.com/gallery/asset-detail.html?id=bdbaa83e-a8d1-4665-9c61-50333f4985a4)来进行体验。** +隧道裂缝是影响铁路交通安全的重要病害之一,其不仅会削弱隧道结构的完整性,还可能引发严重的运营风险。为确保铁路隧道的安全性,针对裂缝的早期检测与精确定位显得尤为重要。然而,隧道裂缝的检测面临诸多挑战:如图像中普遍存在的低对比度、光照不均匀、噪声污染等问题,导致传统的检测方法难以准确识别微小的裂缝特征,尤其是宽度仅在 1-5mm 范围内的裂缝。因此,开发一个高精度的实例分割模型来自动化识别和分割这些裂缝,是提升隧道病害监测效率和精度的关键。 + +## [项目意义](#目录) + +本项目通过构建基于 MindSpore 框架的实例分割模型,解决了隧道裂缝检测中的技术难点。利用先进的深度学习模型,如 Res-UNet 和 VGG-UNet,在复杂环境下准确定位并分割微小裂缝区域,不仅显著提升了检测精度,还降低了人工检测的工作量与误差。该模型的应用能够及时发现潜在的结构问题,确保隧道的安全运行,进而保障铁路交通的稳定性与安全性。项目还通过创新性的图像后处理技术,进一步精确识别裂缝区域,确保裂缝检测结果具备更高的可靠性与适用性。 + +## [语义分割简介](#目录) + + 在介绍模型说明之前,首先我们需要明确何为语义分割: + 识别隧道中的裂缝可以抽象成为一个图像语义分割任务。语义分割(semantic segmentation)是图像处理和机器视觉领域中一个重要的任务,旨在对图像进行全面的理解。具体而言,语义分割将图像中的每一个像素进行分类,赋予其特定的标签。这一过程在人工智能(AI)领域中占据了重要地位,广泛应用于人脸识别、物体检测、医学影像分析、卫星图像处理、自动驾驶感知等多个领域。 +与传统的分类任务仅输出一个类别不同,语义分割任务的输出图像与输入图像的尺寸相同。换句话说,输出图像的每个像素都与输入图像中的相应像素一一对应,并标注其类别。这意味着语义分割不仅关注整体物体的识别,还能够细致到每个像素的分类,从而提供更丰富的场景理解。 +在图像处理领域,"语义" 指的是图像内容的深层理解,反映了对图像所承载信息的全面把握。下图展示了一些语义分割的示例,直观地表明了不同类别在图像中的具体表现。通过这些示例,我们可以更清晰地理解语义分割的实际应用和效果。 +![segment_example.png](https://tunnelcrack.obs.cn-north-4.myhuaweicloud.com/ipynb_img/segment_example.png) + +# [模型说明](#目录) + +**该部分介绍了本项目所构建的的网络模型结构** + +- U-Net说明 + + > *U-Net: Convolutional Networks for Biomedical Image Segmentation* + + U-Net 是一种经典的语义分割模型,最初用于医学图像分析。其网络架构采用 Encoder-Decoder 结构,左侧的 Encoder 部分用于提取图像特征,右侧的 Decoder 部分通过上采样重构图像。自 2015 年提出以来,U-Net 衍生出了多种改进版本,如 Res-UNet、UNet 3+ 等。U-Net 的网络结构使其在处理低对比度和噪声较大的图像时具备良好的表现,因此在本项目中得到采用。网络结构如下图所示: + ![Unet_structure.png](https://tunnelcrack.obs.cn-north-4.myhuaweicloud.com/ipynb_img/Unet_structure.png) + +- Res-UNet说明 + + > *Weighted Res-UNet for High-Quality Retina Vessel Segmentation* + + Res-UNet 是在 U-Net 基础上引入残差连接的改进模型,广泛用于视网膜血管分割等任务。该模型通过将 U-Net 中的每个子模块替换为包含残差连接的模块,增强了网络对深层特征的学习能力。在本项目中,借鉴 ResNet、Attention 和 U-Net 的结合,提升了模型在复杂场景下的分割精度。详细的网络结构如下图所示: + + ![ResUnet_structure.png](https://tunnelcrack.obs.cn-north-4.myhuaweicloud.com/ipynb_img/ResUnet_structure.png) + +- Vgg-UNet说明 + + Vgg-UNet 将 U-Net 的 Encoder 部分替换为 VGG-16 网络结构,进一步提升特征提取的效果。VGG-16 的输入为 224×224 的 RGB 图像,输出为 1000 个分类预测值,主要用于大规模图像分类任务。VGG-16 通过全连接层的卷积化处理,保留了更多的空间信息,使其能够生成输入图像的二维热力图。该模型在裂缝检测中有助于精确定位裂缝位置,并提高了分割结果的空间分辨率。下图展示了 VGG-16 的网络结构: + + ![vgg_structure.png](https://tunnelcrack.obs.cn-north-4.myhuaweicloud.com/ipynb_img/vgg_structure.png) + +# [数据集格式](#目录) + + 本项目一种 Multi-Class 数据集格式,通过固定的目录结构获取图片和对应标签数据。 + 在同一个目录中保存原图片及对应标签,其中图片名为 "image.png",标签名为 "mask.png"。 + 目录结构如下: + 目录根据 train 和 val 文件夹区分训练集和验证集 + +```text +. +└─dataset + └─train + └─0001 + ├─image.png + └─mask.png + ... + └─xxxx + ├─image.png + └─mask.png + └─val + └─0001 + ├─image.png + └─mask.png + ... + └─xxxx + ├─image.png + └─mask.png + +``` + +# [快速入门](#目录) + +通过官方网站安装MindSpore后,需要再安装如下包:opencv、scikit-image、matplot +可通过以下命令进行安装 + +```bash +! pip install opencv-python -i https://pypi.tuna.tsinghua.edu.cn/simple +! pip install scikit-image -i https://pypi.tuna.tsinghua.edu.cn/simple +! pip install matplotlib -i https://pypi.tuna.tsinghua.edu.cn/simple +``` + +```bash +#通过Python在Ascend或GPU上进行训练 +python train.py \ + --data_dir="xxx/dataset" \ # 按照如上定义的数据集格式文件夹 + --epoch_size=100 \ + --weight_decay_param=0.0005 \ + --lr_param=0.0001 \ + --who_to_train="VggUnet" \ # 选择训练哪个模型,"ResUnet"或"VggUnet" + --batch_size=16 \ + --repeat=4 \ # 数据集中每张照片的重复次数 + --is_resume=True \ # 是否为断点续训 + --ckpt_path="xxx/xxx.ckpt" \ # 非必选参数,若为断点续训模式,以该文件路径读入.ckpt文件 + +``` + +```bash +# 通过Python命令在Ascend或GPU上运行推理 +python infer.py \ + --image_path="xxx/xxx.png" \ # 需要进行推理的照片 + --res_ckpt_path="xxx/xxx.ckpt" \ # 以该文件路径读入ResUnet的ckpt文件 + --vgg_ckpt_path="xxx/xxx.ckpt" \ # 以该文件路径读入VggUnet的ckpt文件 +``` + +```bash +# 通过Python命令进行数字图像处理流程(进行推理结果后处理) +python post_processing.py \ + --image_path="xxx/xxx.png" \ # 需要进行推理的照片 + --segmentation_path="xxx/xxx.png" \ # 该图片对应的兴趣区域蒙版(即上一步的推理结果) +``` + +# [文件说明](#目录) + +## [脚本及代码](#目录) + +```text +├── cv + ├── resunet&vggunet + ├── nets + │ ├──resnet.py // resnet 网络结构 + │ ├──unet.py // unet 网络结构 + │ ├──vgg.py // vgg16 网络结构 + ├── utils + │ ├──data_loader.py // 加载数据集工具 + │ ├──infer_preprocess_image.py // 模型推理载入图片工具 + │ ├──xdog.py // 利用xdog算法检测边缘信息工具 + ├── infer.py // 模型推理脚本 + ├── post_processing.py // 数字图像处理流程(进行推理结果后处理)脚本 + ├── README_CN.md + ├── train.py // 模型训练脚本 +``` + +# [训练过程](#目录) + +通过Python在Ascend或GPU上进行训练,训练数据集的格式如[数据集格式](#数据集格式)所示。利用train.py文件进行训练,其中主要的参数有: + +```text + --data_dir 训练数据集目录 + --epoch_size=100 模型训练最大轮次 + --weight_decay_param 权重衰减因子 + --lr_param 学习率 + --who_to_train 选择本次训练哪个模型,可选值:"ResUnet"或"VggUnet" + --batch_size 训练的批处理大小 + --repeat=4 数据集中每张照片的重复次数 + --is_resume=True 是否为断点续训,可选值:True或False 。若指定为 True,则需要指定ckpt_path + --ckpt_path="xxx/xxx.ckpt" 非必选参数,若为断点续训模式,以该文件路径读入.ckpt文件 + +``` + +# [推理过程](#目录) + +本项目对输入图像进行了处理,并使用 ResUnet 和 VggUnet 进行推理。随后,将两个模型的推理结果进行集成,采用取均值的方式将二者的结果结合。最后,通过线性插值的方法,将集成后的推理结果扩展到原始图片的尺寸。 + +按照以下方式,需要同时传入训练好的ResUnet的ckpt文件和VggUnet的ckpt文件来进行推理,最终会在项目文件夹下得到推理结果"pr_mask.png"。 + +```bash +python infer.py \ + --image_path="xxx/xxx.png" \ # 需要进行推理的照片 + --res_ckpt_path="xxx/xxx.ckpt" \ # 以该文件路径读入ResUnet的ckpt文件 + --vgg_ckpt_path="xxx/xxx.ckpt" \ # 以该文件路径读入VggUnet的ckpt文件 +``` + +# [数字图像处理过程](#目录) + +该部分旨在进行裂缝区域的精确识别,利用图像处理技术,结合上述模型推理结果,得到细化的裂缝信息,一共分包括3个阶段: + +1. 图像后处理与边缘检测:对推理结果进行进一步处理,以增强图像的边缘特征。 +2. 区域分析与噪声去除:对图像进行区域分析,识别出裂缝区域,并去除可能的噪声干扰。 +3. 裂缝边缘信息叠加:将检测到的裂缝边缘信息叠加至原始图像中,以便更直观地展示裂缝的具体位置和形状。 + +按照以下方式,需要同时传入推理得到的兴趣区域,及对应的原始照片,最终会在项目文件夹下得到处理细化后的结果"result.png" + +```bash +python post_processing.py \ + --image_path="xxx/xxx.png" \ # 需要进行推理的照片 + --segmentation_path="xxx/xxx.png" \ # 该图片对应的兴趣区域蒙版(即上一步的推理结果) +``` + +## 图像后处理&边缘检测 + +该部分先对图像进行了一定增强,再利用 XDoG (Extended Difference of Gaussians) 算法来对图像进行边缘检测,然后阈值进行二值化边缘,最后对原图取反色,方便后续的去除噪声操作。 + +## 区域分析&噪声去除 + +该部分使用regionprops来对图像进行分析,结合兴趣区域(上面的模型预测的mask部分),选取了一些特征来去除图像中非裂缝的噪声,同时使用形态学闭操作,来保证裂缝的连通性。 + +# [性能](#目录) + +**本组数据训练集使用自制数据集,格式如同上述Multi-Class 数据集格式,其中包含实拍隧道裂缝图片及标注兴趣区域mask。基于基于启智平台进行模型训练。** + +## 评估性能 + + 参数 | ResUnet | VggUnet | +|-------------|--------------------------------------------------------------|---------------------------------------------------------------| +| 资源 | Ascend-D910B; CPU: 20, 内存: 60GB | Ascend-D910B; CPU: 20, 内存: 60GB | +| MindSpore版本 | MindSpore 2.2.14 | MindSpore 2.2.14 | +| 数据集 | 3192张图 | 3192张图 | +| 训练参数 | epoch=300, batch_size=16, lr=0.02,weight_decay=1e-3,repeat=6 | epoch=300, batch_size=16, lr=0.025,weight_decay=1e-3,repeat=6 | +| 优化器 | Adam | Adam | +| 损失函数 | SoftmaxCrossEntropyWithLogits | SoftmaxCrossEntropyWithLogits | +| 输出 | 与原始图像尺寸相同的mask | 与原始图像尺寸相同的mask | +| 损失 | 83 | 101.970097 | +| 速度 | 约56FPS | 约37FPS | +| 精度 | mIoU 83% | mIoU 75% | + diff --git a/community/cv/Resunet&Vggunet/infer.py b/community/cv/Resunet&Vggunet/infer.py new file mode 100644 index 0000000000000000000000000000000000000000..3fca7db228e7fcedfc9709358865450a65b09064 --- /dev/null +++ b/community/cv/Resunet&Vggunet/infer.py @@ -0,0 +1,144 @@ +""" + vgg&res unet infer +""" + +import argparse +import cv2 +import numpy as np +from PIL import Image +from matplotlib import pyplot as plt +from nets.unet import Unet as unet +import mindspore as ms +from utils.infer_preprocess_image import preprocess_input + + +def load_models(res_ckpt_path, vgg_ckpt_path): + """ + 从 .ckpt 文件中加载预训练的模型 + + Args: + res_ckpt_path (str): 包含 res-unet 网络参数的.ckpt 文件路径 + vgg_ckpt_path (str): 包含 vgg-unet 网络参数的.ckpt 文件路径 + + Returns: + ms.Model: 加载预训练参数的 res-unet 网络 + ms.Model: 加载预训练参数的 vgg-unet 网络 + """ + res_unet = unet( + pretrained=False, + num_classes=2, + backbone='resnet50').set_train(False) # 构建res_unet网络 + vgg_unet = unet( + pretrained=False, + num_classes=2, + backbone='vgg').set_train(False) # 构建 vgg——unet网络 + + # 加载预训练参数到网络 + ms.load_checkpoint(res_ckpt_path, res_unet) + ms.load_checkpoint(vgg_ckpt_path, vgg_unet) + + return res_unet, vgg_unet + + +def infer_and_segment(image, resunet, vggunet, target_size=(512, 512)): + """ + 用 ResUnet 和 VggUnet 模型进行图像分割,对推理结果进行集成,并修改推理结果尺寸到原图大小 + + Args: + image (PIL.Image): 输入图像 + resunet (ms.Model): 加载预训练参数的 res-unet 网络 + vggunet (ms.Model): 加载预训练参数的 vgg-unet 网络 + target_size (tuple): 以(宽度,高度)形式给出图片要变换到的目标尺寸 + + Returns: + np.array: 均值过后的推理结果 + """ + colors = [ + (0, 0, 0), + (255, 255, 255), + ] + + # 进行推理前图片预处理 + original_width, original_height = image.size + image_data, new_width, new_height = preprocess_input(image, target_size) + + # 将推理照片转化为[N,C,H,W]格式 + image_data = np.expand_dims( + np.transpose( + np.array( + image_data, np.float32), (2, 0, 1)), 0) + + # 获得推理结果 + resunet_output = resunet(ms.Tensor(image_data))[0] + vggunet_output = vggunet(ms.Tensor(image_data))[0] + + # 对推理结果进行softmax + resunet_pred = ms.ops.softmax( + resunet_output.permute( + 1, 2, 0), axis=-1).asnumpy() + vggunet_pred = ms.ops.softmax( + vggunet_output.permute( + 1, 2, 0), axis=-1).asnumpy() + + # 对推理结果进行均值集成 + final_prediction = (resunet_pred + vggunet_pred) / 2 + + # 恢复推理结果mask到原始图像大小 + final_prediction = final_prediction[ + int((512 - new_height) // 2): int((512 - new_height) // 2 + new_height), + int((512 - new_width) // 2): int((512 - new_width) // 2 + new_width) + ] + final_prediction = cv2.resize( + final_prediction, + (original_width, + original_height), + interpolation=cv2.INTER_LINEAR) + # 对预测结果的两类进行argmax(获得每个像素点属于哪一类) + final_prediction = final_prediction.argmax(axis=-1) + segmented_img = np.reshape( + np.array(colors, np.uint8)[np.reshape(final_prediction, [-1])], + (original_height, original_width, -1) + ) + + plt.imshow(segmented_img, cmap="gray") + plt.title("predicted segmentation mask") + plt.show() + + return segmented_img + + +def main(image_path, res_ckpt_path, vgg_ckpt_path): + original_image = Image.open(image_path) + + ResUnet, VggUnet = load_models( + res_ckpt_path=res_ckpt_path, vgg_ckpt_path=vgg_ckpt_path) + Pr_Final = infer_and_segment(original_image, ResUnet, VggUnet) + segmentation_result = Image.fromarray(Pr_Final) + segmentation_result.save("pr_mask.png") + + +if __name__ == "__main__": + parser = argparse.ArgumentParser( + description='Image Segmentation with ResUNet and VggUnet') + parser.add_argument( + '--image_path', + type=str, + required=True, + help='Path to the input image') + parser.add_argument( + '--res_ckpt_path', + type=str, + required=True, + help='Path to the ResUNet checkpoint file') + parser.add_argument( + '--vgg_ckpt_path', + type=str, + required=True, + help='Path to the VGG UNet checkpoint file') + + args = parser.parse_args() + + main( + args.image_path, + args.res_ckpt_path, + args.vgg_ckpt_path) diff --git a/community/cv/Resunet&Vggunet/nets/resnet.py b/community/cv/Resunet&Vggunet/nets/resnet.py new file mode 100644 index 0000000000000000000000000000000000000000..29c10f7dafa0ae77acdaab54ef1070478cfd9e51 --- /dev/null +++ b/community/cv/Resunet&Vggunet/nets/resnet.py @@ -0,0 +1,252 @@ +""" + resnet model structure +""" +import mindspore.nn as nn + + +def conv3x3(in_channel, out_channel, stride=1, groups=1, dilation=1): + return nn.Conv2d( + in_channels=in_channel, + out_channels=out_channel, + kernel_size=3, + stride=stride, + padding=dilation, + group=groups, + has_bias=False, + pad_mode='pad', + dilation=dilation) + + +def conv1x1(in_channel, out_channel, stride=1): + return nn.Conv2d(in_channel, out_channel, kernel_size=1, stride=stride, + padding=0, pad_mode='pad', has_bias=False) + + +class BasicBlock(nn.Cell): + """ + BasicBlock class + """ + expansion = 1 + + def __init__( + self, + in_channel, + out_channel, + stride=1, + downsample=None, + groups=1, + base_width=64, + dilation=1, + norm_layer=None): + super(BasicBlock, self).__init__() + if norm_layer is None: + norm_layer = nn.BatchNorm2d + if groups != 1 or base_width != 64: + raise ValueError( + 'BasicBlock only supports groups=1 and base_width=64') + if dilation > 1: + raise NotImplementedError( + "Dilation > 1 not supported in BasicBlock") + self.conv1 = conv3x3(in_channel, out_channel, stride) + self.bn1 = norm_layer( + num_features=out_channel, + eps=1e-5, + momentum=0.1, + gamma_init=1, + beta_init=0, + moving_mean_init=0, + moving_var_init=1) + self.relu = nn.ReLU() + self.conv2 = conv3x3(in_channel, out_channel) + self.bn2 = norm_layer(out_channel) + self.downsample = downsample + self.stride = stride + + def construct(self, x): + """ + BasicBlock construct + """ + identity = x + + out = self.conv1(x) + out = self.bn1(out) + out = self.relu(out) + + out = self.conv2(out) + out = self.bn2(out) + + if self.downsample is not None: + identity = self.downsample(x) + + out += identity + out = self.relu(out) + + return out + + +class Bottleneck(nn.Cell): + """ + Bottleneck class + """ + expansion = 4 + + def __init__( + self, + in_channel, + out_channel, + stride=1, + downsample=None, + groups=1, + base_width=64, + dilation=1, + norm_layer=None): + super(Bottleneck, self).__init__() + if norm_layer is None: + norm_layer = nn.BatchNorm2d + width = int(out_channel * (base_width / 64.)) * groups + self.conv1 = conv1x1(in_channel, width) + self.bn1 = norm_layer( + num_features=width, + eps=1e-5, + momentum=0.1, + gamma_init=1, + beta_init=0, + moving_mean_init=0, + moving_var_init=1) + self.conv2 = conv3x3(width, width, stride, groups, dilation) + self.bn2 = norm_layer(width) + self.conv3 = conv1x1(width, out_channel * self.expansion) + self.bn3 = norm_layer(out_channel * self.expansion) + + self.relu = nn.ReLU() + self.downsample = downsample + self.stride = stride + + def construct(self, x): + """ + Bottleneck construct + """ + identity = x + + out = self.conv1(x) + out = self.bn1(out) + out = self.relu(out) + + out = self.conv2(out) + out = self.bn2(out) + out = self.relu(out) + + out = self.conv3(out) + out = self.bn3(out) + + if self.downsample is not None: + identity = self.downsample(x) + + out += identity + out = self.relu(out) + + return out + + +class ResNet(nn.Cell): + """ + ResNet class + """ + def __init__(self, block, layers, num_classes=1000): + self.inplanes = 64 + super(ResNet, self).__init__() + # 600,600,3 -> 300,300,64 + self.conv1 = nn.Conv2d( + 3, + 64, + kernel_size=7, + stride=2, + padding=3, + has_bias=False, + pad_mode='pad') + self.bn1 = nn.BatchNorm2d( + num_features=64, + eps=1e-5, + momentum=0.1, + gamma_init=1, + beta_init=0, + moving_mean_init=0, + moving_var_init=1) + self.relu = nn.ReLU() + # 300,300,64 -> 150,150,64 + self.maxpool = nn.MaxPool2d( + kernel_size=3, + stride=2, + pad_mode="pad", + padding=0, + ceil_mode=True) + # 150,150,64 -> 150,150,256 + self.layer1 = self._make_layer(block, 64, layers[0]) + # 150,150,256 -> 75,75,512 + self.layer2 = self._make_layer(block, 128, layers[1], stride=2) + # 75,75,512 -> 38,38,1024 + self.layer3 = self._make_layer(block, 256, layers[2], stride=2) + # 38,38,1024 -> 19,19,2048 + self.layer4 = self._make_layer(block, 512, layers[3], stride=2) + + self.avgpool = nn.AvgPool2d(kernel_size=7, stride=7) + self.fc = nn.Dense(512 * block.expansion, num_classes) + + def _make_layer(self, block, planes, blocks, stride=1): + """ + ResNet layers maker + """ + downsample = None + if stride != 1 or self.inplanes != planes * block.expansion: + downsample = nn.SequentialCell( + nn.Conv2d( + self.inplanes, + planes * block.expansion, + kernel_size=1, + stride=stride, + has_bias=False), + nn.BatchNorm2d( + planes * block.expansion, + eps=1e-4, + momentum=0.9, + gamma_init=1, + beta_init=0, + moving_mean_init=0, + moving_var_init=1)) + + layers = [] + layers.append(block(self.inplanes, planes, stride, downsample)) + self.inplanes = planes * block.expansion + for _ in range(1, blocks): + layers.append(block(self.inplanes, planes)) + + return nn.SequentialCell(*layers) + + def construct(self, x): + """ + ResNet construct + """ + x = self.conv1(x) + x = self.bn1(x) + feat1 = self.relu(x) + + x = self.maxpool(feat1) + feat2 = self.layer1(x) + + feat3 = self.layer2(feat2) + feat4 = self.layer3(feat3) + feat5 = self.layer4(feat4) + return [feat1, feat2, feat3, feat4, feat5] + + +def resnet50(pretrained=False, **kwargs): + """ + resnet50 model + """ + model = ResNet(Bottleneck, [3, 4, 6, 3], **kwargs) + if pretrained: + print("No Pretrained Model") + + del model.avgpool + del model.fc + return model diff --git a/community/cv/Resunet&Vggunet/nets/unet.py b/community/cv/Resunet&Vggunet/nets/unet.py new file mode 100644 index 0000000000000000000000000000000000000000..5fda5680e5e84eee76cbd27e28a0e95acac285ea --- /dev/null +++ b/community/cv/Resunet&Vggunet/nets/unet.py @@ -0,0 +1,117 @@ +""" + unet model structure +""" +import mindspore as ms +import mindspore.nn as nn +from nets.resnet import resnet50 +from nets.vgg import VGG16 + + +class unetUp(nn.Cell): + """ + unetUp class + """ + def __init__(self, in_size, out_size): + super(unetUp, self).__init__() + self.conv1 = nn.Conv2d(in_size, out_size, kernel_size=3, padding=1, + has_bias=True, pad_mode='pad') + self.conv2 = nn.Conv2d(out_size, out_size, kernel_size=3, padding=1, + has_bias=True, pad_mode='pad') + self.up = nn.Upsample( + scale_factor=2.0, + mode='bilinear', + recompute_scale_factor=True, + align_corners=True) + self.relu = nn.ReLU() + + def construct(self, inputs1, inputs2): + """ + unetUp construct + """ + outputs = ms.ops.cat([inputs1, self.up(inputs2)], 1) + outputs = self.conv1(outputs) + outputs = self.relu(outputs) + outputs = self.conv2(outputs) + outputs = self.relu(outputs) + return outputs + + +class Unet(nn.Cell): + """ + Unet class + """ + def __init__(self, num_classes=21, pretrained=False, backbone='resnet50'): + super(Unet, self).__init__() + if backbone == "resnet50": + self.resnet = resnet50(pretrained=pretrained) + in_filters = [192, 512, 1024, 3072] + elif backbone == 'vgg': + self.vgg = VGG16(pretrained=pretrained) + in_filters = [192, 384, 768, 1024] + else: + raise ValueError( + 'Unsupported backbone - `{}`, Use vgg, resnet50.'.format(backbone)) + + out_filters = [64, 128, 256, 512] + + # upsampling + # 64,64,512 + self.up_concat4 = unetUp(in_filters[3], out_filters[3]) + # 128,128,256 + self.up_concat3 = unetUp(in_filters[2], out_filters[2]) + # 256,256,128 + self.up_concat2 = unetUp(in_filters[1], out_filters[1]) + # 512,512,64 + self.up_concat1 = unetUp(in_filters[0], out_filters[0]) + + if backbone == 'resnet50': + self.up_conv = nn.SequentialCell( + nn.Upsample( + scale_factor=2.0, + mode='bilinear', + recompute_scale_factor=True, + align_corners=True), + nn.Conv2d( + out_filters[0], + out_filters[0], + kernel_size=3, + padding=1, + has_bias=True, + pad_mode='pad'), + nn.ReLU(), + nn.Conv2d( + out_filters[0], + out_filters[0], + kernel_size=3, + padding=1, + has_bias=True, + pad_mode='pad'), + nn.ReLU(), + ) + else: + self.up_conv = None + + self.final = nn.Conv2d(out_filters[0], num_classes, 1, has_bias=True) + + self.backbone = backbone + + def construct(self, inputs): + """ + Unet construct + """ + if self.backbone == "vgg": + [feat1, feat2, feat3, feat4, feat5] = self.vgg.construct(inputs) + elif self.backbone == "resnet50": + [feat1, feat2, feat3, feat4, feat5] = self.resnet.construct(inputs) + + up4 = self.up_concat4(feat4, feat5) + up3 = self.up_concat3(feat3, up4) + up2 = self.up_concat2(feat2, up3) + up1 = self.up_concat1(feat1, up2) + + if self.up_conv is not None: + up1 = self.up_conv(up1) + + final = self.final(up1) + + return final diff --git a/community/cv/Resunet&Vggunet/nets/vgg.py b/community/cv/Resunet&Vggunet/nets/vgg.py new file mode 100644 index 0000000000000000000000000000000000000000..ecf7bc723ea210b32ab3be133a9ed741c19c38fc --- /dev/null +++ b/community/cv/Resunet&Vggunet/nets/vgg.py @@ -0,0 +1,102 @@ +""" + vgg model structure +""" +import mindspore.nn as nn + + +class VGG(nn.Cell): + """ + vgg class + """ + def __init__(self, features, num_classes=1000): + super(VGG, self).__init__() + self.features = features + self.avgpool = nn.AdaptiveAvgPool2d((7, 7)) + self.classifier = nn.SequentialCell( + nn.Dense(512 * 7 * 7, 4096), + nn.ReLU(), + nn.Dropout(p=0.5), + nn.Dense(4096, 4096), + nn.ReLU(), + nn.Dropout(p=0.5), + nn.Dense(4096, num_classes), + ) + + def construct(self, x): + """ + vgg construct + """ + + feat1 = self.features[:4](x) + feat2 = self.features[4:9](feat1) + feat3 = self.features[9:16](feat2) + feat4 = self.features[16:23](feat3) + feat5 = self.features[23:-1](feat4) + return [feat1, feat2, feat3, feat4, feat5] + + +def make_layers(cfg, batch_norm=False, in_channels=3): + """ + vgg layers maker + """ + + layers = [] + for v in cfg: + if v == 'M': + layers += [nn.MaxPool2d(kernel_size=2, stride=2)] + else: + conv2d = nn.Conv2d( + in_channels, + v, + kernel_size=3, + padding=1, + pad_mode='pad', + has_bias=True) + if batch_norm: + layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU()] + else: + layers += [conv2d, nn.ReLU()] + in_channels = v + return nn.SequentialCell(*layers) + + +# 512,512,3 -> 512,512,64 -> 256,256,64 -> 256,256,128 -> 128,128,128 -> 128,128,256 -> 64,64,256 +# 64,64,512 -> 32,32,512 -> 32,32,512 +cfgs = { + 'D': [ + 64, + 64, + 'M', + 128, + 128, + 'M', + 256, + 256, + 256, + 'M', + 512, + 512, + 512, + 'M', + 512, + 512, + 512, + 'M']} + + +def VGG16(pretrained, in_channels=3, **kwargs): + """ + vgg16 model + """ + model = VGG( + make_layers( + cfgs["D"], + batch_norm=False, + in_channels=in_channels), + **kwargs) + if pretrained: + print("no Pretrained Model") + + del model.avgpool + del model.classifier + return model diff --git a/community/cv/Resunet&Vggunet/post_processing.py b/community/cv/Resunet&Vggunet/post_processing.py new file mode 100644 index 0000000000000000000000000000000000000000..4a6db042124937e3f2f3962340ce817587a5cdf7 --- /dev/null +++ b/community/cv/Resunet&Vggunet/post_processing.py @@ -0,0 +1,174 @@ +""" + post processing +""" +import argparse +import math +import cv2 +import numpy as np +from PIL import Image +from matplotlib import pyplot as plt +from skimage import measure +from utils.xdog import xdog + + +def apply_xdog_filter( + original_image_gray, + epsilon=0.008, + gamma=0.98, + k=4, + phi=1200): + """ + 使用 XDoG 算法对增强后的图片进行边缘检测 + + Args: + original_image_gray (np.array): 原始图像(灰度图版本) + epsilon (float): XDoG epsilon 参数 + gamma (float): XDoG gamma 参数 + k (int): XDoG k 参数 + phi (int): XDoG phi 参数 + + Returns: + np.array: 边缘检测后图像结果 + """ + # 进行xdog算法获得边缘信息 + original_image_gray = cv2.convertScaleAbs( + original_image_gray, alpha=0.4, beta=50) + xdog_result = xdog( + original_image_gray, + epsilon=epsilon, + gamma=gamma, + k=k, + phi=phi) + + # 对边缘信息进行二值化处理 + _, binary_edges = cv2.threshold(xdog_result, 180, 255, cv2.THRESH_BINARY) + inv = cv2.bitwise_not(binary_edges) # 进行反色 + + plt.imshow(inv, cmap="gray") + plt.title("edge detection result (inverted)") + plt.show() + + return inv + + +def filter_regions( + result_image, + min_area=30, + min_aspect_ratio=1.6, + max_circularity=0.8, + min_eccentricity=0.77): + """ + 利用几何特性,例如:大小,长宽比,圆度,离心率等来对原始图像区域内的噪声进行过滤 + + Args: + result_image (np.array): 边缘检测后图像结果 + min_area (int): 最小面积阈值 + min_aspect_ratio (float): 最小长宽比阈值 + max_circularity (float): 最大圆度阈值 + min_eccentricity (float): 最小离心率阈值 + + Returns: + np.array: 去除噪声过后的结果 + """ + label_image = measure.label(result_image) + props = measure.regionprops(label_image) + cleaned_image = np.zeros_like(result_image) + + plt.imshow(result_image, cmap="gray") + plt.title("The edge detection result within the roi (without denoising)") + plt.show() + # 构建region props + for prop in props: + # 计算长宽比 + aspect_ratio = prop.major_axis_length / prop.minor_axis_length if prop.minor_axis_length > 0 else 0 + # 计算圆形度 + circularity = (4 * math.pi * prop.area) / (prop.perimeter ** 2) if prop.perimeter > 0 else 0 + eccentricity = prop.eccentricity + + if ( + prop.area >= min_area and + aspect_ratio >= min_aspect_ratio and + circularity < max_circularity and + eccentricity >= min_eccentricity + ): + cleaned_image[label_image == prop.label] = 255 + + plt.imshow(cleaned_image, cmap="gray") + plt.title("The edge detection result within the roi (with denoising)") + plt.show() + + # 进行形态学开操作 + kernel = np.ones((3, 3), np.uint8) + cleaned_image = cv2.morphologyEx(cleaned_image, cv2.MORPH_CLOSE, kernel) + plt.imshow(cleaned_image, cmap="gray") + plt.title("The edge detection result with closing operations") + plt.show() + + return cleaned_image + + +def overlay_crack_detection(cleaned_image, original_image): + """ + 叠加检测到的裂缝至原始图像上,并显示结果 + + Args: + cleaned_image (np.array): 去除噪声后的边缘图像 + original_image (np.array): 原始BGR图像 + + Returns: + np.array: 最终检测结果 + """ + opencv_image = cv2.cvtColor(np.array(original_image), cv2.COLOR_RGB2BGR) + crack_image_rgb = cv2.cvtColor(cleaned_image, cv2.COLOR_GRAY2BGR) + crack_image_rgb[:, :, :2] = 0 # 保留红色通道 + # 叠加裂缝信息和原始图像 + overlay_image = cv2.addWeighted(opencv_image, 0.8, crack_image_rgb, 0.5, 0) + + plt.imshow(cv2.cvtColor(overlay_image, cv2.COLOR_BGR2RGB)) + plt.axis('off') + plt.title("Crack Detection Result") + plt.show() + + return overlay_image + + +def main(image_path, segmentation_path): + # 读取原始图片和分割结果 + original_image = Image.open(image_path) + segmentation_result = Image.open(segmentation_path) + + # 执行高斯差分算法来获得边缘信息 + original_image_gray = cv2.cvtColor( + np.array(original_image), + cv2.COLOR_RGB2GRAY) + xdog_edges = apply_xdog_filter(original_image_gray) + + # 结合兴趣区域 & 去噪过程 + result_image = cv2.bitwise_and( + xdog_edges, + cv2.cvtColor( + np.array(segmentation_result), + cv2.COLOR_RGB2GRAY)) + cleaned_image = filter_regions(result_image) + + # 显示裂缝检测后的结果 + overlay_image = overlay_crack_detection(cleaned_image, original_image) + if cv2.imwrite("result.png", overlay_image): + print("成功保存结果至result.png") + + +if __name__ == "__main__": + parser = argparse.ArgumentParser(description="Crack detection in images.") + parser.add_argument( + "--image_path", + type=str, + required=True, + help="Path to the original image.") + parser.add_argument( + "--segmentation_path", + type=str, + required=True, + help="Path to the segmentation result image.") + + args = parser.parse_args() + main(args.image_path, args.segmentation_path) diff --git a/community/cv/Resunet&Vggunet/train.py b/community/cv/Resunet&Vggunet/train.py new file mode 100644 index 0000000000000000000000000000000000000000..3c6d512d1a593085aee49e11560b14b77df34785 --- /dev/null +++ b/community/cv/Resunet&Vggunet/train.py @@ -0,0 +1,192 @@ +""" + vgg&res unet train +""" +import argparse +import mindspore as ms +from mindspore import nn, Tensor, ops +from nets.unet import Unet as unet +from utils.data_loader import create_multi_class_dataset + + +def train_epoch(epoch, model, loss_fn, optimizer, data_loader): + """ + one epoch train function + """ + model.set_train() + + # Define forward function + def forward_fn(data, label): + # NCHW->NHWC + logits = model(data).transpose(0, 2, 3, 1).astype(mindspore.float32) + label = label.transpose(0, 2, 3, 1) + label = Tensor(label) + # 拉直 + logits = ops.reshape(logits, (-1, 2)) + label = ops.reshape(label, (-1, 2)) + loss = loss_fn(logits, label) + return loss, logits + + # Get gradient function + grad_fn = ms.value_and_grad( + forward_fn, + None, + optimizer.parameters, + has_aux=True) + + # Define function of one-step training + def train_step(data, label): + (loss, _), grads = grad_fn(data, label) + optimizer(grads) + return loss + + dataset_size = data_loader.get_dataset_size() + for batch_idx, (data, target) in enumerate(data_loader): + loss = float(train_step(data, target).asnumpy()) + if batch_idx % 100 == 0: + print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format( + epoch, batch_idx, dataset_size, + 100. * batch_idx / dataset_size, loss)) + + +def train_net( + data_dir, + epoch_size, + weight_decay_param, + lr_param, + batch_size, + repeat, + ckpt_path, + is_resume=False, + who_to_train="None"): + """ + train net process + """ + def modify_ckpt_param_names(ms_params, ckpt_path): + new_params_list = [] + for ms_param in ms_params: + value = ms_param.data + name = ms_param.name + if not name.startswith('up') and not name.startswith('final'): + new_name = "vgg.features." + name + new_params_list.append( + {"name": new_name, "data": ms.Tensor(value)}) + else: + new_params_list.append( + {"name": name, "data": ms.Tensor(value)}) + + ms.save_checkpoint(new_params_list, ckpt_path) + + train_dataset = create_multi_class_dataset( + data_dir, + [ + 512, + 512], + repeat, + batch_size, + num_classes=2, + is_train=True, + split=1, + rank=0, + group_size=1, + shuffle=True) + if is_resume: + if "ResUnet" in who_to_train: + Unet = unet(pretrained=False, num_classes=2, backbone='resnet50') + ResUnet_ckpt_path = ckpt_path + ms.load_checkpoint(ResUnet_ckpt_path, Unet) + elif "VggUnet" in who_to_train: + Unet = unet(pretrained=False, num_classes=2, backbone='vgg') + VggUnet_ckpt_path = ckpt_path + ms.load_checkpoint(VggUnet_ckpt_path, Unet) + else: + print("请输入正确的模型名称") + else: + if "ResUnet" in who_to_train: + Unet = unet(pretrained=False, num_classes=2, backbone='resnet50') + elif "VggUnet" in who_to_train: + Unet = unet(pretrained=False, num_classes=2, backbone='vgg') + else: + print("请输入正确的模型名称") + + optimizer = nn.Adam( + Unet.trainable_params(), + lr_param, + weight_decay=weight_decay_param) + loss_fn = nn.SoftmaxCrossEntropyWithLogits(sparse=False, reduction='mean') + for epoch in range(epoch_size): + train_epoch(epoch, Unet, loss_fn, optimizer, train_dataset) + + print('Finished Training') + + if "VggUnet" in who_to_train: + save_path = './VggUnet.ckpt' + params = Unet.get_parameters() + modify_ckpt_param_names(params, save_path) + else: + save_path = './ResUnet.ckpt' + ms.save_checkpoint(Unet, save_path) + + +if __name__ == "__main__": + parser = argparse.ArgumentParser(description='Train UNet Model') + parser.add_argument( + '--data_dir', + type=str, + required=True, + help='Directory for training data') + parser.add_argument( + '--epoch_size', + type=int, + required=True, + help='Number of training epochs') + parser.add_argument( + '--weight_decay_param', + type=float, + required=True, + help='Weight decay parameter') + parser.add_argument( + '--lr_param', + type=float, + required=True, + help='Learning rate') + parser.add_argument( + '--who_to_train', + type=str, + required=True, + choices=[ + "ResUnet", + "VggUnet"], + help='Model to train') + parser.add_argument( + '--batch_size', + type=int, + required=True, + help='Batch size for training') + parser.add_argument( + '--repeat', + type=int, + required=True, + help='Repeat times for data') + parser.add_argument( + '--is_resume', + type=bool, + required=False, + help='Is resume or not') + parser.add_argument( + '--ckpt_path', + type=str, + required=False, + help='Path to the checkpoint file') + + args = parser.parse_args() + + train_net( + args.data_dir, + epoch_size=args.epoch_size, + weight_decay_param=args.weight_decay_param, + lr_param=args.lr_param, + who_to_train=args.who_to_train, + batch_size=args.batch_size, + repeat=args.repeat, + is_resume=args.is_resume, + ckpt_path=args.ckpt_path) diff --git a/community/cv/Resunet&Vggunet/utils/data_loader.py b/community/cv/Resunet&Vggunet/utils/data_loader.py new file mode 100644 index 0000000000000000000000000000000000000000..0f885de37145332fd86f2e30b335eeba4e86f422 --- /dev/null +++ b/community/cv/Resunet&Vggunet/utils/data_loader.py @@ -0,0 +1,187 @@ +""" + data loader util +""" +import multiprocessing +import os +import cv2 +import mindspore.dataset as ds +import numpy as np + + +def preprocess_input(image, size, ismask=False): + """ + preprocess the input image + """ + iw, ih = image.shape[1], image.shape[0] # 获取输入图像的宽和高 + w, h = size # 目标尺寸 + + # 计算缩放比例 + scale = min(w / iw, h / ih) + nw = int(iw * scale) + nh = int(ih * scale) + + # 使用双三次插值调整图像大小 + image_resized = cv2.resize(image, (nw, nh), interpolation=cv2.INTER_CUBIC) + + if not ismask: + + if len(np.shape(image)) == 3 and np.shape(image)[2] == 3: + image_resized = image_resized + else: + image_resized = cv2.cvtColor(image_resized, cv2.COLOR_GRAY2BGR) + # 创建新的灰色背景图像 + new_image = np.full((h, w, 3), (128, 128, 128), dtype=np.uint8) + + # 将缩放后的图像粘贴到背景上,使其居中 + top = (h - nh) // 2 + left = (w - nw) // 2 + new_image[top:top + nh, left:left + nw] = image_resized + + rescaled_image = new_image.astype(np.float32) / 255.0 + else: + new_image = np.zeros((h, w), dtype=np.uint8) + + # 将缩放后的图像粘贴到背景上,使其居中 + top = (h - nh) // 2 + left = (w - nw) // 2 + + # 假设 image_resized 也是单通道 + new_image[top:top + nh, left:left + nw] = image_resized + rescaled_image = new_image + + return rescaled_image + + +class MultiClassDataset: + """ + Read image and mask from original images, and split all data into train_dataset and val_dataset by `split`. + Get image path and mask path from a tree of directories, + images within one folder is an image, the image file named `"image.png"`, the mask file named `"mask.png"`. + """ + + def __init__( + self, + data_dir, + repeat, + is_train=False, + split=0.8, + shuffle=False): + self.data_dir = data_dir + self.is_train = is_train + self.split = (split != 1.0) + + if self.split: + self.img_ids = [ + d for d in sorted( + next( + os.walk( + self.data_dir))[1]) if not d.startswith('.')] + self.train_ids = self.img_ids[:int( + len(self.img_ids) * split)] * repeat + self.val_ids = self.img_ids[int(len(self.img_ids) * split):] + else: + self.train_ids = [ + d for d in sorted( + next( + os.walk( + os.path.join( + self.data_dir, + "train")))[1]) if not d.startswith('.')] * repeat + self.val_ids = [ + d for d in sorted( + next( + os.walk( + os.path.join( + self.data_dir, + "val")))[1]) if not d.startswith('.')] + if shuffle: + np.random.shuffle(self.train_ids) + + def _read_img_mask(self, img_id): + if self.split: + path = os.path.join(self.data_dir, img_id) + elif self.is_train: + path = os.path.join(self.data_dir, "train", img_id) + else: + path = os.path.join(self.data_dir, "val", img_id) + img = cv2.imread(os.path.join(path, "image.png")) + mask = cv2.imread(os.path.join(path, "mask.png"), cv2.IMREAD_GRAYSCALE) + return img, mask + + def __getitem__(self, index): + if self.is_train: + return self._read_img_mask(self.train_ids[index]) + return self._read_img_mask(self.val_ids[index]) + + @property + def column_names(self): + column_names = ['image', 'mask'] + return column_names + + def __len__(self): + if self.is_train: + return len(self.train_ids) + return len(self.val_ids) + + +def preprocess_img_mask(img, mask, num_classes, img_size): + """ + Preprocess for multi-class dataset. + Random crop and flip images and masks when augment is True. + """ + img = preprocess_input(img, img_size) + mask = preprocess_input(mask, img_size, ismask=True) + + img = img.transpose(2, 0, 1) + mask = mask.astype(np.float32) / mask.max() + mask = (mask > 0.5).astype(np.int_) + mask = (np.arange(num_classes) == mask[..., None]).astype(int) + mask = mask.transpose(2, 0, 1).astype(np.float32) + return img, mask + + +def create_multi_class_dataset( + data_dir, + img_size, + repeat, + batch_size, + num_classes=2, + is_train=False, + split=0.8, + rank=0, + group_size=1, + shuffle=True): + """ + Get generator dataset for multi-class dataset. + """ + cv2.setNumThreads(0) + ds.config.set_enable_shared_mem(True) + cores = multiprocessing.cpu_count() + num_parallel_workers = min(4, cores // group_size) + mc_dataset = MultiClassDataset(data_dir, repeat, is_train, split, shuffle) + dataset = ds.GeneratorDataset( + mc_dataset, + mc_dataset.column_names, + shuffle=True, + num_shards=group_size, + shard_id=rank, + num_parallel_workers=num_parallel_workers, + python_multiprocessing=is_train) + compose_map_func = ( + lambda image, mask: preprocess_img_mask( + image, + mask, + num_classes, + tuple(img_size) + ) + ) + dataset = dataset.map( + operations=compose_map_func, + input_columns=mc_dataset.column_names, + output_columns=mc_dataset.column_names, + num_parallel_workers=num_parallel_workers) + dataset = dataset.batch( + batch_size, + drop_remainder=is_train, + num_parallel_workers=num_parallel_workers) + return dataset diff --git a/community/cv/Resunet&Vggunet/utils/infer_preprocess_image.py b/community/cv/Resunet&Vggunet/utils/infer_preprocess_image.py new file mode 100644 index 0000000000000000000000000000000000000000..cfcc2151dd6fbb4c5d1357f77098b1315160ff8f --- /dev/null +++ b/community/cv/Resunet&Vggunet/utils/infer_preprocess_image.py @@ -0,0 +1,60 @@ +""" + preprocess image util +""" +import numpy as np +from PIL import Image +from mindspore.dataset import vision + + +def convert_to_rgb(image): + """ + 确保图像有3个通道 (RGB 格式) + 将单通道图片转化为3通道RGB格式 + + Args: + image (PIL.Image): 输入图像 + + Returns: + PIL.Image: RGB 图像 + """ + if len(np.shape(image)) == 3 and np.shape(image)[2] == 3: + return image + return image.convert('RGB') + + +def preprocess_input(image, target_size): + """ + 将图像调整至目标尺寸,并进行预处理(标准化),返回预处理后的图像和调整后的尺寸. + + Args: + image (PIL.Image): 输入图像 + target_size (tuple): 以(宽度,高度)形式给出图片要变换到的目标尺寸 + + Returns: + ms.Tensor: 预处理后的图像张量 + tuple: 调整后的宽度和高度 + """ + original_width, original_height = image.size # 获得原始图像大小 + target_width, target_height = target_size # 设定目标大小 + + # 计算放缩比例 + scale = min(target_width / original_width, target_height / original_height) + new_width = int(original_width * scale) + new_height = int(original_height * scale) + + # 以双线性插值的方式对 图像进行放缩 + image = image.resize((new_width, new_height), Image.BICUBIC) + new_image = Image.new('RGB', target_size, (128, 128, 128)) + new_image.paste( + image, + ( + (target_width - new_width) // 2, + (target_height - new_height) // 2 + ) + ) + + # 将图像归一化到[0,1] + rescale = vision.Rescale(1.0 / 255.0, 0) + rescaled_image = rescale(new_image) + + return rescaled_image, new_width, new_height diff --git a/community/cv/Resunet&Vggunet/utils/xdog.py b/community/cv/Resunet&Vggunet/utils/xdog.py new file mode 100644 index 0000000000000000000000000000000000000000..ea254c5319d3b93fb8569621c9bdf4c942b958bb --- /dev/null +++ b/community/cv/Resunet&Vggunet/utils/xdog.py @@ -0,0 +1,47 @@ +""" + xdog util +""" +import numpy as np +from scipy.ndimage import gaussian_filter + + +def xdog(image, k=200, gamma=0.5, epsilon=0.1, phi=10): + """ + Computes the eXtended Difference of Gaussians (XDoG) for a given image. + + Args: + image: A grayscale image as an n x m numpy array. + k: Multiplier for the second Gaussian filter's sigma. + gamma: Multiplier for the second Gaussian result. + epsilon: The threshold offset for the XDoG process. + phi: Parameter for controlling the sharpness of the transition. + + Returns: + An n x m numpy array representing the XDoG processed image. + """ + # Compute the Difference of Gaussians (DoG) + s1 = 0.5 + s2 = s1 * k + + gauss1 = gaussian_filter(image, s1) + gauss2 = gaussian_filter(image, s2) + + # difference = (p + 1) * gauss1 - p * gauss2 + + difference = gauss1 - gamma * gauss2 + + # Normalize the difference image to [0, 1] + difference = difference / 255.0 + + # Apply the XDoG formula + for i in range(difference.shape[0]): + for j in range(difference.shape[1]): + if difference[i, j] >= epsilon: + difference[i, j] = 1 + else: + ht = np.tanh(phi * (difference[i, j] - epsilon)) + difference[i, j] = 1 + ht + + # Convert back to the original scale + result = difference * 255 + return np.clip(result, 0, 255).astype(np.uint8)