1 Star 0 Fork 0

Mr霖/temp

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
文件
该仓库未声明开源许可证文件(LICENSE),使用请关注具体项目描述及其代码上游依赖。
克隆/下载
temp.py 36.93 KB
一键复制 编辑 原始数据 按行查看 历史
Mr霖 提交于 2025-05-05 11:41 +08:00 . update temp.py.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947948949950951952953954955956957958959960961962963964965966967968969970971972973974975976977978979980981982983984985986987988989990991992993994995996997998999100010011002100310041005100610071008100910101011101210131014101510161017101810191020102110221023102410251026102710281029103010311032103310341035103610371038103910401041104210431044104510461047104810491050105110521053105410551056105710581059106010611062106310641065106610671068106910701071107210731074107510761077107810791080108110821083108410851086108710881089109010911092109310941095109610971098109911001101110211031104110511061107110811091110111111121113111411151116111711181119112011211122112311241125112611271128112911301131113211331134113511361137113811391140114111421143114411451146114711481149115011511152115311541155115611571158115911601161116211631164116511661167116811691170117111721173117411751176117711781179118011811182118311841185118611871188118911901191119211931194119511961197119811991200120112021203120412051206120712081209121012111212121312141215121612171218121912201221122212231224122512261227122812291230123112321233123412351236123712381239124012411242124312441245124612471248124912501251125212531254125512561257125812591260126112621263126412651266126712681269127012711272127312741275127612771278127912801281128212831284128512861287128812891290129112921293129412951296129712981299130013011302130313041305130613071308130913101311131213131314131513161317131813191320132113221323132413251326132713281329133013311332133313341335133613371338133913401341134213431344134513461347134813491350135113521353135413551356135713581359136013611362136313641365136613671368136913701371137213731374137513761377137813791380138113821383138413851386138713881389139013911392139313941395139613971398139914001401140214031404140514061407140814091410141114121413141414151416141714181419142014211422142314241425142614271428142914301431143214331434143514361437143814391440144114421443144414451446144714481449145014511452145314541455145614571458145914601461146214631464146514661467146814691470147114721473147414751476147714781479148014811482148314841485148614871488148914901491149214931494149514961497149814991500150115021503150415051506150715081509151015111512151315141515151615171518151915201521
# 1.常用配置
```bash
shell配置
temp_proxy_ip=90.255.14.187; \
export http_proxy="http://p_atlas:proxy%40123@$temp_proxy_ip:8080" && \
export https_proxy="http://p_atlas:proxy%40123@$temp_proxy_ip:8080" && \
export no_proxy=127.0.0.1,.huawei.com,localhost,local,.local
# 尽量不设全局代理
git config --global http.proxy http://p_atlas:proxy%40123@50.65.18.4:8080
git config --global https.proxy http://p_atlas:proxy%40123@50.65.18.4:8080
git config --global http.postBuffer 2097152000
Acquire::http::Proxy "http://p_atlas:proxy%40123@50.65.44.186:8080";
Acquire::https::Proxy "http://p_atlas:proxy%40123@50.65.44.186:8080";
export LD_LIBRARY_PATH=/usr/local/gcc7.3.0/lib64:${LD_LIBRARY_PATH}
export CC=/usr/local/gcc7.3.0/bin/gcc
export CXX=/usr/local/gcc7.3.0/bin/g++
export PATH=/usr/local/gcc7.3.0/bin:${PATH}
export LD_PRELOAD=$LD_PRELOAD:/root/miniconda3/envs/cql_opensora1.0/lib/libgomp.so.1
# 添加python包环境变量
export PYTHONPATH=$PYTHONPATH:/root/share/cql/Megatron-LM/megatron
wget http://gcc.gnu.org/pub/gcc/infrastructure/gmp-6.1.0.tar.bz2 --no-check-certificate
wget http://gcc.gnu.org/pub/gcc/infrastructure/mpfr-3.1.4.tar.bz2 --no-check-certificate
wget http://gcc.gnu.org/pub/gcc/infrastructure/mpc-1.0.3.tar.gz --no-check-certificate
wget http://gcc.gnu.org/pub/gcc/infrastructure/isl-0.16.1.tar.bz2 --no-check-certificate
端口映射映射地址proxy.huawei.com
映射端口8080
Conda配置
配置代理
1) 打开文件内容
vim ~/.condarc
2) 将内容修改为
proxy_servers:
  http: http://p_atlas:proxy%40123@80.253.84.77:8080
  https: http://p_atlas:proxy%40123@80.253.84.77:8080
ssl_verify: false
report_errors: false
allow_conda_downgrades: true
show_channel_urls: true
auto_activate_base: false
```
# 2.linux命令
```bash
查看文件夹空间
df -h /path
查看文件大小
du -h /path
显示文件夹下修改时间及显示文件大小kbmb
ls -lth
查看某个文件夹大小
du -sh
查看所有分区的可用空间
df -h
本地上传文件夹
scp -r -P 22 D:\data\OpenSora_GPU\opensora_data root@51.62.15.15:/data1/c30061641
杀进程
pgrep -f python|xargs kill -9
pkill -9 python
ps -aux | grep internvl2-26B | grep -v grep| awk '{print $2}' | xargs kill -9
环境变量
printenv
echo $PATH
采集错误日志
msnpureport -f
查询所有接口状态
dis int b
每1秒刷新gpu状态
watch -n 1 nvidia-smi
查看系统是否基于ARM架构
arch
替换文件文本内容
sed -i 's|原内容|替换后内容|g' 文件名
向所有终端广播信息
[root@localhost ~]# wall this is a test line
Broadcast message from root (pts/1) (Fri Dec 20 11:36:51 2013):
this is a test line
查看cpu数量
nproc
压缩与解压
tar -zcvf /home/xahot.tar.gz /xahot # 压缩;例子:把/xahot文件夹打包后生成一个/home/xahot.tar.gz的文件。
tar -xzf packed.tar.gz -C packed # 解压缩
软链接
ln -s /var/www/test test # 当前路径创建test 引向/var/www/test 文件夹
查找特定目录
find /path/to/search -type d -name "*b090*"
查找特定进程启动文件绝对路径
ll /proc/<PID>
pwdx PID
同时显示文件夹的大小以及它们的最后修改时间并按照文件夹大小进行排序
du -h --max-depth=1 | grep -v '^0' | while read line; do
size=$(echo $line | awk '{print $1}')
path=$(echo $line | awk '{print $2}')
if [ -d "$path" ]; then
mod_time=$(stat -c %y "$path" | awk '{print $1}')
echo "$size $mod_time $path"
fi
done | sort -rhk1,1 -k2,2
# 简单写法
du -lh --max-depth=1 /home
限制脚本执行时间
timeout 1800 bash run.sh
# 设置定时任务之sleep
sleep **
bash 任务脚本
临时指定编辑器
export EDITOR=vim
修改文件权限
sudo chown -R c00899541:c00899541 EasyR1/
后台运行命令
nohup command > output.log 2>&1 &
如果需要删除 ori 文件夹下所有子文件夹中的文件同时保留子文件夹本身
find ori -type f -delete
```
# 3.昇腾版本包
```
版本转测
FrameworkPTAdapter 7.0.RC1.B020
https://cmc-szv.clouddragon.huawei.com/cmcversion/index/releaseView?deltaId=11878173034349824&isSelect=Inner
CANN 8.1.RC1.B030
https://cmc.rnd.huawei.com/cmcversion/index/releaseView?deltaId=11917143636641026&isSelect=Software
Ascend HDK 24.1.0
https://cmc-szv.clouddragon.huawei.com/cmcversion/index/componentVersionView?deltaId=11469100756019072&isSelect=Software
```
```bash
install_path:/home/gpt_neox/c30061641
# 安装驱动固件
第一次安装顺序先驱动后固件升级顺序先固件firmware后驱动driver
bash Ascend-hdk-910-npu-driver_6.0.0_linux-x86-64.run --full --install-for-all
bash Ascend-hdk-910b-npu-firmware_6.4.0.4.220.run --full --quiet
reboot
# 安装cann
rm -rf /etc/Ascend/ascend_cann_install.info
*#创建安装目录*
chmod -R 755 /home/c30061641
*#安装包*
bash Ascend-cann-toolkit_8.0.0_linux-aarch64.run --install-path=/root/cann --full -q
bash Atlas-A3-cann-kernels_8.0.0_linux-aarch64.run --install-path=/root/cann --install -q
*#source环境变量*
source /home/c30061641/cann/b010/ascend-toolkit/set_env.sh
```
sh脚本自动安装cann和apex
```bash
# ---------------install cann---------------
rm -rf /etc/Ascend/ascend_cann_install.info
chmod -R 755 /home/c30061641
bash Ascend-cann-toolkit*.run --install-path=/home/c30061641/cann/b080 --full -q
bash Ascend-cann-kernels*.run --install-path=/home/c30061641/cann/b080 --install -q
source /home/c30061641/cann/b030/ascend-toolkit/set_env.sh
bash Ascend-cann-nnal*.run --install-path=/home/c30061641/cann/b030 --install -q
rm -rf /etc/Ascend/ascend_cann_install.info
# ---------------install apex---------------
git clone https://gitee.com/ascend/apex.git
cd apex/
bash scripts/build.sh --python=3.10
cd apex/dist/
pip3 uninstall apex
pip3 install --upgrade apex-0.1+ascend*.whl
```
vllm-ascend安装
```bash
# Install vLLM
git clone --depth 1 --branch |vllm_version| https://github.com/vllm-project/vllm
cd vllm
VLLM_TARGET_DEVICE=empty pip install . --extra-index https://download.pytorch.org/whl/cpu/
# Install vLLM Ascend
git clone --depth 1 --branch |vllm_ascend_version| https://github.com/vllm-project/vllm-ascend.git
cd vllm-ascend
pip install -e . --extra-index https://download.pytorch.org/whl/cpu/
# Once the packages are installed, you need to install `torch-npu` manually,
# because that vllm-ascend relies on an unreleased version of torch-npu.
# This step will be removed in the next vllm-ascend release.
#
# Here we take python 3.10 on aarch64 as an example. Feel free to install the correct version for your environment. See:
#
# https://pytorch-package.obs.cn-north-4.myhuaweicloud.com/pta/Daily/v2.5.1/20250308.3/pytorch_v2.5.1_py39.tar.gz
# https://pytorch-package.obs.cn-north-4.myhuaweicloud.com/pta/Daily/v2.5.1/20250308.3/pytorch_v2.5.1_py310.tar.gz
# https://pytorch-package.obs.cn-north-4.myhuaweicloud.com/pta/Daily/v2.5.1/20250308.3/pytorch_v2.5.1_py311.tar.gz
#
mkdir pta
cd pta
wget https://pytorch-package.obs.cn-north-4.myhuaweicloud.com/pta/Daily/v2.5.1/20250308.3/pytorch_v2.5.1_py310.tar.gz
tar -xvf pytorch_v2.5.1_py310.tar.gz
pip install ./torch_npu-2.5.1.dev20250308-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
--------------------------------
# 验证vllm
from vllm import LLM, SamplingParams
prompts = [
"Hello, my name is",
"The president of the United States is",
"The capital of France is",
"The future of AI is",
]
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
# Create an LLM.
llm = LLM(model="Qwen/Qwen2.5-0.5B-Instruct")
# Generate texts from the prompts.
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
```
docker内安装https://3ms.huawei.com/km/blogs/details/15205432
```bash
# 临时目录空间不足,设置其他目录为临时目录
export TMPDIR=/new/tmp/directory
# 校验文件是否相同
md5sum file
# firmwareDeviceReady failed
https://3ms.huawei.com/km/blogs/details/14331751
# hdk操作日志
/var/log/ascend_seclog/operation.log
```
```bash
# PTA包安装
# 编译出whl包
bash ci/build.sh --python=3.8
# 安装生成的whl,记得使用--no-deps,因为上面是torch==2.1.0a0,不被PTA识别为torch==2.1.0
pip install ./dist/torch_npu-2.1.0.post3+git0d6a0ee-cp38-cp38-linux_aarch64.whl --no-deps
```
# 4.VIM
```
1.删除文档所有内容
**使用可视模式**
- `gg` 跳转到文档的第一行
- `V` 进入可视行模式
- 滚动到文档的末尾使用 `Shift+V` 进入可视块模式选中整个页面
- `d` 删除选中的文本
2.复制及粘贴文档内容
- `gg` 跳转到文档的第一行
- `V`大写 V进入可视行模式)。
- 滚动到文档的最后一行这将自动选中从第一行到最后一行的所有内容
- `y`代表 yank即复制复制选中的文本到 Vim 的寄存器
使用 `p` `P` 将复制的内容粘贴到 Vim 的其他位置或外部程序中`p` 是在当前光标位置之后粘贴 `P` 是在当前光标位置之前粘贴
3.查找
例如查找200/’,使用\转义字符
/200\/
```
# 5.A100环境搭建
```bash
A100环境搭建
1.conda环境
pip install torch==2.1.0 --index-url https://download.pytorch.org/whl/cu121
pip install torchvision==0.16.0 --index-url https://download.pytorch.org/whl/cu121
pip install xformers==0.0.22.post3 --index-url https://download.pytorch.org/whl/cu121
pip install packaging ninja
pip install flash-attn --no-build-isolation
git clone https://github.com/NVIDIA/apex
cd apex
处理setup文件
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
pip install pytorch-extension
pip install colossalai==0.3.7
删除requirements中的colossalai
pip install -r requirement**
pip install -e .
进入OpenSora_gpu
ln -s ../opensora_data
————————————
pip下载包到指定目录
pip download 包名称 -d 目录名
pip download xformers==0.0.22.post3 --index-url https://download.pytorch.org/whl/cu121 -d .
————————————
pip安装指定下载的whl包路径
pip install --no-index --find-links=/data1/c30061641/conda_data 包名称
pip install --no-index --find-links=/data1/c30061641/MiniCPM-V/conda_data
————————————
pip安装时强制依赖包版本
pip install 包名称 --constraint requirements.txt
————————————
pip安装时指定二进制whl包
pip download deepspeed --only-binary :all:
# 指定cuda版本
export CUDA_HOME=/usr/local/cuda-12.4
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
2.数据集和模型文件
tips
先装xformers
whl包下载不下来就先下载到本地
name:
8机_A100_
ip
51.62.15.15 : /data1/c30061641 数据ok 环境ok
51.62.14.13 : /data1/c30061641 数据ok 环境安装中
51.62.15.17 : /data1/c30061641 数据ok
51.62.14.11 : /data1/c30061641 数据ok
51.62.14.15 : /data/c30061641 数据ok
51.62.15.11 : /home/c30061641 数据ok
51.62.15.13 : /home/c30061641 数据ok
51.62.14.17 : /home/c30061641 数据ok
password:
DCauto1!2@
scp -r -P 22 opensora_data root@51.62.14.15:/data/c30061641
scp -r root@51.62.15.15:/data1/c30061641/conda_data .
mpirun -np 2 -H 51.62.14.11,51.62.14.13 ./build/all_reduce_perf -b 8 -e 128M -f 2 -g 1
export NCCL_IB_DISABLE=1
export NCCL_SOCKET_IFNAME=eth0
export NCCL_P2P_DISABLE=1
export NCCL_DEBUG=INFO
sudo mount -t nfs 51.62.14.13:/root/share /root/share
'''
import torch
import torch_npu
import os
device = int(os.getenv('LOCAL_RANK'))
print(device)
torch.npu.set_device(device)
# Call the hccl init process
torch.distributed.init_process_group(backend='hccl', init_method='env://')
a = torch.tensor(1).npu()
torch.distributed.all_reduce(a)
print('Hccl: ', a, device)
'''
torchrun --master_addr=90.90.94.158 --master_port=6001 --nnodes=2 --node_rank=1 --nproc_per_node=8 test_allreduce.py
- `--master_addr=master` 指定主节点的地址为 "master"
- `--master_port=20022` 指定主节点的端口为 20022
- `--nnodes=2` 指定使用的节点数为 2
- `--node_rank=0` 指定当前节点的排名为 0
- `--nproc_per_node=8` 指定每个节点使用的进程数为 8
torchrun --master_addr=90.90.94.158 --master_port=20022 --nnodes=2 --node_rank=0 --nproc_per_node=8 test_allreduce.py
CUDA_VISIBLE_DEVICES=0,1,2,4
```
torch2.5.1环境安装
```bash
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
```
切换cuda版本
```bash
# 根据需要修改路径
export PATH=/usr/local/cuda-12.1/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib64:$LD_LIBRARY_PATH
# 验证CUDA版本
nvcc --version
# 系统默认指向CUDA 12.2,可以更新符号链接:
sudo rm /usr/local/cuda
sudo ln -s /usr/local/cuda-12.1 /usr/local/cuda
```
下载包到本地
```python
import subprocess
# 假设requirements.txt文件中包含了需要下载的包
requirements_file = 'requirements.txt'
# 指定下载路径
download_path = './python_pkgs'
# 检查下载路径是否存在,如果不存在则创建
import os
if not os.path.exists(download_path):
os.makedirs(download_path)
# 读取requirements文件中的包名
with open(requirements_file, 'r') as file:
packages = file.read().splitlines()
# 逐个下载包到指定路径
for package in packages:
try:
# 调用pip download命令并添加下载路径
subprocess.run(['pip', 'download', '-d', download_path, '--constraint', requirements_file, package], check=True)
print(f"Package '{package}' downloaded successfully to '{download_path}'.")
except subprocess.CalledProcessError as e:
print(f"Failed to download package '{package}': {e}")
```
# 6.GPU集群
```bash
查看GPU之间的连接拓扑
nvidia-smi topo -m
查看InfiniBand设备的详细信息
ibv_devinfo
激活ib端口
sudo systemctl start opensm
sudo /user/sbin/opensm
测试RDMA端口
# 远程机器上启动RDMA带宽测试服务器
ib_send_bw -d mlx5_0
# 本地机器上运行RDMA带宽测试客户端
ib_send_bw -d mlx5_0 <remote_ip>
nvidia相关环境变量
export NCCL_P2P_LEVEL=NVL
export NCCL_IB_DISABLE=1
export NCCL_SOCKET_IFNAME=eth0
export NCCL_P2P_DISABLE=1
export NCCL_DEBUG=INFO
export CUDA_VISIBLE_DEVICES=1,3
GPU机器联通
export PATH=/usr/local/mpi/bin:/usr/local/cuda-12.2/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/mpi/lib:/usr/local/cuda-12.2/lib64:/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH
export NCCL_ALGO=ring
export NCCL_IB_HCA=mlx5_0,mlx5_1,mlx5_2,mlx5_5,mlx5_6,mlx5_7,mlx5_8,mlx5_9
export NCCL_SOCKET_IFNAME=enp86s0f0
export NCCL_IB_GID_INDEX=3
export NCCL_IB_TC=106
export MAX_JOBS=160
export NCCL_MIN_NCHANNELS=32
```
gpu ID=3 跨机器通信有问题
```bash
NPUS_PER_NODE=6
可以跑通的配置
CUDA_VISIBLE_DEVICES=0,3,4,5,6,7
CUDA_VISIBLE_DEVICES=0,2,4,5,6,7
CUDA_VISIBLE_DEVICES=1,3,4,5,6,7
不可以跑通的配置
CUDA_VISIBLE_DEVICES=0,1,4,5,6,7su
CUDA_VISIBLE_DEVICES=1,2,4,5,6,7
CUDA_VISIBLE_DEVICES=2,3,4,5,6,7
NCCL_SOCKET_IFNAME=mlx5_0,mlx5_1,mlx5_2,mlx5_3,mlx5_4,mlx5_5,mlx5_6,mlx5_7,mlx5_8,mlx5_9
/usr/local/cuda-12.4/lib64/libcudnn_cnn_train.so.8
/usr/local/cuda/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8
14.11 14.13
NCCL_SOCKET_IFNAME=eth0,enp45s0f0np0,ibp138s0,ibp141s0,ibp14s0,ibp17s0,ibp199s0,ibp202s0,ibp82s0,ibp83s0,docker0,lo
NCCL_SOCKET_IFNAME=eth0,enxbe3af2b6059f,enp45s0f0np0,ibp14s0,ibp168s0f0,ibp168s0f1,ibp17s0,ibp199s0,ibp202s0,ibp82s0,ibp83s0,ibp141s0,ibp138s0,docker0,lo
NCCL_SOCKET_IFNAME=eth,enp,ib
```
CUDA_VISIBLE_DEVICES=1,2,3,4,5,6
CUDA_VISIBLE_DEVICES=0,1,2,4,5,6
`NCCL_P2P_LEVEL` 是一个环境变量用于设置 NVIDIA Collective Communications Library (NCCL) 的点对点 (P2P) 通信级别NCCL 是用于 NVIDIA GPU 的一个高性能通信库它支持多GPU之间的数据并行操作
`NCCL_P2P_LEVEL` 的值可以是以下之一
\- `DEFAULT`使用 NCCL 的默认 P2P 通信策略
\- `SYS`使用系统提供的 P2P 通信能力
\- `NVL`使用 NVIDIA 网络库NVLink进行通信
设置 `NCCL_P2P_LEVEL=NVL` 主要用于优化多GPU之间的通信特别是在使用 NVIDIA NVLink 连接的 GPU 系统上通过显式设置为使用 NVLink可以提高数据传输的效率从而可能提高整体的计算性能
在某些情况下如果系统默认的 P2P 通信策略不能充分利用硬件的能力或者在特定的硬件配置上使用 NVLink 可以提供更好的性能那么设置 `NCCL_P2P_LEVEL=NVL` 可能会有所帮助然而这并不是一个通用的解决方案因为不同的系统和应用场景可能需要不同的设置来达到最佳性能
# 7.docker
```bash
# docker使用参考
https://wiki.huawei.com/domains/23224/wiki/56326/WIKI202311072304489
# 机器重启后,docker也需重启
sudo systemctl restart docker
# 加载镜像
docker load -i torch_model_aarch64-torch2.1.0.tar
# 启动镜像
docker run -it --name my-torch-container torch_model:torch2.1.0 /bin/bash
docker run -it --name my-torch-container -v /home:/home torch_model:torch2.1.0 /bin/bash
docker run -dit -u root --ipc=host --network host --name mm-b080 \
--device=/dev/davinci0 \
--device=/dev/davinci1 \
--device=/dev/davinci2 \
--device=/dev/davinci3 \
--device=/dev/davinci4 \
--device=/dev/davinci5 \
--device=/dev/davinci6 \
--device=/dev/davinci7 \
--device=/dev/davinci8 \
--device=/dev/davinci9 \
--device=/dev/davinci10 \
--device=/dev/davinci11 \
--device=/dev/davinci12 \
--device=/dev/davinci13 \
--device=/dev/davinci14 \
--device=/dev/davinci15 \
--device=/dev/davinci_manager \
--device=/dev/devmm_svm \
--device=/dev/hisi_hdc \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /usr/local/Ascend/add-ons/:/usr/local/Ascend/add-ons/ \
-v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi \
-v /etc/hccn.conf:/etc/hccn.conf \
-v /usr/local/sbin/:/usr/local/sbin/ \
-v /var/log/npu/conf/slog/slog.conf:/var/log/npu/conf/slog/slog.conf \
-v /var/log/npu/slog/:/var/log/npu/slog \
-v /var/log/npu/profiling/:/var/log/npu/profiling \
-v /var/log/npu/dump/:/var/log/npu/dump \
-v /var/log/npu/:/usr/slog \
-v /home/:/home/ \
mm-b080:latest /bin/bash
# 启动docker容器
docker start {name}
# 进入容器
docker exec -it {name} /bin/bash
# 停止并删除容器
docker stop my-torch-container
docker rm my-torch-container
# 删除镜像
docker rmi my-torch-container/<IMAGE_ID>
# export环境变量(下面二选一或都设置)
export LD_LIBRARY_PATH=/usr/local/Ascend/driver/lib64/driver:${LD_LIBRARY_PATH}
export LD_LIBRARY_PATH=/usr/local/Ascend/driver/lib64/common:${LD_LIBRARY_PATH}
------------------------------------------------------------------------------------
export LD_PRELOAD=$LD_PRELOAD:/usr/local/Ascend/driver/lib64/common/libc_sec.so
export LD_PRELOAD=$LD_PRELOAD:/usr/local/Ascend/driver/lib64/common/libmmpa.so
export LD_PRELOAD=$LD_PRELOAD:/usr/local/Ascend/driver/lib64/driver/libascend_hal.so
export LD_PRELOAD=$LD_PRELOAD:/usr/local/Ascend/driver/lib64/driver/libdrvdsmi_host.so
export LD_PRELOAD=/lib/aarch64-linux-gnu/libgomp.so.1 # 可能有用
# 提交容器为新镜像
docker commit my-torch-container my-torch-container-with-pandas
# 保存镜像
docker save -o my-torch-container.tar my-torch-container
# 查看所有镜像
docker images
# 显示当前正在运行的 Docker 容器
docker ps -a
# 保存Docker镜像的tar文件
docker save -o myimage.tar myimage:tag
# 镜像中多机互联
ifconfig
export GLOO_SOCKET_IFNAME=enp189s0f0
```
# 8.conda&pip
```bash
# 下载并安装miniconda
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
# 激活conda
source /data/anaconda3/bin/activate
# 打包指定环境(--ignore-editable-packages 忽略可编辑的包)
conda pack -n myenv --ignore-editable-packages
# 通过 --include 和 --exclude 选项来控制哪些文件被包括或排除:
conda pack -n myenv --exclude "*.pyc" --exclude "*.tmp"
pack/package
# 解压和使用包
tar -xzf myenv-packed.tar.gz -C myenv
# 解压后,激活环境前需要运行以下命令以重新链接二进制文件:
source myenv/bin/activate
# conda复制已有环境
conda create --name llava_cql --clone llava
# conda复制已有环境到指定路径
conda create --prefix /new/path/to/env --clone myenv
# conda激活特定路径的环境
conda activate /path/to/env/myenv
# conda删除指定环境
conda env remove -n env_name
```
```bash
# pip配置
pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
pip config set global.trusted-host mirrors.aliyun.com
 
pip config set global.index-url http://repo.huaweicloud.com/repository/pypi/simple
pip config set global.trusted-host repo.huaweicloud.com
 
华为云更快
http://repo.huaweicloud.com/repository/pypi/simple
# pip install -e . 卸载
lib/pythonX.Y/site-packages/ 目录中删除
```
# 9.git
```bash
# gitee信息
Username for 'https://gitee.com': mr-lin314
Password for 'https://mr-lin314@gitee.com': Chenql19960314
# git配置邮箱和用户名
git config user.email 798948055@qq.com
git config user.name mr-lin314ls
# git初始化本地代码
git init
# git排除添加的文件
.gitignore
# git 合并远程分支到master (使用dev分支开发,完成后合并到master分支的操作过程)
git checkout -b dev (创建新分支)
coding->commit-push dev (代码的更改都存在dev分支下)
git checkout master (合并前切换到主分支)
git pull (拉一下)
git checkout dev (切回dev)
git merge master (合并有冲突解决冲突)
git commit (保存确认)
git checkout master (切换回master)
git merge dev --squash (分支合并)
git commit -m "输入commit信息"
git push origin
# Git 之 push 代码后,如何回退/回滚到之前的版本的方法简单整理
1.查看 push日志
git log
2.强制到目标版本
git reset --hard 回退到的版本号
3.使用 gitk 查看本地版本已经回退
gitk
4.把当前的 head 指针指向强制提交推送到远程实现回退版本
git push -f origin 远程分支
# git强制覆盖本地命令
git fetch --all
git reset --hard origin/master
# 合并commit
1.压缩n个commit为1个
git rebase -i HEAD~n
vim编辑器按i编辑将后4个commit的pick修改为fixup保留第一个pick按esc键输入:wq保存退出
pick使用commit
reword使用commit修改commit信息(直接wq保存退出后会自动跳出更改)
squash使用commit将commit信息合入上一个commit
fixup使用commit丢弃commit信息
2.提交
git push --force
# 使用git checkout -m强制覆盖,这种方法会覆盖工作目录中的文件,可能会导致数据丢失。
git checkout -m ****
# 全局禁用SSL验证
git config --global http.sslVerify false
# 将代码库强制还原到某个特定的 commit
git reset --hard commit_id
```
# 10.jupyter
```bash
pip install notebook
pip install ipykernel
# jupyter添加conda环境
python -m ipykernel install --name {conda名}_jupyter --user
# 启动jupyter
jupyter notebook --no-browser --port 8890 --allow-root
# pycharm配置环境变量
http://127.0.0.1:8890/tree?token=69cdc8c8593fa63be44318db912268e58b0191c79f935a3a
```
# 11.breakpoint
- **s(tep)**
运行当前行在第一个可以停止的位置在被调用的函数内部或在当前函数的下一行停下
- **n(ext)**
继续运行直到运行到当前函数的下一行或当前函数返回为止。( [`next`](https://docs.python.org/zh-cn/3/library/pdb.html#pdbcommand-next) 和 [`step`](https://docs.python.org/zh-cn/3/library/pdb.html#pdbcommand-step) 之间的区别在于,[`step`](https://docs.python.org/zh-cn/3/library/pdb.html#pdbcommand-step) 进入被调用函数内部并停止,而 [`next`](https://docs.python.org/zh-cn/3/library/pdb.html#pdbcommand-next) (几乎)全速运行被调用函数,仅在当前函数的下一行停止。)
- **unt(il)** [lineno]
如果不带参数则继续运行直到行号比当前行大时停止如果带有 *lineno*则继续执行直至行号大于或等于 *lineno* 在这两种情况下在当前帧返回时也将停止* 3.2 版本发生变更:* 允许明确给定行号
- **r(eturn)**
继续运行直到当前函数返回
- **c(ont(inue))**
继续运行仅在遇到断点时停止
**w(here)**
打印栈回溯最新的帧位于底部 有一个箭头 (`>`) 指明当前帧该帧决定了大多数命令的上下文
| **unt或untilunt lineno** | 持续执行直到运行到指定行或遇到断点 |
| ---------------------------- | -------------------------------------------- |
| j lineno或jump | 直接跳转到指定行注意被跳过的代码不执行 |
# 12.Transformers
## Tokenizer
```bash
# id转token
tokenizer.convert_ids_to_tokens(id)
```
# 13.Torch
```bash
# 打印张量指定小数后几位
torch.set_printoptions(precision=8)
# 输出整个tensor
torch.set_printoptions(profile="full")
# 指定torch编译路径
export TORCH_EXTENSIONS_DIR=***
```
## 内存快照
https://pytorch.org/docs/2.1/torch_cuda_memory.html#understanding-cuda-memory-usage
torch.cuda.变换 为torch_npu.npu.
# 14.Megatron
```bash
# DDP
megatron包了个DDP的实现可以指定grad_dtype
# rope
megatron内部的rope实现中self.inv_freq是float32
# overlap
需要参数init的顺序与前向顺序一致否则有精度问题
```
# 15.NPU环境变量
```bash
输出plog日志
##将Host日志输出到串口,0-关闭/1-开启
export ASCEND_SLOG_PRINT_TO_STDOUT=1
##设置默认日志级别,0-debug/1-info/2-warning/3-error
export ASCEND_GLOBAL_LOG_LEVEL=0
bash xxx.sh >> xx.log
————————————————————————————————
# 设置可用显卡
export ASCEND_RT_VISIBLE_DEVICES=4,5,6,7
os.environ['ASCEND_RT_VISIBLE_DEVICES']='4,5,6,7'
# 当设置为“1”时,强制算子采用同步模式运行,从而更容易地调试和跟踪代码中的问题。设置为“0”时则会采用异步方式执行
export ASCEND_LAUNCH_BLOCKING=1
torch_npu.npu.synchronize()
# 优化host bound
export TASK_QUEUE_ENABLE=2
# 显存优化
export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True
# matmul错峰保证输出结果一致
export CLOSE_MATMUL_K_SHIFT=1
# 查看带宽
npu-smi info -t memory -i 0 -c 0
# 适配npu
from torch_npu.contrib import transfer_to_npu
```
# 16.天成机器保障
常用网址
日常使用https://wiki.huawei.com/domains/74808/wiki/120214/WIKI202406143776228
b110的日常使用https://wiki.huawei.com/domains/74808/wiki/120214/WIKI202408024200506
天成超节点产线调测完整版升级B110指导书https://onebox.huawei.com/p/d7e678c1b6be8bd26dc975e43b550119
后续问题分析补充都放在这里https://wiki.huawei.com/domains/74808/wiki/120214/WIKI202408024200476
```bash
CPLD大于2 按钮上电
100以上通过按钮去上电
查询命令
dis startup
1-16对应npu
17-20对应cpu
AC在96
96:BMC
97:OS
99:1520前台
B070
下电可以一次性复制
上电不能一次性复制先执行npu上电99看npu是否都起来然后cpu再上电
OS挂载在BMC不能reboot
双机不同只有两种情况
1.cann版本/hdk版本不同
2.物理没连通
单机打流
带宽在97-98
多机打流
只在首节点输入就可以根据之前编辑的hostfile找对应的机器
```
```bash
# 传数据
mkdir -p /home/hemuhui
cd /home/hemuhui
df -h .
scp -r root@90.90.97.49:/home/hemuhui/open-sora-data-1.0 .
scp -r root@90.90.97.49:/home/hemuhui/open-sora-data-1.1 .
scp -r root@90.90.97.49:/home/hemuhui/opensora1.1.tar .
密码POCauto1122
scp -r root@90.90.97.48:/home/hemuhui .
scp root@90.90.97.49:/home/hemuhui/opensora1.1-new.tar .
# 挂载共享盘
mkdir -p /root/share
sudo systemctl start nfs-server
sudo mount -t nfs 90.90.97.49:/root/share /root/share
cd /home/share
ls
# 创建docker
docker import opensora1.1-new.tar opensora:1.1
容器创建命令docker run -it --ipc=host --network host --name 'hmh-sora-1.1' --privileged -v /usr/local/Ascend/:/usr/local/Ascend/ -v /usr/local/sbin/:/usr/local/sbin/ -v /home/:/home/ -v /root/share/:/root/share/ opensora:1.1 /bin/bash
docker start hmh-sora-1.1
docker attach hmh-sora-1.1
# cann包 配套包放在/home/pkgs/rc2-b020/中,安装在/home/cann/rc2-b020/
mkdir -p /home/pkgs
mkdir -p /home/cann/rc2-b020
chmod -R 755 /home/cann
cd /home/pkgs
scp -r root@90.90.97.45:/home/pkgs/rc2-b020/ .
cd rc2-b020
rm -rf /etc/Ascend/ascend_cann_install.info
bash Ascend-cann-toolkit_8.0.RC2.10_linux-aarch64.run --install-path=/home/cann/rc2-b020/ --full
bash Ascend-cann-kernels-910c_8.0.RC2.10_linux.run --install-path=/home/cann/rc2-b020/ --install
# 跑模型
1.装mindspeed-在opensoradata1.0
2.pip install colossalai==0.4.2
3.shuffle改成false
source /home/cann/rc2-b020//ascend-toolkit/set_env.sh
source /usr/local/Ascend/driver/bin/setenv.bash
export LD_LIBRARY_PATH=/usr/local/Ascend/driver/lib64/driver/:$LD_LIBRARY_PATH
```
# 17.新工具
```bash
# advisor
pip install msprof-analyze
msprof-analyze advisor all -d /path/to/profiling
```
# 18.精度对齐
确定性计算
```bash
# 安装
pip install mindstudio-probe
# 使用
from msprobe.pytorch import seed_all
seed_all(mode=True)
# HCCL确定性
export HCCL_DETERMINISTIC=true
# mindspeed/utils
from mindspeed.utils import extend_seed_all
extend_seed_all()
```
其他变量
```bash
# 配置文件参数对齐,尤其注意megatron的融合算子参数(带fusion的)
# cann版本(kernel、toolkit、torch_npu)
# 整网ND
torch.npu.config.allow_internal_format = False
# matmul错峰
export CLOSE_MATMUL_K_SHIFT=1
# zero0
"communication_data_type": "fp32"
# 原生优化器
torch.optim.AdamW
或mindspeed的优化器
# 优化器参数
--lr-decay-style constant \
--weight-decay 0.00 \
--clip-grad 0.0 \
```
计算
```bash
1.linear matmul badbmm bmm
2.rearrange reshape替换
3.反向时注意重计算走的逻辑
4.理解模型中vit走full_atten;语言模块走masked_atten
5.多卡时存在all reduce的加和顺序问题
```
# 19.性能调优
```bash
# ACLNN_CACHE_LIMIT:此环境变量用于配置单算子执行API在Host侧缓存的算子信息条目个数,单位:个,取值范围:[1,10000000],默认值为10000。缓存的算子信息包含workspace大小、算子计算的执行器、tiling信息等。动态shape场景下,若算子shape范围较大,开发者可通过此环境变量适当增加算子缓存条目,提升调度性能。但需要注意,增加算子信息缓存条目会增加Host内存开销
export ACLNN_CACHE_LIMIT=100000
#
export HOST_CACHE_CAPACITY=20
# 调度优化
export TASK_QUEUE_ENABLE=2 # 最新可开到3
# 平均绑核
export CPU_AFFINITY_CONF=1
# 设置是否开启2个非连续combined标志,0关闭/1开启
export COMBINED_ENABLE=1
# 融合优化器
mindspeed使能融合算子aspm.register_patch('apex.optimizers.FusedAdam', AdamW, create_dummy=True)
todo融合优化器
# 检查项
不开确定性计算seed_allHCCL_DETERMINISTIC)、
不开CLOSE_MATMUL_K_SHIFT
# GPU profiling查看
chrome://tracing/
```
# 20.内存优化
```bash
export MULTI_STREAM_MEMORY_REUSE=2
# shell脚本添加
--reuse-fp32-param # 参数复用
```
参考mindspeed/model/transformer.py
self.activation_checkpoint_manager = CheckpointWithoutOutput()
![image-20250102192802590](C:\Users\c30061641\AppData\Roaming\Typora\typora-user-images\image-20250102192802590.png)
蓝色和绿色线要重合
a = all_reduce(a) 会出内存问题 要写成 b = all_reduce(a)
重计算分离调度省去某些模块的输出
![image-20250102194623033](C:\Users\c30061641\AppData\Roaming\Typora\typora-user-images\image-20250102194623033.png)
![image-20250102202355956](C:\Users\c30061641\AppData\Roaming\Typora\typora-user-images\image-20250102202355956.png)
![image-20250102202942912](C:\Users\c30061641\AppData\Roaming\Typora\typora-user-images\image-20250102202942912.png)
# 21.报错日志查看
```bash
cd /root/ascend/log
grep -rn ERROR -m 1 | grep 2025-03-10-
```
# 22.项目备注
```
# Internvl
性能拆解放在这个表格中https://onebox.huawei.com/v/b5a445315c1b76491043e30a30eae6df?type=0
Profiling归到https://onebox.huawei.com/#eSpaceGroupFile/1/45/12062403
```
# 23.vscode
```bash
# .env
sh开头添加 set -x
torchrun上面的部分
正则将下面两个替换成空格
.*export.*
\++
 
添加到.env中
# launch.json
print(json.dumps(string.split()))
{
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0",
"configurations": [
{
"name": "TorchRun",
"type": "python",
"request": "launch",
"module": "torch.distributed.run",
"console": "integratedTerminal",
"justMyCode": false,
"args":
}
]
}
```
# 24.libing
编辑AR 更改状态为测试/完成
# 25.云道
```
云道使用指南https://onebox.huawei.com/v/9438af1ee1b089e6e9c784dff2d73a7a?type=1
新建训练任务https://csb.roma.huawei.com/csb/#/hec/eiwizard/overview/com.noah.pangu.knowledge/autoLearning
云上分析集群性能https://csb.roma.huawei.com/csb/#/hec/eiwizard/training/com.noah.pangu.knowledge/trainingjobv2?thirdType=detail&id=4993f95a-e3d2-4787-aeb3-513e5d1a79fe&trackId=5915ef26-ab72-4e6b-a94f-651e18ea3328
```
```
/user/config/jobstart_hccl.json
```
```bash
# 装cann
cann_install_b080.sh
#!/bin/bash
LOCAL_DIR=/home/ma-user/modelarts/user-job-dir/MindSpeed-MM-disttrain-new/MindSpeed-MM/examples/internvl2/cannb080
# uninstall previous packages
echo "------------------------"step 2: uninstall ascend package"------------------------"
rm -rf /usr/local/Ascend/ascend-toolkit/
# rm -f /etc/Ascend/ascend_cann_install.info
echo "------------------------"uninstall torch"------------------------"
# pip uninstall torch -y
pip uninstall torch_npu -y
pip uninstall apex -y
# install torch
pip install ${LOCAL_DIR}/apex-*-cp39-cp39-linux_aarch64.whl
pip install ${LOCAL_DIR}/torch_npu-2.1.0.*-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
# install new CANN
chmod +x ${LOCAL_DIR}/*.run
echo "-------------------start install new cann--------------------------"
${LOCAL_DIR}/Ascend-cann-toolkit*.run --install --install-path=/usr/local/Ascend --quiet
${LOCAL_DIR}/Atlas-A3-cann-kernels*.run --install --install-path=/usr/local/Ascend --quiet
source /usr/local/Ascend/ascend-toolkit/set_env.sh
${LOCAL_DIR}/Ascend-cann-nnal*.run --install --install-path=/usr/local/Ascend --quiet
source /usr/local/Ascend/ascend-toolkit/set_env.sh
source /usr/local/Ascend/nnal/atb/set_env.sh
```
# 26.服务器免密登录
```bash
# 本地
ssh-keygen -t rsa -b 4096
cat ~/.ssh/id_rsa.pub
# 服务器
vim ~/.ssh/authorized_keys
```
# 27.工作链接
1.白盒评价
\\hghnas05av3-rd\CA_JS_STJS_Product_F\昇腾计算产品部\18 昇腾计算网络模型开发部\16 可信及软件工程能力\3 白盒评价\2024H2
2.述职
https://onebox.huawei.com/#eSpaceGroupFile/1/6238/12067988
# 28.Mindspeed新特性
1.自适应hccl buffer 节省显存
2.pp支持zero bubble 提升pp性能
# 29.IDEA
1.TP+CP组内对data额外进行DP
![image-20250211103358539](C:\Users\c30061641\AppData\Roaming\Typora\typora-user-images\image-20250211103358539.png)
# 30.项目总结
mindspeed-rl&mm项目反省
1.项目代码传给他人使用需给出checklist权重路径配置代码关键改动或者在代码入口加校验
# 31.需注意的坑
1.长跑或测性能之前一定要pkill python进程
2.mstt对比性能时如果某些算子没识别到试试用不同level的profiling
3.GPU确定性计算报错需设置环境变量CUBLAS_WORKSPACE_CONFIG=:4096:8
4.deepspeed指定端口
5.对比精度时确保conda环境配置文件代码文件device完全相同cpu精度比npu高
6.pycharm调试 shift + f8 返回上一步
7.新增的patch加到main中防止被误调用
8.接手问题问清是不是在处理范围之内
9.模型配置文件自己拷贝一份不要用别人目录的可能会被更改
10.fa下三角或全量attention
11.github官仓代码复现可以参考tests中的测试文件
12.往成都绿区scp传文件需要传到用户目录下非root权限目录
13.\\hghnas05av3-rd\CA_JS_STJS_Product_F\昇腾计算产品部\18 昇腾计算网络模型开发部\16 可信及软件工程能力\3 白盒评价\2024H2
14.如果显存接近打满训练可能会卡主
15.A3名称Atlas 9000 A3 SuperPoD/Atlas A3 训练系列产品
16.转测时保存测试能直接复现的代码减少测试成本
17.移除已知主机的密钥信息 ssh-keygen -R 90.90.94.158
# 32.常用链接
TMGhttps://3ms.huawei.com/km/blogs/details/15930852
codehubhttps://codehub-g.huawei.com/users/
代码随想录https://programmercarl.com/
Loading...
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/mr-lin314/temp.git
git@gitee.com:mr-lin314/temp.git
mr-lin314
temp
temp
master

搜索帮助