diff --git a/docs/compute/clusters_gpu/iluvatar_gpu.md b/docs/compute/clusters_gpu/iluvatar_gpu.md index 0339df45a704e432bbb07073923aff7a185e073d..0a5f7ecabee4b96c46ef6541c748a7bfdededf3a 100644 --- a/docs/compute/clusters_gpu/iluvatar_gpu.md +++ b/docs/compute/clusters_gpu/iluvatar_gpu.md @@ -61,16 +61,16 @@ sidebar_position: 3 ## 4. AI 开发库:适配与版本管理 -为了发挥智铠100 的最佳性能,建议使用天数智芯官方适配的 AI 框架版本。 +为了发挥智铠100 的最佳性能,建议使用天数智芯官方适配的 corex 框架版本。 ### 版本识别与安装 -官方适配的 Python 库通常会带有特定的标识(如 `ix` 或版本后缀)。 +官方适配的 Python 库通常会带有特定的标识(如 `corex` 或版本后缀)。 * **PyTorch**:需安装天数智芯适配版的 PyTorch。 * **安装建议**:强烈建议直接使用官方提供的 Docker 镜像进行开发,镜像中已预置了正确的驱动、CUDA 兼容库及 AI 框架。 ```bash # 示例:检查环境中的适配包(具体名称以官方镜像为准) -pip list | grep -E "torch|iluvatar|ix" +pip list | grep corex ``` --- diff --git a/docs/compute/deploy/asr_model.md b/docs/compute/deploy/asr_model.md index 3b7ad76436c7945ec513fe2c3b5f99b7d4dcd8cf..4abcb2e987f58df38003592b0bf69ced93061833 100644 --- a/docs/compute/deploy/asr_model.md +++ b/docs/compute/deploy/asr_model.md @@ -141,6 +141,60 @@ model = AutoModelForSpeechSeq2Seq.from_pretrained( ) +device = "cuda" +model = model.to(device) + +# 加载音频 (强制重采样为 16kHz) +audio, sr = librosa.load(audio_path, sr=16000) + +inputs = processor(audio, sampling_rate=sr, return_tensors="pt") + +input_features = inputs.input_features.to(device=device, dtype=torch.bfloat16) + +with torch.inference_mode(): + generated_ids = model.generate(input_features, max_new_tokens=256) + text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] + +print("识别结果:", text) +``` + +--- +### 三、天数智芯 (iluvatar) 部署指南 {#iluvatar-deploy} + +#### 1. 通用环境准备 + +- **算力型号**: 智铠 100 (32GB) +- **镜像选择**: `vLLM / 0.11.2 / Python 3.10 / IX-ML 4.4.0` + +#### 2. 模型部署实战 + +##### 2.1 whisper-large-v3-turbo +本示例演示如何在 **智铠 100** 环境完成语音识别。 + +**运行推理代码**: 新建 Notebook 单元格运行。 + +```python +# 安装依赖 +!pip install transformers librosa + +import torch +import librosa +from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor + +# 模型与音频路径 +model_name = "/mnt/moark-models/whisper-large-v3-turbo" +audio_path = "/mnt/moark-models/github/index-tts/emo_sad.wav" + +processor = AutoProcessor.from_pretrained(model_name) + +model = AutoModelForSpeechSeq2Seq.from_pretrained( + model_name, + torch_dtype=torch.bfloat16, # 这里指定了模型权重为 bfloat16 + low_cpu_mem_usage=True, + use_safetensors=True +) + + device = "cuda" model = model.to(device) diff --git a/docs/compute/deploy/embedding_model.md b/docs/compute/deploy/embedding_model.md index 6ec7a306ee0ce4f96fdd7274436bd0bf9fa6dcb0..c61f9f9185cced34ba2b59292f38b12023a44310 100644 --- a/docs/compute/deploy/embedding_model.md +++ b/docs/compute/deploy/embedding_model.md @@ -173,6 +173,61 @@ print(f"编码完成,生成 {len(texts)} 个向量") --- + +### 三、天数智芯 (iluvatar) 部署指南 {#iluvatar-deploy} + +#### 3.1 环境准备 + +- **算力型号**:智铠 100 (32GB) +- **镜像选择**:`vLLM / 0.11.2 / Python 3.10 / IX-ML 4.4.0` + +![镜像选择](/img/computility/iluvatar_image.jpg) + +#### 3.2 Embedding 推理示例 + +```python +!pip install transformers +import torch +import torch.nn.functional as F +from torch import Tensor +from transformers import AutoTokenizer, AutoModel + +#“句向量”绑定在最后一个有效 token上 +def last_token_pool(last_hidden_states: Tensor, + attention_mask: Tensor) -> Tensor: + left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0]) + if left_padding: + return last_hidden_states[:, -1] + else: + sequence_lengths = attention_mask.sum(dim=1) - 1 + batch_size = last_hidden_states.shape[0] + return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths] + + +model_name = "/mnt/moark-models/Qwen3-Embedding-8B" +tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side='left') +model = AutoModel.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="cuda") + +texts = ["天空是蓝色的,天气晴朗。", "今天阳光灿烂,非常适合出游。"] + +batch_dict = tokenizer( + texts, + padding=True, + truncation=True, + max_length=8192, + return_tensors="pt" +).to("cuda") + +batch_dict.to(model.device) +outputs = model(**batch_dict) + +embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask']) +print(embeddings) + +print(f"编码完成,生成 {len(texts)} 个向量") +``` + +--- ## 三、vLLM 高性能部署 vLLM 提供了对 Embedding 模型的优化支持,可直接生成 OpenAI 兼容的 API 接口,无需额外开发。 @@ -190,6 +245,8 @@ vLLM 会自动处理批处理、GPU 优化和 API 兼容性,开箱即用。 --- + + ## 五、本地访问与服务验证 请参考【[SSH 隧道配置指南](./ssh_tunnel.md)】建立安全连接。 diff --git a/docs/compute/deploy/img_model.md b/docs/compute/deploy/img_model.md index 4de7cad7f4b8ce2a4876405a3ed69641034c12da..1e3ce89ee26b596d31984678324dd92f47c4c5e9 100644 --- a/docs/compute/deploy/img_model.md +++ b/docs/compute/deploy/img_model.md @@ -311,6 +311,83 @@ with torch.inference_mode(): --- +### 四、天数智芯 (iluvatar) 部署指南 {#iluvatar-deploy} + +本章节适用于 **智铠 100** 等沐曦系列算力卡。 + +#### 1. 通用环境准备 + +所有沐曦模型部署均需基于以下环境配置进行: + +- **算力型号**:智铠 100 (32GB) +- **算力数量**:>=2 +- **镜像选择**: `vLLM / 0.11.2 / Python 3.10 / IX-ML 4.4.0` + + +![镜像选择](/img/computility/iluvatar_image.jpg) + +**基础操作步骤:** +1. **进入工作台**:启动实例后,点击 **JupyterLab** 进入容器环境。 + + ![进入容器](/img/computility/iluvatar_Lab.jpg) +2. **新建脚本**:点击 "Notebook" 图标,新建一个 `.ipynb` 文件。 + + ![新建Notebook](/img/computility/iluvatar_notebook.jpg) + +--- + +#### 2. 模型部署实战 + +请根据您需要的模型选择对应的实战案例代码。 + +#### 2.1 Z-Image-Turbo +本示例演示如何在 **智铠 100** 算力环境下,**加载平台内置模型**并生成高质量图像。 + +**运行推理代码:** +新建 Notebook 单元格运行 + +```python +import torch +from diffusers import ZImagePipeline + +model_name = "/mnt/moark-models/Z-Image-Turbo" + + +pipe = ZImagePipeline.from_pretrained( + model_name, + torch_dtype=torch.bfloat16, +) +pipe.to("cuda") + +prompt = '''A 9:16 vertical, realistic cyber-aesthetic future social profile card photo: A hand gently holds a vertically semi-transparent acrylic card, occupying the visual center of the picture. The card presents the personal homepage interface of the future social platform "MoArk", with a minimalist design and no redundant decorations. The edges of the card are rounded and soft, with a gradient neon halo of pink-purple and ice blue. The background is deep and blurry, further highlighting the crystal-clear texture of the card itself. The interface information seems to be micro-engraved, three-dimensional and clear, displayed in sequence: +Avatar (suspended in the center, with a holographic surround effect) +Username and the dynamic "Verified Member" badge at the top +Name: MoArk (MoArk) Computing Power Experience Officer +Followers: 2,777 +Following: 12,000 +Join Date: 11/7/2025 +Follow button (presenting a soft light touchable effect) +A soft light and shadow is reflected at the point where the finger touches, creating an atmosphere that is both cinematic and immersive, like a scene from a near-future live-action game.''' + +image = pipe( + prompt=prompt, + height=1024, + width=1024, + num_inference_steps=9, + guidance_scale=0.0, + generator=torch.Generator("cuda").manual_seed(42), +).images[0] + +image.save("image.png") +print("图片生成完成,已保存为 image.png") +``` + +**查看结果与排查:** +代码运行结束后,您可以在**左侧文件栏**找到 `image.png`,双击即可查看生成的图片。 +![图像结果](/img/computility/pytorch_img.png) + +--- + ## 四、本地访问与服务验证 请参考【[SSH 隧道配置指南](./ssh_tunnel.md)】建立安全连接。 diff --git a/docs/compute/deploy/ocr_model.md b/docs/compute/deploy/ocr_model.md index 6e3dbdd70ef552098c6bd9553764daeef824ee39..4cc203bbcb283617b65517866e9f4015a6c8ef2f 100644 --- a/docs/compute/deploy/ocr_model.md +++ b/docs/compute/deploy/ocr_model.md @@ -204,6 +204,86 @@ print(result) --- + +### 三、天数智芯 (iluvatar) 部署指南 {#iluvatar-deploy} + +#### 1. 通用环境准备 + +- **算力型号**: 智铠 100 (32GB) +- **镜像选择**: `vLLM / 0.11.2 / Python 3.10 / IX-ML 4.4.0` + +#### 2. 模型部署实战 + +##### 2.1 PaddleOCR-VL-1.5 +本示例演示如何在 **智铠 100** 环境完成 OCR 识别。 + +**运行推理代码**: 新建 Notebook 单元格运行。 + +```python +# 安装依赖 +!pip install "transformers>=5.0.0" + +import torch +from PIL import Image +from transformers import AutoProcessor, AutoModelForImageTextToText + +model_path = "/mnt/moark-models/PaddleOCR-VL-1.5" +image_path = "/mnt/moark-models/L1_exam/ocr_demo.png" +task = "ocr" # Options: 'ocr' | 'table' | 'chart' | 'formula' | 'spotting' | 'seal' + +image = Image.open(image_path).convert("RGB") +orig_w, orig_h = image.size +spotting_upscale_threshold = 1500 + +# 通用图片预处理方式,目的是放大图片,针对spotting任务,需要确保图片内的文字清晰可见,以提高ocr的识别度, +if task == "spotting" and orig_w < spotting_upscale_threshold and orig_h < spotting_upscale_threshold: + process_w, process_h = orig_w * 2, orig_h * 2 + try: + resample_filter = Image.Resampling.LANCZOS + except AttributeError: + resample_filter = Image.LANCZOS + image = image.resize((process_w, process_h), resample_filter) + +max_pixels = 2048 * 28 * 28 if task == "spotting" else 1280 * 28 * 28 + +DEVICE = "cuda" +PROMPTS = { + "ocr": "OCR:", + "table": "Table Recognition:", + "formula": "Formula Recognition:", + "chart": "Chart Recognition:", + "spotting": "Spotting:", + "seal": "Seal Recognition:", +} + +model = AutoModelForImageTextToText.from_pretrained(model_path, torch_dtype=torch.bfloat16).to(DEVICE).eval() +processor = AutoProcessor.from_pretrained(model_path) + +messages = [ + { + "role": "user", + "content": [ + {"type": "image", "image": image}, + {"type": "text", "text": PROMPTS[task]}, + ] + } +] +inputs = processor.apply_chat_template( + messages, + add_generation_prompt=True, + tokenize=True, + return_dict=True, + return_tensors="pt", + images_kwargs={"size": {"shortest_edge": processor.image_processor.min_pixels, "longest_edge": max_pixels}}, +).to(model.device) + +outputs = model.generate(**inputs, max_new_tokens=512) +result = processor.decode(outputs[0][inputs["input_ids"].shape[-1]:-1]) +print(result) +``` + +--- + ## 四、常见问题 - **图片识别不完整**: 尝试提高图片分辨率或使用更清晰的扫描件。 diff --git a/docs/compute/deploy/speech_model.md b/docs/compute/deploy/speech_model.md index b0f25154d6ae6662030f49dd1da273bc2dec4830..05eb9269ff110ccca2bcc68caca91cfc3f09273f 100644 --- a/docs/compute/deploy/speech_model.md +++ b/docs/compute/deploy/speech_model.md @@ -168,7 +168,7 @@ tts.infer(spk_audio_prompt='/mnt/moark-models/github/index-tts/emo_sad.wav', tex 所有燧原模型部署均需基于以下环境配置进行: - **算力型号**:Enflame S60 (48GB) -- **镜像选择**:`vLLM / 0.11.0 / Python 3.12 / ef 1.7.0.14` +- **镜像选择**:`vLLM / 0.8.0 / Python 3.10 / ef 1.5.0.604` ![镜像选择](/img/computility/Enflame_flux-krea-dev_create.jpg) @@ -233,6 +233,88 @@ tts.infer(spk_audio_prompt='/mnt/moark-models/github/index-tts/emo_sad.wav', tex --- +### 三、天数智芯 (iluvatar) 部署指南 {#iluvatar-deploy} + +#### 1. 通用环境准备 + +- **算力型号**:智铠 100 (32GB) +- **镜像选择**:`vLLM / 0.11.2 / Python 3.10 / IX-ML 4.4.0` + +![镜像选择](/img/computility/iluvatar_image.jpg) + +**基础操作步骤:** +1. **进入工作台**:启动实例后,点击 **JupyterLab** 进入容器。 + + ![进入容器](/img/computility/iluvatar_Lab.jpg) +2. **新建脚本**:点击图标新建一个 `.ipynb` 文件。 + + ![新建Notebook](/img/computility/iluvatar_notebook.jpg) + +--- + +#### 2. 模型部署实战 + +请根据您需要的模型选择对应的实战案例代码。 + +#### 2.1 IndexTTS-2 +本示例演示如何在 **智铠 100** 算力环境下,**加载平台内置模型**合成一段音频 + +##### 运行推理代码 +**第一步安装相关的依赖** +在notebook上新建单元格,运行以下代码 +```python +#安装必要的依赖库 +!pip install bentoml \ + accelerate==1.8.1 \ + transformers==4.52.1 \ + cn2an==0.5.22 \ + ffmpeg-python==0.2.0 \ + Cython==3.0.7 \ + g2p-en==2.1.0 \ + jieba==0.42.1 \ + keras==2.9.0 \ + numba \ + numpy==1.26.2 \ + pandas==2.1.3 \ + matplotlib==3.8.2 \ + opencv-python==4.9.0.80 \ + vocos==0.1.0 \ + omegaconf \ + sentencepiece \ + munch==4.0.0 \ + librosa==0.10.2.post1 \ + descript-audiotools==0.7.2 \ + textstat==0.7.10 \ + tokenizers==0.21.0 \ + json5==0.10.0 \ + pydub \ + tqdm +``` +![图像结果](/img/computility/speech_ix_1.png) + +**第二步加载模型** +等待上述依赖安装完成后,在下方新起单元格,运行以下代码 +```python +!cp -rf /mnt/moark-models/github/index-tts/indextts ./ + +import os +os.environ["USE_TF"] = "0" +os.environ["USE_TORCH"] = "1" + +from indextts.infer_v2 import IndexTTS2 +#请不要修改cfg_path和model_dir的路径 +tts = IndexTTS2(cfg_path="/mnt/moark-models/github/index-tts/checkpoints/config.yaml", model_dir="/mnt/moark-models/IndexTTS-2", use_fp16=False, use_cuda_kernel=False, use_deepspeed=True) + +#如果需要生成不同的音色和音频内容,请替换spk_audio_prompt所需的参考音频以及text的输出内容。 +text = "快躲起来!是他要来了!他要来抓我们了!" +tts.infer(spk_audio_prompt='/mnt/moark-models/github/index-tts/emo_sad.wav', text=text, output_path="gen.wav", emo_alpha=0.6, use_emo_text=True, use_random=False, verbose=True) +``` +**查看结果与排查:** +代码运行结束后,您可以在**左侧文件栏**找到 `gen.wav`,由于jupyter不支持播放音频文件,可下载至本地主机进行播放。 +![图像结果](/img/computility/speech_2.png) + +--- + ## 四、本地访问与服务验证 请参考【[SSH 隧道配置指南](./ssh_tunnel.md)】建立安全连接。 diff --git a/docs/compute/deploy/text_model.md b/docs/compute/deploy/text_model.md index 81471ddf739f92ba079269ac1fafca797cf3a014..0ce29fa0d43207a5f13e0609d250eb1c12fb5a90 100644 --- a/docs/compute/deploy/text_model.md +++ b/docs/compute/deploy/text_model.md @@ -53,61 +53,62 @@ Qwen3 系列提供从 0.6B 到 70B 不等的参数规模,原生支持 32K~128K 2. 新建一个 `.ipynb` (Notebook) 文件。 ![新建Notebook](/img/computility/pytorch0.png) 3. 安装依赖:在第一个单元格运行以下命令。安装完成后,请务必重启内核 (Restart Kernel) 以使环境生效。 - ```bash - pip install accelerate - ``` - ![重启内核](/img/computility/pytorch2.png) -4. 运行推理代码:沐曦环境支持直接加载挂载盘中的内置模型,无需下载。 - - ```python - from transformers import AutoModelForCausalLM, AutoTokenizer - import torch - - # 指定内置模型库路径 (无需下载) - model_name = "/mnt/moark-models/Qwen3-0.6B" - - tokenizer = AutoTokenizer.from_pretrained(model_name) - model = AutoModelForCausalLM.from_pretrained( - model_name, - torch_dtype="auto", - device_map="auto" - ) - - prompt = "Give me a short introduction to large language model." - messages = [{"role": "user", "content": prompt}] - - text = tokenizer.apply_chat_template( - messages, - tokenize=False, - add_generation_prompt=True, - enable_thinking=True - ) - - model_inputs = tokenizer([text], return_tensors="pt").to(model.device) - generated_ids = model.generate( - **model_inputs, - max_new_tokens=32768 - ) - - output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() - try: - index = len(output_ids) - output_ids[::-1].index(151668) # - except ValueError: - index = 0 +```bash +pip install accelerate +``` +![重启内核](/img/computility/pytorch2.png) +4. 运行推理代码:沐曦环境支持直接加载挂载盘中的内置模型,无需下载。 - thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n") - content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n") +```python +from transformers import AutoModelForCausalLM, AutoTokenizer +import torch + +# 指定内置模型库路径 (无需下载) +model_name = "/mnt/moark-models/Qwen3-0.6B" + +tokenizer = AutoTokenizer.from_pretrained(model_name) +model = AutoModelForCausalLM.from_pretrained( + model_name, + torch_dtype="auto", + device_map="auto" +) + +prompt = "Give me a short introduction to large language model." +messages = [{"role": "user", "content": prompt}] + +text = tokenizer.apply_chat_template( + messages, + tokenize=False, + add_generation_prompt=True, + enable_thinking=True +) + +model_inputs = tokenizer([text], return_tensors="pt").to(model.device) + +generated_ids = model.generate( + **model_inputs, + max_new_tokens=32768 +) + +output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() +try: + index = len(output_ids) - output_ids[::-1].index(151668) # +except ValueError: + index = 0 + +thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n") +content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n") + +print("=== Thinking ===") +print(thinking_content) +print("\n=== Response ===") +print(content) +``` - print("=== Thinking ===") - print(thinking_content) - print("\n=== Response ===") - print(content) - ``` +执行结果示例: - 执行结果示例: - - ![执行结果](/img/computility/pytorch3.png) +![执行结果](/img/computility/pytorch3.png) #### 2. vLLM 高性能推理 @@ -122,14 +123,19 @@ Qwen3 系列提供从 0.6B 到 70B 不等的参数规模,原生支持 32K~128K 1. 通过 JupyterLab 打开终端 (Terminal)。 2. 执行以下命令启动兼容 OpenAI 接口的服务: - ```bash - # 使用内置模型路径,指定端口 8188 - vllm serve /mnt/moark-models/Qwen3-0.6B --port 8188 - ``` - ![vLLM启动命令](/img/computility/vllm2.png) +```bash +# 使用内置模型路径,指定端口 8188 +vllm serve /mnt/moark-models/Qwen3-0.6B --port 8188 +``` + +:::info +若需要指定model的id,可通过--served-model-name Qwen3-0.6B重新调整 +::: + +![vLLM启动命令](/img/computility/vllm2.png) 3. 当看到 `Uvicorn running on http://0.0.0.0:8188` 字样时,服务启动成功。 - ![vLLM运行成功](/img/computility/vllm3.png) +![vLLM运行成功](/img/computility/vllm3.png) #### 3. SGLang 结构化推理 @@ -142,12 +148,12 @@ Qwen3 系列提供从 0.6B 到 70B 不等的参数规模,原生支持 32K~128K 1. 通过 JupyterLab 打开终端 (Terminal)。 2. 执行以下命令启动服务: - ```bash - # SGLang 启动命令 - python -m sglang.launch_server \ - --model-path /mnt/moark-models/Qwen3-0.6B \ - --port 8188 - ``` +```bash +# SGLang 启动命令 +python -m sglang.launch_server \ + --model-path /mnt/moark-models/Qwen3-0.6B \ + --port 8188 +``` ![SGLang启动命令](/img/computility/sglang2.png) 3. 服务启动成功界面: @@ -175,37 +181,37 @@ Qwen3 系列提供从 0.6B 到 70B 不等的参数规模,原生支持 32K~128K ![进入容器](/img/computility/Enflame_flux-krea-dev_notebook.jpg) 3. 代码迁移关键点:必须在代码最前方引入 `torch_gcu` 库。 - ```python - import torch - import torch_gcu - from torch_gcu import transfer_to_gcu - from transformers import AutoModelForCausalLM, AutoTokenizer +```python +import torch +import torch_gcu +from torch_gcu import transfer_to_gcu +from transformers import AutoModelForCausalLM, AutoTokenizer - model_name = "/mnt/moark-models/Qwen3-0.6B" +model_name = "/mnt/moark-models/Qwen3-0.6B" - tokenizer = AutoTokenizer.from_pretrained(model_name) - model = AutoModelForCausalLM.from_pretrained( - model_name, - torch_dtype="auto", - device_map="auto" - ) +tokenizer = AutoTokenizer.from_pretrained(model_name) +model = AutoModelForCausalLM.from_pretrained( + model_name, + torch_dtype="auto", + device_map="auto" +) - prompt = "Explain quantum computing in simple terms." - messages = [{"role": "user", "content": prompt}] +prompt = "Explain quantum computing in simple terms." +messages = [{"role": "user", "content": prompt}] - text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) - model_inputs = tokenizer([text], return_tensors="pt").to(model.device) +text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) +model_inputs = tokenizer([text], return_tensors="pt").to(model.device) - generated_ids = model.generate(**model_inputs, max_new_tokens=512) - output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() - response = tokenizer.decode(output_ids, skip_special_tokens=True) - print(response) - ``` +generated_ids = model.generate(**model_inputs, max_new_tokens=512) +output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() +response = tokenizer.decode(output_ids, skip_special_tokens=True) +print(response) +``` #### 2. vLLM 高性能推理 #### 2.1 环境准备 -- 镜像选择:`Ubuntu / 22.04 / Python 3.13 / ef 1.5.0.604` +- 镜像选择:`vLLM / 0.11.0 / Python 3.12 / ef 1.7.0.14` (注:该官方镜像已内置 vLLM 适配环境) ![镜像选择](/img/computility/Enflame_flux-krea-dev_create.jpg) @@ -217,6 +223,10 @@ Qwen3 系列提供从 0.6B 到 70B 不等的参数规模,原生支持 32K~128K vllm serve /mnt/moark-models/Qwen3-0.6B --port 8188 ``` +:::info +若需要指定model的id,可通过--served-model-name Qwen3-0.6B重新调整 +::: + ![vLLM启动命令](/img/computility/Enflame_Qwen3-0.6B_vllm_00.jpg) ![进入容器](/img/computility/Enflame_Qwen3-0.6B_vllm_01.jpg) @@ -229,6 +239,67 @@ vllm serve /mnt/moark-models/Qwen3-0.6B --port 8188 --- +### 三、天数智芯 (iluvatar) 部署指南 {#iluvatar-deploy} + +#### 1 Hugging Face Transformers + +#### 1.1 环境准备 +- **算力型号**:智铠 100 (32GB) +- **镜像选择**:`vLLM / 0.11.2 / Python 3.10 / IX-ML 4.4.0` + +![镜像选择](/img/computility/iluvatar_image.jpg) + +#### 1.2 部署步骤 +1. 启动实例后,进入 JupyterLab。 + ![进入容器](/img/computility/iluvatar_Lab.jpg) +2. 新建一个 `.ipynb` (Notebook) 文件。 + ![进入容器](/img/computility/iluvatar_notebook.jpg) + +```python +import torch +from transformers import AutoModelForCausalLM, AutoTokenizer + +model_name = "/mnt/moark-models/Qwen3-0.6B" + +tokenizer = AutoTokenizer.from_pretrained(model_name) +model = AutoModelForCausalLM.from_pretrained( + model_name, + torch_dtype="auto", + device_map="auto" +) + +prompt = "Explain quantum computing in simple terms." +messages = [{"role": "user", "content": prompt}] + +text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) +model_inputs = tokenizer([text], return_tensors="pt").to(model.device) + +generated_ids = model.generate(**model_inputs, max_new_tokens=512) +output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() +response = tokenizer.decode(output_ids, skip_special_tokens=True) +print(response) +``` + +#### 2. vLLM 高性能推理 + +#### 2.1 环境准备 +- **算力型号**:智铠 100 (32GB) +- **镜像选择**:`vLLM / 0.11.2 / Python 3.10 / IX-ML 4.4.0` + +![镜像选择](/img/computility/iluvatar_image.jpg) + +#### 2.2 启动服务 +在终端中执行以下命令: + +```bash +vllm serve /mnt/moark-models/Qwen3-0.6B --port 8188 +``` +:::info +若需要指定model的id,可通过--served-model-name Qwen3-0.6B重新调整 +::: + +--- + ## 四、本地访问与服务验证 请参考【[SSH 隧道配置指南](./ssh_tunnel.md)】建立安全连接。 diff --git a/static/img/computility/iluvatar_Lab.jpg b/static/img/computility/iluvatar_Lab.jpg new file mode 100644 index 0000000000000000000000000000000000000000..cbef4ae85d7b244d60c450afc42d470f529511d3 Binary files /dev/null and b/static/img/computility/iluvatar_Lab.jpg differ diff --git a/static/img/computility/iluvatar_image.jpg b/static/img/computility/iluvatar_image.jpg new file mode 100644 index 0000000000000000000000000000000000000000..51e386fececcf834cde6c7497aee5a3f219b7398 Binary files /dev/null and b/static/img/computility/iluvatar_image.jpg differ diff --git a/static/img/computility/iluvatar_notebook.jpg b/static/img/computility/iluvatar_notebook.jpg new file mode 100644 index 0000000000000000000000000000000000000000..948e0c658f46c2cf3c1ae147cba9b659cda98892 Binary files /dev/null and b/static/img/computility/iluvatar_notebook.jpg differ diff --git a/static/img/computility/speech_ix_1.png b/static/img/computility/speech_ix_1.png new file mode 100644 index 0000000000000000000000000000000000000000..ab4d66d275b5cb03ee025eefd89581549b08e161 Binary files /dev/null and b/static/img/computility/speech_ix_1.png differ