# asrtts-example

**Repository Path**: bytesifter/asrtts-example

## Basic Information

- **Project Name**: asrtts-example
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2026-04-29
- **Last Updated**: 2026-05-03

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# ASR/TTS 示例项目

音频转录 (ASR) 和模型下载示例项目。使用 `uv` 作为包管理器。

## 安装

根据你的显卡选择安装方案：

```
你有 NVIDIA GPU 吗？（GTX/RTX 系列）
├── 是 → 方案 B：CUDA 加速
└── 否 → 你有 Intel Arc Graphics 吗？
         ├── 是 → 方案 C：DirectML 加速
         └── 否 → 方案 A：CPU（默认）
```

### 方案 A：CPU（默认）

无需 GPU，直接安装：

```bash
uv sync
```

### 方案 B：CUDA 加速（NVIDIA GPU）

```bash
# 1. 先安装 CUDA 版 PyTorch
uv pip install "torch>=2.0.0" --index-url https://download.pytorch.org/whl/cu124

# 2. 再同步其他依赖
uv sync
```

### 方案 C：DirectML 加速（Intel Arc Graphics，仅 Windows）

```bash
uv sync --extra dml
```

## 使用

### 转录音频

```bash
# 自动检测设备（CUDA → DirectML → CPU）
uv run transcribe-audio transcribe audio.mp3 --model-dir ./models

# 强制使用 NVIDIA GPU
uv run transcribe-audio transcribe audio.mp3 --model-dir ./models --device cuda

# 强制使用 Intel Arc Graphics
uv run transcribe-audio transcribe audio.mp3 --model-dir ./models --device dml

# 使用 CPU
uv run transcribe-audio transcribe audio.mp3 --model-dir ./models --device cpu
```

### 下载模型

```bash
uv run transcribe-audio download --model iic/whisper-large-v3 --cache-dir ./models
```

## 输出格式

```
[00:00:00.000 --> 00:00:05.120] 转录文本内容
```

## 注意

- CUDA 和 DirectML 不可同时启用，请按显卡类型选择一种方案
- `--device dml` 需要安装 torch-directml，否则会报错
- `--device cuda` 需要 CUDA 版 PyTorch，否则回退 CPU 并提示
- GTX 1050 Ti（4GB 显存）运行 whisper-large-v3 建议使用 medium 或 small 模型

## 依赖

- modelscope: 模型下载
- transformers: 模型加载
- torch: 深度学习框架
- librosa: 音频处理
- torch-directml: （可选）Intel Arc Graphics 加速