# zsibot_vln
**Repository Path**: codepool_admin/zsibot_vln
## Basic Information
- **Project Name**: zsibot_vln
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: BSD-3-Clause
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2026-04-08
- **Last Updated**: 2026-04-08
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# ZsiBot VLN
ZsiBot VLN provides a framework for developing, testing, and deploying Vision-Language Navigation (VLN) algorithms, unifying the [MATRiX](https://github.com/zsibot/matrix) simulation platform and VLN algorithms into a single, extensible pipeline. It also includes a zeroshot VLN baseline model, which serves as both a reference implementation and a practical starting point for research or product development.
➡️ **[Full Development Guide](./docs/guide.md)**
## 🗂️ Project Structure
```text
zsibot_vln/
├── agents/
│ └── zeroshot/
│ └── unigoal/ # baseline VLN model
│ └── finetune/ # todo
├── assets/
├── bridge/
│ └── src/ # ROS2 ↔ ZMQ bidirectional bridging module
├── configs/ # configuration files
├── docs/
├── envs/ # MATRiX environment, adaptable to real-world robots
├── goals/ # example image goals
├── llms/ # prepared LLM/VLM HuggingFace models
├── outputs/
├── third_party/
├── main.py
├── requirements.txt
└── README.md
```
## 🛠️ Setup
### Prerequisites
* CUDA-capable GPU (Nvidia 4090 recommended when using local LLMs)
* A gentle recommendation: use VPN for Git and model weight downloads
### Clone Repository and Install zmq3
```bash
git clone git@github.com:zsibot/zsibot_vln.git
sudo apt install libzmq3-dev
```
### Install the Simulator (MATRiX)
Follow the [MATRiX](https://github.com/zsibot/matrix) installation instructions and then:
```bash
#Update matrix/config.json
cp zsibot_vln/configs/config.json matrix/config/
```
### Prepare LLM Access
Option 1: Huggingface
```bash
conda create -n smol python=3.9 -y
conda activate smol
conda install --freeze-installed -c nvidia cuda-toolkit=12.4 -y
conda install --freeze-installed pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia -y
conda install -c conda-forge libstdcxx-ng
pip install -U transformers datasets evaluate accelerate timm
pip install num2words fastapi uvicorn hf_xet
pip install --no-cache-dir --no-build-isolation --verbose flash-attn
# download weights:
python zsibot_vln/llms/huggingface_models/smolvlm2_256m_video_instruct/smolvlm2_256m_video_instruct.py
```
Option 2: Cloud LLM/VLM API (fetch API-key from e.g. [Aliyun Bailian](https://bailian.console.aliyun.com/))
```bash
export DASHSCOPE_API_KEY='YOUR_DASHSCOPE_API_KEY'
```
Option 3: Ollama
```bash
curl -fsSL https://ollama.com/install.sh | sh
ollama pull gemma3:4b # or any other models from ollama
```
### Install VLN Baseline Model
Install and run the baseline following the instructions:
➡️ **[VLN Baseline Installation Guide](./docs/vln_baseline.md)**
## 🚀 Run
```bash
#run local server on shell 0 (can be sikpped when using ollama or cloud-API)
conda activate smol
python zsibot_vln/huggingface_models/llms/smolvlm2_256m_video_instruct/server.py
```
```bash
#run MATRiX on shell 1 (no conda)
cd matrix && export ROS_DOMAIN_ID=0 && source /opt/ros/humble/setup.bash && ./run_sim.sh 1 6
# ==== NOTE ====
# Remember to stand the robot up using LB+Y (controller mode) or "u" (keyboard control mode).
```
```bash
#run env_bridge on shell 2 (no conda)
cd zsibot_vln/bridge && export ROS_DOMAIN_ID=0 && source /opt/ros/humble/setup.bash && colcon build && source install/setup.bash && ros2 run env_bridge env_bridge
```
```bash
#run mc_sdk_bridge on shell 3 (no conda)
cd zsibot_vln/bridge && export ROS_DOMAIN_ID=0 && source /opt/ros/humble/setup.bash && colcon build && source install/setup.bash && ros2 run mc_sdk_bridge mc_sdk_bridge
```
```bash
#run the baselie model on shell 4
cd zsibot_vln && conda activate zsibot_vln
#search using an open-vocabulary text goal
python main.py --goal_type text --text_goal "green plant"
#or
#search using an image goal
python main.py --goal_type ins_image --image_goal_path ./goals/bed.jpg
```
## 🤝 Acknowledgments
This project builds upon and acknowledges the following works:
[MATRiX](https://github.com/zsibot/matrix) – a robotic simulation framework featuring realistic scene rendering and physical dynamics.
[UniGoal](https://github.com/bagh2178/UniGoal) – a zero-shot VLN method leveraging LLMs.
## 📄 License
This project is licensed under the BSD 3-Clause License. See the LICENSE file for details.