# Skywork-R1V **Repository Path**: mirrors_trending/Skywork-R1V ## Basic Information - **Project Name**: Skywork-R1V - **Description**: Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI (Kunlun Inc.), specializing in vision-language reasoning. - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-04-02 - **Last Updated**: 2026-02-07 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README
Skywork Logo

Skywork-R1V4

[[๐Ÿ“– Skywork-R1V4 Report](https://github.com/SkyworkAI/Skywork-R1V/blob/main/Skywork_R1V4.pdf)]
Welcome to the Skywork-R1V repository! Here, you'll find a series of state-of-the-art multimodal reasoning models with powerful agentic capabilities. From open-source versions with model weights and inference code to our latest closed-source offerings, the Skywork-R1V series delivers exceptional performance across vision understanding, code execution, and deep research tasks. ## ๐Ÿ”ฅ News **๐Ÿ’ฅ November 18, 2025**: We released **Skywork-R1V4-Lite**, a lightweight and ultra-fast closed-source multimodal reasoning model that achieves exceptional image understanding capabilities through code execution tools. R1V4-Lite features blazing-fast inference speed and can be integrated with search tools to enable deep research capabilities. Available now on [Skywork Platform](https://docs.skyworkmodel.ai/r1v4/api-reference/completions.html), and coming soon to OpenRouterโ€”stay tuned! **July 15, 2025**: We've released quantized versions of โ€‹Skywork-R1V3โ€‹ for efficient inference: * AWQ Quantization: [๐Ÿค— Skywork-R1V3-38B-AWQ](https://huggingface.co/Skywork/Skywork-R1V3-38B-AWQ) -- Supports single-GPU inference (VRAM โ‰ฅ 30GB). * โ€‹GGUF Quantization (4-bit & 8-bit)โ€‹: [๐Ÿค— Skywork-R1V3-38B-GGUF](https://huggingface.co/Skywork/Skywork-R1V3-38B-GGUF) -- Optimized for CPU-based inference. **July 9, 2025**: We released Skywork-R1V3-38B [[๐Ÿค— Skywork-R1V3-38B](https://huggingface.co/Skywork/Skywork-R1V3-38B)], the latest and most powerful open-source multimodal reasoning model in the Skywork series, pushing the boundaries of multimodal and cross-disciplinary intelligence. Mainly through RL algorithm in post-training, R1V3 significantly enhances multimodal reasoning ablity and achieves open-source state-of-the-art (SOTA) performance across multiple multimodal reasoning benchmarks, e.g. 76.0 on MMMU. **April 28, 2025**: We released awq quantized version of Skywork R1V2[[๐Ÿค— Skywork-R1V2-38B-AWQ](https://huggingface.co/Skywork/Skywork-R1V2-38B-AWQ)], supporting single-card (above 30GB) inference. **April 24, 2025**: We released **Skywork-R1V2**, an advanced open-source multimodal reasoning model that demonstrates strong performance across a range of multimodal reasoning benchmarks including MMMU, MMMU-Pro, MathVista, and OlympiadBench.[[๐Ÿค— Skywork-R1V2-38B](https://huggingface.co/Skywork/Skywork-R1V2-38B)][[๐Ÿ“–R1V2 Report](https://arxiv.org/abs/2504.16656)] **April 9, 2025**: Our technical report is currently available on arxiv: [[Skywork-R1V: Pioneering Multimodal Reasoning with CoT](https://arxiv.org/abs/2504.05599)]. **Mar 26, 2025**: We released awq quantized version of Skywork R1V[[๐Ÿค— Skywork-R1V-38B-AWQ](https://huggingface.co/Skywork/Skywork-R1V-38B-AWQ)], supporting single-card (above 30GB) inference. **Mar 18, 2025**: We are thrilled to introduce Skywork R1V, the first industry open-sourced multimodal reasoning model with advanced visual chain-of-thought capabilities, pushing the boundaries of AI-driven vision and logical inference! ๐Ÿš€ ## ๐Ÿ“Š Evaluation Skywork-R1V4-Lite demonstrates state-of-the-art performance on various multimodal tasks, particularly excelling in perception and deep research capabilities. **Comparison of Skywork-R1V4 with Leading Multimodal Models** | Benchmark | Split | Skywork-R1V4
30B(A3B) | Qwen3-VL
30B(A3B) | Qwen3-VL
235B(A22B) | Gemini 2.5 Flash | Gemini 2.5 Pro | |-----------|-------|:-------------------------:|:---------------------:|:-----------------------:|:----------------:|:--------------:| | **Perception** | | HIRbench-4K | FSP | **91.8** | 88.5 | 89.0 | 81.5 | 85.5 | | | FCP | 73.8 | 68.5 | **77.0** | 74.0 | 82.3 | | | Overall | **82.8** | 78.5 | 83.0 | 77.5 | 83.9 | | HIRbench-8K | FSP | **88.8** | 80.3 | 83.0 | 75.8 | 83.0 | | | FCP | 70.8 | 68.3 | **77.3** | 71.8 | 80.0 | | | Overall | **79.8** | 74.2 | 80.4 | 73.7 | 81.5 | | MME-Real | Perception | **73.4** | 70.4 | 74.3 | 62.3 | 73.1 | | | Reasoning | 56.4 | 47.7 | 52.5 | 51.0 | **58.2** | | | Overall | **71.4** | 67.7 | 71.6 | 60.9 | 71.3 | | MME-Real-CN | Perception | **76.3** | 72.6 | 76.0 | 65.8 | 74.5 | | | Reasoning | **59.4** | 45.0 | 53.8 | 51.3 | 58.3 | | | Overall | **70.8** | 63.7 | 68.8 | 61.2 | 69.3 | | MME-Real-Lite | Perception | **63.2** | 58.0 | 60.2 | 50.4 | 59.9 | | | Reasoning | **53.2** | 46.3 | 50.7 | 49.9 | 55.1 | | | Overall | **59.3** | 53.2 | 56.5 | 50.2 | 58.3 | | V* | Attribute | **90.4** | 81.7 | 79.1 | 77.3 | 86.8 | | | Spatial | **84.2** | 82.9 | 82.9 | 64.4 | 68.4 | | | Overall | **88.0** | 82.2 | 80.6 | 72.3 | 79.1 | | TreeBench | Overall | 48.4 | 42.7 | 49.6 | 45.9 | **54.6** | | Visual Probe | Hard | 42.4 | 30.1 | **42.4** | 28.3 | 33.9 | | | Medium | 42.9 | 35.8 | 39.1 | 31.3 | **35.4** | | | Easy | **66.7** | 65.2 | 65.9 | 45.3 | 49.6 | | **Deep Research** | | MMSearch | Overall | **66.1** | 18.7 | 48.0 | 64.9 | 71.9 | | FVQA | Overall | **67.2** | 53.3 | 54.4 | 60.7 | 72.0 | | BrowseComp-VL | Overall | 38.4 | 30.0 | 31.6 | 40.8 | **45.4** | **Key Highlights:** - ๐Ÿ† Skywork-R1V4 achieves **top performance** among 30B-class models across most perception benchmarks - ๐Ÿš€ **Outstanding FSP scores** on HIRbench-4K (91.8) and HIRbench-8K (88.8), demonstrating exceptional high-resolution image understanding - ๐Ÿ” **Strong deep research capabilities** with competitive performance on MMSearch (66.1) and FVQA (67.2) ## ๐Ÿš€ How to Use Skywork-R1V4-Lite Skywork-R1V4-Lite is available as an API service. You can access it through [Skywork Platform](https://platform.skyworkmodel.ai) or [OpenRouter](https://openrouter.ai) (coming soon). ### 1. Get API Access Visit [Skywork Platform](https://platform.skyworkmodel.ai) to obtain your API key. ### 2. Quick Start with Python ```python import requests import base64 def image_to_base64(image_path): with open(image_path, "rb") as f: image_data = f.read() return base64.b64encode(image_data).decode("utf-8") # API configuration base_url = "https://api.skyworkmodel.ai" api_key = "your_api_key_here" # Prepare the request image_base64 = image_to_base64("path/to/your/image.jpg") content = [ {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_base64}"}}, {"type": "text", "text": "What's in this image?"} ] # Call the API response = requests.post( f"{base_url}/v1/chat/completions", headers={ "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" }, json={ "model": "skywork/r1v4-lite", "messages": [{"role": "user", "content": content}], "stream": False, "enable_search": False # Set to True for deep research capabilities } ) print(response.json()["choices"][0]["message"]["content"]) ``` ### 3. Batch Testing with Our Tool Suite We provide a comprehensive testing toolkit in the `r1v4` folder for batch processing and result visualization. #### Clone and Setup ```shell git clone https://github.com/SkyworkAI/Skywork-R1V.git cd Skywork-R1V/r1v4 pip install -r requirements.txt ``` #### Prepare Test Cases Edit `test_cases.jsonl` with your test cases (one JSON per line): ```json {"image": "./demo_image/demo_1.png", "question": "What's in this image?"} {"image": "", "question": "This is a text-only question"} ``` #### Run Batch Tests ```shell # Non-streaming mode (default) python3 batch_nonstream.py # Streaming mode python3 batch_stream.py # With custom input/output files python3 batch_nonstream.py input.jsonl output.jsonl # Using planner model for task planning python3 batch_planner_nonstream.py ``` #### Visualize Results ```shell # Start the web viewer python3 visual.py # Then open browser and input result file path (e.g., result_nonstream.jsonl) ``` #### Parse Structured Responses ```python from parse_utils import parse_full_response # Parse the response to extract reasoning steps, tool calls, and observations parsed = parse_full_response(response_text) # Access structured data for round_data in parsed['rounds']: print(f"Round {round_data['round_num']}") print(f"Thinking: {round_data['think']}") print(f"Tool: {round_data['tool_call']['name']}") ``` ### 4. Features - **Code Execution**: R1V4-Lite can write and execute Python code for complex tasks - **Deep Research**: Enable `enable_search=True` to integrate web search capabilities - **Multi-turn Reasoning**: Automatic multi-step reasoning with tool usage - **Streaming Support**: Real-time response streaming for better user experience ## License This code repository is licensed under [the MIT License](https://github.com/SkyworkAI/Skywork-R1V/blob/main/LICENSE). โœ… Commercial use permitted โœ… Modification allowed โœ… Distribution allowed โŒ No liability Skywork-R1V4-Lite is based on [Qwen3-VL-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Instruct) as the base model, which is licensed under the Apache 2.0 License. ## Acknowledgments We would like to express our gratitude to the following open-source projects that have been instrumental in our work: - [MS-SWIFT](https://github.com/modelscope/swift): A powerful framework for model training and fine-tuning that greatly facilitated our model development process. - [VLMEvalKit](https://github.com/open-compass/VLMEvalKit): A comprehensive evaluation toolkit for vision-language models that enabled our extensive benchmarking. ## ๐Ÿ”ฎ Future Directions We are excited to share our vision for the future development of the Skywork-R1V series: - **Skywork-R1V4-Pro**: We are developing a more powerful model with enhanced capabilities across all benchmarks. Stay tuned for the upcoming release! - **Reinforcement Learning Research**: We are actively exploring the application of reinforcement learning techniques to advance multimodal reasoning and agentic capabilities, pushing the boundaries of what's possible in vision-language AI. ## โค๏ธMisc [![Star History Chart](https://api.star-history.com/svg?repos=SkyworkAI/Skywork-R1V&type=Date)](https://star-history.com/#SkyworkAI/Skywork-R1V&Date) ## Citation If you use Skywork-R1V in your research, please cite: ``` @misc{zhang2025skyworkr1v4agenticmultimodalintelligence, title={Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch}, author={Yifan Zhang and Liang Hu and Haofeng Sun and Peiyu Wang and Yichen Wei and Shukang Yin and Jiangbo Pei and Wei Shen and Peng Xia and Yi Peng and Tianyidan Xie and Eric Li and Yang Liu and Xuchen Song and Yahui Zhou}, year={2025}, eprint={2512.02395}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2512.02395}, } ``` ``` @misc{shen2025skyworkr1v3technicalreport, title={Skywork-R1V3 Technical Report}, author={Wei Shen and Jiangbo Pei and Yi Peng and Xuchen Song and Yang Liu and Jian Peng and Haofeng Sun and Yunzhuo Hao and Peiyu Wang and Jianhao Zhang and Yahui Zhou}, year={2025}, eprint={2507.06167}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2507.06167}, } ``` ``` @misc{wang2025skyworkr1v2multimodalhybrid, title={Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning}, author={Peiyu Wang and Yichen Wei and Yi Peng and Xiaokun Wang and Weijie Qiu and Wei Shen and Tianyidan Xie and Jiangbo Pei and Jianhao Zhang and Yunzhuo Hao and Xuchen Song and Yang Liu and Yahui Zhou}, year={2025}, eprint={2504.16656}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2504.16656}, } ``` ``` @misc{peng2025skyworkr1vpioneeringmultimodal, title={Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought}, author={Yi Peng and Peiyu Wang and Xiaokun Wang and Yichen Wei and Jiangbo Pei and Weijie Qiu and Ai Jian and Yunzhuo Hao and Jiachun Pan and Tianyidan Xie and Li Ge and Rongxian Zhuang and Xuchen Song and Yang Liu and Yahui Zhou}, year={2025}, eprint={2504.05599}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2504.05599}, } ```