# internlm-xcomposer **Repository Path**: rtkris/internlm-xcomposer ## Basic Information - **Project Name**: internlm-xcomposer - **Description**: 书生·浦语灵笔(InternLM-XComposer,简称“浦语灵笔”)是基于书生·浦语大语言模型研发的视觉-语言大模型,提供出色的图文理解和创作能力,具有多项优势: 图文交错创作 - **Primary Language**: Python - **License**: Not specified - **Default Branch**: main - **Homepage**: https://www.oschina.net/p/internlm-xcomposer - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 2 - **Created**: 2024-04-01 - **Last Updated**: 2024-11-24 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

浦语·灵笔2

InternLM-XComposer2 🤗  | InternLM-XComposer2-VL 🤗   | 技术报告 📄 [English](./README.md) | [简体中文](./README_CN.md)

感谢社区提供的 InternLM-XComposer2 Hugging Face 在线试用 | OpenXLab 在线试用

👋 加入我们的 Discord微信社区


## 本仓库包括的多模态项目 > [**InternLM-XComposer2**](https://github.com/InternLM/InternLM-XComposer): **Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Models** > [**InternLM-XComposer**](https://github.com/InternLM/InternLM-XComposer/tree/main/InternLM-XComposer-1.0): **A Vision-Language Large Model for Advanced Text-image Comprehension and Composition** > [**ShareGPT4V**](https://github.com/InternLM/InternLM-XComposer/tree/main/projects/ShareGPT4V): **Improving Large Multi-modal Models with Better Captions** > [**DualFocus**](https://github.com/InternLM/InternLM-XComposer/tree/main/projects/DualFocus): **Integrating Macro and Micro Perspectives in Multi-modal Large Language Models**
**浦语·灵笔2**是基于[书生·浦语2](https://github.com/InternLM/InternLM/tree/main)大语言模型研发的突破性的图文多模态大模型,具有非凡的图文写作和图像理解能力,在多种应用场景表现出色: - **自由指令输入的图文写作:** 浦语·灵笔2可以理解**自由形式的图文指令输入,包括大纲、文章细节要求、参考图片等**,为用户打造图文并貌的专属文章。生成的文章文采斐然,图文相得益彰,提供沉浸式的阅读体验。 - **准确的图文问题解答:** 浦语·灵笔2具有海量图文知识,可以准确的回复各种图文问答难题,在识别、感知、细节描述、视觉推理等能力上表现惊人。 - **杰出性能:** 浦语·灵笔2基于书生·浦语2-7B模型,我们在13项多模态评测中大幅领先同量级多模态模型,在其中6项评测中超过 GPT-4V 和 Gemini Pro。

我们开源的 浦语·灵笔2 包括两个版本: - **InternLM-XComposer2-VL-7B** 🤗 (浦语·灵笔2-视觉问答-7B): 基于书生·浦语2-7B大语言模型训练,面向多模态评测和视觉问答。浦语·灵笔2-视觉问答-7B是目前最强的基于7B量级语言模型基座的图文多模态大模型,领跑多达13个多模态大模型榜单。 - **InternLM-XComposer2-7B** 🤗 : 进一步微调,支持自由指令输入图文写作的图文多模态大模型。 更多方法细节请参考[技术报告](https://arxiv.org/abs/2401.16420).
## Demo Video [https://github.com/InternLM/InternLM-XComposer/assets/22662425/0a2b475b-3f74-4f41-a5df-796680fa56cd](https://github.com/InternLM/InternLM-XComposer/assets/30363822/63756590-7366-4c5d-807f-66c4e69ea827) ## 更新消息 - `2024.02.22` 🎉🎉🎉 我们开源了[DualFocus](https://github.com/InternLM/InternLM-XComposer/tree/main/projects/DualFocus), 一个整合宏观和微观视角于多语言大模型中以提升视觉-语言任务性能的框架。 * ```2024.02.06``` 🎉🎉🎉 [InternLM-XComposer2-7B-4bit](https://huggingface.co/internlm/internlm-xcomposer2-7b-4bit) and [InternLM-XComposer-VL2-7B-4bit](https://huggingface.co/internlm/internlm-xcomposer2-vl-7b-4bit) 已在**Hugging Face**和**ModelScope**开源。 - `2024.02.02` 🎉🎉🎉 **InternLM-XComposer2-VL-7B**的[微调代码](./finetune/)已开源。 - `2024.01.26` 🎉🎉🎉 **InternLM-XComposer2-VL-7B**的[评测代码](./evaluation/)已开源。 - `2024.01.26` 🎉🎉🎉 [InternLM-XComposer2-7B](https://huggingface.co/internlm/internlm-xcomposer2-7b) and [InternLM-XComposer-VL2-7B](https://huggingface.co/internlm/internlm-xcomposer2-vl-7b)已在**Hugging Face**和**ModelScope**开源。 - `2024.01.26` 🎉🎉🎉 我们公开了InternLM-XComposer2更多技术细节,请参考[技术报告](https://arxiv.org/abs/2401.16420)。 - `2023.11.22` 🎉🎉🎉 我们开源了[ShareGPT4V](https://github.com/InternLM/InternLM-XComposer/tree/main/projects/ShareGPT4V), 一个高质量的大规模图文描述数据集,以及性能优秀的多模态大模型ShareGPT4V-7B。 - `2023.10.30` 🎉🎉🎉 灵笔在[Q-Bench](https://github.com/Q-Future/Q-Bench/tree/master/leaderboards#overall-leaderboards) 和 [Tiny LVLM](https://github.com/OpenGVLab/Multi-Modality-Arena/tree/main/tiny_lvlm_evaluation) 取得了第一名。 - `2023.10.19` 🎉🎉🎉 支持多卡测试,多卡Demo. 两张4090显卡可部署全量Demo。 - `2023.10.12` 🎉🎉🎉 支持4比特量化Demo, 模型文件可从[Hugging Face](https://huggingface.co/internlm/internlm-xcomposer-7b-4bit) and [ModelScope](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-xcomposer-7b-4bit) 获取。 - `2023.10.8` 🎉🎉🎉 [InternLM-XComposer-7B](https://huggingface.co/internlm/internlm-xcomposer-7b) 和 [InternLM-XComposer-VL-7B](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-xcomposer-vl-7b) 已在Modelscope开源。 - `2023.9.27` 🎉🎉🎉 **InternLM-XComposer-VL-7B**的[评测代码](./InternLM-XComposer-1.0/evaluation/)已开源。 - `2023.9.27` 🎉🎉🎉 [InternLM-XComposer-7B](https://huggingface.co/internlm/internlm-xcomposer-7b) 和 [InternLM-XComposer-VL-7B](https://huggingface.co/internlm/internlm-xcomposer-vl-7b) 已在Hugging Face开源。 - `2023.9.27` 🎉🎉🎉 更多技术细节请参考[技术报告](https://arxiv.org/pdf/2309.15112.pdf)。
## 模型合集 | 模型 | 用途 | Transformers(HF) | ModelScope(HF) | 开源日期 | | --------------------------- | ------------------- | ------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------- | | **InternLM-XComposer2** | 图文创作 | [🤗internlm-xcomposer2-7b](https://huggingface.co/internlm/internlm-xcomposer2-7b) | [ internlm-xcomposer2-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-xcomposer2-7b/summary) | 2024-01-26 | | **InternLM-XComposer2-VL** | Benchmark, 视觉问答 | [🤗internlm-xcomposer2-vl-7b](https://huggingface.co/internlm/internlm-xcomposer2-vl-7b) | [ internlm-xcomposer2-vl-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-xcomposer2-vl-7b/summary) | 2024-01-26 | | **InternLM-XComposer2-4bit** | 图文创作 | [🤗internlm-xcomposer2-7b-4bit](https://huggingface.co/internlm/internlm-xcomposer2-7b-4bit) | [ internlm-xcomposer2-7b-4bit](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-xcomposer2-7b-4bit/summary) | 2024-02-06 | | **InternLM-XComposer2-VL-4bit** | Benchmark, 视觉问答 | [🤗internlm-xcomposer2-vl-7b-4bit](https://huggingface.co/internlm/internlm-xcomposer2-vl-7b-4bit) | [ internlm-xcomposer2-vl-7b-4bit](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-xcomposer2-vl-7b-4bit/summary) | 2024-02-06 | | **InternLM-XComposer** | 图文创作, 视觉问答 | [🤗internlm-xcomposer-7b](https://huggingface.co/internlm/internlm-xcomposer-7b) | [ internlm-xcomposer-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-xcomposer-7b/summary) | 2023-09-26 | | **InternLM-XComposer-4bit** | 图文创作, 视觉问答 | [🤗internlm-xcomposer-7b-4bit](https://huggingface.co/internlm/internlm-xcomposer-7b-4bit) | [ internlm-xcomposer-7b-4bit](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-xcomposer-7b-4bit/summary) | 2023-09-26 | | **InternLM-XComposer-VL** | Benchmark | [🤗internlm-xcomposer-vl-7b](https://huggingface.co/internlm/internlm-xcomposer-vl-7b) | [ internlm-xcomposer-vl-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-xcomposer-vl-7b/summary) | 2023-09-26 | ## 评测 我们在13个多模态评测对InternLM-XComposer2-VL上进行测试,包括:[MathVista](https://mathvista.github.io/), [MMMU](https://mmmu-benchmark.github.io/), [AI2D](https://prior.allenai.org/projects/diagram-understanding), [MME](https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Evaluation), [MMBench](https://opencompass.org.cn/leaderboard-multimodal), [MMBench-CN](https://opencompass.org.cn/leaderboard-multimodal), [SEED-Bench](https://huggingface.co/spaces/AILab-CVC/SEED-Bench_Leaderboard), [QBench](https://github.com/Q-Future/Q-Bench/tree/master/leaderboards#overall-leaderboards), [HallusionBench](https://github.com/tianyi-lab/HallusionBench), [ChartQA](https://github.com/vis-nlp/ChartQA), [MM-Vet](https://github.com/yuweihao/MM-Vet), [LLaVA-in-the-wild](https://github.com/haotian-liu/LLaVA), [POPE](https://github.com/AoiDragon/POPE). 复现评测结果,请参考[评测细节](./evaluation/README.md)。 ### 对比闭源多模态API以及开源SOTA模型。 | | MathVista | AI2D | MMMU | MME | MMB | MMBCN | SEEDI | LLaVAW | QBenchT | MM-Vet | HallB | ChartVQA | | -------------------------- | --------- | ------ | ----- | ------ | ------ | ------ | ----- | ------ | ------- | ------ | ------ | -------- | | Open-source Previous SOTA | SPH-MOE | Monkey | Yi-VL | WeMM | L-Int2 | L-Int2 | SPH-2 | CogVLM | Int-XC | CogVLM | Monkey | CogAgent | | | 8x7B | 10B | 34B | 6B | 20B | 20B | 17B | 17B | 8B | 30B | 10B | 18B | | | 42.3 | 72.6 | 45.9 | 2066.6 | 75.1 | 73.7 | 74.8 | 73.9 | 64.4 | 56.8 | 58.4 | 68.4 | | | | | | | | | | | | | | | | GPT-4V | 49.9 | 78.2 | 56.8 | 1926.5 | 77 | 74.4 | 69.1 | 93.1 | 74.1 | 67.7 | 65.8 | 78.5 | | Gemini-Pro | 45.2 | 73.9 | 47.9 | 1933.3 | 73.6 | 74.3 | 70.7 | 79.9 | 70.6 | 64.3 | 63.9 | 74.1 | | QwenVL-Plus | 43.3 | 75.9 | 46.5 | 2183.3 | 67 | 70.7 | 72.7 | 73.7 | 68.9 | 55.7 | 56.4 | 78.1 | | Ours | 57.6 | 78.7 | 42 | 2242.7 | 79.6 | 77.6 | 75.9 | 81.8 | 72.5 | 51.2 | 60.3 | 72.6 | ### 对比开源模型。 | Method | LLM | MathVista | MMMU | MMEP | MMEC | MMB | MMBCN | SEEDI | LLaVAW | QBenchT | MM-Vet | HallB | POPE | | ------------ | ------------ | --------- | ---- | ------- | ----- | ---- | ----- | ----- | ------ | ------- | ------ | ----- | ---- | | BLIP-2 | FLAN-T5 | - | 35.7 | 1,293.8 | 290.0 | - | - | 46.4 | 38.1 | - | 22.4 | - | - | | InstructBLIP | Vicuna-7B | 25.3 | 30.6 | - | - | 36.0 | 23.7 | 53.4 | 60.9 | 55.9 | 26.2 | 53.6 | 78.9 | | IDEFICS-80B | LLaMA-65B | 26.2 | 24.0 | - | - | 54.5 | 38.1 | 52.0 | 56.9 | - | 39.7 | 46.1 | - | | Qwen-VL-Chat | Qwen-7B | 33.8 | 35.9 | 1,487.5 | 360.7 | 60.6 | 56.7 | 58.2 | 67.7 | 61.7 | 47.3 | 56.4 | - | | LLaVA | Vicuna-7B | 23.7 | 32.3 | 807.0 | 247.9 | 34.1 | 14.1 | 25.5 | 63.0 | 54.7 | 26.7 | 44.1 | 80.2 | | LLaVA-1.5 | Vicuna-13B | 26.1 | 36.4 | 1,531.3 | 295.4 | 67.7 | 63.6 | 68.2 | 70.7 | 61.4 | 35.4 | 46.7 | 85.9 | | ShareGPT4V | Vicuna-7B | 25.8 | 36.6 | 1,567.4 | 376.4 | 68.8 | 62.2 | 69.7 | 72.6 | - | 37.6 | 49.8 | - | | CogVLM-17B | Vicuna-7B | 34.7 | 37.3 | - | - | 65.8 | 55.9 | 68.8 | 73.9 | - | 54.5 | 55.1 | - | | LLaVA-XTuner | InernLM2-20B | 24.6 | 39.4 | - | - | 75.1 | 73.7 | 70.2 | 63.7 | - | 37.2 | 47.7 | - | | Monkey-10B | Qwen-7B | 34.8 | 40.7 | 1,522.4 | 401.4 | 72.4 | 67.5 | 68.9 | 33.5 | - | 33.0 | 58.4 | - | | InternLM-XC | InernLM-7B | 29.5 | 35.6 | 1,528.4 | 391.1 | 74.4 | 72.4 | 66.1 | 53.8 | 64.4 | 35.2 | 57.0 | - | | Ours | InernLM2-7B | 57.6 | 43.0 | 1,712.0 | 530.7 | 79.6 | 77.6 | 75.9 | 81.8 | 72.5 | 51.2 | 59.1 | 87.7 | ## 环境要求 - python 3.8 and above - pytorch 1.12 and above, 2.0 and above are recommended - CUDA 11.4 and above are recommended (this is for GPU users)
## 安装教程 在运行代码之前,请先按照要求配置环境。请确认你的设备符合以上环境需求,然后安装环境。 请参考[安装教程](docs/install_CN.md) ## 快速开始 我们提供了一个简单实用的 🤗 Transformers 版本 InternLM-XComposer 的使用案例。
🤗 Transformers ```python import torch from transformers import AutoModel, AutoTokenizer torch.set_grad_enabled(False) # init model and tokenizer model = AutoModel.from_pretrained('internlm/internlm-xcomposer2-vl-7b', trust_remote_code=True).cuda().eval() tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2-vl-7b', trust_remote_code=True) text = '仔细描述这张图' image = 'examples/image1.webp' with torch.cuda.amp.autocast(): response, _ = model.chat(tokenizer, query=text, image=image, history=[], do_sample=False) print(response) #这张图片是一个引用的奥斯卡·王尔德的名言,它被放在一个美丽的日落背景上。 #引用的内容是“Live life with no excuses, travel with no regrets”,意思是“生活不要找借口,旅行不要后悔”。 # 在日落时分,两个身影站在山丘上,他们似乎正在享受这个美景。整个场景传达出一种积极向上、勇敢追求梦想的情感。 ```
🤖 ModelScope ```python import torch from modelscope import snapshot_download, AutoModel, AutoTokenizer torch.set_grad_enabled(False) # init model and tokenizer model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm-xcomposer2-vl-7b') model = AutoModel.from_pretrained(model_dir, trust_remote_code=True).cuda().eval() tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True) model.tokenizer = tokenizer text = '仔细描述这张图' image = 'examples/image1.webp' with torch.cuda.amp.autocast(): response, _ = model.chat(tokenizer, query=text, image=image, history=[], do_sample=False) print(response) #这张图片是一个引用的奥斯卡·王尔德的名言,它被放在一个美丽的日落背景上。 #引用的内容是“Live life with no excuses, travel with no regrets”,意思是“生活不要找借口,旅行不要后悔”。 # 在日落时分,两个身影站在山丘上,他们似乎正在享受这个美景。整个场景传达出一种积极向上、勇敢追求梦想的情感。 ```
## 多GPU测试 如果你有多张 GPU,但是每张 GPU 的显存大小都不足以容纳完整的模型,那么可以将模型切分在多张GPU上。首先安装 accelerate: pip install accelerate,然后执行以下脚本进行聊天: ``` # chat with 2 GPUs python examples/example_chat.py --num_gpus 2 ``` ## 4-Bit 量化模型 我们提供4bit量化模型来缓解模型的内存需求。 要运行4bit模型(GPU内存> = 12GB),您需要首先安装相应的[依赖包](https://github.com/InternLM/InternLM-XComposer/docs/install.md), 然后执行以下脚本:
🤗 Transformers ```python import torch, auto_gptq from transformers import AutoModel, AutoTokenizer from auto_gptq.modeling import BaseGPTQForCausalLM auto_gptq.modeling._base.SUPPORTED_MODELS = ["internlm"] torch.set_grad_enabled(False) class InternLMXComposer2QForCausalLM(BaseGPTQForCausalLM): layers_block_name = "model.layers" outside_layer_modules = [ 'vit', 'vision_proj', 'model.tok_embeddings', 'model.norm', 'output', ] inside_layer_modules = [ ["attention.wqkv.linear"], ["attention.wo.linear"], ["feed_forward.w1.linear", "feed_forward.w3.linear"], ["feed_forward.w2.linear"], ] # init model and tokenizer model = InternLMXComposer2QForCausalLM.from_quantized( 'internlm/internlm-xcomposer2-vl-7b-4bit', trust_remote_code=True, device="cuda:0").eval() tokenizer = AutoTokenizer.from_pretrained( 'internlm/internlm-xcomposer2-vl-7b-4bit', trust_remote_code=True) text = 'Please describe this image in detail.' image = 'examples/image1.webp' with torch.cuda.amp.autocast(): response, _ = model.chat(tokenizer, query=query, image=image, history=[], do_sample=False) print(response) #The image features a quote by Oscar Wilde, "Live life with no excuses, travel with no regrets." #The quote is displayed in white text against a dark background. In the foreground, there are two silhouettes of people standing on a hill at sunset. #They appear to be hiking or climbing, as one of them is holding a walking stick. #The sky behind them is painted with hues of orange and purple, creating a beautiful contrast with the dark figures. ```
## 微调代码 请参考 [微调指南](finetune/README_zh-CN.md) ## Web UI 我们提供了一个轻松搭建 Web UI demo 的代码. ``` # 自由形式的图文创作demo python examples/gradio_demo_composition.py # 多模态对话demo python examples/gradio_demo_chat.py ``` 更多信息请参考 Web UI [用户指南](demo_asset/demo.md)。 如果您想要更改模型存放的文件夹,请使用 --folder=new_folder 选项。
## 引用 如果你觉得我们模型/代码/技术报告对你有帮助,请给我 star :star: 和 引用 :pencil:,谢谢 :) ```BibTeX @article{internlmxcomposer2, title={InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model}, author={Xiaoyi Dong and Pan Zhang and Yuhang Zang and Yuhang Cao and Bin Wang and Linke Ouyang and Xilin Wei and Songyang Zhang and Haodong Duan and Maosong Cao and Wenwei Zhang and Yining Li and Hang Yan and Yang Gao and Xinyue Zhang and Wei Li and Jingwen Li and Kai Chen and Conghui He and Xingcheng Zhang and Yu Qiao and Dahua Lin and Jiaqi Wang}, journal={arXiv preprint arXiv:2401.16420}, year={2024} } ``` ```BibTeX @article{internlmxcomposer, title={InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition}, author={Pan Zhang and Xiaoyi Dong and Bin Wang and Yuhang Cao and Chao Xu and Linke Ouyang and Zhiyuan Zhao and Shuangrui Ding and Songyang Zhang and Haodong Duan and Wenwei Zhang and Hang Yan and Xinyue Zhang and Wei Li and Jingwen Li and Kai Chen and Conghui He and Xingcheng Zhang and Yu Qiao and Dahua Lin and Jiaqi Wang}, journal={arXiv preprint arXiv:2309.15112}, year={2023} } ```
## 许可证 & 联系我们 本仓库的代码依照 Apache-2.0 协议开源。模型权重对学术研究完全开放,也可申请免费的商业使用授权([申请表](https://wj.qq.com/s2/12725412/f7c1/))。其他问题与合作请联系