diff --git a/AI/vllm/0.8.5/24.03-lts/Dockerfile b/AI/vllm/0.8.5/24.03-lts/Dockerfile
new file mode 100644
index 0000000000000000000000000000000000000000..b1110168c6c129a251ed6a1832f3ed3e599f3b8c
--- /dev/null
+++ b/AI/vllm/0.8.5/24.03-lts/Dockerfile
@@ -0,0 +1,19 @@
+# This vLLM Dockerfile is used to construct an image that can build and run vLLM on ARM CPU platform.
+
+FROM openeuler/openeuler:24.03-lts
+
+RUN yum update -y && \
+    yum install -y make gcc gcc-c++ python python-pip python3-devel git vim wget net-tools numactl-devel && \
+    rm -rf /var/cache/yum
+
+WORKDIR /workspace
+
+RUN git clone https://github.com/vllm-project/vllm.git && cd vllm && git checkout v0.8.5
+
+WORKDIR /workspace/vllm
+
+RUN pip install "cmake>=3.26" wheel packaging ninja "setuptools-scm>=8" numpy
+
+RUN VLLM_TARGET_DEVICE="cpu" pip install -e . --extra-index-url https://download.pytorch.org/whl/cpu
+
+ENTRYPOINT ["python", "-m", "vllm.entrypoints.openai.api_server"]
\ No newline at end of file
diff --git a/AI/vllm/README.md b/AI/vllm/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..c6210c7b9e342b628e46598e54a0fad39c9a7437
--- /dev/null
+++ b/AI/vllm/README.md
@@ -0,0 +1,85 @@
+# Quick reference
+
+- The offical vLLM Ascend docker images
+
+- Maintained by: [openEuler CloudNative SIG](https://gitee.com/openeuler/cloudnative)
+
+- Where to get help: [openEuler CloudNative SIG](https://gitee.com/openeuler/cloudnative), [openEuler](https://gitee.com/openeuler/community)
+
+# vLLM | openEuler
+
+Current vLLM docker images are built on the [openEuler](https://repo.openeuler.org/)⁠. This repository is free to use and exempted from per-user rate limits.
+
+vLLM is a fast and easy-to-use library for LLM inference and serving.
+
+Originally developed in the [Sky Computing Lab](https://sky.cs.berkeley.edu/) at UC Berkeley, vLLM has evolved into a community-driven project with contributions from both academia and industry.
+
+vLLM is fast with:
+
+- State-of-the-art serving throughputV
+- Efficient management of attention key and value memory with [PagedAttention](https://blog.vllm.ai/2023/06/20/vllm.html)
+- Continuous batching of incoming requests
+- Fast model execution with CUDA/HIP graph
+- Quantizations: [GPTQ](https://arxiv.org/abs/2210.17323), [AWQ](https://arxiv.org/abs/2306.00978), INT4, INT8, and FP8.
+- Optimized CUDA kernels, including integration with FlashAttention and FlashInfer.
+- Speculative decoding
+- Chunked prefill
+
+Read more about vLLM at [vLLM paper](https://arxiv.org/abs/2309.06180) (SOSP 2023) and explore the vLLM technical documentation at [docs.vllm.ai](https://docs.vllm.ai/)
+
+# Supported tags and respective Dockerfile links
+
+The tag of each vLLM docker image is consist of the version of vLLM and the version of basic image. The details are as follows
+
+| Tags | Currently |  Architectures|
+|--|--|--|
+|[0.6.3-oe2403lts](https://gitee.com/openeuler/openeuler-docker-images/blob/master/AI/vllm/0.6.3/24.03-lts/Dockerfile)| vLLM 0.6.3 on openEuler 24.03-LTS | amd64 |
+|[0.8.5-oe2403lts](https://gitee.com/openeuler/openeuler-docker-images/blob/master/AI/vllm/0.8.5/24.03-lts/Dockerfile)| vLLM 0.8.5 on openEuler 24.03-LTS | amd64, arm64 |
+
+# Usage
+
+## Quick start 1: supported devices
+
+- Intel/AMD x86
+- ARM AArch64
+
+## Quick start 2: setup environment using container
+
+```bash
+# Update the vllm image
+docker run --rm --name vllm -p 8000:8000 -it --entrypoint bash openeuler/vllm-cpu:latest
+```
+## Quick start 3: offline inference
+
+You can use Modelscope mirror to speed up download:
+
+```bash
+export VLLM_USE_MODELSCOPE=true
+```
+
+With vLLM installed, you can start generating texts for list of input prompts (i.e. offline batch inferencing).
+
+Try to run below Python script directly or use `python3` shell to generate texts:
+
+```python
+from vllm import LLM, SamplingParams
+
+prompts = [
+    "Hello, my name is",
+    "The future of AI is",
+]
+sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
+# The first run will take about 3-5 mins (10 MB/s) to download models
+llm = LLM(model="Qwen/Qwen3-8B")
+
+outputs = llm.generate(prompts, sampling_params)
+
+for output in outputs:
+    prompt = output.prompt
+    generated_text = output.outputs[0].text
+    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
+```
+
+# Question and answering
+
+If you have any questions or want to use some special features, please submit an issue or a pull request on [openeuler-docker-images](https://gitee.com/openeuler/openeuler-docker-images)⁠.
\ No newline at end of file
diff --git a/AI/vllm/meta.yml b/AI/vllm/meta.yml
index da824cf60238a078f485a4df9d0b0bac363ceda1..1e34e832e2b191b497d5597f6e033912b3c52e3d 100644
--- a/AI/vllm/meta.yml
+++ b/AI/vllm/meta.yml
@@ -1,3 +1,7 @@
 0.6.3-oe2403lts:
   path: 0.6.3/24.03-lts/Dockerfile
-  arch: x86_64
\ No newline at end of file
+  arch: x86_64
+
+0.8.5-oe2403lts:
+  path: 0.8.5/24.03-lts/Dockerfile 
+  arch: x86_64