Skip to content

Commit efd1213

Browse files
author
sangchengmeng
committed
merge main
2 parents e723c40 + db1b64c commit efd1213

File tree

197 files changed

+10427
-1069
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

197 files changed

+10427
-1069
lines changed

README.md

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,9 @@ LightLLM is a Python-based LLM (Large Language Model) inference and serving fram
2121
[English Docs](https://lightllm-en.readthedocs.io/en/latest/) | [中文文档](https://lightllm-cn.readthedocs.io/en/latest/) | [Blogs](https://modeltc.github.io/lightllm-blog/)
2222

2323
## News
24-
- [2025/05] LightLLM paper on constrained decoding accepted by [ACL25](https://arxiv.org/pdf/2506.03887) (Pre $^3$: Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation). For a more accessible overview of the research with key insights and examples, check out our blog post: [LightLLM Blog](https://www.light-ai.top/lightllm-blog/2025/06/15/pre3.html)
24+
- [2025/09] 🔥 LightLLM [v1.1.0](https://www.light-ai.top/lightllm-blog/2025/09/03/lightllm.html) release!
25+
- [2025/08] Pre $^3$ achieves the outstanding paper award of [ACL2025](https://2025.aclweb.org/program/awards/).
26+
- [2025/05] LightLLM paper on constrained decoding accepted by [ACL2025](https://arxiv.org/pdf/2506.03887) (Pre $^3$: Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation). For a more accessible overview of the research with key insights and examples, check out our blog post: [LightLLM Blog](https://www.light-ai.top/lightllm-blog/2025/06/15/pre3.html)
2527
- [2025/04] LightLLM paper on request scheduler published in [ASPLOS’25](https://dl.acm.org/doi/10.1145/3676641.3716011) (Past-Future Scheduler for LLM Serving under SLA Guarantees)
2628
- [2025/02] 🔥 LightLLM v1.0.0 release, achieving the **fastest DeepSeek-R1** serving performance on single H200 machine.
2729

@@ -90,6 +92,19 @@ We learned a lot from the following projects when developing LightLLM.
9092

9193
We have published a number of papers around components or features of LightLLM, if you use LightLLM in your work, please consider citing the relevant paper.
9294

95+
**constrained decoding**: accepted by [ACL2025](https://arxiv.org/pdf/2506.03887) and achieved the outstanding paper award.
96+
```bibtex
97+
@inproceedings{
98+
anonymous2025pre,
99+
title={Pre\${\textasciicircum}3\$: Enabling Deterministic Pushdown Automata for Faster Structured {LLM} Generation},
100+
author={Anonymous},
101+
booktitle={Submitted to ACL Rolling Review - February 2025},
102+
year={2025},
103+
url={https://openreview.net/forum?id=g1aBeiyZEi},
104+
note={under review}
105+
}
106+
```
107+
93108
**Request scheduler**: accepted by [ASPLOS’25](https://dl.acm.org/doi/10.1145/3676641.3716011):
94109
```bibtex
95110
@inproceedings{gong2025past,

docker/Dockerfile

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,8 @@ RUN pip install -r /lightllm/requirements.txt --no-cache-dir
3939

4040
RUN pip install --no-cache-dir vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
4141

42-
RUN git clone https://github.com/ModelTC/LightKernel.git && cd LightKernel && pip install --no-deps -v .
42+
# TODO: offline compile
43+
# RUN git clone https://github.com/ModelTC/LightKernel.git && cd LightKernel && pip install --no-deps -v .
4344

4445
RUN apt-get update && apt-get install -y libnuma-dev # for sgl_kernel
4546

docker/Dockerfile.deepep

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,8 @@ RUN pip install -r /lightllm/requirements.txt --no-cache-dir
3939

4040
RUN pip install --no-cache-dir vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
4141

42-
RUN git clone https://github.com/ModelTC/LightKernel.git && cd LightKernel && pip install --no-deps -v .
42+
# TODO: offline compile
43+
# RUN git clone https://github.com/ModelTC/LightKernel.git && cd LightKernel && pip install --no-deps -v .
4344

4445
RUN apt-get update && apt-get install -y libnuma-dev wget devscripts debhelper dh-make build-essential dkms
4546
RUN apt-get install -y ibverbs-providers infiniband-diags perftest rdma-core libibverbs-dev librdmacm-dev

docker/Dockerfile.nixl

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
ARG CUDA_VERSION=12.6.1
2+
FROM nvidia/cuda:${CUDA_VERSION}-cudnn-devel-ubuntu22.04
3+
ARG PYTHON_VERSION=3.10
4+
ARG MAMBA_VERSION=24.7.1-0
5+
ARG TARGETPLATFORM
6+
ENV PATH=/opt/conda/bin:$PATH \
7+
CONDA_PREFIX=/opt/conda
8+
9+
RUN chmod 777 -R /tmp && apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
10+
ca-certificates \
11+
libssl-dev \
12+
curl \
13+
g++ \
14+
make \
15+
git && \
16+
rm -rf /var/lib/apt/lists/*
17+
18+
RUN case ${TARGETPLATFORM} in \
19+
"linux/arm64") MAMBA_ARCH=aarch64 ;; \
20+
*) MAMBA_ARCH=x86_64 ;; \
21+
esac && \
22+
curl -fsSL -o ~/mambaforge.sh -v "https://github.com/conda-forge/miniforge/releases/download/${MAMBA_VERSION}/Mambaforge-${MAMBA_VERSION}-Linux-${MAMBA_ARCH}.sh" && \
23+
bash ~/mambaforge.sh -b -p /opt/conda && \
24+
rm ~/mambaforge.sh
25+
26+
RUN case ${TARGETPLATFORM} in \
27+
"linux/arm64") exit 1 ;; \
28+
*) /opt/conda/bin/conda update -y conda && \
29+
/opt/conda/bin/conda install -y "python=${PYTHON_VERSION}" ;; \
30+
esac && \
31+
/opt/conda/bin/conda clean -ya
32+
33+
34+
WORKDIR /root
35+
36+
COPY ./requirements.txt /lightllm/requirements.txt
37+
RUN --mount=type=cache,target=/root/.cache/pip pip install -r /lightllm/requirements.txt --ignore-installed --extra-index-url https://download.pytorch.org/whl/cu124
38+
39+
RUN --mount=type=cache,target=/root/.cache/pip pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
40+
RUN --mount=type=cache,target=/root/.cache/pip git clone https://github.com/ModelTC/LightKernel.git && cd LightKernel && pip install --no-deps -v .
41+
42+
RUN apt-get update && apt-get install -y libnuma-dev wget devscripts debhelper dh-make build-essential dkms
43+
RUN apt-get install -y ibverbs-providers infiniband-diags perftest rdma-core libibverbs-dev librdmacm-dev
44+
45+
ENV CUDA_HOME=/usr/local/cuda \
46+
GDRCOPY_HOME=/usr/src/gdrdrv-2.4.4/
47+
48+
RUN mkdir -p /tmp/gdrcopy && cd /tmp \
49+
&& git clone https://github.com/NVIDIA/gdrcopy.git -b v2.4.4 \
50+
&& cd gdrcopy/packages \
51+
&& CUDA=/usr/local/cuda ./build-deb-packages.sh \
52+
&& dpkg -i gdrdrv-dkms_*.deb libgdrapi_*.deb gdrcopy-tests_*.deb gdrcopy_*.deb \
53+
&& cd / && rm -rf /tmp/gdrcopy
54+
55+
RUN apt-get update && apt-get install -y cmake automake autotools-dev libtool libz-dev && \
56+
DEBIAN_FRONTEND=noninteractive apt-get -y install --reinstall libibverbs-dev rdma-core ibverbs-utils libibumad-dev; \
57+
rm -rf /usr/lib/ucx && \
58+
rm -rf /opt/hpcx/ucx && \
59+
cd /usr/local/src && \
60+
git clone https://github.com/openucx/ucx.git && \
61+
cd ucx && \
62+
git checkout v1.19.x && \
63+
./autogen.sh && ./configure \
64+
--enable-shared \
65+
--disable-static \
66+
--disable-doxygen-doc \
67+
--enable-optimizations \
68+
--enable-cma \
69+
--enable-devel-headers \
70+
--with-cuda=/usr/local/cuda \
71+
--with-verbs=yes \
72+
--with-dm \
73+
--with-gdrcopy=/usr/local \
74+
--with-efa \
75+
--enable-mt && \
76+
make -j && \
77+
make -j install-strip && \
78+
ldconfig;
79+
80+
RUN apt-get update && apt-get install -y pkg-config tmux net-tools ; \
81+
cd /usr/local/src; \
82+
pip install --upgrade meson pybind11 patchelf; \
83+
git clone https://github.com/ai-dynamo/nixl.git -b main && \
84+
cd nixl && \
85+
rm -rf build && \
86+
mkdir build && \
87+
meson setup build/ --prefix=/usr/local/nixl --buildtype=release && \
88+
cd build && \
89+
ninja && \
90+
ninja install && \
91+
cd .. && pip install . --no-deps;
92+
93+
COPY . /lightllm
94+
RUN pip install -e /lightllm --no-cache-dir

docker/Dockerfile.nixl.deepep

Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
ARG CUDA_VERSION=12.6.1
2+
FROM nvidia/cuda:${CUDA_VERSION}-cudnn-devel-ubuntu22.04
3+
4+
ARG PYTHON_VERSION=3.10
5+
ARG MAMBA_VERSION=24.7.1-0
6+
ARG TARGETPLATFORM
7+
8+
ENV PATH=/opt/conda/bin:$PATH \
9+
CONDA_PREFIX=/opt/conda
10+
11+
RUN chmod 777 -R /tmp && apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
12+
ca-certificates \
13+
libssl-dev \
14+
curl \
15+
g++ \
16+
make \
17+
git && \
18+
rm -rf /var/lib/apt/lists/*
19+
20+
RUN case ${TARGETPLATFORM} in \
21+
"linux/arm64") MAMBA_ARCH=aarch64 ;; \
22+
*) MAMBA_ARCH=x86_64 ;; \
23+
esac && \
24+
curl -fsSL -o ~/mambaforge.sh -v "https://github.com/conda-forge/miniforge/releases/download/${MAMBA_VERSION}/Mambaforge-${MAMBA_VERSION}-Linux-${MAMBA_ARCH}.sh" && \
25+
bash ~/mambaforge.sh -b -p /opt/conda && \
26+
rm ~/mambaforge.sh
27+
28+
RUN case ${TARGETPLATFORM} in \
29+
"linux/arm64") exit 1 ;; \
30+
*) /opt/conda/bin/conda update -y conda && \
31+
/opt/conda/bin/conda install -y "python=${PYTHON_VERSION}" ;; \
32+
esac && \
33+
/opt/conda/bin/conda clean -ya
34+
35+
36+
WORKDIR /root
37+
38+
COPY ./requirements.txt /lightllm/requirements.txt
39+
RUN --mount=type=cache,target=/root/.cache/pip pip install -r /lightllm/requirements.txt --ignore-installed --extra-index-url https://download.pytorch.org/whl/cu124
40+
41+
RUN --mount=type=cache,target=/root/.cache/pip pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
42+
RUN --mount=type=cache,target=/root/.cache/pip git clone https://github.com/ModelTC/LightKernel.git && cd LightKernel && pip install --no-deps -v .
43+
44+
RUN apt-get update && apt-get install -y libnuma-dev wget devscripts debhelper dh-make build-essential dkms
45+
RUN apt-get install -y ibverbs-providers infiniband-diags perftest rdma-core libibverbs-dev librdmacm-dev
46+
47+
ENV CUDA_HOME=/usr/local/cuda \
48+
GDRCOPY_HOME=/usr/src/gdrdrv-2.4.4/
49+
50+
RUN mkdir -p /tmp/gdrcopy && cd /tmp \
51+
&& git clone https://github.com/NVIDIA/gdrcopy.git -b v2.4.4 \
52+
&& cd gdrcopy/packages \
53+
&& CUDA=/usr/local/cuda ./build-deb-packages.sh \
54+
&& dpkg -i gdrdrv-dkms_*.deb libgdrapi_*.deb gdrcopy-tests_*.deb gdrcopy_*.deb \
55+
&& cd / && rm -rf /tmp/gdrcopy
56+
57+
# Fix DeepEP IBGDA symlink
58+
RUN ln -sf /usr/lib/x86_64-linux-gnu/libmlx5.so.1 /usr/lib/x86_64-linux-gnu/libmlx5.so
59+
60+
RUN wget https://developer.download.nvidia.com/compute/redist/nvshmem/3.3.9/source/nvshmem_src_cuda12-all-all-3.3.9.tar.gz \
61+
&& tar -xf nvshmem_src_cuda12-all-all-3.3.9.tar.gz && mv nvshmem_src nvshmem \
62+
&& cd nvshmem \
63+
&& rm -f /root/nvshmem_src_cuda12-all-all-3.3.9.tar.gz \
64+
&& NVSHMEM_SHMEM_SUPPORT=0 \
65+
NVSHMEM_UCX_SUPPORT=0 \
66+
NVSHMEM_USE_NCCL=0 \
67+
NVSHMEM_MPI_SUPPORT=0 \
68+
NVSHMEM_IBGDA_SUPPORT=1 \
69+
NVSHMEM_PMIX_SUPPORT=0 \
70+
NVSHMEM_TIMEOUT_DEVICE_POLLING=0 \
71+
NVSHMEM_USE_GDRCOPY=1 \
72+
cmake -S . -B build/ -DCMAKE_INSTALL_PREFIX=/root/nvshmem/install -DCMAKE_CUDA_ARCHITECTURES=90 \
73+
&& cmake --build build --target install -j64
74+
75+
ARG DEEPEP_COMMIT=b6ce310bb0b75079682d09bc2ebc063a074fbd58
76+
RUN git clone https://github.com/deepseek-ai/DeepEP.git && cd DeepEP && git checkout ${DEEPEP_COMMIT} && cd ..
77+
78+
WORKDIR /root/DeepEP
79+
ENV NVSHMEM_DIR=/root/nvshmem/install
80+
RUN NVSHMEM_DIR=/root/nvshmem/install python setup.py install
81+
82+
RUN apt-get update && apt-get install -y cmake automake autotools-dev libtool libz-dev && \
83+
DEBIAN_FRONTEND=noninteractive apt-get -y install --reinstall libibverbs-dev rdma-core ibverbs-utils libibumad-dev; \
84+
rm -rf /usr/lib/ucx && \
85+
rm -rf /opt/hpcx/ucx && \
86+
cd /usr/local/src && \
87+
git clone https://github.com/openucx/ucx.git && \
88+
cd ucx && \
89+
git checkout v1.19.x && \
90+
./autogen.sh && ./configure \
91+
--enable-shared \
92+
--disable-static \
93+
--disable-doxygen-doc \
94+
--enable-optimizations \
95+
--enable-cma \
96+
--enable-devel-headers \
97+
--with-cuda=/usr/local/cuda \
98+
--with-verbs=yes \
99+
--with-dm \
100+
--with-gdrcopy=/usr/local \
101+
--with-efa \
102+
--enable-mt && \
103+
make -j && \
104+
make -j install-strip && \
105+
ldconfig;
106+
107+
RUN apt-get update && apt-get install -y pkg-config tmux net-tools ; \
108+
cd /usr/local/src; \
109+
pip install --upgrade meson pybind11 patchelf; \
110+
git clone https://github.com/ai-dynamo/nixl.git -b main && \
111+
cd nixl && \
112+
rm -rf build && \
113+
mkdir build && \
114+
meson setup build/ --prefix=/usr/local/nixl --buildtype=release && \
115+
cd build && \
116+
ninja && \
117+
ninja install && \
118+
cd .. && pip install . --no-deps;
119+
120+
COPY . /lightllm
121+
RUN pip install -e /lightllm --no-cache-dir

docs/CN/source/tutorial/api_server_args_zh.rst

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -445,9 +445,10 @@ MTP 多预测参数
445445

446446
.. option:: --mtp_mode
447447

448-
支持的 mtp 模式,可选值:
448+
支持的 mtp 模式,建议使用 deepseekv3_eagle获得更好的性能体验,可选值:
449449

450-
* ``deepseekv3``
450+
* ``deepseekv3_vanilla``
451+
* ``deepseekv3_eagle``
451452
* ``None``: 不启用 mtp(默认)
452453

453454
.. option:: --mtp_draft_model_dir

docs/EN/source/tutorial/api_server_args_zh.rst

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -442,9 +442,10 @@ MTP Multi-Prediction Parameters
442442

443443
.. option:: --mtp_mode
444444

445-
Supported mtp modes, optional values:
445+
Supported mtp modes, it is recommended to use deepseekv3_eagle for better performance, optional values:
446446

447-
* ``deepseekv3``
447+
* ``deepseekv3_vanilla``
448+
* ``deepseekv3_eagle``
448449
* ``None``: Do not enable mtp (default)
449450

450451
.. option:: --mtp_draft_model_dir

0 commit comments

Comments
 (0)