Adopters
Adopters, Integrations and Presentations
Adopters
This is based on public documentations, please open an issue if you would like to be added or removed the list.
AWS:
- Amazon EKS supports to run superpod with LeaderWorkerSet to server large LLMs, see blog here.
- A Terraform based EKS Blueprints pattern can be found here. This pattern demonstrates an Amazon EKS Cluster with an EFA-enabled nodegroup that support multi-node inference using vLLM and LeaderWorkerSet.
DaoCloud: LeaderWorkerSet is the default deployment method to run large models crossing multiple nodes on Kubernetes.
Google Cloud:
- GKE leverages LeaderWorkerSet to deploy and serve multi-host gen AI large open models, see blog here.
- A guide to serve DeepSeek-R1 671B or Llama 3.1 405B on GKE, see guide here
Nvidia: LeaderWorkerSet deployments are the recommended method for deploying Multi-Node models with NIM, see document here.
Integrations
Feel free to submit a PR if you use LeaderWorkerSet in your project and want to be added here.
Axlearn: Axlearn is a library built on top of JAX and XLA to support the development of large-scale deep learning models. It uses LeaderWorkerSet to deploy multi-host inference workloads to use during training workflows.
llm-d: llm-d is a Kubernetes-native, high-performance distributed LLM inference framework. It integrates open technologies such as vLLM for model serving and Gateway API Inference extension (GIE) for request scheduling and load balancing, and uses LeaderWorkerSet for scalable multi-node deployments. Key features include P/D Disaggregated serving and prefix caching.
llmaz: llmaz, serving as an easy to use and advanced inference platform, uses LeaderWorkerSet as the underlying workload to support both single-host and multi-host inference scenarios.
NVIDIA Dynamo: NVIDIA Dynamo is a high-throughput low-latency inference framework designed for serving generative AI and reasoning models in multi-node distributed environments especially the disaggregated prefill & decode inference. It uses LeaderWorkerSet to support multi-node deployment on Kubernetes.
OME: OME is a Kubernetes operator for enterprise-grade management and serving of LLMs. it leverages LWS for multi-node inference, see documentation here
SGLang: SGLang, a fast serving framework for large language models and vision language models. It can be deployed with LWS on Kubernetes for distributed model serving, see documentation here
vLLM: vLLM is a fast and easy-to-use library for LLM inference, it can be deployed with LWS on Kubernetes for distributed model serving, see documentation here.
Talks and Presentations
- KubeCon NA 2024: Distributed Multi-Node Model Inference Using the LeaderWorkerSet API by @ahg-g @liurupeng
- KubeCon EU 2025: Project Lighting Talk: Sailing Multi-Host Inference with LWS by @kerthcet
- KubeCon HK 2025: More Than Model Sharding: LWS & Distributed Inference(In Chinese) by @panpan0000 @nicole-lihui
- KubeCon HK 2025: New Pattern for Sailing Multi-host LLM Inference by @kerthcet
- KubeCon JP 2025: Sailing Multi-host Inference for LLM on Kubernetes by @yankay