Create Dedicated LLM Endpoint

You can create a dedicated endpoint from LLMs by selecting the Create Dedicated LLM Endpoint option. The models can be either your uploaded fine-tuned models, or the ones from the HuggingFace Hub and some other model providers.

LLM Engine

Lepton LLM

Lepton LLM Engine is one of the fastest and most scalable LLM runtime engine, developed by Lepton AI.

huggingface

vLLM

vLLM is a fast and easy-to-use library for LLM inference and serving.

vllm

Select a model from the Hugging Face Hub - we'll automatically generate the vLLM command for you, but you can also customize it if needed.

SGLang

SGLang is a fast serving framework for large language models and vision language models. It makes your interaction with models faster and more controllable by co-designing the backend runtime and frontend language.

sglang

Select a model from the Hugging Face Hub - we'll automatically generate the SGLang command for you, but you can also customize it if needed.

Hugging Face Token

Some models are gated and require a HuggingFace token to access. For how to add your HuggingFace token, checkout the Secrets Documentation.

Model from File System

You can upload your own models to your workspace and create a dedicated endpoint from them. For a more detailed guide for file system management, checkout the File System guide.

Configuration

For more details about the configuration options, checkout the Configuration Options guide.

Lepton AI

© 2025