Create Dedicated LLM Endpoint

You can create a dedicated endpoint from LLMs by selecting the Create Dedicated LLM Endpoint option. The models can be either your uploaded fine-tuned models, or the ones from the HuggingFace Hub and some other model providers.

LLM Engine

Lepton LLM

Lepton LLM Engine is one of the fastest and most scalable LLM runtime engine, developed by Lepton AI.

vLLM

vLLM is a fast and easy-to-use library for LLM inference and serving.

Select a model from the Hugging Face Hub or from your file system - we'll automatically generate the vLLM command for you, but you can also customize it if needed.

You can find all vLLM arguments here

SGLang

SGLang is a fast serving framework for large language models and vision language models. It makes your interaction with models faster and more controllable by co-designing the backend runtime and frontend language.

Select a model from the Hugging Face Hub or from your file system - we'll automatically generate the SGLang command for you, but you can also customize it if needed.

You can find all SGLang arguments here

Create Dedicated LLM Endpoint

LLM Engine

Lepton LLM

vLLM

SGLang

Hugging Face Token

Model from File System

Configuration