Create Dedicated LLM Endpoint
You can create a dedicated endpoint from LLMs by selecting the Create Dedicated LLM Endpoint option. The models can be either your uploaded fine-tuned models, or the ones from the HuggingFace Hub and some other model providers.
LLM Engine
Lepton LLM
Lepton LLM Engine is one of the fastest and most scalable LLM runtime engine, developed by Lepton AI.
data:image/s3,"s3://crabby-images/282e6/282e6e31da72200b92a2ad8943060f6e5b60e895" alt="huggingface"
vLLM
vLLM is a fast and easy-to-use library for LLM inference and serving.
data:image/s3,"s3://crabby-images/6df6b/6df6b80069853ff01606cd78e47ac3aa969e2b75" alt="vllm"
Select a model from the Hugging Face Hub - we'll automatically generate the vLLM command for you, but you can also customize it if needed.
You can find all vLLM arguments here
SGLang
SGLang is a fast serving framework for large language models and vision language models. It makes your interaction with models faster and more controllable by co-designing the backend runtime and frontend language.
data:image/s3,"s3://crabby-images/6618d/6618d35be369356984a6d0fd53bd872290871c85" alt="sglang"
Select a model from the Hugging Face Hub - we'll automatically generate the SGLang command for you, but you can also customize it if needed.
You can find all SGLang arguments here
Hugging Face Token
Some models are gated and require a HuggingFace token to access. For how to add your HuggingFace token, checkout the Secrets Documentation.
Model from File System
You can upload your own models to your workspace and create a dedicated endpoint from them. For a more detailed guide for file system management, checkout the File System guide.
Configuration
For more details about the configuration options, checkout the Configuration Options guide.