Other LLM Endpoints

You can also create a dedicated endpoint from any models by Create Dedicated LLM Endpoint option. The models can be either your uploaded fine-tuned models, or the ones from the HuggingFace Hub.

What is Dedicated LLM Endpoint?

Dedicated LLM endpoint corresponds to a running instance of an LLM model using Lepton LLM engine for high performance and low latency. Beyond the high performance and low latency, dedicated LLM endpoint also provides many convenient features, such as autoscaling, file system mount, monitoring, metrics and more.

Create Dedicated LLM Endpoint

Go to the create dedicated LLM endpoint page on Lepton dashboard.
Load your model from Hugging Face or from your file system on Lepton.
Modify the endpoint configuration as needed.

Pricing for Dedicated LLM Endpoint

Dedicated LLM endpoint is billed by the GPU instance type and numbers of the GPU instances, calculated by minutes. For example, the unit price of NVIDIA-H100 GPU is $3/hour, if you deployed an LLM endpoint with 1x NVIDIA-H100 GPU and ran it for 2.5 hours, the cost will be $3 * 2.5 = $7.5.

For a more detailed information about dedicated LLM endpoint, checkout the documentation for Dedicated Endpoint.