Custom Model Endpoints

Along with all the out-of-the-box models that we have listed above, Lepton allows you to create dedicated endpoints for any custom models.

What is Custom Model Endpoint?

Custom model endpoint corresponds to a running instance of an AI model, exposing itself as a HTTP server. Any service can be run as a dedicated endpoint, the most common use case is to deploy an AI model, exposed with an OpenAPI.

Create Custom Model Endpoint

Go to the create dedicated endpoint page on Lepton dashboard.
As you can see, there are four options to create a dedicated endpoint, including LLM endpoint, custom model, container image and NVIDIA NIM.
Select one of the option and follow the instructions to create a dedicated endpoint.

Pricing for Custom Model Endpoint

Dedicated endpoints are billed by the GPU instance type and numbers of the GPU instances, calculated by minutes. For example, the unit price of NVIDIA-H100 GPU is $3/hour, if you deployed an endpoint with 1x NVIDIA-H100 GPU and ran it for 2.5 hours, the cost will be $3 * 2.5 = $7.5.

For a more detailed information about dedicated endpoint, checkout the documentation for Dedicated Endpoint.