Custom Model Endpoints
Along with all the out-of-the-box models that we have listed above, Lepton allows you to create dedicated endpoints for any custom models.
What is Custom Model Endpoint?
Custom model endpoint corresponds to a running instance of an AI model, exposing itself as a HTTP server. Any service can be run as a dedicated endpoint, the most common use case is to deploy an AI model, exposed with an OpenAPI.
Create Custom Model Endpoint
- Go to the create dedicated endpoint page on Lepton dashboard.
- As you can see, there are five options to create a dedicated endpoint, including LLM endpoint, Lepton prebuilt, custom model, container image and NVIDIA NIM.
- Select one of the option and follow the instructions to create a dedicated endpoint.
Pricing for Custom Model Endpoint
Dedicated endpoints are billed by the GPU instance type and numbers of the GPU instances, calculated by minutes. For example, the unit price of NVIDIA-H100 GPU is $3/hour, if you deployed an endpoint with 1x NVIDIA-H100 GPU and ran it for 2.5 hours, the cost will be $3 * 2.5 = $7.5.
For a more detailed information about dedicated endpoint, checkout the documentation for Dedicated Endpoint.