LLM Models

LLM(Large Language Model) is a model that can generate text. It is trained on a large corpus of text and can generate text that is similar to the training corpus.

At Lepton, we provide a list of popular LLM models as serverless endpoint for AI developers to use. The models are hosted on our servers and can be accessed through our APIs.

We made it compatiple with OpenAI API so that you can use it as a drop-in replacement by redirecting api_base to each model url spcified below. For api_token, you can use your Lepton API token.

You can find your API token in Dashboard - Setting.

Usage

import os
import openai

client = openai.OpenAI(
    base_url="https://llama2-7b.lepton.run/api/v1/",
    api_key=os.environ.get('LEPTON_API_TOKEN')
)

completion = client.chat.completions.create(
    model="llama2-7b",
    messages=[
        {"role": "user", "content": "say hello"},
    ],
    max_tokens=128,
    stream=True,
)

for chunk in completion:
    if not chunk.choices:
        continue
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="")

Model List

To switch to a different model, simply change the base_url to the model url specified below.

Here are some sample models that you can use:

Model Name	Model URL
Llama2-13b	https://llama2-13b.lepton.run/api/v1
Mixtral-8*7b	https://mixtral-8x7b.lepton.run/api/v1/
Wizardlm-2-8x22b	https://wizardlm-2-8x22b.lepton.run/api/v1/
DBRX	https://dbrx.lepton.run/api/v1/
Mistral-7b	https://mistral-7b.lepton.run/api/v1/
Toppy M 7B	https://toppy-m-7b.lepton.run/api/v1/
Jet MoE	https://jetmoe-8b-chat.lepton.run/api/v1/
Gemma-7b	Https://gemma-7b.lepton.run/api/v1/

To view the full list of models, please visit Lepton AI Playground.