LLM Models
LLM(Large Language Model) is a model that can generate text. It is trained on a large corpus of text and can generate text that is similar to the training corpus.
At Lepton, we provide a list of popular LLM models as serverless endpoint for AI developers to use. The models are hosted on our servers and can be accessed through our APIs.
We made it compatiple with OpenAI API so that you can use it as a drop-in replacement by redirecting api_base
to each model url spcified below. For api_token
, you can use your Lepton API token.
You can find your API token in Dashboard - Setting.
Usage
import os
import openai
client = openai.OpenAI(
base_url="https://llama2-7b.lepton.run/api/v1/",
api_key=os.environ.get('LEPTON_API_TOKEN')
)
completion = client.chat.completions.create(
model="llama2-7b",
messages=[
{"role": "user", "content": "say hello"},
],
max_tokens=128,
stream=True,
)
for chunk in completion:
if not chunk.choices:
continue
content = chunk.choices[0].delta.content
if content:
print(content, end="")
Model List
To switch to a different model, simply change the base_url
to the model url specified below.
Here are some sample models that you can use:
Model Name | Model URL |
---|---|
Llama2-13b | https://llama2-13b.lepton.run/api/v1 |
Mixtral-8*7b | https://mixtral-8x7b.lepton.run/api/v1/ |
Wizardlm-2-8x22b | https://wizardlm-2-8x22b.lepton.run/api/v1/ |
DBRX | https://dbrx.lepton.run/api/v1/ |
Mistral-7b | https://mistral-7b.lepton.run/api/v1/ |
Toppy M 7B | https://toppy-m-7b.lepton.run/api/v1/ |
Jet MoE | https://jetmoe-8b-chat.lepton.run/api/v1/ |
Gemma-7b | Https://gemma-7b.lepton.run/api/v1/ |
To view the full list of models, please visit Lepton AI Playground.