And use it in the following codes.
import os import openai api_token = os.environ.get('LEPTON_API_TOKEN') client = openai.OpenAI( base_url="https://dolphin-mixtral-8x7b.lepton.run/api/v1/", api_key=api_token ) response = client.completions.create( model="dolphin-mixtral-8x7b", prompt="<|im_start|>user\n# Python\ndef fibonacci(n):<|im_end|>\n<|im_start|>assistant" ) print(response)
The rate limit for the Serverless Endpoints is 10 requests per minute across all models under Basic Plan. If you need a higher rate limit with SLA please upgrade to standard plan, or use dedicated deployment.