The Missing Guide to the H100 GPU Market
Meet the

New AI Cloud

New AI Cloud

Cutting-edge AI inference and training, unmatched cloud-native experience, and top-tier GPU infrastructure.

From the Creators of
caffepytorchonnxetcd
Why Lepton AI Cloud

Efficient, reliable and easy to use

20B+
tokens processed per day by a single deployment with 100% uptime
1M+
images generated per day by a single deployment with 100% uptime
600+
tokens/s max speed with Tuna, our fast LLM engine
20B+
tokens processed per day by a single deployment with 100% uptime
1M+
images generated per day by a single deployment with 100% uptime
600+
tokens/s max speed with Tuna, our fast LLM engine
20B+
tokens processed per day by a single deployment with 100% uptime
1M+
images generated per day by a single deployment with 100% uptime
600+
tokens/s max speed with Tuna, our fast LLM engine
20B+
tokens processed per day by a single deployment with 100% uptime
1M+
images generated per day by a single deployment with 100% uptime
600+
tokens/s max speed with Tuna, our fast LLM engine
6x+
faster high-resolution image generation via our distributed engine DistriFusion
10K+
models and Loras supported concurrency for image generation
1PB
accelerated serverless storage for fast distributed training
6x+
faster high-resolution image generation via our distributed engine DistriFusion
10K+
models and Loras supported concurrency for image generation
1PB
accelerated serverless storage for fast distributed training
6x+
faster high-resolution image generation via our distributed engine DistriFusion
10K+
models and Loras supported concurrency for image generation
1PB
accelerated serverless storage for fast distributed training
6x+
faster high-resolution image generation via our distributed engine DistriFusion
10K+
models and Loras supported concurrency for image generation
1PB
accelerated serverless storage for fast distributed training
A Full Platform. Not Just GPUs

Combining high performance computing with cloud native efficiency

High Availability
Ensure 99.9% uptime with comprehensive health checks and automatic repairs.
Efficient Compute
5x performance boost with smart scheduling, accelerated compute, and optimized infra.
AI Tailored
Streamlined deployment, training, and serving. Build in a day, scale to millions.
Enterprise Ready
SOC2 and HIPAA compliant. RBAC, quota, audit log, and more.
Fast Training, Fast Inference

We built the fastest and scalable AI runtimes

600+ t/s
Tokens per second speed with distributed inference
23B+
Daily tokens processed by a single client with zero downtime
10ms
Time-to-first-token as low as 10ms for fast local deployment
Lepton’s LLM engine
The fastest LLM serving engine, with dynamic batching, quantization, speculative decoding. Supports most open source architectures.
Once upon a time, in a small village nestled between the mountains, lived a young girl named Lily. She was known for her radiant smile and her love for all living things.

One day, while exploring the nearby forest, Lily stumbled upon a small, injured bird. Its wing was broken, and it couldn't fly. Lily gently picked up the bird and took it home. She made a small nest for it and fed it with the berries she had collected.

Days turned into weeks, and with Lily's tender care, the bird began to heal. It would chirp happily every time Lily entered the room. Lily named the bird Chirpy and grew to love it dearly.

However, Lily knew that Chirpy belonged in the sky, not in a cage. So, she decided to teach Chirpy how to fly again. Every day, she would take Chirpy to the top of a hill and encourage it to flap its wings. At first, Chirpy could only glide a short distance before falling to the ground. But Lily didn't give up. She continued to encourage and support Chirpy, never losing faith in its ability to fly again.
Mixtral 8x7b speed with 2x H100, in-product traffic.
# Installpip install -U leptonai# Serve huggingface modellep photon run -n llama3 -m hf:meta-llama/Meta-Llama-3-8B-Instruct# Serve vllm modellep photon run -n mixtral -m vllm:mistralai/Mixtral-8x7B-v0.1# Serve with Tuna, Lepton's optimized engine (coming soon!)lep tuna run -n mixtral -m mistralai/Mistral-7B-Instruct-v0.3
Photon: Lepton’s BYOM solution
Photon is an easy-to-use, open source library to build Pythonic machine learning model services.
10K+
Models/LORAs supported by single deployment of image generation service.
1M+
Images generated by clients from Lepton.
6x
High-resolution image speedup via Distrifusion, our multi-GPU inference algorithm.
10K+
Models/LORAs supported by single deployment of image generation service.
1M+
Images generated by clients from Lepton.
6x
High-resolution image speedup via Distrifusion, our multi-GPU inference algorithm.
10K+
Models/LORAs supported by single deployment of image generation service.
1M+
Images generated by clients from Lepton.
6x
High-resolution image speedup via Distrifusion, our multi-GPU inference algorithm.
10K+
Models/LORAs supported by single deployment of image generation service.
1M+
Images generated by clients from Lepton.
6x
High-resolution image speedup via Distrifusion, our multi-GPU inference algorithm.
SDFarm: image gen@scale
Run the standard SD Web UI for development, and seamlessly productize with 10s of thousands of models.
Ready for Your Enterprise

High performance computation hardware and cloud native software combined

Serverless Cloud
Lepton API Services
Enterprise Deployment
Lepton AI Cloud Architecture
Deployments
Inference
Jobs
Training
Pods
Development
Fast Runtimes
LLM, SD, etc
Global Overlay Network
Infra Health Management
Lepton Optimized Kubernetes
Bare Metal & VM
High Throughput Storage
Cloud Native Middleware
Multi Cloud & BYOC Hardware Resources