Changelog | Lepton AI

2025-03-27

Logs

Introduced a level filter for logs, enabling users to filter logs by level including Debug, Warn, and Error.
Optimized logs experience by normalizing the log format for better readability and enhancing the logs style.

Node Group

Added interactive timeline chart for nodes, enabling users to inspect nodes' state changes, review its operation history and track critical events occurred on the node.

Workspace

Introduced failed job alert feature, supporting send alerts when failed jobs are detected to slack channels via webhooks.

2025-03-13

Logs

Introduced a time series chart for logs, enabling users to inspect and locate logs more efficiently.
Added a time range selector for log queries, supporting durations from 5 minutes to 30 days.

Node Group

Added a "Recalled" node stage. Users can now inspect nodes across four distinct stages: Production, Repairing, Recalled, and Testing.
Separated "Scale Up" and "Scale Down" operations for clearer understanding of each action.
Enhanced the timeline chart for node groups:
- Refined the chart’s visual style for better readability and presentation.
- Corrected the start time and duration for timeline chart.

Autoscaling

Enabled scale up from zero feature exclusively for Enterprise users.

Workspace Settings

Optimized private registry management:
- Added support for a new authentication method for Google Cloud Platform (GCP) with service account key.
- Improved relevant documentation with clearer, more detailed guidance.

2025-02-27

Node Group

Implemented a historical job list view for individual nodes, enabling users to review the jobs executed on each node.
Added a timeline view with a detailed operation history of a node group, showing what operations were performed, who executed them, and their durations.
Enhanced node operations with the following improvements:
- Users can now add comments to operations (e.g., release, replace, reboot, drain, and scale) to explain the reasoning behind each action.
- Users can now enable node release protection to prevent nodes from being accidentally released by using the dashboard or calling the API. (Only for reserved nodes.)

Endpoint

Added support for loading models from the file system with vLLM and SGLang engines, offering users greater flexibility in model management.
Added a feature to show the reason for scaling an endpoint deployment to zero, enabling users to view the scale-down reason and timestamp.

Batch Jobs

Introduced a new metric chart to monitor InfiniBand usage, allowing users to inspect detailed InfiniBand usage for each replica of the job.

Billing

Introduced a cumulative display mode for usage charts, enabling users to view data in both periodic and cumulative formats.

2025-02-13

Node Group

Introduced a comprehensive statistics dashboard for node groups:
- Real-time GPU allocation and usage trends.
- Daily GPU Usage statistics about GPU Hours and GPUs.
- Job completion statistics tracking successful, failed and deleted jobs.
- User-specific job completion statistics.
Added workload queue monitoring for each node group:
- Clear status indicators (Admitted/Pending/Queuing) showing the complete lifecycle.
- Detailed workload information are displayed in a table view.
Enhanced node status and stage system for better lifecycle management, users can now check nodes with status and stage.

Endpoint Creation

Added vLLM and SGLang engines for LLM endpoints besides the Lepton LLM Engine.
Deprecated Prebuilt LLM endpoints, since any LLM endpoint can be created by those three engines.

Job Monitoring Improvements

Made job metrics and timeline publicly accessible on the job details page
Added event inspection for failed jobs to facilitate debugging

Workload Region Selection

Added region selection in the resource section when creating a workload in a multi-region node group.

Status Page

Lauched Lepton status page for real-time service status monitoring.

2025-01-16

Job Priority Selection

Users can now set priority levels (low, medium, high) for jobs, pods, and dedicated endpoints. Workloads with higher priority will be allocated resources first when availability is limited, ensuring optimal workload prioritization.

Endpoint Tab Organization

Serverless endpoints and dedicated endpoints are now consolidated under the endpoint tab for improved navigation and management efficiency.

Node Management Enhancement

Node Groups are renamed to Nodes and moved to the top navigation bar for more intuitive access and management.

BYOA Support for Enterprise Users

Enterprise users can now bring their own cloud vendor account (BYOA) such as Lambda Lab to manage workloads, offering greater flexibility and control over infrastructure resources.

2025-01-03

Workload Template Support

Users can create workloads using templates with predefined settings, such as file system mounts and environment variables. Templates are managed under the Settings tab.

Workspace Notifications

Users can receive notifications for workspace-level events, including low credit balance alerts and product updates, found under Settings > General.

SSO Login Support

Enterprise plan users can now log in to Lepton AI using their SSO provider. Contact us to enable this feature.

2024-12-19

Node Group Update in Utilities Tab

Node groups are now categorized into three types: Customer Node Groups, Lepton Node Groups, and Monitoring-Only Nodes.
Customer Node Groups - The nodes are owned by customers. Thus customer can manage their nodes, view workload distribution, and monitor node status themselves.
Lepton Node Groups - The nodes are owned by Lepton. The customer can create workloads over it, view workload distribution, monitor node status, reboot nodes and pause scheduling over the nodes if needed.
Monitoring-Only Nodes - The nodes designated solely for monitoring purposes and are not involved in workload orchestration.

Enterprise Email OTP Login Support

Users can now log in to Lepton AI using their enterprise email and a one-time password (OTP).

Invoice Cost View Enhancement

Users can view daily cost distribution in the invoice detail page, offering better insights into cost trends.

Time-Range Selection in Metrics

Users can select a specific time range in the Metrics tab to analyze logs from that period in detail.

2024-11-21

User-Defined Shapes Support for Customer Managed Node Groups

Users can now create custom shapes with specified CPU / memory / disk / GPU for customer managed node groups, enabling more flexibility for diverse workloads.

Job Stop Support

Jobs can now be stopped in addition to being started / cloned / deleted, preserving logs and other metadata.

Resizable Input Command Box for Job Creation

The input command box in the job creation interface is now resizable, improving visibility and ease of editing for long commands.

Dedicated Endpoint Creation from Nvidia NIM Images

Users can seamlessly create dedicated endpoints from NVIDIA NIM images through the interface.

Graceful Replica Deletion

Replica deletion now defaults to the graceful deletion mode: a new replica is created before the old one is deleted, so service capacity is reserved during migration.

Enhanced Workspace User Management

Users can add multiple workspace members via a list of email addresses separated by commas or new lines.

Node Group Scaling History and Billing details

Operation History: Users can review a detailed scaling history for customer-managed node groups, including actions like node creation and release.
Billing Details: Comprehensive usage and billing insights are now available, with on-demand node usage categorized under "Machines" in the billing section.

Automatic Node Group Selection for Single Node Group Workspaces

Workspaces with a single node group now automatically select that node group when creating new workloads, simplifying the process for users.

Filter Workloads by Creator

Users can filter workloads by creator in the Job, Dev-Pod, and Dedicated Endpoints lists.

Customizable Timezone for Billing Reports

Billing report can now use either local timezone or UTS for better alignment with clients' own metering and billing preferences.

2024-11-08

Node Group and GPU Selection in Workload Creation UX Improvements

Users can now select a specific node group when creating workloads, with a list of available GPU types and quantities for that node group.

Multiple Region Latecny Test Support for Monitoring

Users can now choose the client region for latency testing during monitor creation, enabling them to assess their application's latency from different regions.

Job Failure Diagnostics (Beta Feature)

Users can view the cause of job failures when job log collection is enabled. This feature is in public beta and will be enhanced over time.

Model Selection in LLM Chat Web Interface

Users can select a model from a list within the LLM chat interface, now available for both serverless and dedicated LLM endpoints with multiple Loras loaded.

Enhanced Fuzzy Search in Dedicated Endpoints List

Users can now perform fuzzy searches in the dedicated endpoints list, making it easier to find endpoints without needing the exact name.

2024-10-24

Dedicated Endpoint Creation UX Improvements

Users can now easily create dedicated LLM endpoints using the Lepton LLM Engine with just a few clicks. Additionally, users have the option to search for LLM models from Hugging Face or load models directly from their file systems.

Support Health Check Monitor Clone

Users can now create a new monitor by cloning an existing monitor. This feature automatically fills in the values from the chosen monitor rule, making it easier to set up monitors with similar configurations.

Workspace-Specific User Tokens Support

Users now receive unique user tokens for each workspace they are part of. This enhancement ensures that user authentication and access permissions are managed separately across different workspaces.

Credit Usage Details in Billing

Users can now view the amount of credits used for each invoice in the Billing section.

2024-10-11

Make Endpoints(Previously Deployments), Pods, and Jobs Top Level Tab in the Dashboard

Direct Access to Endpoints(Previously Deployments), Pods, and Jobs: Users can now directly access serverless endpoints, dedicated endpoints, pods, and jobs through the navigation tabs. This enhancement allows for quicker and more intuitive access to these key areas of the application.

Add Utilities Tab in the Dashboard

Utilities Tab Categorization: Storage, Domains, Observability, Machines, and Photons have been re-categorized under the Utilities tab. This reorganization aims to streamline navigation and grouping of utilities for better user experience and manageability.

API Documentation For Voice Mode

Users can now view API documentation for voice mode in the API tabs, making it easier to handle audio input and output.

Node Group Status Update in Machines Tabs

Users can now view node group nodes status and resource usage in the Machines tab directly.

Multi-file Drag-and-drop Upload Support

Users can now drag-and-drop multiple files directly into the file system via the web interface.

Serverless Endpoints Tab Update

Serverless endpoints with guaranteed capacity for enterprise users now show in the "serverless endpoints" tab.

2024-09-26

Voice Mode Improvement

Users can now use audio as input in voice mode in playground.
All LLM model APIs now have voice mode with voice input enabled

Expanded Model Offerings: Launch of Llama3.2-3B

We have introduced the llama3.2-3B model to the Playground. Users can now explore and experiment with it directly via the Playground interface. Additionally, this model is accessible through an OpenAI-compatible API, providing a seamless integration for developers. Visit the Playground to start using Llama3.2-3B model.

View Support Tickets

Users can now view support tickets, their statuses, and communication records in the Settings-Support module. This enhancement provides a clear understanding of the ticket processing status and history.

Traffic Routing Capabilities for Standard Plan Users

Users under the standard plan can now route traffic with their desired pattern under the Domains feature. This enhancement allows users to shadow traffic, route traffic to multiple deployments with specified weights, and bind their own domains.

2024-09-12

Voice Mode Improvement

Users can now replay the generated audio when using voice mode in playground.
Users can now copy sample code for voice mode available in Python, JavaScript, and cURL command.
All LLM model APIs now have voice mode enabled, including models such as Qwen72B, Llama 3.1 8B, Llama 3.1 70B, and Llama 3.1 405B. This feature allows for voice interaction across a range of advanced language models, enhancing user experience and accessibility.

Deployment Version History

Users can now view the history of deployment versions and see what changes were made, when, and by whom. This feature helps users better understand the status and changes of the deployment over time and aids in debugging if needed.

Deployment Authentication Token Generation

Users can now generate a random token for deployment authentication while creating a new deployment directly.

Direct API Testing Under Deployment

Users can now try out the APIs directly under their deployments with improved API documentation and user interface. This enhancement allows for easier and more efficient API testing and integration within the deployment environment.

Deployment Status Enhancement

Users can now view the number of pending replicas in the deployment overview page. This feature provides a clearer and quicker understanding of the current state of the deployment, helping users to easily identify any pending actions or issues.
Users can now view the stopped and scaling states in the deployment status indicator. This improvement provides a clearer understanding of the deployment status, helping users easily monitor and manage their deployments.

Enhanced Replica Timeline with Event Visibility

Users can now observe events such as restarts and crashes directly in the replica timeline to understand the status and availability of their services.

Job Submission and Running History

Users can now view historical jobs by applying the 'Archived' flag in the job list filter.

2024-08-29

Self-managed Machines Monitoring

Users can now improve their GPUs' efficiency and reliability by adding their own machines to Lepton under the Machines > Self Managed page. Lepton helps monitor the status of these machines using the GPUd tool, which automatically identifies, diagnoses, and repairs GPU-related issues, minimizing downtime and maintaining high efficiency.

Viewing Termination Reasons for Replicas

Users can now see the termination reasons for replicas within deployment and job replica lists to understand why a replica was terminated and take corrective actions as needed. By hovering over the termination text, a tooltip will display the reason for termination.

Resource Shape Display in Pod Summary

When creating a pod, users can now clearly understand the resource shape associated with the pod they are creating by viewing the resource shape under the pod summary.

Create Dedicated Inference Endpoints from Inference Page

Users can now easily create a deployment directly from the inference page with a single click. When viewing detailed model information, the new "Create Dedicated Deployment" button allows users to easily set up a dedicated deployment for the chosen models.

OpenAI-compatible Whisper API Now Available

We have introduced the Whisper model to the Built with Lepton page. Users can now explore and experiment with the transcribe model directly via the Built with Lepton interface. Additionally, the model is accessible through an OpenAI-compatible API, providing a seamless integration for developers. Visit the Built with Lepton to start using it.

File System Usage Display

Users can now view their file system usage under the Storage tab. This feature provides a clear understanding of how much data has been saved in the file system.

2024-08-15

Workspace-level Log Collection Configuration

Users can now configure log collection settings at the workspace level under Settings > General > Settings. This feature allows setting a default log collection preference for the entire workspace, eliminating the need for individual configuration for each job or deployment. This enhancement promotes a more streamlined workflow and consistent log settings across all tasks within the workspace.

Deployment Access with Workspace User Token

Users can now use their user token to access the API of deployments they created, in addition to the workspace token. Note that by default, user B's user token will not have access to the API of a deployment user A created.

Replica Version Matching with Deployments

Replica versions on the web UI now match the versions of their deployments. Previously, deployment and replica follow their separate versioning. This allows users to more easily identify old and new replicas while updating a deployment.

Added Inference Tab to Dashboard

Users can now access the Inference tab in the dashboard to choose serverless APIs of state-of-the-art (SOTA) models. This update allows users to select models, input data, and receive inference results. Additionally, users can view available APIs and track their inference usage history, enhancing convenience and user experience.

Deployment Update Confirmation

Users will now receive a confirmation if changes to the deployment configuration will result in a rolling restart of replicas. This feature allows users to review, confirm, or cancel the update beforehand, helping to prevent unintended service interruptions and ensuring greater control over deployment processes.

Moved Billing Under Settings

The Billing section is now moved under Settings along with other workspace settings.

Edit Initial Delay in Deployment

Users can now edit the initial delay of a deployment after it has been created. This feature allows for adjustments to the initial delay to better suit user needs and deployment requirements.

2024-08-01

Redesigned Dashboard after Login for Quick Access to Modules

Added an explore page for quick access to compute pages: deployments, jobs and pods. Other modules including Photon, Storage, Network, Observability, Billing, and Settings are grouped under the Others section for easy navigation.

Enhanced Traffic Management: Introducing Traffic Splitting and Shadowing for Optimized Deployment

Network configurations now support distributing traffic across multiple deployments based on assigned weights. Users can assign any positive integer weight to each deployment, and traffic will be split in proportion to these weights. Within each deployment, users can set the load balancing policy to Least Request to route traffic to the replica with the fewest active requests. If left empty, the default Round Robin policy will be used.
Introduced traffic shadowing in network configurations, enabling users to duplicate incoming traffic to a deployment for testing or monitoring purposes without impacting the primary traffic flow.

Customizable Starting Command for AI Pods

Users can now specify custom starting commands for AI Pods at initialization. This enhancement provides greater flexibility and control over the execution environment, enabling users to tailor their AI Pods to better fit specific workflows and use cases. The specified command requires the AI Pod's Docker image to have bash installed.

Real-time Status Indicators for Rolling Upgrades on Deployment Replica List Page

Users can now track rolling upgrades on the Deployment Replica List page. Status indicators show the update progress of each running replica, providing real-time visibility into the process. This enhancement lets users efficiently monitor each replica's status and the overall deployment upgrade progress.

Display Termination Reasons in Timeline

The timeline now shows the reason for termination when a deployment, job, or pod is terminated. This update provides users with critical context about why a resource was terminated, aiding in troubleshooting and corrective actions.

Configurable Long-term Log Collection

Users can now configure long-term log collection to be enabled by default for pods, deployments, and jobs either at the workspace level or individually. If this field is left empty, the default settings from the workspace will be inherited and used. Note that restarting the workload will continue using the configuration set at its creation time. This feature simplifies log management by allowing broader or more granular control over log collection settings.

AI Pod Supports Multiple SSH Keys

Users can now add multiple public SSH keys, separated by new lines, during the pod creation process. This improvement enhances security and collaboration by allowing multiple users or devices to access AI Pods easily.

Expanded Model Offerings: Launch of Llama3.1-8B, Llama3.1-70B, and Llama3.1-405B in the Playground

We have introduced the llama3.1-8B, llama3.1-70B, and llama3.1-405B models to the Playground. Users can now explore and experiment with these large-scale language models directly via the Playground interface. Additionally, these models are accessible through an OpenAI-compatible API, providing a seamless integration for developers. Visit the Playground to start using these new models.

2024-07-18

Redesigned Navigation Tabs to Enhance User Experience and Streamline Access to Various Features

Compute Tab: Now includes Photons, Deployments, Jobs, and Pods for better organization of compute-related resources.
Storage Tab: Groups File System, Object Storage, KV Store, and Queue under one tab to centralize storage management.
Network Tab: Networking Ingress is now categorized under the Network tab for improved access to networking configurations.
Observability Tab: Consolidates Logs, Monitoring, and Audit Logs to provide a unified observability interface.
Billing Tab: Billing functionalities are now accessible through a dedicated tab for easier financial management.
Settings Tab: Groups general information, Members, Tokens, Secrets, and Docker Registry settings for streamlined access to configuration options.

Support Audit Logs for Workspace-level Operations

Introduced support for audit logs for workspace-level operations. Users can now access and review detailed audit logs directly from the settings page. This feature enhances transparency and helps users track changes and activities within their workspace.

Support User Level Auth Token

User level auth token can be used to access the workspace and perform operations on behalf of the user such as creating deployments, jobs, and pods. User level auth tokens can be found in the settings page.

Support for Role-Based Access Control (RBAC)

Lepton now supports role-based access control, allowing each user to have a role with specific, adjustable permissions.

Support Toggle Line and Timestamps in Logs

Added the ability for users to toggle line and timestamps when searching and viewing logs in the Observability tab. This enhancement improves log readability and allows users to customize their log viewing experience for more efficient troubleshooting and analysis.

Support Context Lookup in Logs

Introduced context lookup: Users can now expand the context of selected lines in logs, viewing previous and subsequent lines for better clarity.

Support for Launching Jupyter Lab Option during Pod Creation

Users can now launch Jupyter Lab in Pods using preset images during the pod creation process.

Support for Specifying User's SSH Key During Pod Creation

Users can now spcify their SSH key during Pod creation, enabling direct SSH access to the Pod.

Support for Traffic-Based Auto-Scaling Policy

Users can now configure auto-scaling policies using Queries Per Minute (QPM) as the metric. This allows for dynamic scaling based on the actual traffic rate, ensuring optimal resource allocation and performance during varying load conditions.

Support for Creating Deployments with Custom Docker Images and Commands

Users can now create deployments using a custom Docker image and specify their own commands. This allows for greater flexibility and customization in deployment configurations, catering to specific application requirements.

Support for Ingress Endpoints to Route Traffic to Multiple Deployments

Users can now create Ingress endpoints under the Networking tab to route traffic to multiple deployments, allowing specification of traffic distribution for each deployment separately.

Support for Customizing the Auto Top-up Amount in Billing

Users now have the ability to set a specific amount for automatic top-ups in their billing settings. This enhancement provides greater control and flexibility over billing preferences, ensuring that accounts are funded according to individual needs and reducing the risk of interruptions due to insufficient funds.

2024-06-26

Allow Job to Select Node Groups

Added support for node group selection in CLI Job submissions. Users can now specify the desired node group for job execution using the --node-group flag.

Login Support for Enterprise Email Address

Users with non-Gmail email addresses can now sign up for Lepton AI using their enterprise email addresses.

Private Docker Registry UX Improvements

Enhanced the user experience for creating Private Image Registry Auth. Pre-filled values are now available for Docker Hub, AWS ECR, Azure CR, GitHub CR, and GCR.

2024-06-05

Job Fault-tolerance Support

Added job fault-tolerance support. Users can now specify the maximum number of retries for each job, both at the worker and job levels, enhancing reliability and streamlining execution.

2024-05-22

Log Persistency Support

Added support for persisting job logs. Users can now access logs from the job details page even after job completion.
Logs will be available for 30 days post-completion for enterprise tier users