Changelog

Product updates and improvements for Lepton AI.

-

2024-11-21

User-Defined Shapes Support for Customer Managed Node Groups

  • Users can now create custom shapes with specified CPU / memory / disk / GPU for customer managed node groups, enabling more flexibility for diverse workloads.

Job Stop Support

  • Jobs can now be stopped in addition to being started / cloned / deleted, preserving logs and other metadata.

Resizable Input Command Box for Job Creation

  • The input command box in the job creation interface is now resizable, improving visibility and ease of editing for long commands.

Dedicated Endpoint Creation from Nvidia NIM Images

  • Users can seamlessly create dedicated endpoints from NVIDIA NIM images through the interface.

Graceful Replica Deletion

  • Replica deletion now defaults to the graceful deletion mode: a new replica is created before the old one is deleted, so service capacity is reserved during migration.

Enhanced Workspace User Management

  • Users can add multiple workspace members via a list of email addresses separated by commas or new lines.

Node Group Scaling History and Billing details

  • Operation History: Users can review a detailed scaling history for customer-managed node groups, including actions like node creation and release.
  • Billing Details: Comprehensive usage and billing insights are now available, with on-demand node usage categorized under "Machines" in the billing section.

Automatic Node Group Selection for Single Node Group Workspaces

  • Workspaces with a single node group now automatically select that node group when creating new workloads, simplifying the process for users.

Filter Workloads by Creator

  • Users can filter workloads by creator in the Job, Dev-Pod, and Dedicated Endpoints lists.

Customizable Timezone for Billing Reports

  • Billing report can now use either local timezone or UTS for better alignment with clients' own metering and billing preferences.
-

2024-11-08

Node Group and GPU Selection in Workload Creation UX Improvements

  • Users can now select a specific node group when creating workloads, with a list of available GPU types and quantities for that node group.

Multiple Region Latecny Test Support for Monitoring

  • Users can now choose the client region for latency testing during monitor creation, enabling them to assess their application's latency from different regions.

Job Failure Diagnostics (Beta Feature)

  • Users can view the cause of job failures when job log collection is enabled. This feature is in public beta and will be enhanced over time.

Model Selection in LLM Chat Web Interface

  • Users can select a model from a list within the LLM chat interface, now available for both serverless and dedicated LLM endpoints with multiple Loras loaded.

Enhanced Fuzzy Search in Dedicated Endpoints List

  • Users can now perform fuzzy searches in the dedicated endpoints list, making it easier to find endpoints without needing the exact name.
-

2024-10-24

Dedicated Endpoint Creation UX Improvements

  • Users can now easily create dedicated LLM endpoints using the Lepton LLM Engine with just a few clicks. Additionally, users have the option to search for LLM models from Hugging Face or load models directly from their file systems.

Support Health Check Monitor Clone

  • Users can now create a new monitor by cloning an existing monitor. This feature automatically fills in the values from the chosen monitor rule, making it easier to set up monitors with similar configurations.

Workspace-Specific User Tokens Support

  • Users now receive unique user tokens for each workspace they are part of. This enhancement ensures that user authentication and access permissions are managed separately across different workspaces.

Credit Usage Details in Billing

  • Users can now view the amount of credits used for each invoice in the Billing section.
-

2024-10-11

Make Endpoints(Previously Deployments), Pods, and Jobs Top Level Tab in the Dashboard

  • Direct Access to Endpoints(Previously Deployments), Pods, and Jobs: Users can now directly access serverless endpoints, dedicated endpoints, pods, and jobs through the navigation tabs. This enhancement allows for quicker and more intuitive access to these key areas of the application.

Add Utilities Tab in the Dashboard

  • Utilities Tab Categorization: Storage, Domains, Observability, Machines, and Photons have been re-categorized under the Utilities tab. This reorganization aims to streamline navigation and grouping of utilities for better user experience and manageability.

API Documentation For Voice Mode

  • Users can now view API documentation for voice mode in the API tabs, making it easier to handle audio input and output.

Node Group Status Update in Machines Tabs

  • Users can now view node group nodes status and resource usage in the Machines tab directly.

Multi-file Drag-and-drop Upload Support

  • Users can now drag-and-drop multiple files directly into the file system via the web interface.

Serverless Endpoints Tab Update

  • Serverless endpoints with guaranteed capacity for enterprise users now show in the "serverless endpoints" tab.
-

2024-09-26

Voice Mode Improvement

  • Users can now use audio as input in voice mode in playground.
  • All LLM model APIs now have voice mode with voice input enabled

Expanded Model Offerings: Launch of Llama3.2-3B

  • We have introduced the llama3.2-3B model to the Playground. Users can now explore and experiment with it directly via the Playground interface. Additionally, this model is accessible through an OpenAI-compatible API, providing a seamless integration for developers. Visit the Playground to start using Llama3.2-3B model.

View Support Tickets

  • Users can now view support tickets, their statuses, and communication records in the Settings-Support module. This enhancement provides a clear understanding of the ticket processing status and history.

Traffic Routing Capabilities for Standard Plan Users

  • Users under the standard plan can now route traffic with their desired pattern under the Domains feature. This enhancement allows users to shadow traffic, route traffic to multiple deployments with specified weights, and bind their own domains.
-

2024-09-12

Voice Mode Improvement

  • Users can now replay the generated audio when using voice mode in playground.
  • Users can now copy sample code for voice mode available in Python, JavaScript, and cURL command.
  • All LLM model APIs now have voice mode enabled, including models such as Qwen72B, Llama 3.1 8B, Llama 3.1 70B, and Llama 3.1 405B. This feature allows for voice interaction across a range of advanced language models, enhancing user experience and accessibility.

Deployment Version History

  • Users can now view the history of deployment versions and see what changes were made, when, and by whom. This feature helps users better understand the status and changes of the deployment over time and aids in debugging if needed.

Deployment Authentication Token Generation

  • Users can now generate a random token for deployment authentication while creating a new deployment directly.

Direct API Testing Under Deployment

  • Users can now try out the APIs directly under their deployments with improved API documentation and user interface. This enhancement allows for easier and more efficient API testing and integration within the deployment environment.

Deployment Status Enhancement

  • Users can now view the number of pending replicas in the deployment overview page. This feature provides a clearer and quicker understanding of the current state of the deployment, helping users to easily identify any pending actions or issues.
  • Users can now view the stopped and scaling states in the deployment status indicator. This improvement provides a clearer understanding of the deployment status, helping users easily monitor and manage their deployments.

Enhanced Replica Timeline with Event Visibility

  • Users can now observe events such as restarts and crashes directly in the replica timeline to understand the status and availability of their services.

Job Submission and Running History

  • Users can now view historical jobs by applying the 'Archived' flag in the job list filter.
-

2024-08-29

Self-managed Machines Monitoring

  • Users can now improve their GPUs' efficiency and reliability by adding their own machines to Lepton under the Machines > Self Managed page. Lepton helps monitor the status of these machines using the GPUd tool, which automatically identifies, diagnoses, and repairs GPU-related issues, minimizing downtime and maintaining high efficiency.

Viewing Termination Reasons for Replicas

  • Users can now see the termination reasons for replicas within deployment and job replica lists to understand why a replica was terminated and take corrective actions as needed. By hovering over the termination text, a tooltip will display the reason for termination.

Resource Shape Display in Pod Summary

  • When creating a pod, users can now clearly understand the resource shape associated with the pod they are creating by viewing the resource shape under the pod summary.

Create Dedicated Inference Endpoints from Inference Page

  • Users can now easily create a deployment directly from the inference page with a single click. When viewing detailed model information, the new "Create Dedicated Deployment" button allows users to easily set up a dedicated deployment for the chosen models.

OpenAI-compatible Whisper API Now Available

  • We have introduced the Whisper model to the Built with Lepton page. Users can now explore and experiment with the transcribe model directly via the Built with Lepton interface. Additionally, the model is accessible through an OpenAI-compatible API, providing a seamless integration for developers. Visit the Built with Lepton to start using it.

File System Usage Display

  • Users can now view their file system usage under the Storage tab. This feature provides a clear understanding of how much data has been saved in the file system.
-

2024-08-15

Workspace-level Log Collection Configuration

  • Users can now configure log collection settings at the workspace level under Settings > General > Settings. This feature allows setting a default log collection preference for the entire workspace, eliminating the need for individual configuration for each job or deployment. This enhancement promotes a more streamlined workflow and consistent log settings across all tasks within the workspace.

Deployment Access with Workspace User Token

  • Users can now use their user token to access the API of deployments they created, in addition to the workspace token. Note that by default, user B's user token will not have access to the API of a deployment user A created.

Replica Version Matching with Deployments

  • Replica versions on the web UI now match the versions of their deployments. Previously, deployment and replica follow their separate versioning. This allows users to more easily identify old and new replicas while updating a deployment.

Added Inference Tab to Dashboard

  • Users can now access the Inference tab in the dashboard to choose serverless APIs of state-of-the-art (SOTA) models. This update allows users to select models, input data, and receive inference results. Additionally, users can view available APIs and track their inference usage history, enhancing convenience and user experience.

Deployment Update Confirmation

  • Users will now receive a confirmation if changes to the deployment configuration will result in a rolling restart of replicas. This feature allows users to review, confirm, or cancel the update beforehand, helping to prevent unintended service interruptions and ensuring greater control over deployment processes.

Moved Billing Under Settings

  • The Billing section is now moved under Settings along with other workspace settings.

Edit Initial Delay in Deployment

  • Users can now edit the initial delay of a deployment after it has been created. This feature allows for adjustments to the initial delay to better suit user needs and deployment requirements.
-

2024-08-01

Redesigned Dashboard after Login for Quick Access to Modules

  • Added an explore page for quick access to compute pages: deployments, jobs and pods. Other modules including Photon, Storage, Network, Observability, Billing, and Settings are grouped under the Others section for easy navigation.

Enhanced Traffic Management: Introducing Traffic Splitting and Shadowing for Optimized Deployment

  • Network configurations now support distributing traffic across multiple deployments based on assigned weights. Users can assign any positive integer weight to each deployment, and traffic will be split in proportion to these weights. Within each deployment, users can set the load balancing policy to Least Request to route traffic to the replica with the fewest active requests. If left empty, the default Round Robin policy will be used.

  • Introduced traffic shadowing in network configurations, enabling users to duplicate incoming traffic to a deployment for testing or monitoring purposes without impacting the primary traffic flow.

Customizable Starting Command for AI Pods

  • Users can now specify custom starting commands for AI Pods at initialization. This enhancement provides greater flexibility and control over the execution environment, enabling users to tailor their AI Pods to better fit specific workflows and use cases. The specified command requires the AI Pod's Docker image to have bash installed.

Real-time Status Indicators for Rolling Upgrades on Deployment Replica List Page

  • Users can now track rolling upgrades on the Deployment Replica List page. Status indicators show the update progress of each running replica, providing real-time visibility into the process. This enhancement lets users efficiently monitor each replica's status and the overall deployment upgrade progress.

Display Termination Reasons in Timeline

  • The timeline now shows the reason for termination when a deployment, job, or pod is terminated. This update provides users with critical context about why a resource was terminated, aiding in troubleshooting and corrective actions.

Configurable Long-term Log Collection

  • Users can now configure long-term log collection to be enabled by default for pods, deployments, and jobs either at the workspace level or individually. If this field is left empty, the default settings from the workspace will be inherited and used. Note that restarting the workload will continue using the configuration set at its creation time. This feature simplifies log management by allowing broader or more granular control over log collection settings.

AI Pod Supports Multiple SSH Keys

  • Users can now add multiple public SSH keys, separated by new lines, during the pod creation process. This improvement enhances security and collaboration by allowing multiple users or devices to access AI Pods easily.

Expanded Model Offerings: Launch of Llama3.1-8B, Llama3.1-70B, and Llama3.1-405B in the Playground

  • We have introduced the llama3.1-8B, llama3.1-70B, and llama3.1-405B models to the Playground. Users can now explore and experiment with these large-scale language models directly via the Playground interface. Additionally, these models are accessible through an OpenAI-compatible API, providing a seamless integration for developers. Visit the Playground to start using these new models.
-

2024-07-18

Redesigned Navigation Tabs to Enhance User Experience and Streamline Access to Various Features

  • Compute Tab: Now includes Photons, Deployments, Jobs, and Pods for better organization of compute-related resources.
  • Storage Tab: Groups File System, Object Storage, KV Store, and Queue under one tab to centralize storage management.
  • Network Tab: Networking Ingress is now categorized under the Network tab for improved access to networking configurations.
  • Observability Tab: Consolidates Logs, Monitoring, and Audit Logs to provide a unified observability interface.
  • Billing Tab: Billing functionalities are now accessible through a dedicated tab for easier financial management.
  • Settings Tab: Groups general information, Members, Tokens, Secrets, and Docker Registry settings for streamlined access to configuration options.

Support Audit Logs for Workspace-level Operations

  • Introduced support for audit logs for workspace-level operations. Users can now access and review detailed audit logs directly from the settings page. This feature enhances transparency and helps users track changes and activities within their workspace.

Support User Level Auth Token

  • User level auth token can be used to access the workspace and perform operations on behalf of the user such as creating deployments, jobs, and pods. User level auth tokens can be found in the settings page.

Support for Role-Based Access Control (RBAC)

  • Lepton now supports role-based access control, allowing each user to have a role with specific, adjustable permissions.

Support Toggle Line and Timestamps in Logs

  • Added the ability for users to toggle line and timestamps when searching and viewing logs in the Observability tab. This enhancement improves log readability and allows users to customize their log viewing experience for more efficient troubleshooting and analysis.

Support Context Lookup in Logs

  • Introduced context lookup: Users can now expand the context of selected lines in logs, viewing previous and subsequent lines for better clarity.

Support for Launching Jupyter Lab Option during Pod Creation

  • Users can now launch Jupyter Lab in Pods using preset images during the pod creation process.

Support for Specifying User's SSH Key During Pod Creation

  • Users can now spcify their SSH key during Pod creation, enabling direct SSH access to the Pod.

Support for Traffic-Based Auto-Scaling Policy

  • Users can now configure auto-scaling policies using Queries Per Minute (QPM) as the metric. This allows for dynamic scaling based on the actual traffic rate, ensuring optimal resource allocation and performance during varying load conditions.

Support for Creating Deployments with Custom Docker Images and Commands

  • Users can now create deployments using a custom Docker image and specify their own commands. This allows for greater flexibility and customization in deployment configurations, catering to specific application requirements.

Support for Ingress Endpoints to Route Traffic to Multiple Deployments

  • Users can now create Ingress endpoints under the Networking tab to route traffic to multiple deployments, allowing specification of traffic distribution for each deployment separately.

Support for Customizing the Auto Top-up Amount in Billing

  • Users now have the ability to set a specific amount for automatic top-ups in their billing settings. This enhancement provides greater control and flexibility over billing preferences, ensuring that accounts are funded according to individual needs and reducing the risk of interruptions due to insufficient funds.
-

2024-06-26

Allow Job to Select Node Groups

  • Added support for node group selection in CLI Job submissions. Users can now specify the desired node group for job execution using the --node-group flag.

Login Support for Enterprise Email Address

  • Users with non-Gmail email addresses can now sign up for Lepton AI using their enterprise email addresses.

Private Docker Registry UX Improvements

  • Enhanced the user experience for creating Private Image Registry Auth. Pre-filled values are now available for Docker Hub, AWS ECR, Azure CR, GitHub CR, and GCR.
-

2024-06-05

Job Fault-tolerance Support

  • Added job fault-tolerance support. Users can now specify the maximum number of retries for each job, both at the worker and job levels, enhancing reliability and streamlining execution.
-

2024-05-22

Log Persistency Support

  • Added support for persisting job logs. Users can now access logs from the job details page even after job completion.
  • Logs will be available for 30 days post-completion for enterprise tier users