Storage

Storage is a core component of Lepton, providing essential services for data management and processing. It includes file system, object storage, key-value store, and queue.

File System

File System is a persistent remote storage service that offers POSIX filesystem semantics. It can be mounted as a POSIX filesystem to Deployments, Pods, and Jobs, hereafter referred to as "workload"s. In addition, you can also use the web interface or CLI tool to manage files and folders in the file system.

Types of File System

Lepton provides different types of file systems to cater to different use cases. The following are the types of file systems available in Lepton:

Default File System

The default file system is a shared file system that is accessible across all types of workloads in a workspace. Consider it a similar version of network file systems such as NFS, or cloud versions like Amazon EFS.

Node group specific file system

For enterprise users who have reserved computation capacities (which are called "node group"s) on lepton, we provide dedicated filesystems much more performant than a general purpose NFS. Such node group specific file systems are equipped with transparent local caches, making it a good fit for terabyte or petabyte level distributed training for large models.

Local Storage

Local Storage refers to the storage on local SSDs / HDDs attached to physical servers. This is only available for enterprise users with reserved node groups. This type of storage provides high-performance, low-latency access to data, making it suitable for workloads that require scratch spaces for fast I/O operations. Note that, similar to all local storage patterns, it cannot be shared across nodes, and is not persistent across multiple runs of different workloads, unless one explicitly specifies the node (physical server) to run the workload on.

Usage and Management

Mounting the File System

When creating workload, you can specify the file system to be mounted to the workload. A workload can have multiple file system mounts. Here are few parameters you need to specify when mounting a file system:

  • File System: The file system to be mounted. You can choose from the list of available file systems. To mount local storage, you need to make sure the node group option is selected when creating the workload. Otherwise, the list of local storage options will not be visible in the file system dropdown.

  • Mount From: The path within the file system to be mounted. For example, if you want to mount the root of the file system, you can specify /. Or if you want to mount a specific folder, you can specify the path to the folder within the file system. For local storage, you will not be able to specify the mount from path as it is always mounted the whole local storage block.

  • Mount As: The path within the workload where the file system will be mounted. For example, if you want to mount the file system to /mnt/data, you can specify /mnt/data as the mount as path.

File and Folder Management

1. Using the Dashboard

To manage files and folders within the defualt or node group specific file storage, you can use the dashboard. You can upload, download, delete, and create files and folders within the file storage using the dashboard directly from your local machine.

You can also upload files from cloud storage services such as AWS S3 and Cloudflare R2. On the File System page, click Upload File then choose From Cloud will open a dialog for you to select the cloud storage service and fill in the required information.

For AWS S3, you need to provide the following information:

  • Bucket Name : The name of the bucket you want to upload from.
  • Access Key ID : The access key of the AWS account.
  • Secret Access Key : The secret access key of the AWS account.
  • Destination Path : The path in the file system you want to upload to.

For Cloudflare R2, you need to provide the following information:

  • Endpoint URL : The S3 API URL of the Cloudflare R2 bucket. It can be found in the bucket's settings page uder Bucket Details. Do not include the bucket name in the URL. It should look like https://xxxxxxxxx.r2.cloudflarestorage.com.
  • Bucket Name : The name of the bucket you want to upload from.
  • Access Key ID : The access key of the Cloudflare R2 API Token. You can manage and create R2 tokens by clickin the Manage R2 API Tokens button in the bucket's settings page.
  • Secret Access Key : The secret access key of the Cloudflare R2 API Token.
  • Destination Path : The path in the file system you want to upload to.

The local storage is not accessible via the dashboard. You can only manage files within the local storage with a workload. For example, you can create a pod with local storage mounted and then use the pod to manage files within the local storage.

2. Using CLI tools

You can also manage files and folders within the file storage using the CLI tools. You can upload, download, delete, and create files and folders within the file storage using the CLI tools. Here are a few examples of how you can manage files and folders using the CLI tools:

# Upload a file to the file storage
$ lep storage upload /local/path/to/a.txt /remote/path/to/a.txt
# Upload a folder to the file storage, rsync is only available for the standard and enterprise plan
$ lep storage upload -r -p --rsync /local/path/to/folder /remote/path/to/folder
# Download a file from the file storage
$ lep storage download /remote/path/to/a.txt /local/path/to/a.txt
# Remove a file from the file storage
$ lep storage rm /remote/path/to/a.txt

For more information on how to manage files and folders using the CLI tools, refer to the CLI documentation.

Object Storage

Object storage provides a simple way to transmit small files between deployments, and between the deployment and the client.

Types of Object Storage

1. Public Object Storage

Public Object Storage is an object storage service that is accessible to all requests from the internet. It is suitable for storing public data such as images, videos, and other files that are meant to be accessible to the public.

2. Private Object Storage

Private Object Storage is an object storage service that is accessible only with an authorized request. It is suitable for storing private data that should not be accessible to the public. And it can also provides a temporary public URL with 900 seconds expiration time.

Manage files

Upload files

1. WebUI

  1. Go to the Storage tab, and click on Object Storage.
  2. Click on the type of object storage you want to upload to.
  3. Click on Upload File on the top right corner.

2. Python SDK

from leptonai.objectstore import ObjectStore
# Upload file to private object storage
bucket = ObjectStore('private')
bucket.put('file_name_in_bucket','path/to/file')

Get files

1. WebUI

  1. Go to the Storage tab, and click on Object Storage.
  2. Click on the type of object storage you want to get files from.
  3. Click the dot icon on the right side of the file name, and click it to see the download option.

2. Python SDK

from leptonai.objectstore import ObjectStore
import shutil
# Specify the type of object storage
bucket = ObjectStore('private')
# Get temporary file and copy it to local
x = bucket.get('path/to/file')
shutil.copy(x.name,'path/to/save/file')

Delete files

1. WebUI

  1. Go to the Storage tab, and click on Object Storage.
  2. Click on the type of object storage you want to delete files from.
  3. Click the dot icon on the right side of the file name, and click it to see the delete option.

2. Python SDK

from leptonai.objectstore import ObjectStore
# Specify the type of object storage
bucket = ObjectStore('private')
# Delete file
bucket.delete('path/to/file')

Key-Value Store

This setion provides comprehensive guidance on utilizing the key-value store module, essential for developing robust AI applications. This module can be used to perfrom tasks such as:

  • Caching Results: Speed up access to repeated computations or predictions by storing and retrieving from a fast cache.
  • Session Storage: Maintain user session data and preferences in conversational AI and recommendation systems for personalized interactions.
  • Configurations and Settings: Manage dynamic AI model configurations and settings for real-time adaptability and personalization.

Similar to a standard Redis database, Lepton's key-value store are organized in two levels. Inside each workspace, one may have:

  • multiple namespaces to store key-value pairs. There is a soft limit for the number of namespaces allowed per workspace, according to the tier of the workspace.
  • key and value pairs in each workspace. Note that key and value sizes are limited to 32kb and 256kb respectively.

This documentation details how to manage and interact with namespaces and key-value pairs effectively. Within Lepton AI, each workspace can have multiple namespaces to store key-value pairs separately. Each namespace can have multiple key-value pairs.

If you are using python SDK, make sure you are logged in via lep login command with the workspace token. You can find it under settings page

List Namespaces

View all available namespaces within your workspace.

from leptonai.kv import KV
# List all namespaces
x = KV.list_kv() # This is not implemented yet

Create Namespace

Create a new namespace for key-value pairs to be stored at.

from leptonai.kv import KV
# create a namespace
x = KV.create_kv("somenamespace")

Delete Namespace

Delete a namespace and its associated data.

from leptonai.kv import KV
# delete a namespace
x = KV.delete_kv("somenamespace")

Create Key-Value Pair under a Namespace

Add new key-value pairs within a specific namespace.

from leptonai.kv import KV
# retrive a namespace
x = KV.get_kv("somenamespace")
# insert a key-value pair
x.put('somekey','somevalue')

Get Value of a Key under a Namespace

Retrieve the value associated with a specific key in a namespace.

from leptonai.kv import KV
# retrive a namespace
x = KV.get_kv("somenamespace")
# retirve the value based on a given key
x.get('somekey')

Delete Key-Value Pair under a Namespace

Remove individual key-value pairs from a namespace, ensuring data relevance and cleanliness.

from leptonai.kv import KV
# retrive a namespace
x = KV.get_kv("somenamespace")
# remote the key-value pair
x.delete('somekey')

Queue

Queue provides a flexible, reliable means of managing a stream of data within your application. Users can effortlessly enqueue string format messages, ensuring that no data is lost or overlooked. Later, these messages can be consumed at a pace that suits the workflow, offering a seamless integration of production and consumption processes.

By default, the Queue object acts as a single FIFO queue which supports send, receive, and length operations.

If you are using python SDK to access queue, make sure you are logged in via lep login command with the workspace token. You can find it under settings page

Queue Management

Create Queue

To initialize and set up a new queue for storing and managing data, you can do this via WebUI or Python SDK.

1. WebUI

  1. Go to the Storage tab, and click on Queue.
  2. Click on New Queue.
  3. Enter the queue name and click on Create. The queue creation is an asynchronous operation, and it may take a few seconds to complete.

2. Python SDK

from leptonai.queue import Queue
# Craete queue
my_queue = Queue("somequeue", create_if_not_exists=True)

List Queues

1. WebUI

  1. Go to the Storage tab, and click on Queue.
  2. You can see the list of queues available in the workspace.

2. Python SDK

from leptonai.queue import Queue
# List queues
my_queue = Queue.list_queue()
# this will return a queue name

Delete Queue

1. WebUI

  1. Go to the Storage tab, and click on Queue.
  2. Click on the dot icon on the right side of the queue name, and click it to see the delete option.

2. Python SDK

from leptonai.queue import Queue
# List queues
my_queue = Queue.delete_queue("somequeue")

Queue Operations

Queue operations can only be performed using the Python SDK. If you are looking to perform queue operations using other forms, please contact us.

Send message to a queue

Add new messages to a queue for processing or storage.

from leptonai.queue import Queue
# Fetch the queue by name
my_queue = Queue("somequeue", create_if_not_exists=True)
# Send message to the queue
my_queue.send('this is a message')

Receive message from a queue

Retrieve messages waiting in the queue.

from leptonai.queue import Queue
# Fetch the queue by name
my_queue = Queue("somequeue", create_if_not_exists=True)
# Send message to the queue
my_queue.receive()

Get Queue Length

Find out how to quickly retrieve the number of items currently waiting in the queue.

from leptonai.queue import Queue
# Get length of a queue
Queue.length('somequeue')
Lepton AI

© 2024