lep job

Manages Lepton Jobs.

Lepton Jobs are for one-time and one-off tasks that run on one or more machines. For example, one can launch a shell script that does a bunch of data processing as a job, or a distributed ML training job over multiple, connected machines. See the documentation for more details.

Usage

lep job [OPTIONS] COMMAND [ARGS]...

Options

  • --help : Show this message and exit.

Commands

  • create : Creates a job.
  • events :
  • get : Gets the job with the given name.
  • list : Lists all jobs in the current workspace.
  • log : Gets the log of a job.
  • remove : Removes the job with the given name.
  • replicas :

lep job create

Creates a job.

For advanced uses, check https://kubernetes.io/docs/concepts/workloads/controllers/job/.

Usage

lep job create [OPTIONS]

Options

  • -n, --name TEXT : Job name [required]
  • -f, --file TEXT : If specified, load the job spec from the file. Any explicitly passed in arg will update the spec based on the file.
  • --container-image TEXT : Container image for the job. If not set, default to leptonai.config.BASE_IMAGE
  • --container-port TEXT : Ports to expose for the job, in the format portnumber[:protocol].
  • --port TEXT : Deprecated flag, use --container-port instead.
  • --command TEXT : Command string to run for the job.
  • --resource-shape TEXT : Resource shape for the pod. Available types are: 'cpu.small', 'cpu.medium', 'cpu.large', 'gpu.a10', 'gpu.a10.6xlarge', 'gpu.a100-40gb', 'gpu.2xa100-40gb', 'gpu.4xa100-40gb', 'gpu.8xa100-40gb', 'gpu.a100-80gb', 'gpu.2xa100-80gb', 'gpu.4xa100-80gb', 'gpu.8xa100-80gb', 'gpu.h100-sxm', 'gpu.2xh100-sxm', 'gpu.4xh100-sxm', 'gpu.8xh100-sxm'.
  • -ng, --node-group TEXT : Node group for the job. If not set, use on-demand resources. You can repeat this flag multiple times to choose multiple node groups. Multiple node group option is currently not supported but coming soon for enterprise users. Only the first node group will be set if you input multiple node groups at this time.
  • -w, --num-workers INTEGER : Number of workers to use for the job. For example, when you do a distributed training job of 4 replicas, use --num-workers 4.
  • --max-failure-retry INTEGER : Maximum number of failures to retry per worker.
  • --max-job-failure-retry INTEGER : Maximum number of failures to retry per whole job.
  • -e, --env TEXT : Environment variables to pass to the job, in the format NAME=VALUE.
  • -s, --secret TEXT : Secrets to pass to the job, in the format NAME=SECRET_NAME. If secret name is also the environment variable name, you can omit it and simply pass SECRET_NAME.
  • --mount TEXT : Persistent storage to be mounted to the job, in the format STORAGE_PATH:MOUNT_PATH.
  • --image-pull-secrets TEXT : Secrets to use for pulling images.
  • --intra-job-communication BOOLEAN : Enable intra-job communication. If --num-workers is set, this is automatically enabled.
  • --ttl-seconds-after-finished INTEGER : (advanced feature) limits the lifetime of a job that has finished execution (either Completed or Failed). If not set, we will have it default to 72 hours. Ref: https://kubernetes.io/docs/concepts/workloads/controllers/job/#ttl-mechanism-for-finished-jobs
  • -lg, --log-collection BOOLEAN : Enable or disable log collection (true/false). If not provided, the workspace setting will be used.
  • --help : Show this message and exit.

lep job list

Lists all jobs in the current workspace.

Usage

lep job list [OPTIONS]

Options

  • --help : Show this message and exit.

lep job get

Gets the job with the given name.

Usage

lep job get [OPTIONS]

Options

  • -n, --name TEXT : Job name [required]
  • --help : Show this message and exit.

lep job remove

Removes the job with the given name.

Usage

lep job remove [OPTIONS]

Options

  • -n, --name TEXT : Job name
  • --help : Show this message and exit.

lep job log

Gets the log of a job. If replica is not specified, the first replica is selected. Otherwise, the log of the specified replica is shown. To get the list of replicas, use lep job status.

Usage

lep job log [OPTIONS]

Options

  • -n, --name TEXT : The job name to get log. [required]
  • -r, --replica TEXT : The replica name to get log.
  • --help : Show this message and exit.

lep job replicas

Usage

lep job replicas [OPTIONS]

Options

  • -n, --name TEXT : The job name to get replicas. [required]
  • --help : Show this message and exit.

lep job events

Usage

lep job events [OPTIONS]

Options

  • -n, --name TEXT : The job name to get events. [required]
  • --help : Show this message and exit.
Lepton AI

© 2024