lep job
Manages Lepton Jobs.
Lepton Jobs are for one-time and one-off tasks that run on one or more machines. For example, one can launch a shell script that does a bunch of data processing as a job, or a distributed ML training job over multiple, connected machines. See the documentation for more details.
Usage
lep job [OPTIONS] COMMAND [ARGS]...
Options
--help
: Show this message and exit.
Commands
create
: Creates a job.events
:get
: Gets the job with the given name.list
: Lists all jobs in the current workspace.log
: Gets the log of a job.remove
: Removes the job with the given name.replicas
:
lep job create
Creates a job.
For advanced uses, check https://kubernetes.io/docs/concepts/workloads/controllers/job/.
Usage
lep job create [OPTIONS]
Options
-n
,--name TEXT
: Job name [required]-f
,--file TEXT
: If specified, load the job spec from the file. Any explicitly passed in arg will update the spec based on the file.--container-image TEXT
: Container image for the job. If not set, default to leptonai.config.BASE_IMAGE--container-port TEXT
: Ports to expose for the job, in the format portnumber[:protocol].--port TEXT
: Deprecated flag, use --container-port instead.--command TEXT
: Command string to run for the job.--resource-shape TEXT
: Resource shape for the pod. Available types are: 'cpu.small', 'cpu.medium', 'cpu.large', 'gpu.a10', 'gpu.a10.6xlarge', 'gpu.a100-40gb', 'gpu.2xa100-40gb', 'gpu.4xa100-40gb', 'gpu.8xa100-40gb', 'gpu.a100-80gb', 'gpu.2xa100-80gb', 'gpu.4xa100-80gb', 'gpu.8xa100-80gb', 'gpu.h100-sxm', 'gpu.2xh100-sxm', 'gpu.4xh100-sxm', 'gpu.8xh100-sxm'.-ng
,--node-group TEXT
: Node group for the job. If not set, use on-demand resources. You can repeat this flag multiple times to choose multiple node groups. Multiple node group option is currently not supported but coming soon for enterprise users. Only the first node group will be set if you input multiple node groups at this time.-w
,--num-workers INTEGER
: Number of workers to use for the job. For example, when you do a distributed training job of 4 replicas, use --num-workers 4.--max-failure-retry INTEGER
: Maximum number of failures to retry per worker.--max-job-failure-retry INTEGER
: Maximum number of failures to retry per whole job.-e
,--env TEXT
: Environment variables to pass to the job, in the formatNAME=VALUE
.-s
,--secret TEXT
: Secrets to pass to the job, in the formatNAME=SECRET_NAME
. If secret name is also the environment variable name, you can omit it and simply passSECRET_NAME
.--mount TEXT
: Persistent storage to be mounted to the job, in the formatSTORAGE_PATH:MOUNT_PATH
.--image-pull-secrets TEXT
: Secrets to use for pulling images.--intra-job-communication BOOLEAN
: Enable intra-job communication. If --num-workers is set, this is automatically enabled.--ttl-seconds-after-finished INTEGER
: (advanced feature) limits the lifetime of a job that has finished execution (either Completed or Failed). If not set, we will have it default to 72 hours. Ref: https://kubernetes.io/docs/concepts/workloads/controllers/job/#ttl-mechanism-for-finished-jobs-lg
,--log-collection BOOLEAN
: Enable or disable log collection (true/false). If not provided, the workspace setting will be used.--help
: Show this message and exit.
Job Specs File
As an alternative to passing in the job spec as a command line argument, you can pass in a job spec file with JSON format by passing the -f
flag.
Here is an example of a job spec file:
{
"resource_shape": "cpu.small",
"container": {
"image": "ubuntu:latest",
"command": [
"/bin/bash",
"-c",
"env | grep LEPTON"
]
},
"completions": 2,
"parallelism": 2,
"envs": [],
"mounts": [
{
"path": "/",
"mount_path": "/test"
}
],
"ttl_seconds_after_finished": 259200,
"intra_job_communication": true
}
lep job list
Lists all jobs in the current workspace.
Usage
lep job list [OPTIONS]
Options
--help
: Show this message and exit.
lep job get
Gets the job with the given name.
Usage
lep job get [OPTIONS]
Options
-n
,--name TEXT
: Job name [required]--help
: Show this message and exit.
lep job remove
Removes the job with the given name.
Usage
lep job remove [OPTIONS]
Options
-n
,--name TEXT
: Job name--help
: Show this message and exit.
lep job log
Gets the log of a job. If replica
is not specified, the first replica is
selected. Otherwise, the log of the specified replica is shown. To get the
list of replicas, use lep job status
.
Usage
lep job log [OPTIONS]
Options
-n
,--name TEXT
: The job name to get log. [required]-r
,--replica TEXT
: The replica name to get log.--help
: Show this message and exit.
lep job replicas
Usage
lep job replicas [OPTIONS]
Options
-n
,--name TEXT
: The job name to get replicas. [required]--help
: Show this message and exit.
lep job events
Usage
lep job events [OPTIONS]
Options
-n
,--name TEXT
: The job name to get events. [required]--help
: Show this message and exit.