NCCL Performance Benchmark Job

This example shows how to run a NCCL performance benchmark job on Lepton step by step.

Step 1: Create a New Job

First, you need to create a job in our platform. Head over to the Create Job page.

As you can see, there are many configurations you can fill in, like name, resource, image, etc. You can find a more detailed guide in documentation for creating a job.

In this example, we will use the following configurations:

Name: nccl-benchmark or any name you want
Resource: For the performance benchmark, we need to choose 8xH100 to take over the whole node for NCCL performance benchmark, and set number of workers to 2 to use both nodes. So you will have to select a node group matching the resource shape requirements, it's recommended to use your own dedicated node group for this job.
Image: We will use nvcr.io/nvidia/pytorch:24.11-py3 as the image for the job. This image is built with the latest NVIDIA container toolkit and PyTorch 24.11. Choose custom image and then fill in the image name.
Run command: We will load a code from remote github repo and run the NCCL performance benchmark. Fill in the command as follows:

set -euox pipefail
trap -- 's=$?; echo >&2 "$0: Error on line "$LINENO": $BASH_COMMAND"; exit $s' ERR

export DEBIAN_FRONTEND=noninteractive
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
apt-get -y update
apt-get install -y libibverbs-dev infiniband-diags openmpi-bin openmpi-doc libopenmpi-dev net-tools openssh-server openssh-client
# custom env setup
git clone https://github.com/NVIDIA/nccl-tests.git /tmp/nccl-tests
cd /tmp/nccl-tests
NV_COMPUTE=$(nvidia-smi --query-gpu=compute_cap --format=csv,noheader,nounits|head -n 1 | tr -d ".")
make -j MPI=1 MPI_HOME=/usr/lib/x86_64-linux-gnu/openmpi/ NVCC_GENCODE="-gencode=arch=compute_${NV_COMPUTE},code=sm_${NV_COMPUTE}"

# SSH setup (replace with your own credentials)
mkdir -p /root/.ssh
echo 'YOUR_SSH_PUBLIC_KEY' >> /root/.ssh/authorized_keys
cat <<EOT > /root/.ssh/id_ed25519
YOUR_SSH_PRIVATE_KEY
EOT
chmod 700 /root/.ssh
chmod 600 /root/.ssh/*
if grep -q "^PermitRootLogin" /etc/ssh/sshd_config; then
    sed -i 's/^PermitRootLogin .*/PermitRootLogin yes/' /etc/ssh/sshd_config
else
    echo "PermitRootLogin yes" >> /etc/ssh/sshd_config
fi
if grep -q "^PubkeyAuthentication" /etc/ssh/sshd_config; then
    sed -i 's/^PubkeyAuthentication .*/PubkeyAuthentication yes/' /etc/ssh/sshd_config
else
    echo "PubkeyAuthentication yes" >> /etc/ssh/sshd_config
fi
sed -i 's/^#Port .*/Port 2222/' /etc/ssh/sshd_config
service ssh restart


COMPLETE_FILE="/tmp/lepton-mpi-complete"
if [ "$LEPTON_JOB_WORKER_INDEX" -eq 0 ]; then
    HOSTFILE="/tmp/hostfile.txt"
    rm -f "$HOSTFILE"
    for i in $(seq 0 $((LEPTON_JOB_TOTAL_WORKERS - 1))); do
        IP_ADDRESS=""
        while [ -z "$IP_ADDRESS" ]; do
            IP_ADDRESS=$({ getent hosts -- "${LEPTON_JOB_WORKER_HOSTNAME_PREFIX}-$i-lan.${LEPTON_SUBDOMAIN}" || echo ""; } | cut -d' ' -f1)
            if [ -z "$IP_ADDRESS" ]; then
                sleep 5
            fi
        done

        WAIT_RETRY=60
        while ! ssh -o StrictHostKeyChecking=no -p 2222 "$IP_ADDRESS" -- echo ok 2>&1; do
            echo "waiting for server ping ..."
            WAIT_RETRY=$((WAIT_RETRY-1))
            if [ $WAIT_RETRY -eq 0 ]; then
                echo "timed out waiting host $IP_ADDRESS to be ready"
                exit 1
            fi
            sleep 5
            echo "retry ssh to $IP_ADDRESS"
        done
        echo "$IP_ADDRESS" >> "$HOSTFILE"
    done

    mpirun -np "$LEPTON_JOB_TOTAL_WORKERS" \
        -x LOGLEVEL=INFO \
        -x NCCL_DEBUG=INFO \
        -x NCCL_IB_DISABLE=0 \
        -x NCCL_IB_HCA="mlx" \
        -pernode \
        --allow-run-as-root \
        --hostfile "$HOSTFILE" \
        -mca plm_rsh_args "-p 2222 -o StrictHostKeyChecking=no" \
        /tmp/nccl-tests/build/all_reduce_perf -b 8 -e 16G -f 2 -g "$LEPTON_RESOURCE_ACCELERATOR_NUM" -c 0

    {
      read -r # ignore head node itself
      while read -r PEER; do
        ssh -n -o StrictHostKeyChecking=no -p 2222 "$PEER" -- touch "$COMPLETE_FILE"
      done
     } <"$HOSTFILE"
else
  while true; do
     [ ! -f "$COMPLETE_FILE" ] || break
     sleep 5
  done
fi

echo "MPI job completed!"

You need to fill in the YOUR_SSH_PUBLIC_KEY and YOUR_SSH_PRIVATE_KEY with your own credentials.

Step 2: Run the Job

Click on the Create button, then you can see the job status in the detail page of the job. The job will proceed once two replicas are in the "Ready" state, which will take a few minutes.

You can see the logs of the job by clicking on the Logs button to check the test result.

After the job is finished, the two replicas will be terminated automatically with Completed state.