Use Dev Pod

You can access your dev pod via terminal SSH or browser Web terminal. For how to create a dev pod, please refer to the Create a Dev Pod page.

Access the pod via terminal SSH

You can use the SSH command to access the Pod. The SSH command is shown in both the Pod detail page and the Pods list.

Access the pod via browser

You can also access the Pod via Web terminal in the Pod detail page as shown below. Do notice that the Web terminal is more suitable for temporary access. If you want to have a long-term session to the Pod, please use the SSH command.

Access services hosted in a Pod

If you have a service running inside the Pod, you can access the service by using the internal port 18888 and access this service via the public ip and a randomly assigned port. The randomly assigned port is shown in the Pod detail page. Here is an example code of a Gradio application running in a Pod.

import gradio as gr

def greet(name, intensity):
    return "Hello " * intensity + name + "!"

demo = gr.Interface(
    fn=greet,
    inputs=["text", "slider"],
    outputs=["text"],
)

demo.launch(server_name="0.0.0.0", server_port=18888) # speciy the server name and port

You can save the code to a file named main.py and start this application with command python main.py. Then you can access the application via the public ip and the randomly assigned port as indicated below.

Preserve long running sessions

To preserve the long running sessions, you can use the nohup command to run the application in the background. For more information on how to use nohup, you can refer to the nohup documentation. Here is an example code of running the Gradio application in the background.

nohup python main.py --port 18888 --listen 0.0.0.0 > main.log 2>&1 &

After executing this command, the terminal will display the process ID of the Gradio application. You can now disconnect from the Pod, and the application will continue to run in the background. For more information on how to use nohup, you can refer to the nohup documentation

Connect to Pod via SSH

If you are using default image provided by Lepton AI, the SSH service is already configured. If you are using your own image, you can configure the SSH service by change the code below and run it in the Web terminal.

Edit line 4 to 8 below to be your SSH password, SSH public key and Jupyter Lab token. Leave it empty to skip installation.

#!/bin/bash
############ User Space ##############
# Please fill in your Jupyter Lab token here, leave empty to use env value
USER_JUPKEY=""
# Please fill in your SSH public key here (something like ssh-rsa AAAxxxxx lepton@sampleDomain.com), leave empty to use env value
USER_SSHPUB=""
# Please fill in your SSH root password here, leave empty to use env value
USER_SSHKEY=""
######################################

JUPKEY=$LEPTON_POD_JUPKEY
SSHPUB=$LEPTON_POD_SSHPUB
SSHKEY=$LEPTON_POD_SSHKEY
if [[ -n "$USER_JUPKEY" ]]; then
  JUPKEY=$USER_JUPKEY
fi
if [[ -n "$USER_SSHPUB" ]]; then
  SSHPUB=$USER_SSHPUB
fi
if [[ -n "$USER_SSHKEY" ]]; then
  SSHKEY=$USER_SSHKEY
fi


export DEBIAN_FRONTEND=noninteractive
export TZ=Etc/UTC

function InstallSSH {
  if service sshd status > /dev/null 2>&1; then
    echo "OpenSSH server is already started."
    return
  fi
  # Check if OpenSSH server is already installed
  if ! command -v sshd &> /dev/null; then
      echo "OpenSSH server is not installed. Installing..."

      apt update
      apt install -y openssh-server

      echo "OpenSSH server installation complete."
  else
      echo "OpenSSH server is already installed."
  fi

  # Set root password if SSHKEY is provided
  if [[ -n "$SSHKEY" ]]; then
      # Enable password authentication in SSH configuration
      sed -i '/^#PasswordAuthentication/c\PasswordAuthentication yes' /etc/ssh/sshd_config
      sed -i '/^PasswordAuthentication/c\PasswordAuthentication yes' /etc/ssh/sshd_config
      # Enable root login in SSH configuration
      sed -i '/^#PermitRootLogin/c\PermitRootLogin yes' /etc/ssh/sshd_config
      sed -i '/^PermitRootLogin/c\PermitRootLogin yes' /etc/ssh/sshd_config
      echo "Root login is enabled."

      # Display a message indicating that user/password SSH access is enabled
      echo "User/password SSH access is enabled."
      echo "root:${SSHKEY}" | chpasswd
      echo "Root password has been set."
  fi

  # Check if LEPTON_PUBLIC_KEY variable is set and not empty
  if [[ -n "$SSHPUB" ]]; then
      # Create the .ssh directory and authorized_keys file if they don't exist
      if [ ! -d "$HOME/.ssh" ]; then
          mkdir -p "$HOME/.ssh"
          chmod 0700 "$HOME/.ssh"
          echo "Directory $HOME/.ssh created."
      fi
      if [ ! -f "$HOME/.ssh/authorized_keys" ]; then
          touch "$HOME/.ssh/authorized_keys"
          chmod 0600 "$HOME/.ssh/authorized_keys"
          echo "File $HOME/.ssh/authorized_keys created."
      fi
      # Check if the public key is not already present in authorized_keys
      if ! grep -q "${SSHPUB}" "$HOME/.ssh/authorized_keys"; then
          # Append the public key to authorized_keys
          echo "$SSHPUB" >> "$HOME/.ssh/authorized_keys"
          echo "Public key from env variable added."
      fi
  fi

  # turn off PAM to fix sshd login issue
  sed -i 's/UsePAM yes/UsePAM no/' /etc/ssh/sshd_config

  # set default port to 2222
  sed -i 's/#Port 22/Port 2222/' /etc/ssh/sshd_config

  echo "Exposing ENV variables"
  env | sed 's/=/="/' | sed 's/$/"/' > /etc/environment
  echo "set -a; source /etc/environment; set +a;" >> /root/.bashrc

  mkdir /run/sshd
  chmod 0755 /run/sshd

  service ssh start
  echo "sshd service started"
}

function InstallJupyter {
    if pgrep jupyter-lab > /dev/null 2>&1; then
      echo "jupyter already started"
      return
    fi

    # Check if jupyter lab is already installed
    if ! command -v jupyter-lab &> /dev/null; then
        echo "jupyter lab is not installed. Installing..."

        apt update
        apt install python3 python3-pip -y
        pip install -U virtualenv --break-system-packages
        pip3 install jupyterlab --break-system-packages

        echo "jupyter lab installation complete."
    else
        echo "jupyter lab is already installed."
    fi

    jupyter lab --generate-config

    {
     echo "c.ServerApp.ip = '0.0.0.0'"
     echo "c.ServerApp.open_browser = False"
     echo "c.ServerApp.port = 18889"
    } >> ~/.jupyter/jupyter_lab_config.py

    # Set root password if LEPTON_POD_ROOT_PASSWORD is provided
    if [[ -n "$JUPKEY" ]]; then
        echo "c.ServerApp.token = '${JUPKEY}'" >> ~/.jupyter/jupyter_lab_config.py
        echo "Root password has been set."
    fi

    jupyter lab --allow-root > /var/log/jupyter.log 2>&1 &
}

if [[ -n $SSHKEY ||  -n $SSHPUB  ]]; then
  InstallSSH
fi

if [[ -n $JUPKEY ]]; then
  InstallJupyter
fi

Once you have edited the code accordingly, you can copy the code and paste it into the Web terminal and press enter. The SSH service will be configured and you can access the Pod via SSH.

Running NCCL tests on a set of pods

NCCL tests can be used to verify bandwidth among nodes that have GPUs with IB/RoCE connectivity. Assuming we have N pods, with each utilizing a full node.

First, generate a ssh key on Pod 1.

mkdir -p /root/.ssh
chmod 700 /root/.ssh
ssh-keygen -t ed25519 -N "" -f /root/.ssh/id_ed25519
cat /root/.ssh/id_ed25519.pub >> /root/.ssh/authorized_keys

Then copy the generated public key to /root/.ssh/authorized_keys on Pod 2 to Pod N.

mkdir -p /root/.ssh
chmod 700 /root/.ssh
touch /root/.ssh/authorized_keys
# then echo >> or any other command that writes the public key to /root/.ssh/authorized_keys
# on pod 2 ... pod N

Create a file named: /tmp/hostfile.txt on Pod 1 where each line contains the private ip address of the machines these pods running on. You can obtain the local IP of the node from the dashboard by clicking on the node name (node-ip-10-x-x-x).

touch /tmp/hostfile.txt
vi /tmp/hostfile.txt
# PRIVATE_IP_ADDRESS_FOR_NODE_1
# PRIVATE_IP_ADDRESS_FOR_NODE_2
# ...

On the dashboard, write down the port that's used to ssh into the pod.

Download the nccl-tests script on every pod.

wget https://pub-2f78d6ca875c410392d83a29768dd4ce.r2.dev/nccl_test_pod.bash -O ./nccl_test_pod.bash
chmod +x nccl_test_pod.bash

On Pod2 to Pod N, run the following command:

./nccl_test_pod.bash --ssh-port <ssh-port>

You are expected to see "Done" printed out he end of the output.

On Pod1, run the following command:

./nccl_test_pod.bash --launcher --host-file /tmp/hostfile.txt --num-workers <N> --ssh-port <ssh-port>

The output will be similar to the following:

#
#                                                              out-of-place                       in-place
#       size         count      type   redop    root     time   algbw   busbw #wrong     time   algbw   busbw #wrong
#        (B)    (elements)                               (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)
           8             2     float     sum      -1    42.85    0.00    0.00    N/A    36.65    0.00    0.00    N/A
          16             4     float     sum      -1    36.15    0.00    0.00    N/A    36.02    0.00    0.00    N/A
          32             8     float     sum      -1    36.19    0.00    0.00    N/A    35.95    0.00    0.00    N/A
          64            16     float     sum      -1    36.22    0.00    0.00    N/A    36.30    0.00    0.00    N/A
         128            32     float     sum      -1    36.38    0.00    0.01    N/A    36.54    0.00    0.01    N/A
         256            64     float     sum      -1    62.06    0.00    0.01    N/A    36.49    0.01    0.01    N/A
         512           128     float     sum      -1    55.09    0.01    0.02    N/A    36.07    0.01    0.03    N/A
        1024           256     float     sum      -1    35.48    0.03    0.05    N/A    36.33    0.03    0.05    N/A
        2048           512     float     sum      -1    36.13    0.06    0.11    N/A    36.43    0.06    0.11    N/A
        4096          1024     float     sum      -1    36.23    0.11    0.21    N/A    36.23    0.11    0.21    N/A
        8192          2048     float     sum      -1    40.11    0.20    0.38    N/A    39.67    0.21    0.39    N/A
       16384          4096     float     sum      -1    47.88    0.34    0.64    N/A    47.94    0.34    0.64    N/A
       32768          8192     float     sum      -1    57.35    0.57    1.07    N/A    57.48    0.57    1.07    N/A
       65536         16384     float     sum      -1    63.68    1.03    1.93    N/A    63.04    1.04    1.95    N/A
      131072         32768     float     sum      -1    69.54    1.88    3.53    N/A    71.59    1.83    3.43    N/A
      262144         65536     float     sum      -1    70.81    3.70    6.94    N/A    67.62    3.88    7.27    N/A
      524288        131072     float     sum      -1    69.38    7.56   14.17    N/A    67.24    7.80   14.62    N/A
     1048576        262144     float     sum      -1    82.14   12.77   23.94    N/A    80.83   12.97   24.32    N/A
     2097152        524288     float     sum      -1    98.29   21.34   40.01    N/A    97.38   21.53   40.38    N/A
     4194304       1048576     float     sum      -1    111.6   37.57   70.44    N/A    110.5   37.94   71.14    N/A
     8388608       2097152     float     sum      -1    147.5   56.88  106.64    N/A    145.2   57.75  108.29    N/A
    16777216       4194304     float     sum      -1    192.9   86.98  163.09    N/A    193.0   86.93  162.99    N/A
    33554432       8388608     float     sum      -1    259.4  129.34  242.51    N/A    258.1  129.99  243.73    N/A
    67108864      16777216     float     sum      -1    465.2  144.27  270.51    N/A    461.3  145.47  272.76    N/A
   134217728      33554432     float     sum      -1    736.0  182.37  341.95    N/A    748.3  179.35  336.29    N/A
   268435456      67108864     float     sum      -1   1282.3  209.34  392.52    N/A   1284.0  209.06  391.98    N/A
   536870912     134217728     float     sum      -1   2338.5  229.57  430.45    N/A   2347.8  228.67  428.76    N/A
  1073741824     268435456     float     sum      -1   4483.0  239.51  449.09    N/A   4476.1  239.89  449.79    N/A
  2147483648     536870912     float     sum      -1   8768.3  244.91  459.21    N/A   8772.0  244.81  459.02    N/A
  4294967296    1073741824     float     sum      -1    17427  246.45  462.10    N/A    17365  247.33  463.75    N/A
  8589934592    2147483648     float     sum      -1    34683  247.67  464.38    N/A    34668  247.78  464.58    N/A
 17179869184    4294967296     float     sum      -1    69265  248.03  465.06    N/A    69282  247.97  464.95    N/A
# Out of bounds values : 0 OK
# Avg bus bandwidth    : 137.867
#
MPI job completed!

This result is obtained when running the nccl test on nvcr.io/nvidia/cuda-dl-base:24.10-cuda12.6-devel-ubuntu22.04.