Use Dev Pod
You can access your dev pod via terminal SSH or browser Web terminal. For how to create a dev pod, please refer to the Create a Dev Pod page.
Access the pod via terminal SSH
You can use the SSH command to access the Pod. The SSH command is shown in both the Pod detail page and the Pods list.
Access the pod via browser
You can also access the Pod via Web terminal in the Pod detail page as shown below. Do notice that the Web terminal is more suitable for temporary access. If you want to have a long-term session to the Pod, please use the SSH command.
Access services hosted in a Pod
If you have a service running inside the Pod, you can access the service by using the internal port 18888
and access this service via the public ip and a randomly assigned port. The randomly assigned port is shown in the Pod detail page. Here is an example code of a Gradio application running in a Pod.
import gradio as gr
def greet(name, intensity):
return "Hello " * intensity + name + "!"
demo = gr.Interface(
fn=greet,
inputs=["text", "slider"],
outputs=["text"],
)
demo.launch(server_name="0.0.0.0", server_port=18888) # speciy the server name and port
You can save the code to a file named main.py
and start this application with command python main.py
. Then you can access the application via the public ip and the randomly assigned port as indicated below.
Preserve long running sessions
To preserve the long running sessions, you can use the nohup
command to run the application in the background. For more information on how to use nohup, you can refer to the nohup documentation. Here is an example code of running the Gradio application in the background.
nohup python main.py --port 18888 --listen 0.0.0.0 > main.log 2>&1 &
After executing this command, the terminal will display the process ID of the Gradio application. You can now disconnect from the Pod, and the application will continue to run in the background. For more information on how to use nohup, you can refer to the nohup documentation
Connect to Pod via SSH
If you are using default image provided by Lepton AI, the SSH service is already configured. If you are using your own image, you can configure the SSH service by change the code below and run it in the Web terminal.
Edit line 4 to 8 below to be your SSH password, SSH public key and Jupyter Lab token. Leave it empty to skip installation.
#!/bin/bash
############ User Space ##############
# Please fill in your Jupyter Lab token here, leave empty to use env value
USER_JUPKEY=""
# Please fill in your SSH public key here (something like ssh-rsa AAAxxxxx lepton@sampleDomain.com), leave empty to use env value
USER_SSHPUB=""
# Please fill in your SSH root password here, leave empty to use env value
USER_SSHKEY=""
######################################
JUPKEY=$LEPTON_POD_JUPKEY
SSHPUB=$LEPTON_POD_SSHPUB
SSHKEY=$LEPTON_POD_SSHKEY
if [[ -n "$USER_JUPKEY" ]]; then
JUPKEY=$USER_JUPKEY
fi
if [[ -n "$USER_SSHPUB" ]]; then
SSHPUB=$USER_SSHPUB
fi
if [[ -n "$USER_SSHKEY" ]]; then
SSHKEY=$USER_SSHKEY
fi
export DEBIAN_FRONTEND=noninteractive
export TZ=Etc/UTC
function InstallSSH {
if service sshd status > /dev/null 2>&1; then
echo "OpenSSH server is already started."
return
fi
# Check if OpenSSH server is already installed
if ! command -v sshd &> /dev/null; then
echo "OpenSSH server is not installed. Installing..."
apt update
apt install -y openssh-server
echo "OpenSSH server installation complete."
else
echo "OpenSSH server is already installed."
fi
# Set root password if SSHKEY is provided
if [[ -n "$SSHKEY" ]]; then
# Enable password authentication in SSH configuration
sed -i '/^#PasswordAuthentication/c\PasswordAuthentication yes' /etc/ssh/sshd_config
sed -i '/^PasswordAuthentication/c\PasswordAuthentication yes' /etc/ssh/sshd_config
# Enable root login in SSH configuration
sed -i '/^#PermitRootLogin/c\PermitRootLogin yes' /etc/ssh/sshd_config
sed -i '/^PermitRootLogin/c\PermitRootLogin yes' /etc/ssh/sshd_config
echo "Root login is enabled."
# Display a message indicating that user/password SSH access is enabled
echo "User/password SSH access is enabled."
echo "root:${SSHKEY}" | chpasswd
echo "Root password has been set."
fi
# Check if LEPTON_PUBLIC_KEY variable is set and not empty
if [[ -n "$SSHPUB" ]]; then
# Create the .ssh directory and authorized_keys file if they don't exist
if [ ! -d "$HOME/.ssh" ]; then
mkdir -p "$HOME/.ssh"
chmod 0700 "$HOME/.ssh"
echo "Directory $HOME/.ssh created."
fi
if [ ! -f "$HOME/.ssh/authorized_keys" ]; then
touch "$HOME/.ssh/authorized_keys"
chmod 0600 "$HOME/.ssh/authorized_keys"
echo "File $HOME/.ssh/authorized_keys created."
fi
# Check if the public key is not already present in authorized_keys
if ! grep -q "${SSHPUB}" "$HOME/.ssh/authorized_keys"; then
# Append the public key to authorized_keys
echo "$SSHPUB" >> "$HOME/.ssh/authorized_keys"
echo "Public key from env variable added."
fi
fi
# turn off PAM to fix sshd login issue
sed -i 's/UsePAM yes/UsePAM no/' /etc/ssh/sshd_config
# set default port to 2222
sed -i 's/#Port 22/Port 2222/' /etc/ssh/sshd_config
echo "Exposing ENV variables"
env | sed 's/=/="/' | sed 's/$/"/' > /etc/environment
echo "set -a; source /etc/environment; set +a;" >> /root/.bashrc
mkdir /run/sshd
chmod 0755 /run/sshd
service ssh start
echo "sshd service started"
}
function InstallJupyter {
if pgrep jupyter-lab > /dev/null 2>&1; then
echo "jupyter already started"
return
fi
# Check if jupyter lab is already installed
if ! command -v jupyter-lab &> /dev/null; then
echo "jupyter lab is not installed. Installing..."
apt update
apt install python3 python3-pip -y
pip install -U virtualenv --break-system-packages
pip3 install jupyterlab --break-system-packages
echo "jupyter lab installation complete."
else
echo "jupyter lab is already installed."
fi
jupyter lab --generate-config
{
echo "c.ServerApp.ip = '0.0.0.0'"
echo "c.ServerApp.open_browser = False"
echo "c.ServerApp.port = 18889"
} >> ~/.jupyter/jupyter_lab_config.py
# Set root password if LEPTON_POD_ROOT_PASSWORD is provided
if [[ -n "$JUPKEY" ]]; then
echo "c.ServerApp.token = '${JUPKEY}'" >> ~/.jupyter/jupyter_lab_config.py
echo "Root password has been set."
fi
jupyter lab --allow-root > /var/log/jupyter.log 2>&1 &
}
if [[ -n $SSHKEY || -n $SSHPUB ]]; then
InstallSSH
fi
if [[ -n $JUPKEY ]]; then
InstallJupyter
fi
Once you have edited the code accordingly, you can copy the code and paste it into the Web terminal and press enter. The SSH service will be configured and you can access the Pod via SSH.
Running NCCL tests on a set of pods
NCCL tests can be used to verify bandwidth among nodes that have GPUs with IB/RoCE connectivity. Assuming we have N pods, with each utilizing a full node.
First, generate a ssh key on Pod 1.
mkdir -p /root/.ssh
chmod 700 /root/.ssh
ssh-keygen -t ed25519 -N "" -f /root/.ssh/id_ed25519
cat /root/.ssh/id_ed25519.pub >> /root/.ssh/authorized_keys
Then copy the generated public key to /root/.ssh/authorized_keys on Pod 2 to Pod N.
mkdir -p /root/.ssh
chmod 700 /root/.ssh
touch /root/.ssh/authorized_keys
# then echo >> or any other command that writes the public key to /root/.ssh/authorized_keys
# on pod 2 ... pod N
Create a file named: /tmp/hostfile.txt on Pod 1 where each line contains the private ip address of the machines these pods running on. You can obtain the local IP of the node from the dashboard by clicking on the node name (node-ip-10-x-x-x).
touch /tmp/hostfile.txt
vi /tmp/hostfile.txt
# PRIVATE_IP_ADDRESS_FOR_NODE_1
# PRIVATE_IP_ADDRESS_FOR_NODE_2
# ...
On the dashboard, write down the port that's used to ssh into the pod.
Download the nccl-tests script on every pod.
wget https://pub-2f78d6ca875c410392d83a29768dd4ce.r2.dev/nccl_test_pod.bash -O ./nccl_test_pod.bash
chmod +x nccl_test_pod.bash
On Pod2 to Pod N, run the following command:
./nccl_test_pod.bash --ssh-port <ssh-port>
You are expected to see "Done" printed out he end of the output.
On Pod1, run the following command:
./nccl_test_pod.bash --launcher --host-file /tmp/hostfile.txt --num-workers <N> --ssh-port <ssh-port>
The output will be similar to the following:
#
# out-of-place in-place
# size count type redop root time algbw busbw #wrong time algbw busbw #wrong
# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s)
8 2 float sum -1 42.85 0.00 0.00 N/A 36.65 0.00 0.00 N/A
16 4 float sum -1 36.15 0.00 0.00 N/A 36.02 0.00 0.00 N/A
32 8 float sum -1 36.19 0.00 0.00 N/A 35.95 0.00 0.00 N/A
64 16 float sum -1 36.22 0.00 0.00 N/A 36.30 0.00 0.00 N/A
128 32 float sum -1 36.38 0.00 0.01 N/A 36.54 0.00 0.01 N/A
256 64 float sum -1 62.06 0.00 0.01 N/A 36.49 0.01 0.01 N/A
512 128 float sum -1 55.09 0.01 0.02 N/A 36.07 0.01 0.03 N/A
1024 256 float sum -1 35.48 0.03 0.05 N/A 36.33 0.03 0.05 N/A
2048 512 float sum -1 36.13 0.06 0.11 N/A 36.43 0.06 0.11 N/A
4096 1024 float sum -1 36.23 0.11 0.21 N/A 36.23 0.11 0.21 N/A
8192 2048 float sum -1 40.11 0.20 0.38 N/A 39.67 0.21 0.39 N/A
16384 4096 float sum -1 47.88 0.34 0.64 N/A 47.94 0.34 0.64 N/A
32768 8192 float sum -1 57.35 0.57 1.07 N/A 57.48 0.57 1.07 N/A
65536 16384 float sum -1 63.68 1.03 1.93 N/A 63.04 1.04 1.95 N/A
131072 32768 float sum -1 69.54 1.88 3.53 N/A 71.59 1.83 3.43 N/A
262144 65536 float sum -1 70.81 3.70 6.94 N/A 67.62 3.88 7.27 N/A
524288 131072 float sum -1 69.38 7.56 14.17 N/A 67.24 7.80 14.62 N/A
1048576 262144 float sum -1 82.14 12.77 23.94 N/A 80.83 12.97 24.32 N/A
2097152 524288 float sum -1 98.29 21.34 40.01 N/A 97.38 21.53 40.38 N/A
4194304 1048576 float sum -1 111.6 37.57 70.44 N/A 110.5 37.94 71.14 N/A
8388608 2097152 float sum -1 147.5 56.88 106.64 N/A 145.2 57.75 108.29 N/A
16777216 4194304 float sum -1 192.9 86.98 163.09 N/A 193.0 86.93 162.99 N/A
33554432 8388608 float sum -1 259.4 129.34 242.51 N/A 258.1 129.99 243.73 N/A
67108864 16777216 float sum -1 465.2 144.27 270.51 N/A 461.3 145.47 272.76 N/A
134217728 33554432 float sum -1 736.0 182.37 341.95 N/A 748.3 179.35 336.29 N/A
268435456 67108864 float sum -1 1282.3 209.34 392.52 N/A 1284.0 209.06 391.98 N/A
536870912 134217728 float sum -1 2338.5 229.57 430.45 N/A 2347.8 228.67 428.76 N/A
1073741824 268435456 float sum -1 4483.0 239.51 449.09 N/A 4476.1 239.89 449.79 N/A
2147483648 536870912 float sum -1 8768.3 244.91 459.21 N/A 8772.0 244.81 459.02 N/A
4294967296 1073741824 float sum -1 17427 246.45 462.10 N/A 17365 247.33 463.75 N/A
8589934592 2147483648 float sum -1 34683 247.67 464.38 N/A 34668 247.78 464.58 N/A
17179869184 4294967296 float sum -1 69265 248.03 465.06 N/A 69282 247.97 464.95 N/A
# Out of bounds values : 0 OK
# Avg bus bandwidth : 137.867
#
MPI job completed!
This result is obtained when running the nccl test on nvcr.io/nvidia/cuda-dl-base:24.10-cuda12.6-devel-ubuntu22.04
.