SDXL Fine-Tuning Walkthrough

The Job module in Lepton is designed for the efficient execution of long-running tasks, making it ideal for handling processes like model fine-tuning. In this guide, we'll walk you through using Lepton's Job module to streamline SDXL fine-tuning, improving team productivity and workflow. Let's dive in!

Step 1: Preparing the Output Directory for Fine-Tuning

To begin, create a folder within your Lepton workspace to store the fine-tuning results. You will later mount this folder as the output directory when configuring the fine-tuning job.

Go to your workspace, select "Utilities" from the navigation bar, and click "Storage" from the dropdown menu.

Alternatively, you can access the filesystem directly by clicking here.
In the action bar, click "New Folder", and name it to reflect your task. For this example, I’ll name it sdxl-finetune.

Step 2: Configuring Your Job

Navigate to Batch Jobs in your workspace, then click Create Job.
Name your job, such as sdxl-finetune-job, and choose a resource type suitable for your task. For SDXL fine-tuning, we’ll select gpu.h100-sxm as we’ll be using an H100 GPU.

Tip: You can specify a custom container image and authentication details under Container Image and Private Image Registry Auth if necessary. For this tutorial, the default image will suffice.

Head over to Advanced Settings. In the Filesystem Mount section, select the folder you created under Mount From, and for Mount As, input /workspace/sdxl-finetune. This ensures that the fine-tuned model will be stored in the correct directory.
If you need to use secrets or environment variables (e.g., a Hugging Face token), you can add them by clicking Add Variable/Secret.

Step 3: Specifying the Fine-Tuning Command

In the Run Command section, define the fine-tuning command, ensuring that the output_dir is set to /workspace/sdxl-finetune to match the directory you mounted earlier.

Here’s an example command for fine-tuning the SDXL model, where the output_dir is configured correctly:

git clone https://github.com/huggingface/diffusers
cd diffusers
pip install -e .

cd examples/text_to_image/
pip install -r requirements_sdxl.txt
pip install bitsandbytes

export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0"
export VAE_NAME="madebyollin/sdxl-vae-fp16-fix"
export DATASET_NAME="lambdalabs/naruto-blip-captions"

accelerate launch train_text_to_image_sdxl.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --pretrained_vae_model_name_or_path=$VAE_NAME \
  --dataset_name=$DATASET_NAME \
  --resolution=512 --center_crop --random_flip \
  --proportion_empty_prompts=0.2 \
  --train_batch_size=8 \
  --gradient_accumulation_steps=4 --gradient_checkpointing \
  --max_train_steps=10000 \
  --use_8bit_adam \
  --learning_rate=1e-06 --lr_scheduler="constant" --lr_warmup_steps=0 \
  --mixed_precision="fp16" \
  --validation_prompt="a cute Sundar Pichai creature" --validation_epochs 5 \
  --checkpointing_steps=5000 \
  --output_dir="/workspace/sdxl-finetune"

After entering the command, click Create to launch the fine-tuning job.

Step 4: Monitoring Your Fine-Tuning Job

Locating the Job Once created, navigate back to the Batch Jobs section to find your job.
Viewing Logs When the job status shows Running, you can monitor its progress by clicking Log under the Run Command section.
Handling Errors If the job encounters an issue, check the logs to pinpoint the problem. You can easily adjust and retry the task by clicking Clone to replicate the job with new settings.

Pro Tip: Use the terminal to interact with a specific job replica. Click the terminal icon next to the replica to manage or troubleshoot the deployment.

That's it! You've now set up and launched a fine-tuning job using Lepton's Job module. Enjoy the streamlined workflow and the power of efficient long-running tasks execution!