Fine-Tuning with Peft on Lepton Job
Lepton's Batch Job component simplifies the execution of one-time tasks, making it a powerful tool for efficient model fine-tuning. This guide walks you through integrating PEFT with Lepton to enhance productivity and streamline your workflow.
Step 1: Prepare your Peft finetune script
Here we will build our own finetune script for LLaMA3 8B model
lets save this file as peft-finetune.py
import os
from datasets import load_dataset
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
BitsAndBytesConfig,
TrainingArguments,
)
from peft import (
prepare_model_for_kbit_training,
LoraConfig,
get_peft_model,
)
from trl import SFTTrainer
def prepare_dataset():
"""
Load and process the OpenOrca dataset for supervised fine-tuning.
The dataset must be formatted with a 'text' field containing the conversation
in the correct chat template format for Llama 3.
"""
# Load dataset from Hugging Face hub
dataset = load_dataset("Open-Orca/OpenOrca")
def format_chat(example):
formatted_text = (
"<|begin_of_text|>"
"<|start_header_id|>system<|end_header_id|>\n\n"
"You are a helpful AI assistant.\n"
"<|eot_id|>"
"<|start_header_id|>user<|end_header_id|>\n\n"
f"{example['question']}\n"
"<|eot_id|>"
"<|start_header_id|>assistant<|end_header_id|>\n\n"
f"{example['response']}<|eot_id|>"
)
return {"text": formatted_text}
def filter_samples(example):
formatted_text = format_chat(example)["text"]
tokens = tokenizer(formatted_text, return_length=True)["length"][0]
return tokens <= 4096
# Use a subset for demonstration/testing
dataset = dataset["train"].select(range(10000))
# Apply filtering and formatting
filtered_dataset = dataset.filter(filter_samples)
processed_dataset = filtered_dataset.map(
format_chat,
remove_columns=dataset.column_names
)
return processed_dataset
# Initialize tokenizer with proper settings for Llama 3
model_id = "meta-llama/Meta-Llama-3-8B"
tokenizer = AutoTokenizer.from_pretrained(
model_id,
token=os.environ.get("HUGGING_FACE_HUB_TOKEN"),
trust_remote_code=True
)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
# Configure 4-bit quantization for memory efficiency
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype="bfloat16",
bnb_4bit_use_double_quant=True,
)
# Load the base model with quantization
model = AutoModelForCausalLM.from_pretrained(
model_id,
token=os.environ.get("HUGGING_FACE_HUB_TOKEN"),
quantization_config=bnb_config,
device_map="auto",
trust_remote_code=True,
use_cache=False
)
# Prepare the dataset
train_dataset = prepare_dataset()
# Print a sample to verify format
print("\nSample formatted text:")
print(train_dataset[0]["text"])
# Configure LoRA parameters
lora_config = LoraConfig(
r=32, # Rank dimension
lora_alpha=64, # Alpha parameter for LoRA scaling
lora_dropout=0.1,
bias="none",
task_type="CAUSAL_LM",
target_modules=[
"q_proj",
"k_proj",
"v_proj",
"o_proj",
"gate_proj",
"up_proj",
"down_proj",
],
)
# Prepare model for training
model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, lora_config)
# Define training configuration
training_args = TrainingArguments(
output_dir="./llama3-8b-openorca-lora",
num_train_epochs=3,
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
gradient_checkpointing=True,
optim="paged_adamw_32bit",
logging_steps=10,
learning_rate=1e-4,
weight_decay=0.05,
fp16=False,
bf16=True,
max_grad_norm=0.3,
warmup_ratio=0.03,
group_by_length=True,
lr_scheduler_type="cosine",
save_strategy="steps",
save_steps=100,
)
trainer = SFTTrainer(
model=model,
train_dataset=train_dataset,
args=training_args,
)
# Start training
trainer.train()
# Save the final model
trainer.model.save_pretrained("./llama3-8b-openorca-lora-final")
Important Note:
During this example fine-tuning process, the dataset is created dynamically. However, if you prefer, you can upload your data to the workspace storage and place it at the same directory level as this script. You can then load the dataset from there in your script.
As indicated in the last two lines of code, the model will be saved in the script’s execution path by default. Alternatively, you can specify an path for saving the model. Make sure the specified path is within the mount directory, as any storage locations outside the mount path will be erased upon job completion.
Step 2: Setting up your Hugging Face secret
In this script, we'll retrieve our HUGGING_FACE_HUB_TOKEN
from the environment variables. Lepton provides a secure method for storing and managing your team's secrets. To set up your token, open your workspace and navigate to Settings
, then select Secrets
to access Secrets Management Page. Click New Secret
, choose Hugging Face
from the options, and enter your token in the provided field. This ensures your credentials are stored securely and can be accessed safely by your applications.
Tip: If you prefer using the command line, there's a quick way to add your token: lep secret create --name HUGGING_FACE_HUB_TOKEN --value <Your_Hugging_Face_Token>
Step 3: Setting Up the Input and Output Directory for Fine-Tuning Task
First, create a new folder in your Lepton workspace file system. This folder will serve as the input and output directory where all data, scripts, fine-tuning results are stored.
-
Access your workspace, then select "Utilities" from the navigation bar and click "Storage" in the dropdown. Alternatively, you can access your filesystem directly by clicking here.
-
In the action bar, click "New Folder" and name it according to your task. For instance, I will name mine
peft-finetune
.
Tip You could also use lep login
and lep storage mkdir peft-finetune
to create this folder in your storage system.
- In this
peft-finetune
folder you could clickupload file
to upload yourpeft-finetune.py
. Hint: You can also create a dedicated data folder within the finetune directory to upload your dataset files for easy organization and data import.
Tip You could also use lep storage upolad /path/to/your/local/peft-finetune.py /finetune/peft-finetune.py
to upload this script to your storage system.
Step 4: Configuring Your Job
- Navigate to the Create Batch Job page.
- Assign a descriptive name to your job, such as
peft-finetune
, and select an appropriate resource type based on your task requirements. For LLaMA-3 8B fine-tuning, resource options may include A100, or H100, depending on the model, dataset and workload. For this example, we’ll use H100. - Navigate to Advanced Settings and locate the File System Mount section. Click the Add File System Mount button. For Mount from, select the folder name you created earlier (e.g.,
peft-finetune
in this example). For Mount as, enter/workspace/peft-finetune
. - Under "Advanced configuration", add your
HUGGING_FACE_HUB_TOKEN
by clickingEnvironment variable
, selectingSecret
, and choosing your previously createdHUGGING_FACE_HUB_TOKEN
from the dropdown menu.

This configuration ensures that your peft-finetune
folder is accessible as the working directory during command execution, enabling seamless navigation and access to its contents.
Step 5: Specifying the Job Command
In the Run Command section, specify the command to execute the job. For this example, Since the job relies on a script, start by ensuring that all necessary dependencies for the script are installed beforehand. After that navigating to the script’s directory with cd peft-finetune
. Then, execute the script using python peft-finetune.py
, This ensures that the fine-tuning results are saved within the peft-finetune
folder.

Example Run Command for This Job:
pip install transformers datasets peft trl bitsandbytes --upgrade
cd peft-finetune
python peft-finetune.py
After entering the command, click on Create to launch the fine-tuning job.
Step 6: Monitoring Your Fine-Tuning Job
The fine-tuning job might take a couple of hours to complete, depending on the complexity of the task and the resources allocated. You can monitor the job status and logs to track its progress.
- Check Logs: When the job status shows Running, you can monitor its progress by clicking Logs, and view the logs of each replica to track the fine-tuning process.