Fine-Tuning with Unsloth on Lepton Job
The Job component in Lepton streamlines one-time tasks, making it ideal for efficient model fine-tuning workflows. This guide demonstrates its integration with unsloth to boost team productivity.
As an example, we’ll use this unsloth fine-tuning notebook to demonstrate how seamlessly the Job component fine-tunes the llama3-8b model.
Step 1: Write your unsloth finetune script
Here we will build our finetune script based on this unsloth fine-tuning notebook
lets save this file as unsloth-finetune.py
from unsloth import FastLanguageModel
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported
from datasets import load_dataset
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.
# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
"unsloth/Meta-Llama-3.1-8B-bnb-4bit", # Llama-3.1 15 trillion tokens model 2x faster!
"unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
"unsloth/Meta-Llama-3.1-70B-bnb-4bit",
"unsloth/Meta-Llama-3.1-405B-bnb-4bit", # We also uploaded 4bit for 405b!
"unsloth/Mistral-Nemo-Base-2407-bnb-4bit", # New Mistral 12b 2x faster!
"unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit",
"unsloth/mistral-7b-v0.3-bnb-4bit", # Mistral v3 2x faster!
"unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
"unsloth/Phi-3.5-mini-instruct", # Phi-3.5 2x faster!
"unsloth/Phi-3-medium-4k-instruct",
"unsloth/gemma-2-9b-bnb-4bit",
"unsloth/gemma-2-27b-bnb-4bit", # Gemma 2x faster!
] # More models at https://huggingface.co/unsloth
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/Meta-Llama-3.1-8B",
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
# token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)
model = FastLanguageModel.get_peft_model(
model,
r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 16,
lora_dropout = 0, # Supports any, but = 0 is optimized
bias = "none", # Supports any, but = "none" is optimized
# [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
random_state = 3407,
use_rslora = False, # We support rank stabilized LoRA
loftq_config = None, # And LoftQ
)
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{}
### Input:
{}
### Response:
{}"""
EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
instructions = examples["instruction"]
inputs = examples["input"]
outputs = examples["output"]
texts = []
for instruction, input, output in zip(instructions, inputs, outputs):
# Must add EOS_TOKEN, otherwise your generation will go on forever!
text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
texts.append(text)
return { "text" : texts, }
pass
dataset = load_dataset("yahma/alpaca-cleaned", split = "train")
dataset = dataset.map(formatting_prompts_func, batched = True,)
trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = dataset,
dataset_text_field = "text",
max_seq_length = max_seq_length,
dataset_num_proc = 2,
packing = False, # Can make training 5x faster for short sequences.
args = TrainingArguments(
per_device_train_batch_size = 2,
gradient_accumulation_steps = 4,
warmup_steps = 5,
# num_train_epochs = 1, # Set this for 1 full training run.
max_steps = 60,
learning_rate = 2e-4,
fp16 = not is_bfloat16_supported(),
bf16 = is_bfloat16_supported(),
logging_steps = 1,
optim = "adamw_8bit",
weight_decay = 0.01,
lr_scheduler_type = "linear",
seed = 3407,
output_dir = "outputs",
report_to = "none", # Use this for WandB etc
),
)
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")
trainer_stats = trainer.train()
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory /max_memory*100, 3)
lora_percentage = round(used_memory_for_lora/max_memory*100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.")
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")
model.save_pretrained("lora_model") # Local saving
tokenizer.save_pretrained("lora_model")
Important Note:
During this example fine-tuning process, the dataset is created dynamically. However, if you prefer, you can upload your data to the workspace storage and place it at the same directory level as this script. You can then load the dataset from there in your script.
As indicated in the last two lines of code, the model will be saved in the script’s execution path by default. Alternatively, you can specify an path for saving the model. Make sure the specified path is within the mount directory, as any storage locations outside the mount path will be erased upon job completion.
Step 2: Setting Up the Input and Output Directory for Fine-Tuning Task
First, create a new folder in your Lepton workspace file system. This folder will serve as the input and output directory where all data, scripts, fine-tuning results are stored.
-
Access your workspace, then select "Utilities" from the navigation bar and click "Storage" in the dropdown. Alternatively, you can access your filesystem directly by clicking here.
-
In the action bar, click "New Folder" and name it according to your task. For instance, I will name mine
finetune
.
Tip You could also use lep login
and lep storage mkdir finetune
to create this folder in your storage system.
- In this
finetune
folder you could clickupload file
to upload yourunsloth-finetune.py
. hint You can also create a dedicated data folder within the finetune directory to upload your dataset files for easy organization and data import.
Tip You could also use lep storage upolad /path/to/your/local/unsloth-finetune.py /finetune/unsloth-finetune.py
to upload this script to your storage system.
Step 3: Configuring Your Job
- Navigate to the Create Batch Job page.
- Assign a descriptive name to your job, such as
unsloth-finetune
, and select an appropriate resource type based on your task requirements. For LLaMA-3 8B fine-tuning, resource options may include A100, or H100, depending on the model, dataset and workload. For this example, we’ll use H100. - Navigate to Advanced Settings and locate the File System Mount section. Click the Add File System Mount button. For Mount from, select the folder name you created earlier (e.g.,
finetune
in this example). For Mount as, enter/workspace/finetune
.
This configuration ensures that your finetune
folder is accessible as the working directory during command execution, enabling seamless navigation and access to its contents.
Step 4: Specifying the Job Command
In the Run Command section, specify the command to execute the job. For this example, Since the job relies on a script, start by ensuring that all necessary dependencies for the script are installed beforehand. After that navigating to the script’s directory with cd finetune
. Then, execute the script using python finetune.py
, This ensures that the fine-tuning results are saved within the finetune
folder.
Example Run Command for This Job:
pip install unsloth
pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git
pip install trl
pip install transformers
pip install unsloth
pip install datasets
pip install torchvision==0.20.1+cu124 torchaudio==2.5.1+cu124 --index-url https://download.pytorch.org/whl/cu124
cd /workspace/finetune
python finetune.py
Please note that here we used the following command to install specific versions of torchvision
and torchaudio
:
pip install torchvision==0.20.1+cu124 torchaudio==2.5.1+cu124 --index-url https://download.pytorch.org/whl/cu124
This is necessary because the default image includes torch==2.2.0
, while unsloth
requires torch>=2.4.0
. Consequently, torch==2.5.1
was installed, requiring compatible versions of torchvision
and torchaudio
.
In the future, you may need to update this command to align with the latest torch
version.
After entering the command, click on Create to launch the fine-tuning job.
Step 5: Monitoring Your Fine-Tuning Job
The fine-tuning job might take a couple of hours to complete, depending on the complexity of the task and the resources allocated. You can monitor the job status and logs to track its progress.
-
Check Logs: When the job status shows Running, you can monitor its progress by clicking Logs, and view the logs of each replica to track the fine-tuning process.