LlamaFactory Blog

Issues Related to the Qwen3-VL Model

This blog post focuses on several practical issues related to the Qwen3-VL model, along with an analysis of their root causes and corresponding solutions. 1. Slow Training and Inference Speed of Qwen3-VL Problem: Some posts and GitHub issues report that when using torch=2.9 together with Conv3D, the training and inference speed of Qwen3-VL degrades significantly compared to torch=2.8. See the related discussion at: https://github.com/pytorch/pytorch/issues/166122 1.1 Comparing CUDA Kernel Invocations We first compared the CUDA kernel calls of Conv3D under torch=2.8 and torch=2.9. The test code is shown below: ...

January 5, 2026 · 3 min · 490 words · hiyouga

RL-DPO Training with KTransformers and LLaMA-Factory

This tutorial demonstrates how to fine-tune a language model using the LLaMA-Factory framework with Direct Preference Optimization (DPO). DPO is a training method based on human preferences, enabling model outputs to better align with human expectations and be more user-centric. 1 Environment Setup Software & hardware requirements: CPU must support AMX, the system glibc version must be ≥ 2.32, and a GPU with at least 32 GB of VRAM is recommended. ...

December 23, 2025 · 4 min · 811 words · hiyouga

Add New Special Tokens for Model Training

1 Introduction This paper uses the Ministral-3-3B-Instruct-2512 model and takes an image classification task fine-tuned via SFT as an example to illustrate how to add new special tokens. The experimental command is as follows: 1 2 3 4 # install newest transformers pip install git+https://github.com/huggingface/transformers DISABLE_VERSION_CHECK=1 CUDA_VISIBLE_DEVICES=7 python src/train.py examples/train_lora/ministral3_lora_sft.yaml It is necessary to preconfigure ministral3_lora_sft.yaml. 2 Dataset Loading and Preprocessing In the file LLaMA-Factory/src/llamafactory/data/loader.py, the get_dataset function is responsible for loading the dataset and preprocessing the data using the tokenizer. ...

December 17, 2025 · 9 min · 1732 words · hiyouga

Adapt a new model on LLaMA-Factory

1 Overview of Model Adaptation LLaMA-Factory offers a complete framework for model pre-training, fine-tuning, and inference. If it is necessary to adapt a new model, only a small amount of code needs to be modified to integrate the model into LLaMA-Factory. First, the file LLaMA-Factory/src/llamafactory/extras/constants.py defines the supported model groups and their corresponding templates. A template is a “format specifier” used when constructing the input prompt for the large model. It defines the dialogue format, field structure, role order, and the format for tool calls. For example: ...

December 12, 2025 · 8 min · 1664 words · hiyouga

Code Guide for LLaMA Factory Project

1 Introduction to the LLaMA-Factory Project LLaMA-Factory is an efficient training and fine-tuning framework designed for large language models (LLMs). It aims to simplify the training workflow of the LLaMA family as well as various open-source large models. With the core philosophy of being “out-of-the-box, flexible, and efficient,” it provides an end-to-end solution covering data preparation, parameter-efficient fine-tuning (PEFT), training configuration management, and model deployment. LLaMA-Factory supports multiple mainstream model architectures—such as LLaMA, Qwen, Gemma, and Mistral—and integrates lightweight training techniques including LoRA, QLoRA, AdaLoRA, and Prompt Tuning. These capabilities enable developers to fine-tune high-quality models at extremely low cost, whether in single-GPU or multi-GPU environments. ...

December 5, 2025 · 11 min · 2277 words · hiyouga