Posts

LlamaFactory x MindSpore HyperParallel Community Collaboration Roadmap

MindSpore Community · HyperParallel SuperNode Parallel Library Version: v1.0 | Updated: 2026-03-30 Vision HyperParallel is a new supernode parallel training architecture proposed by the MindSpore Community, dedicated to simplifying Ascend supernode programming and unlocking computing potential. We aim to collaborate with the LlamaFactory ecosystem to provide an easy-to-use, high-performance distributed training solution. Our goal is to enable every developer to efficiently train large models on Ascend NPU and NVIDIA GPU, lowering the barrier and cost of large model training. ...

MindSpore HyperParallel FSDP2 Training on Ascend with LlamaFactory

LlamaFactory + MindSpore HyperParallel We integrated HyperParallel from the MindSpore community as an FSDP2 backend into LlamaFactory, supporting Ascend NPU and NVIDIA GPU. Just one extra config line on the FSDP2 workflow to get started. Quick Start 1. Environment Installation pip 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 # Install HyperParallel git clone https://gitcode.com/mindspore/hyper-parallel cd hyper-parallel pip install -e . # Install LlamaFactory git clone https://github.com/hiyouga/LlamaFactory.git cd LlamaFactory pip install -e ".[torch,metrics]" --no-build-isolation # Install PyTorch pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 # Optional: install torch-npu for Ascend NPU pip install torch-npu==2.7.1 2. Configuration HyperParallel training requires two config files: an Accelerate FSDP2 config and a LlamaFactory training config. ...

Fine-Tuning the Latest Qwen3.5 Model to Identify Humanoid Robot Models Using LlamaFactory

At the beginning of 2026, from the Consumer Electronics Show (CES) in Las Vegas, USA, to the China Central Television (CCTV) Spring Festival Gala, China’s self-developed humanoid robots have frequently “broken through the circle.” Products and applications from multiple Chinese enterprises have not only sparked discussions within the overseas industry but have also continuously “swept” global social media platforms and international media. Embodied intelligence, regarded as the next stage of artificial intelligence development, has its core in achieving a deep coupling between the intelligent “brain” and the physical “body,” thereby directly transforming data, algorithms, and computing power into the ability to act on and transform the objective world. Humanoid robots, due to their human-like appearance and functionality, are considered a high-level form and the optimal carrier for embodied intelligence, poised to become the next-generation super terminal following smartphones and new energy vehicles. ...

Issues Related to the Qwen3-VL Model

This blog post focuses on several practical issues related to the Qwen3-VL model, along with an analysis of their root causes and corresponding solutions. 1. Slow Training and Inference Speed of Qwen3-VL Problem: Some posts and GitHub issues report that when using torch=2.9 together with Conv3D, the training and inference speed of Qwen3-VL degrades significantly compared to torch=2.8. See the related discussion at: https://github.com/pytorch/pytorch/issues/166122 1.1 Comparing CUDA Kernel Invocations We first compared the CUDA kernel calls of Conv3D under torch=2.8 and torch=2.9. The test code is shown below: ...

RL-DPO Training with KTransformers and LLaMA-Factory

This tutorial demonstrates how to fine-tune a language model using the LLaMA-Factory framework with Direct Preference Optimization (DPO). DPO is a training method based on human preferences, enabling model outputs to better align with human expectations and be more user-centric. 1 Environment Setup Software & hardware requirements: CPU must support AMX, the system glibc version must be ≥ 2.32, and a GPU with at least 32 GB of VRAM is recommended. ...