Skip to content

hiyouga/LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

LLaMA-Factory is an open-source Python framework for fine-tuning over 100 large language models and vision-language models. It supports LoRA, QLoRA, RLHF, and instruction tuning from a single interface, reducing model-specific training setup to a config file and one command in 2026.

71,447 stars8,718 forksPythonUpdated May 2026
✅ Reviewed by My AI Guide, vetted for vibe builders

Our Review

When Yaowei Zheng published LLaMA-Factory at ACL 2024, he was solving a specific friction point that every ML team hit when fine-tuning open-source models: each base model came with its own training scripts, different LoRA handling, different tokenizer edge cases, and different hardware requirements. Maintaining a separate fine-tuning codebase per model family did not scale. LLaMA-Factory collapsed that into one framework -- a single Python library that fine-tunes Llama 3, Qwen3, DeepSeek, Gemma, Mistral, and 95+ other models with the same configuration format and the same training loop.

Key capabilities

  • 100+ model support: fine-tunes Llama 3, Qwen3, DeepSeek-R1, Gemma, Mistral, Phi, and more from a unified YAML config format
  • Multiple training methods: full fine-tuning, LoRA, QLoRA, DPO, PPO/RLHF, reward modeling, and instruction tuning from the same codebase
  • WebUI: a Gradio-based web interface covers dataset management, training launch, evaluation, and model export without writing any Python
  • Multi-GPU and distributed training: DeepSpeed and FSDP integration for distributed training across multiple GPUs or nodes
  • Dataset flexibility: supports Alpaca-format, ShareGPT-format, and custom datasets; compatible with the Hugging Face dataset ecosystem
  • Inference export: fine-tuned models export to formats compatible with vLLM, Ollama, and other inference engines without manual conversion

Getting started

pip install llamafactory. Load your dataset in Alpaca or ShareGPT format, configure your base model and LoRA rank in a YAML file, then run llamafactory-cli train. The built-in WebUI at llamafactory-cli webui handles dataset management and training launch for teams who prefer a no-code interface.

Limitation

Requires a CUDA-capable GPU for meaningful training -- QLoRA on a 7B model needs at least 8GB VRAM; full fine-tuning a 13B model needs 40GB+ even with DeepSpeed. The framework's breadth means some model-specific edge cases are handled in community PRs rather than core maintainer patches. Dataset preparation remains the main time investment -- config files are simple, but constructing good instruction-tuning datasets requires domain expertise and data quality work.

Our Verdict

LLaMA-Factory earned its ACL 2024 publication and 71,000 GitHub stars by solving a real tooling problem: model-specific fine-tuning scripts fragment a team's workflow and make switching base models unnecessarily expensive. The unified config format -- specify the model, the dataset, the training method (LoRA, DPO, RLHF), and hardware -- and get a training run that works across 100+ models is a genuine productivity improvement that ML teams building custom LLMs in 2026 notice immediately.

The WebUI is the feature that extends accessibility beyond ML engineers. A product team that wants to experiment with instruction tuning on a custom dataset but does not have a dedicated ML engineer can use the Gradio interface to manage datasets, launch training runs, and export models without writing Python. That accessibility is part of why adoption spans research labs, startups, and enterprise AI teams simultaneously.

The practical ceiling is hardware. Fine-tuning remains GPU-bound, and LLaMA-Factory does not change that physical constraint. QLoRA makes 7B models tractable on consumer hardware, but anything larger demands access to multi-GPU cloud instances. For teams without dedicated GPU infrastructure, the cloud GPU costs are the real budget line -- LLaMA-Factory handles the tooling side cleanly, but it cannot abstract away the compute requirement in 2026.

Frequently Asked Questions

What is LLaMA-Factory and which models does it support?

LLaMA-Factory is an open-source Python framework for fine-tuning large language models and vision-language models. It supports over 100 base models including Llama 3, Qwen3, DeepSeek-R1, Gemma, Mistral, Phi, Falcon, and more, all from the same configuration format. Published at ACL 2024 by Yaowei Zheng, it has become the standard unified fine-tuning toolkit for open-source LLMs in 2026.

What is the difference between LoRA, QLoRA, and full fine-tuning in LLaMA-Factory?

Full fine-tuning updates all model weights -- highest accuracy, highest GPU memory requirement (40GB+ for 13B models). LoRA (Low-Rank Adaptation) adds small trainable matrices to the model while keeping most weights frozen -- 8GB+ VRAM for 7B models with good results. QLoRA combines LoRA with 4-bit quantization -- fine-tune a 7B model in 6-8GB VRAM at a small accuracy tradeoff. LLaMA-Factory supports all three methods; QLoRA is the most popular starting point in 2026 for teams without large GPU clusters.

What hardware do I need to fine-tune a 7B or 13B model with LLaMA-Factory?

For a 7B model: QLoRA requires 8GB+ VRAM (a single consumer RTX 3090 or 4090), LoRA requires 16GB+ VRAM, and full fine-tuning requires 40GB+ VRAM. For a 13B model: QLoRA requires 12-16GB VRAM, full fine-tuning requires 80GB+. Cloud GPU options (Vast.ai, RunPod, Lambda Labs) are commonly used for 13B+ experiments. LLaMA-Factory's multi-GPU support via DeepSpeed allows distributing larger models across multiple consumer GPUs in 2026.

Can I fine-tune a model without writing Python code?

Yes. LLaMA-Factory includes a Gradio-based WebUI that covers the full workflow -- uploading datasets, configuring training parameters, launching training runs, monitoring progress, evaluating the model, and exporting the result. Run llamafactory-cli webui to start it. The WebUI is useful for experimentation and for teams without ML engineering resources who want to run instruction tuning on a custom dataset in 2026.

How does LLaMA-Factory compare to other fine-tuning frameworks?

LLaMA-Factory's main advantage is breadth -- 100+ models, every major training method (LoRA, QLoRA, DPO, RLHF), and the WebUI, all in one framework. Unsloth is faster for LoRA/QLoRA training on specific architectures (2-5x speed improvement) but supports fewer models and methods. Axolotl is more configurable for power users but has a steeper setup curve. For most teams starting with fine-tuning in 2026, LLaMA-Factory is the lowest-friction starting point that doesn't sacrifice capability.

What is LLaMA-Factory?

LLaMA-Factory is an open-source Python framework for fine-tuning over 100 large language models and vision-language models. It supports LoRA, QLoRA, RLHF, and instruction tuning from a single interface, reducing model-specific training setup to a config file and one command in 2026.

How do I install LLaMA-Factory?

Visit the GitHub repository at https://github.com/hiyouga/LLaMA-Factory for installation instructions.

What license does LLaMA-Factory use?

LLaMA-Factory uses the Apache-2.0 license.

What are alternatives to LLaMA-Factory?

Explore related tools and alternatives on My AI Guide.

🔒

Open source & community-verified

Apache-2.0 licensed: free to use in any project, no strings attached. 71,447 developers have starred this, meaning the community has reviewed and trusted it.

Reviewed by My AI Guide for relevance, quality, and active maintenance before listing.

Topics

fine-tuningllamallmpefttransformersrlhfqloraquantizationloralarge-language-modelsinstruction-tuningdeepseekgemma

Related Tools

View all