infiniflow/ragflow
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs
RAGFlow is an open-source RAG engine that pairs deep-document understanding with agentic retrieval. It parses PDFs, spreadsheets, and images with layout awareness, then routes queries through configurable agent pipelines to deliver grounded, citation-backed answers from your documents.
Our Review
The most common failure mode in production RAG systems is poor document chunking. Splitting a PDF at 512-character limits discards table structure, caption-figure relationships, and document hierarchy -- exactly the contextual metadata that makes retrieved passages interpretable by an LLM. RAGFlow was built to fix this at the parsing layer, before any retrieval happens, treating each document type as a structured object: tables become relational data, figures are paired with their captions, and headings preserve the document's information hierarchy.
Key capabilities
- •Layout-aware document parsing: tables, headers, captions, and figures are extracted as structured elements, not raw text blobs
- •Agentic retrieval: queries route through configurable agent pipelines that can rewrite, rerank, or expand searches before answering
- •Multiple retrieval strategies: sparse, dense, and hybrid retrieval modes let you tune precision vs. recall per knowledge base
- •Multi-document knowledge bases: connect PDFs, web pages, Notion exports, S3 buckets, or databases as named knowledge bases
- •Citation grounding: every answer includes source citations with page references so users can verify claims
- •REST API and web UI: a built-in admin interface manages knowledge bases; the REST API integrates with any LLM application
Getting started
Deploy RAGFlow via Docker Compose: clone infiniflow/ragflow, copy the .env template, and run docker compose up. The web UI launches on port 80. Add knowledge base documents via drag-and-drop, wait for the ingestion pipeline, then query via the chat interface or REST API.
Limitation
RAGFlow's deep parsing pipeline is computationally heavy -- processing large PDFs (500+ pages) or image-heavy documents can take several minutes per file. Self-hosting requires at minimum 16GB RAM for the full stack. Teams without Docker expertise may find initial setup non-trivial, and cloud-hosted options are limited compared to managed RAG services.
Our Verdict
RAGFlow addresses the most common failure mode in RAG systems -- bad chunking. Most RAG pipelines split documents on arbitrary character limits, losing tables, captions, and document hierarchy in the process. RAGFlow's layout-aware parsing preserves structure, which means retrieved context is more complete and answers are more accurate. The 80,000+ GitHub stars in 2026 validate that developers are finding real quality improvements over simpler pipelines.
For teams building knowledge bases from mixed-format documents -- PDFs, spreadsheets, scanned images -- RAGFlow is one of the strongest open-source options available. The agentic retrieval layer (query rewriting, reranking, multi-hop search) pushes accuracy further than single-pass dense retrieval.
The trade-offs are real: this is not a lightweight tool. Infiniflow targets teams building production AI applications that need accurate retrieval, not hobbyists experimenting with a single PDF. If your use case is simple document Q&A on a handful of files, a lighter tool like LangChain's document loaders or Chroma may serve better. RAGFlow earns its complexity when document fidelity at scale matters.
Frequently Asked Questions
What makes RAGFlow different from other RAG frameworks?
RAGFlow uses layout-aware document parsing instead of character-limit chunking. When it processes a PDF, it extracts tables as structured data, preserves caption-figure relationships, and maintains document hierarchy. This produces higher-quality retrieved context compared to chunking approaches that lose document structure, resulting in more accurate LLM answers in 2026.
What file types does RAGFlow support?
RAGFlow supports PDFs, Word documents, PowerPoint files, Excel spreadsheets, plain text, Markdown, HTML, images (JPG, PNG), and can connect to web URLs and S3 buckets. Each file type uses a specialized parser that preserves format-specific structure. Tables from Excel and PDFs are extracted as structured data, not flattened text.
How much hardware does RAGFlow require to self-host?
The full RAGFlow stack requires at minimum 16GB RAM and 50GB disk space. A CPU-only setup works but ingestion of image-heavy PDFs is slow. A GPU with 8GB VRAM (NVIDIA T4 or better) speeds up document embedding and OCR significantly. The Docker Compose deployment bundles all services including Elasticsearch, MinIO, and the RAGFlow server.
Can RAGFlow work with any LLM?
Yes. RAGFlow connects to OpenAI, Anthropic Claude, Gemini, Azure OpenAI, and any OpenAI-compatible endpoint including self-hosted models via Ollama or LM Studio. The LLM is used only for answer generation -- document parsing, embedding, and retrieval run on RAGFlow's own infrastructure without sending document content to external APIs.
Is RAGFlow production-ready in 2026?
RAGFlow is used in production by teams building enterprise knowledge bases, internal search tools, and customer support systems. The 80,000 GitHub stars and active release cadence indicate strong adoption. For production use, infiniflow recommends dedicated hardware with SSD storage and at least 32GB RAM. A managed cloud version is on their roadmap but not yet publicly available.
What is ragflow?
RAGFlow is an open-source RAG engine that pairs deep-document understanding with agentic retrieval. It parses PDFs, spreadsheets, and images with layout awareness, then routes queries through configurable agent pipelines to deliver grounded, citation-backed answers from your documents.
How do I install ragflow?
Visit the GitHub repository at https://github.com/infiniflow/ragflow for installation instructions.
What license does ragflow use?
ragflow uses the Apache-2.0 license.
What are alternatives to ragflow?
Explore related tools and alternatives on My AI Guide.
Open source & community-verified
Apache-2.0 licensed: free to use in any project, no strings attached. 80,908 developers have starred this, meaning the community has reviewed and trusted it.
Reviewed by My AI Guide for relevance, quality, and active maintenance before listing.
Topics