A poetic summary:
The old RAG fades, its vector maps grow worn,
Deep Search ascends, a smarter method born.
AI crafts the code, MCP sets comms free,
Old frameworks creek; what will the new ways be?
Yet open source, a spark in shifting ground,
Promises paths where better tools are found.
In 2023, RAG was all the rage. A crop of LLM AI application frameworks and libraries emerged—Dify, LangChain, RAGFlow, FastGPT, and others—all touting the integration of RAG (Retrieval-Augmented Generation) and AI workflows. They seemed to promise a shortcut for developers to quickly build intelligent applications capable of digesting private knowledge and automating tasks, finding favor especially among B2B users. However, technology in the AI sector evolves at breakneck speed. By 2025, the surge of Deep (Re)Search and the growing traction of MCP (Model Context Protocol) are posing a significant threat, potentially undermining the very foundations of this general-purpose AI application development paradigm. The two pillars they rely on—so-called RAG and graphical workflows—seem to have encountered serious challenges.
Deep (Re)Search Enters, Casting Doubt on Vector RAG
Traditional RAG, especially the implementation heavily reliant on vector databases within frameworks like LangChain or Dify, typically operates as follows: a user inputs documents, the system chunks them, uses an embedding model to convert them into vectors, and during retrieval, fetches relevant snippets based on semantic similarity to feed the large model for answer generation. This process was once lauded as the go-to method for handling private knowledge and “augmenting” models.
But the emergence of Deep (Re)Search has been a game-changer, directly challenging this RAG paradigm centered on vector similarity. Whether it’s Google Gemini’s Deep Research, XAI Grok’s DeepSearch, OpenAI’s deep research, or even Perplexity AI—let’s collectively call them Deep (Re)Search—they achieve remarkable results often without relying on pre-vectorized document libraries, primarily using keyword-based web searches. The results are impressively good. The implication is stark and clear: Vector database search, often hyped for RAG, might not only be less necessary than assumed but potentially dispensable.
The technical details of Deep (Re)Search are still closely guarded secrets by the major AI players; much of our understanding comes from observing user experiences and inferring the underlying principles. It feels more like conducting research: first understanding the query, then devising precise keywords, browsing results, refining keywords if unsatisfied, and finally synthesizing the findings into a structured report. Throughout this process, vector databases find little purchase. Instead, the LLM’s own “research skills” take center stage.
This raises a critical question: To improve RAG effectiveness, should efforts focus on optimizing vector retrieval techniques (like adding reranking), or on honing the model’s intrinsic ability to conduct “research”?
The success of Deep (Re)Search strongly suggests the latter is key. Speculatively, its core technologies likely involve two aspects:
- Solidifying “reasoning paths” akin to Chain-of-Thought (CoT): As hinted by papers like Deepseek-R1’s R1-Zero training process, a model explores countless potential reasoning paths to generate a good response. For deep research, two paths are crucial: understanding user intent to formulate appropriate search terms, and digesting search results to refine those terms. This necessitates reinforcement learning (RL) during post-training, rewarding behaviors along these paths to make the model adept at the search-feedback-refine loop.
- Enforcing structured output formats: Observing Deep (Re)Search interfaces suggests the model’s output follows a relatively fixed format. If the model output freely, inconsistent formatting would require costly post-processing or even retries, wasting compute and context window capacity. Instead of lengthy prompt instructions with JSON schemas and examples crowding the context window and VRAM, it’s more efficient to “bake” these structural requirements into the model during late-stage fine-tuning. Just as
-thinking
or-reasoner
models might have<think></think>
tags hard-coded, fine-tuning directly on the desired output format makes the model inherently compliant. Furthermore, even if the format itself evolves, it likely changes slower than the model, diminishing the value of prompt-based format specification flexibility in this specific scenario.
Both paths point towards fine-tuning the LLM as a necessary step. Consequently, approaches fixated on the “embedding-vector-database-similarity-retrieval-reranking” pipeline appear to be on a narrowing path. When Deep (Re)Search leverages its fine-tuned capabilities to synthesize information like a researcher, optimizing retrieval accuracy in the traditional way yields marginal gains and looks significantly less effective, almost like comparing Llama 2 to GPT-4.
Not long ago, the choice between fine-tuning an LLM with curated domain-specific data versus building a RAG vector knowledge base was debatable, especially around the time of GPT-4’s release. By the era of Llama3-70B, fine-tuning had become less popular; experience showed it was often effort-intensive with modest returns, while RAG, though less laborious, didn’t always deliver a stellar user experience. Times change. With Deep (Re)Search now ascendant, the situation seems reversed. Vendors solely focused on vector search technology, lacking the capability to fine-tune LLMs for Deep (Re)Search, might find themselves sidelined, waiting for an open-source Deep (Re)Search-optimized LLM. In retrospect, Cohere’s early commitment to developing its own LLMs alongside RAG looks like a strategically sound move.
Some AI enthusiasts might mistakenly think Deep (Re)Search is only for the public internet, while traditional RAG uses local knowledge bases, implying different use cases. This is a misconception. While public-facing Deep (Re)Search products connect to the internet, the underlying technology is not inherently limited to it. Whether the backend connects to a local document store or Google Search is decoupled from the LLM’s core capability. If a large enterprise asked OpenAI to adapt Deep (Re)Search for internal use, it would primarily be a matter of changing the data source.
Therefore, the literal meaning of RAG—Retrieval-Augmented Generation—isn’t obsolete. What is becoming outdated is the specific traditional implementation heavily tied to embedding models and vector databases by various experts and vendors. Deep (Re)Search demonstrates that optimizing the model’s own reasoning chain to iteratively retrieve information using refined keywords is far more effective than tinkering with vector databases. Simply put, Deep (Re)Search represents what RAG should look like in 2025. The future is already here—it’s just not evenly distributed.
A Better Playbook for Workflow Automation: MCP + AI Scripts
The other pillar of these AI application frameworks is workflow automation. Some tools (like Dify, RAGFlow, LangFlow) favor drag-and-drop visual interfaces to chain tasks together; others (like LangChain) lean towards defining chains or agent structures in code. Their common goal is to help users orchestrate AI capabilities for complex automation tasks. Building AI workflows boils down to two things: sufficient logical flexibility and robust connectivity to the ecosystem, including local hardware/software and remote services.
Is drag-and-drop flexible enough? No matter how sophisticated, visual workflows are ultimately constrained by the platform’s node library. If a needed node is missing, you have to roll up your sleeves and write custom code. If that’s the case… why not just write code from the start? Today, generating code with natural language is a core strength of LLMs. With such powerful tools available, scripting becomes incredibly potent, potentially making drag-and-drop feel cumbersome by comparison. Automation needs across industries are diverse and highly specific; only code scripting likely offers the “do-anything” flexibility required for these bespoke tasks. While PocketFlow ’s gimmick of fitting into 100 lines of code might be amusing, its underlying idea for AI workflows is worth noting: a core library small enough to be copied into any AI chatbot’s context window, easily learnable even by older models like the original GPT-4 with limited 4k-ish context.
More critically, how do AI applications connect to the external world? Platforms like Dify try to build their own ecosystems with plugin marketplaces; LangChain has invested heavily in creating a vast library of Integrations. Both aim to simplify how AI applications interact with external systems. However, the MCP (Model Context Protocol), an open standard championed by Anthropic, is spreading rapidly, showing signs of becoming a unifying force. MCP’s core value lies in decoupling the act of connecting to external services. Using this standard interface, AI application clients can easily interact with various local or remote resources (files, databases, APIs, hardware, etc.).
What does this mean? It means that once a rich ecosystem of open, open-source, decentralized MCP repositories flourishes, developers or users writing automation scripts can focus on core business logic, offloading the messy details of external interactions to MCP-compliant components or services. The “connectivity” previously reliant on platform-specific plugin markets (like Dify’s) or a framework’s monolithic integration library (like LangChain’s) can now be achieved through an open standard plus diverse community implementations. This directly challenges the ecosystem approach adopted by platforms like Dify, which rely on the perceived value and stickiness of their curated plugin stores, and the ‘centralized’ integration management approach of frameworks like LangChain. How can a fledgling centralized platform or a framework requiring continuous effort to maintain a huge integration library compete effectively against a widely adopted, community-driven open standard?
So, what’s the best approach for workflows today? If you strongly prefer visual building, tools like n8n are reportedly decent options. Personally, however, I increasingly believe that having AI generate disposable scripts is a better path for many scenarios. “Disposable script” here doesn’t mean rigorous software engineering. Concerns like compatibility, robustness, maintainability, readability, and high performance can often take a backseat—getting the right result is the primary goal. If a script errors or produces incorrect output, simply have the AI generate another version. This approach has minimal dependencies and is resource-light. Worried about managing numerous scripts? That mental overhead can largely be offloaded to AI assistants. Of course, requirements vary; for extremely complex, mission-critical workflows requiring long-term stability and maintenance, traditional visual solutions might still have their place.
Even if Dify or LangChain were to natively support MCP, it wouldn’t automatically make them compelling choices. They lack the authority of official first-party tools (like a potential Claude Desktop app) and the lightweight nature of specialized tools focused solely on MCP integration. Simply adopting an open standard doesn’t grant a unique selling proposition or a competitive moat unless you build something unique on top of it that others don’t offer.
Conclusion: Where Do AI Application Frameworks Go From Here?
On one side, the powerful rise of Deep (Re)Search challenges the core value proposition of traditional RAG in information processing. On the other, the proliferation of MCP combined with the increasing proficiency of AI in script generation erodes the appeal of existing workflow solutions. Traditional LLM AI application frameworks, attempting to stand on these two legs, find both are faltering. The prospects for “all-in-one” AI application frameworks aiming to bundle RAG and workflows look increasingly uncertain, perhaps even dim.
Yesterday’s stars, tomorrow’s fading memories. Their heyday seems to be passing. The question is, can these veterans still compete?
This article was initially drafted and shared in a limited range on 2025-03-21. The current version was polished by Gemini 2.5 Pro, manually reviewed, and publicly posted on the blog on 2025-04-15. However, the content still reflects the situation as of 2025-03-21.
Update (2025-04-14):
The release of THUDM/GLM-Z1-Rumination-32B-0414
on 2025-04-14 might be the first open-weight model specifically optimized for Deep (Re)Search capabilities. With more such models expected, RAG frameworks that previously struggled with fine-tuning complexity and retrieval inaccuracy might get a new lease on life.