用Context Offloading解决AI Agent上下文污染，提升推理准确性

摘要：上下文的来源很多：用户的查询、系统指令、搜索结果、工具输出，还有前面步骤的总结。上下文工程的核心在于把这些碎片实时组装成一个连贯的输入，不是静态的prompt，而是根据任务动态构建的东西。

说到上下文工程，其实就是在合适的时机把AI需要的所有东西都给它——指令、示例、数据、工具、历史记录，全部打包塞到模型的输入上下文里。

这么理解会比较直观：语言模型就像CPU，上下文窗口就是工作内存。我们要做的就是往这块内存里装合适比例的代码、数据和指令，让模型能把事情做对。

上下文的来源很多：用户的查询、系统指令、搜索结果、工具输出，还有前面步骤的总结。上下文工程的核心在于把这些碎片实时组装成一个连贯的输入，不是静态的prompt，而是根据任务动态构建的东西。

最近大家对长上下文窗口特别兴奋。新的前沿模型能处理100万token，很多人觉得这就是智能自主agent的终极解决方案。因为这个想法很简单：窗口够大的话，就把所有东西都塞进去——工具、文档、日志、指令、历史，让模型自己处理。

百万token上下文确实感觉像突破。我们可以构建"一次性加载所有内容"的agent：所有工具、所有文档、完整记忆、完整指令。不过这种改进也带来了新问题，叫做"上下文失效"。

这些问题在agent身上表现得最明显，因为agent会随时间积累复杂的长上下文。它们从各个地方收集输入，按顺序调用工具，还要跨多步推理。恰恰在这种场景下，失效会累积。

本文会介绍几个缓解"上下文污染"的方法：先解释AI agent领域里上下文污染是什么；然后介绍一个在AI agent中广泛使用的解决方法；最后用langgraph做端到端实现，展示如何缓解上下文污染。

上下文污染说白了就是幻觉或错误混进上下文后被当成事实。一旦进去了，模型就会不断引用，越来越强化这个错误。

对agent来说这特别要命。要是错误事实进入了目标、摘要或记忆，agent可能就会追求不可能的目标，或者重复一些没意义的动作。问题会复利式增长，上下文一旦被污染，修复就很困难。但是好在我们有办法处理。

我们用"上下文卸载"（Context Offloading）来缓解上下文污染，这能帮agent保持在正确轨道上。

上下文卸载就是把信息存在语言模型的"活跃上下文窗口"之外。通过外部工具或记忆系统单独保存数据，模型需要时再去访问这些存储的数据。

为什么有用？随着上下文窗口变大，我们可能觉得可以把所有东西都塞进去。但研究表明这会出问题：重要信息埋得太深时，模型使用的准确性会下降——这叫"上下文腐烂"（context rot）。

把关键信息卸载出去，只在需要时检索，我们就避免了模型工作内存的"过载"。这有助于模型保持准确性，减少混乱。

说到实践中的上下文卸载，人类处理复杂任务时会做笔记，agent也开始做类似的事。

Anthropic的研究展示了一个"主agent"先思考任务，然后把计划写入记忆。这样即使上下文窗口变得很大，计划也不会丢失。

Manus有另一个例子：agent把工具输出和任务计划卸载到文件系统。随着agent推进，这些内容会被重复写入和更新。这帮助agent记住目标，而不需要把所有东西都放在活跃上下文里。

实现上下文卸载的方式有很多。比如scratchpad可以是运行时状态的一部分，也可以是写入外部文件的工具调用。

在单次任务中，scratchpad帮agent管理"思路"。在长期交互中，像reflexion和memory这样的方法就派上用场了，让agent能回想起之前会话的有用信息。ChatGPT和Cursor这样的产品用类似的记忆系统在多轮交互中提升表现。

不管哪种情况，核心思想都很简单：agent在会话中存储有用信息，后面需要时再使用。

在LangGraph中，上下文卸载利用"状态对象"在节点间传递数据。这个状态对象就像共享内存或scratchpad。执行过程中，agent可以把重要的笔记、计划或输出写入这个状态。agent的其他部分随后可以在工作流中访问并使用这些数据。

这种结构让我们能管理什么留在模型上下文里，什么要卸载出去。有助于agent保持专注和正确性。

下面我们来构建一个langgraph agent，它会有这些特性：带Scratchpad的agent（可以读写scratchpad来避免上下文污染）；上下文卸载工作流（把计划和发现存在模型上下文之外，只在需要时引入）；基于工具的研究循环（使用网页搜索和scratchpad存储）；LangGraph状态图（管理推理步骤和被卸载的上下文）；LangGraph持久化记忆（通过键值存储启用跨线程scratchpad记忆）；线程检查点（像聊天线程一样保存中间状态，以便稍后继续）。

Requirements文件

"bs4>=0.0.2",
"dotenv>=0.9.9",
"ipykernel>=6.30.0",
"langchain>=0.3.27",
"langchain-community>=0.3.27",
"langchain-google-genai>=2.1.8",
"langgraph>=0.6.3",
"langgraph-bigtool>=0.0.3",
"langgraph-supervisor>=0.0.29",
"pandas>=2.3.1",
"rich>=14.1.0",
"tiktoken>=0.9.0"

导入库

# === Standard Library ===
import getpass
import os
from typing_extensions import Literal

# === Display Utilities ===
from IPython.display import Image, display
# === Data Modeling ===
from pydantic import BaseModel, Field
# === Formating tool ===
from utils import format_messages
# === LangChain / Tools ===
from langchain_core.messages import SystemMessage, ToolMessage, HumanMessage
from langchain_core.tools import tool
from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings
from langchain_tavily import TavilySearch
# === LangGraph ===
from langgraph.graph import END, START, StateGraph, MessagesState
# === Load env variables ===
from dotenv import load_dotenv
load_dotenv

使用 Scratchpad 管理代理状态

class ScratchpadState(MessagesState):
"""Agent state with an additional scratchpad field for saving intermediate notes."""
scratchpad: str = Field(description="The scratchpad for storing notes")

Scratchpad 工具

@tool
class WriteToScratchpad(BaseModel):
"""Tool to write notes into the scratchpad memory."""
notes: str = Field(description="Notes to save to the scratchpad")

@tool
class ReadFromScratchpad(BaseModel):
"""Tool to read previously saved notes from the scratchpad."""
reasoning: str = Field(description="Why the agent wants to retrieve past notes")

搜索工具与 Gemini LLM 的配置

# Tavily for real-time web search
search_tool = TavilySearch(max_results=5, topic="general")

llm = ChatGoogleGenerativeAI(model="gemini-2.5-pro", temperature=1)

将工具与大型语言模型（LLM）集成在一起

tools = [ReadFromScratchpad, WriteToScratchpad, search_tool]
tools_by_name = {tool.name: tool for tool in tools}
llm_with_tools = llm.bind_tools(tools)

系统提示词

scratchpad_prompt = """You are a sophisticated research assistant with access to web search and a persistent scratchpad for note-taking. Your Research Workflow:
1. Check Scratchpad: Look for existing notes relevant to your task.
2. Create Research Plan: Think through the task and outline a plan.
3. Write to Scratchpad: Save the plan and findings.
4. Use Search: Look up information using search tools.
5. Update Scratchpad: Add new results to your notes.
6. Iterate: Repeat as needed.
7. Complete Task: Present final output using all gathered information.
Available Tools:
- WriteToScratchpad
- ReadFromScratchpad
- TavilySearch
"""

创建langgraph工作流程

def llm_call(state: ScratchpadState) -> dict:
return {
"messages": [
llm_with_tools.invoke(
[SystemMessage(content=scratchpad_prompt)] + state["messages"]
)
]
}

def tool_node(state: ScratchpadState) -> dict:
result =
for tool_call in state["messages"][-1].tool_calls:
tool = tools_by_name[tool_call["name"]]
observation = tool.invoke(tool_call["args"])
if tool_call["name"] == "WriteToScratchpad":
notes = observation.notes
result.append(ToolMessage(content=f"Wrote to scratchpad: {notes}", tool_call_id=tool_call["id"]))
update = {"messages": result, "scratchpad": notes}
elif tool_call["name"] == "ReadFromScratchpad":
notes = state.get("scratchpad", "")
result.append(ToolMessage(content=f"Notes from scratchpad: {notes}", tool_call_id=tool_call["id"]))
update = {"messages": result}
elif tool_call["name"] == "tavily_search":
result.append(ToolMessage(content=observation, tool_call_id=tool_call["id"]))

return update
def should_continue(state: ScratchpadState) -> Literal["tool_node", "__end__"]:
last_message = state["messages"][-1]
return "tool_node" if last_message.tool_calls else END
agent_builder = StateGraph(ScratchpadState)
agent_builder.add_node("llm_call", llm_call)
agent_builder.add_node("tool_node", tool_node)
agent_builder.add_edge(START, "llm_call")
agent_builder.add_conditional_edges("llm_call", should_continue, {"tool_node": "tool_node", END: END})
agent_builder.add_edge("tool_node", "llm_call")
agent = agent_builder.compile
display(Image(agent.get_graph(xray=True).draw_mermaid_png))

查询

query = "Compare the funding rounds and recent developments of Xaira vs Cohere."
state = agent.invoke({"messages": [HumanMessage(content=query)]})
format_messages(state['messages'])

输出结果显示了完整的研究过程。Agent首先检查scratchpad中是否有相关笔记，然后制定研究计划，接着搜索Xaira和Cohere的信息，并将发现写入scratchpad，最后综合信息提供详细比较。

我们可以进一步优化系统prompt，让agent更严格地遵守何时写入、何时直接从scratchpad读取的规则。

打印出记录的内容

from rich.console import Console
from rich.pretty import pprint
console = Console
console.print("\n[bold green]Scratchpad:[/bold green]")
from rich.markdown import Markdown
Markdown(state['scratchpad'])

输出显示scratchpad中存储的结构化信息，包括Cohere的资金状况和最新发展。

这相当于Anthropic描述的"think"步骤，也展示了Manus描述的"recitation"效应——agent重复关键信息来保持轨道。

现在我们创建一个可以跨多个会话使用的InMemoryStore。

内存存储（In-Memory Storage）与命名空间（Namespace）的设置

LangGraph提供了在多次运行中持久化和复用上下文的工具：Checkpointing（将完整图状态保存到线程，类似聊天历史）；Long-Term Memory/BaseStore（在线程间保存选定数据，如笔记、计划、用户配置文件）；InMemoryStore（内置键值存储，用于本地测试长期记忆）。

from langgraph.store.memory import InMemoryStore

# Initialize in-memory store for long-term memory
store = InMemoryStore
# Define a namespace to organize context
namespace = ("rlm", "scratchpad")
# Add persistent context to the store
store.put(
namespace,
"scratchpad",
{
"scratchpad": "Research project on renewable energy adoption in developing countries. Key areas to track: policy frameworks, technology barriers, financing mechanisms, and success stories from pilot programs."
}
)

查看存储的内存内容

from rich.console import Console
from pprint import pprint

# Retrieve the stored scratchpad data
scratchpad = store.get(namespace, "scratchpad")
# Display the stored data
console = Console
console.print("\n[bold green]Retrieved Context from Memory:[/bold green]")
pprint(scratchpad)

存储功能的持久化工具节点

现在把这些集成到LangGraph工作流中。我们用两个参数编译工作流：checkpointer（每步将图状态保存到线程）和store（跨线程持久化上下文）。

from langgraph.store.base import BaseStore
from langgraph.checkpoint.memory import InMemorySaver

def tool_node_persistent(state: ScratchpadState, store: BaseStore) -> dict:
result =
for tool_call in state["messages"][-1].tool_calls:
tool = tools_by_name[tool_call["name"]]
observation = tool.invoke(tool_call["args"])
if tool_call["name"] == "WriteToScratchpad":
notes = observation.notes
result.append(ToolMessage(content=f"Wrote to scratchpad: {notes}", tool_call_id=tool_call["id"]))
store.put(namespace, "scratchpad", {"scratchpad": notes})
update = {"messages": result}
elif tool_call["name"] == "ReadFromScratchpad":
stored_data = store.get(namespace, "scratchpad")
notes = stored_data.value["scratchpad"] if stored_data else "No notes found"
result.append(ToolMessage(content=f"Notes from scratchpad: {notes}", tool_call_id=tool_call["id"]))

elif tool_call["name"] == "tavily_search":
result.append(ToolMessage(content=observation, tool_call_id=tool_call["id"]))

return update

构建并编译具有持久内存（Persistent Memory）和检查指针（Checkpointer）

agent_builder_persistent = StateGraph(ScratchpadState)
agent_builder_persistent.add_node("llm_call", llm_call)
agent_builder_persistent.add_node("tool_node", tool_node_persistent)
agent_builder_persistent.add_edge(START, "llm_call")
agent_builder_persistent.add_conditional_edges("llm_call", should_continue, {"tool_node": "tool_node", END: END})
agent_builder_persistent.add_edge("tool_node", "llm_call")

# Checkpoint for thread history
checkpointer = InMemorySaver
# Memory store for long-term context
memory_store = InMemoryStore
# Compile persistent agent
agent = agent_builder_persistent.compile(
checkpointer=checkpointer,
store=memory_store
)

调用程序，并显示输出结果

config = {"configurable": {"thread_id": "1"}}
state = agent.invoke({
"messages": [HumanMessage(content="Can you search for funding rounds and recent developments of Commonwealth Fusion Systems?")]
}, config)

console.print("\n[bold cyan]Workflow Result (Thread 1) - Kick Off Research:[/bold cyan]")
format_messages(state['messages'])

通过之前的对话内容，使用新的线程来访问 Scratchpad

现在开始新对话，看agent是否能访问前一次会话的scratchpad。

# Cross-thread memory persistence demonstration
config_2 = {"configurable": {"thread_id": "2"}}
messages_2 = agent.invoke({
"messages": [HumanMessage(content="How does the funding raised for Helion Energy compare to Commonwealth Fusion Systems?")]
}, config_2)

console.print("\n[bold cyan]Workflow Result (Thread 2) - Cross-Thread Memory Access:[/bold cyan]")
format_messages(messages_2['messages'])