流式传输 - Docs by LangChain中文

对于新应用，建议使用 event streaming，这是 LangChain v1.3 引入的 typed-projection API。Event streaming 会为每个 projection（messages、values、tool calls、subgraphs）提供独立迭代器，因此你可以独立消费它们，而不需要根据 stream_mode chunks 分支处理。

LangChain 实现了一个流式传输系统，用于呈现实时更新。流式传输对于提升基于 LLM 构建的应用响应性至关重要。即使完整响应尚未准备好，也可以逐步显示输出，从而显著改善用户体验（UX），尤其是在处理 LLM 延迟时。

概览

LangChain 的流式传输系统让你可以在应用中呈现代理运行的实时反馈。 LangChain 流式传输可以实现：

流式传输代理进度：在每个代理步骤后获取状态更新。
流式传输 LLM tokens：在语言模型 tokens 生成时流式传输它们。
流式传输 thinking / reasoning tokens：在模型推理生成时呈现它。
流式传输自定义更新：发出用户定义信号（例如 "Fetched 10/100 records"）。
流式传输多个模式：从 updates（代理进度）、messages（LLM tokens + metadata）或 custom（任意用户数据）中选择。

更多端到端示例请参阅下面的常见模式部分。

支持的 stream modes

将以下一个或多个 stream modes 作为列表传给 stream 方法：

模式	描述
`updates`	在每个代理步骤后流式传输状态更新。如果同一步中产生多个更新（例如运行多个节点），这些更新会分别流式传输。
`messages`	从任何调用 LLM 的图节点流式传输 `(token, metadata)` 元组。
`custom`	使用 stream writer 从图节点内部流式传输自定义数据。

代理进度

要流式传输代理进度，请使用带 streamMode: "updates" 的 stream 方法。这会在每个代理步骤后发出事件。例如，如果有一个调用一次工具的代理，你应该会看到以下更新：

LLM 节点：包含工具调用请求的 AIMessage
工具节点：包含执行结果的 ToolMessage
LLM 节点：最终 AI 响应

通过 configurable 传入 thread_id，使对话被 checkpoint，并让后续轮次可以恢复相同历史。thread_id 与 streamMode 独立；也可以同时传入 context，用于工具从 runtime.context 读取每次运行的数据。

import { createAgent, tool } from "langchain";
import { MemorySaver } from "@langchain/langgraph";
import z from "zod";

const getWeather = tool(
  async ({ city }) => {
    return `The weather in ${city} is always sunny!`;
  },
  {
    name: "get_weather",
    description: "Get weather for a given city.",
    schema: z.object({
      city: z.string(),
    }),
  },
);

const agent = createAgent({
  model: "google-genai:gemini-3.5-flash",
  tools: [getWeather],
  checkpointer: new MemorySaver(),
});

const config = { configurable: { thread_id: crypto.randomUUID() } };

for await (const chunk of await agent.stream(
  { messages: [{ role: "user", content: "what is the weather in sf" }] },
  { ...config, streamMode: "updates", version: "v2" },
)) {
  const [step, content] = Object.entries(chunk)[0];
  console.log(`step: ${step}`);
  console.log(`content: ${JSON.stringify(content, null, 2)}`);
}
/**
 * step: model_request
 * content: {
 *   "messages": [
 *     {
 *       "kwargs": {
 *         // ...
 *         "tool_calls": [
 *           {
 *             "name": "get_weather",
 *             "args": {
 *               "city": "San Francisco"
 *             },
 *             "type": "tool_call",
 *             "id": "call_0qLS2Jp3MCmaKJ5MAYtr4jJd"
 *           }
 *         ],
 *         // ...
 *       }
 *     }
 *   ]
 * }
 * step: tools
 * content: {
 *   "messages": [
 *     {
 *       "kwargs": {
 *         "content": "The weather in San Francisco is always sunny!",
 *         "name": "get_weather",
 *         // ...
 *       }
 *     }
 *   ]
 * }
 * step: model_request
 * content: {
 *   "messages": [
 *     {
 *       "kwargs": {
 *         "content": "The latest update says: The weather in San Francisco is always sunny!\n\nIf you'd like real-time details (current temperature, humidity, wind, and today's forecast), I can pull the latest data for you. Want me to fetch that?",
 *         // ...
 *       }
 *     }
 *   ]
 * }
 */

使用 thread_id 持久化对话历史要求代理配置 checkpointer。在 LangSmith deployments 中会自动配置 checkpointer。在本地，请显式传入一个，例如 createAgent({ ..., checkpointer: new MemorySaver() })。本页其余 snippets 为简洁起见省略 thread_id，但生产环境中应传入它。

LLM tokens

要在 LLM 生成 tokens 时流式传输它们，请使用 streamMode: "messages"：

import z from "zod";
import { createAgent, tool } from "langchain";

const getWeather = tool(
    async ({ city }) => {
        return `The weather in ${city} is always sunny!`;
    },
    {
        name: "get_weather",
        description: "Get weather for a given city.",
        schema: z.object({
        city: z.string(),
        }),
    }
);

const agent = createAgent({
    model: "gpt-5.4-mini",
    tools: [getWeather],
});

for await (const [token, metadata] of await agent.stream(
    { messages: [{ role: "user", content: "what is the weather in sf" }] },
    { streamMode: "messages" }
)) {
    console.log(`node: ${metadata.langgraph_node}`);
    console.log(`content: ${JSON.stringify(token.contentBlocks, null, 2)}`);
}

自定义更新

要在工具执行时流式传输工具更新，可以使用配置中的 writer 参数。

import z from "zod";
import { tool, createAgent } from "langchain";
import { LangGraphRunnableConfig } from "@langchain/langgraph";

const getWeather = tool(
    async (input, config: LangGraphRunnableConfig) => {
        // Stream any arbitrary data
        config.writer?.(`Looking up data for city: ${input.city}`);
        // ... fetch city data
        config.writer?.(`Acquired data for city: ${input.city}`);
        return `It's always sunny in ${input.city}!`;
    },
    {
        name: "get_weather",
        description: "Get weather for a given city.",
        schema: z.object({
        city: z.string().describe("The city to get weather for."),
        }),
    }
);

const agent = createAgent({
    model: "gpt-5.4-mini",
    tools: [getWeather],
});

for await (const chunk of await agent.stream(
    { messages: [{ role: "user", content: "what is the weather in sf" }] },
    { streamMode: "custom" }
)) {
    console.log(chunk);
}

Output

Looking up data for city: San Francisco
Acquired data for city: San Francisco

如果向工具添加 writer 参数，则在不提供 writer 函数的情况下，无法在 LangGraph 执行上下文之外调用该工具。

流式传输多个模式

可以通过将 streamMode 作为数组传入来指定多个流式传输模式：streamMode: ["updates", "messages", "custom"]。流式输出是 [mode, chunk] 元组，其中 mode 是 stream mode 的名称，chunk 是该模式流式传输的数据。

import z from "zod";
import { tool, createAgent } from "langchain";
import { LangGraphRunnableConfig } from "@langchain/langgraph";

const getWeather = tool(
    async (input, config: LangGraphRunnableConfig) => {
        // Stream any arbitrary data
        config.writer?.(`Looking up data for city: ${input.city}`);
        // ... fetch city data
        config.writer?.(`Acquired data for city: ${input.city}`);
        return `It's always sunny in ${input.city}!`;
    },
    {
        name: "get_weather",
        description: "Get weather for a given city.",
        schema: z.object({
        city: z.string().describe("The city to get weather for."),
        }),
    }
);

const agent = createAgent({
    model: "gpt-5.4-mini",
    tools: [getWeather],
});

for await (const [streamMode, chunk] of await agent.stream(
    { messages: [{ role: "user", content: "what is the weather in sf" }] },
    { streamMode: ["updates", "messages", "custom"] }
)) {
    console.log(`${streamMode}: ${JSON.stringify(chunk, null, 2)}`);
}

常见模式

下面示例展示流式传输的常见用例。

流式传输 thinking / reasoning tokens

有些模型在生成最终答案之前会执行内部推理。可以通过筛选 standard content blocks 中 type 为 "reasoning" 的内容，在这些 thinking / reasoning tokens 生成时流式传输它们。

必须在模型上启用 reasoning 输出。配置详情请参阅 reasoning section 和你的 provider’s integration page。要快速检查模型的 reasoning 支持，请参阅 models.dev。

要从代理流式传输 thinking tokens，请使用 streamMode: "messages" 并筛选 reasoning content blocks。当模型支持时，使用启用 extended thinking 的模型实例（例如 ChatAnthropic）：

import z from "zod";
import { createAgent, tool } from "langchain";
import { ChatAnthropic } from "@langchain/anthropic";

const getWeather = tool(
  async ({ city }) => {
    return `It's always sunny in ${city}!`;
  },
  {
    name: "get_weather",
    description: "Get weather for a given city.",
    schema: z.object({ city: z.string() }),
  },
);

const agent = createAgent({
  model: new ChatAnthropic({
    model: "claude-sonnet-4-6",
    thinking: { type: "enabled", budget_tokens: 5000 },
  }),
  tools: [getWeather],
});

for await (const [token, metadata] of await agent.stream(
  { messages: [{ role: "user", content: "What is the weather in SF?" }] },
  { streamMode: "messages" },
)) {
  if (!token.contentBlocks) continue;
  const reasoning = token.contentBlocks.filter((b) => b.type === "reasoning");
  const text = token.contentBlocks.filter((b) => b.type === "text");
  if (reasoning.length) {
    process.stdout.write(`[thinking] ${reasoning[0].reasoning}`);
  }
  if (text.length) {
    process.stdout.write(text[0].text);
  }
}

Output

[thinking] The user is asking about the weather in San Francisco. I have a tool
[thinking]  available to get this information. Let me call the get_weather tool
[thinking]  with "San Francisco" as the city parameter.
The weather in San Francisco is: It's always sunny in San Francisco!

无论模型 provider 是什么，这都以相同方式工作：LangChain 会通过 content_blocks 属性，将 provider 特定格式（Anthropic thinking blocks、OpenAI reasoning summaries 等）规范化为标准 "reasoning" content block 类型。要直接从聊天模型（不使用代理）流式传输 reasoning tokens，请参阅 streaming with chat models。

禁用 streaming

在某些应用中，你可能需要为给定模型禁用单个 tokens 的流式传输。这在以下场景中很有用：

使用 multi-agent 系统时，控制哪些代理流式传输输出
混合使用支持 streaming 和不支持 streaming 的模型
部署到 LangSmith，并希望阻止某些模型输出流式传输到客户端

初始化模型时设置 streaming: false。

import { ChatOpenAI } from "@langchain/openai";

const model = new ChatOpenAI({
  model: "gpt-5.4",
  streaming: false,
});

部署到 LangSmith 时，对任何不想流式传输到客户端的模型设置 streaming=False。这会在部署前于你的图代码中配置。

并非所有 chat model integrations 都支持 streaming 参数。如果你的模型不支持它，请改用 disableStreaming: true。该参数可通过基类在所有聊天模型上使用。

更多详情请参阅 LangGraph streaming guide。

​概览

​支持的 stream modes

​代理进度

​LLM tokens

​自定义更新

​流式传输多个模式

​常见模式

​流式传输 thinking / reasoning tokens

​禁用 streaming

​相关

概览

支持的 stream modes

代理进度

LLM tokens

自定义更新

流式传输多个模式

常见模式

流式传输 thinking / reasoning tokens

禁用 streaming

相关