Models - Docs by LangChain中文

LLMs 是强大的 AI 工具，可以像人类一样理解和生成文本。它们用途广泛，可以编写内容、翻译语言、总结信息和回答问题，而无需为每项任务进行专门训练。除了文本生成，许多模型还支持：

Tool calling：调用外部工具（例如数据库查询或 API 调用），并在响应中使用结果。
Structured output：约束模型响应遵循定义好的格式。
Multimodality：处理并返回文本以外的数据，例如图像、音频和视频。
Reasoning：模型通过多步推理得出结论。

Models 是 agents 的推理引擎。它们驱动代理的决策过程，决定调用哪些工具、如何解释结果，以及何时给出最终答案。你选择的模型质量和能力会直接影响代理的基础可靠性和性能。不同模型擅长不同任务，有些更擅长遵循复杂指令，有些更擅长结构化推理，还有些支持更大的上下文窗口以处理更多信息。 LangChain 的标准模型接口让你能够访问许多不同的 provider integrations，从而轻松试验和切换模型，找到最适合用例的模型。如需 provider-specific 集成信息和能力，请参阅 provider 的 chat model page。

LangSmith 会追踪每次模型调用，便于你比较 providers、检查工具路由并调试故障。按照 tracing quickstart 完成设置。建议同时设置 LangSmith Engine，它会监控 traces、检测问题并提出修复建议。

Basic usage

Models 可以通过两种方式使用：

与 agents 一起使用：创建 agent 时可以动态指定 models。
独立使用：可以直接调用 models（在代理循环之外）完成文本生成、分类或提取等任务，而无需 agent framework。

同一个模型接口适用于这两种场景，因此你可以从简单场景开始，并在需要时扩展到更复杂的基于 agent 的工作流。

Initialize a model

在 LangChain 中开始使用独立模型的最简单方式，是使用 initChatModel 从所选 chat model provider 初始化模型（示例如下）：

OpenAI
Anthropic
Azure
Google Gemini
Bedrock Converse

👉 Read the OpenAI chat model integration docs

npm install @langchain/openai

import { initChatModel } from "langchain";

process.env.OPENAI_API_KEY = "your-api-key";

const model = await initChatModel("gpt-5.4");

👉 Read the Anthropic chat model integration docs

npm install @langchain/anthropic

import { initChatModel } from "langchain";

process.env.ANTHROPIC_API_KEY = "your-api-key";

const model = await initChatModel("claude-sonnet-4-6");

👉 Read the Azure chat model integration docs

npm install @langchain/azure

import { initChatModel } from "langchain";

process.env.AZURE_OPENAI_API_KEY = "your-api-key";
process.env.AZURE_OPENAI_ENDPOINT = "your-endpoint";
process.env.OPENAI_API_VERSION = "your-api-version";

const model = await initChatModel("azure_openai:gpt-5.4");

👉 Read the Google GenAI chat model integration docs

npm install @langchain/google-genai

import { initChatModel } from "langchain";

process.env.GOOGLE_API_KEY = "your-api-key";

const model = await initChatModel("google-genai:gemini-2.5-flash-lite");

👉 Read the AWS Bedrock chat model integration docs

npm install @langchain/aws

import { initChatModel } from "langchain";

// Follow the steps here to configure your credentials:
// https://docs.aws.amazon.com/bedrock/latest/userguide/getting-started.html

const model = await initChatModel("bedrock:gpt-5.4");

const response = await model.invoke("Why do parrots talk?");

更多详情请参阅 initChatModel，包括如何传递模型 parameters。

Supported providers and models

LangChain 通过专用 integration packages 支持所有主要模型 providers。每个 provider package 都实现相同的标准接口，因此你可以在不重写应用逻辑的情况下切换 providers。新的模型名称可以立即使用，不需要更新 LangChain，因为 provider packages 会将模型名称直接传给 provider 的 API。浏览 full list of supported providers，或查看 Providers and models，了解 providers、packages 和模型名称如何在 LangChain 中协同工作。

Key methods

Invoke

模型接收 messages 作为输入，并在生成完整响应后输出 messages。

Stream

调用模型，并在输出实时生成时以 stream 形式返回。

Batch

以 batch 方式向模型发送多个请求，从而更高效地处理。

除 chat models 外，LangChain 还支持其他相邻技术，例如 embedding models 和 vector stores。详情请参阅 integrations page。

Parameters

Chat model 接收可用于配置其行为的参数。支持的完整参数集因模型和 provider 而异，但标准参数包括：

model

string

required

要与 provider 一起使用的具体模型名称或标识符。也可以使用 {model_provider}:{model} 格式在单个参数中同时指定模型和 provider，例如 openai:o1。

apiKey

string

与模型 provider 认证所需的 key。通常在注册模型访问权限时发放。通常通过设置访问。

temperature

number

控制模型输出的随机性。数值越高，响应越有创造性；数值越低，响应越确定。

maxTokens

number

限制响应中的总数，从而有效控制输出长度。

timeout

number

取消请求前等待模型响应的最长时间（秒）。

maxRetries

number

default:"6"

如果请求因网络超时或速率限制等问题失败，系统重新发送请求的最大尝试次数。重试使用带 jitter 的指数退避。网络错误、速率限制（429）和服务器错误（5xx）会自动重试。401（unauthorized）或 404 等客户端错误不会重试。对于不可靠网络上的长时间运行 agent 任务，可以考虑将其增加到 10 到 15。

使用 initChatModel 时，将这些参数作为内联参数传入：

Initialize using model parameters

const model = await initChatModel(
    "claude-sonnet-4-6",
    { temperature: 0.7, timeout: 30, maxTokens: 1000, maxRetries: 6 }
)

Connection resilience

LangChain chat models 会使用指数退避自动重试失败的 API 请求。默认情况下，对于网络错误、速率限制（429）和服务器错误（5xx），模型最多重试 6 次。401（unauthorized）或 404 等客户端错误不会重试。创建模型时可以调整 maxRetries 和 timeout，然后将该实例传给 createAgent、createDeepAgent，或独立调用它：

import { ChatAnthropic } from "@langchain/anthropic";

const model = new ChatAnthropic({
  model: "google_genai:gemini-3.5-flash",
  maxRetries: 10, // Increase for unreliable networks (default: 6)
  timeout: 120_000, // Milliseconds; increase for slow connections
});

对于不可靠网络上长时间运行的 agent graphs，可以考虑设置更高的 max_retries（例如 10 到 15），并使用 checkpointer，以便在失败后保留进度。

每个 chat model integration 都可能有额外参数，用于控制 provider-specific 功能。例如，ChatOpenAI 有 use_responses_api，用于指定使用 OpenAI Responses API 还是 Completions API。如需查找给定 chat model 支持的所有参数，请前往 chat model integrations 页面。

Invocation

Chat model 必须被调用才会生成输出。主要有三种调用方法，每种方法适用于不同用例。

Invoke

调用模型最直接的方式，是将单条 message 或 messages 列表传给 invoke()。

Single message

const response = await model.invoke("Why do parrots have colorful feathers?");
console.log(response);

可以向 chat model 提供 messages 列表来表示对话历史。每条 message 都有一个 role，模型用它来判断对话中是谁发送了该 message。如需了解 roles、types 和 content 的更多详情，请参阅 messages 指南。

Object format

const conversation = [
  { role: "system", content: "You are a helpful assistant that translates English to French." },
  { role: "user", content: "Translate: I love programming." },
  { role: "assistant", content: "J'adore la programmation." },
  { role: "user", content: "Translate: I love building applications." },
];

const response = await model.invoke(conversation);
console.log(response);  // AIMessage("J'adore créer des applications.")

Message objects

import { HumanMessage, AIMessage, SystemMessage } from "langchain";

const conversation = [
  new SystemMessage("You are a helpful assistant that translates English to French."),
  new HumanMessage("Translate: I love programming."),
  new AIMessage("J'adore la programmation."),
  new HumanMessage("Translate: I love building applications."),
];

const response = await model.invoke(conversation);
console.log(response);  // AIMessage("J'adore créer des applications.")

如果调用的返回类型是字符串，请确认你使用的是 chat model，而不是 LLM。旧版 text-completion LLMs 会直接返回字符串。LangChain chat models 以 “Chat” 为前缀，例如 ChatOpenAI(/oss/integrations/chat/openai)。

Stream

大多数模型都可以在生成输出内容时进行 streaming。通过逐步显示输出，streaming 可以显著改善用户体验，尤其适用于较长响应。调用 stream() 会返回一个，它会在生成输出 chunks 时逐个产生。可以使用循环实时处理每个 chunk：

const stream = await model.stream("Why do parrots have colorful feathers?");
for await (const chunk of stream) {
  console.log(chunk.text)
}

与 invoke() 不同，invoke() 会在模型完成完整响应生成后返回单个 AIMessage，而 stream() 会返回多个 AIMessageChunk 对象，每个对象都包含一部分输出文本。重要的是，stream 中的每个 chunk 都设计为可以通过相加聚合成完整 message：

Construct AIMessage

let full: AIMessageChunk | null = null;
for await (const chunk of stream) {
  full = full ? full.concat(chunk) : chunk;
  console.log(full.text);
}

// The
// The sky
// The sky is
// The sky is typically
// The sky is typically blue
// ...

console.log(full.contentBlocks);
// [{"type": "text", "text": "The sky is typically blue..."}]

生成的 message 可以像通过 invoke() 生成的 message 一样处理。例如，它可以聚合到 message history 中，并作为对话上下文传回模型。

只有当程序中的所有步骤都知道如何处理 chunk stream 时，streaming 才能工作。例如，不支持 streaming 的应用可能需要先将完整输出存储在内存中，然后才能处理。

高级 streaming 主题

Streaming events

LangChain chat models 也可以使用 [streamEvents()][BaseChatModel.streamEvents].这会简化基于事件类型和其他 metadata 的过滤，并在后台聚合完整 message。请参阅下面的示例。

const stream = await model.streamEvents("Hello");
for await (const event of stream) {
    if (event.event === "on_chat_model_start") {
        console.log(`Input: ${event.data.input}`);
    }
    if (event.event === "on_chat_model_stream") {
        console.log(`Token: ${event.data.chunk.text}`);
    }
    if (event.event === "on_chat_model_end") {
        console.log(`Full message: ${event.data.output.text}`);
    }
}

Input: Hello
Token: Hi
Token:  there
Token: !
Token:  How
Token:  can
Token:  I
...
Full message: Hi there! How can I help today?

如需了解事件类型和其他详情，请参阅 streamEvents() reference。

"Auto-streaming" chat models

LangChain 会在某些情况下自动启用 streaming mode，从而简化 chat models 的 streaming，即使你没有显式调用 streaming 方法也是如此。当你使用非 streaming 的 invoke 方法，但仍希望对整个应用进行 streaming（包括 chat model 的中间结果）时，这尤其有用。例如，在 LangGraph agents 中，可以在节点内调用 model.invoke()，但如果以 streaming mode 运行，LangChain 会自动委托给 streaming。

工作原理

当你 invoke() chat model 时，如果 LangChain 检测到你正在尝试 stream 整个应用，就会自动切换到内部 streaming mode。对于使用 invoke 的代码来说，调用结果保持不变；但是在 chat model streaming 期间，LangChain 会负责在 LangChain 的 callback system 中触发 on_llm_new_token 事件。Callback events 允许 LangGraph stream() 和 streamEvents() 实时暴露 chat model 的输出。

Batch

将一组独立请求 batch 发送给模型可以显著提升性能并降低成本，因为处理可以并行完成：

Batch

const responses = await model.batch([
  "Why do parrots have colorful feathers?",
  "How do airplanes fly?",
  "What is quantum computing?",
  "Why do parrots have colorful feathers?",
  "How do airplanes fly?",
  "What is quantum computing?",
]);
for (const response of responses) {
  console.log(response);
}

使用 batch() 处理大量输入时，你可能需要控制最大并行调用数。可以在 RunnableConfig 字典中设置 maxConcurrency 属性。

Batch with max concurrency

model.batch(
  listOfInputs,
  {
    maxConcurrency: 5,  // Limit to 5 parallel calls
  }
)

如需查看支持属性的完整列表，请参阅 RunnableConfig reference。

如需了解 batching 的更多详情，请参阅 reference。

Tool calling

Models 可以请求调用执行特定任务的 tools，例如从数据库获取数据、搜索网页或运行代码。Tools 由以下两部分组成：

Schema，包括工具名称、描述和/或参数定义（通常是 JSON schema）
要执行的函数或。

你可能会听到 “function calling” 这个术语。本文将它与 “tool calling” 互换使用。

下面是用户和模型之间的基本 tool calling 流程：如需让模型能够使用你定义的 tools，必须使用 bindTools 绑定它们。在后续调用中，模型可以按需选择调用任何已绑定 tool。一些模型 providers 提供，可以通过模型或调用参数启用，例如 ChatOpenAI、ChatAnthropic。详情请查看对应的 provider reference。

如需了解详情和创建 tools 的其他选项，请参阅 tools guide。

Binding user tools

import { tool } from "langchain";
import * as z from "zod";
import { ChatOpenAI } from "@langchain/openai";

const getWeather = tool(
  (input) => `It's sunny in ${input.location}.`,
  {
    name: "get_weather",
    description: "Get the weather at a location.",
    schema: z.object({
      location: z.string().describe("The location to get the weather for"),
    }),
  },
);

const model = new ChatOpenAI({ model: "gpt-5.4" });
const modelWithTools = model.bindTools([getWeather]);

const response = await modelWithTools.invoke("What's the weather like in Boston?");
const toolCalls = response.tool_calls || [];
for (const tool_call of toolCalls) {
  // View tool calls made by the model
  console.log(`Tool: ${tool_call.name}`);
  console.log(`Args: ${tool_call.args}`);
}

绑定用户定义的 tools 后，模型响应会包含执行工具的请求。当单独使用模型而不使用 agent 时，需要由你执行请求的工具，并将结果返回给模型以供后续推理使用。使用 agent 时，agent loop 会替你处理工具执行循环。下面展示一些使用 tool calling 的常见方式。

工具执行循环

当模型返回 tool calls 时，需要执行 tools 并将结果传回模型。这会创建一个对话循环，模型可以使用工具结果生成最终响应。LangChain 包含 agent 抽象，可以替你处理该编排。下面是一个简单示例：

Tool execution loop

// Bind (potentially multiple) tools to the model
const modelWithTools = model.bindTools([get_weather])

// Step 1: Model generates tool calls
const messages = [{"role": "user", "content": "What's the weather in Boston?"}]
const ai_msg = await modelWithTools.invoke(messages)
messages.push(ai_msg)

// Step 2: Execute tools and collect results
for (const tool_call of ai_msg.tool_calls) {
    // Execute the tool with the generated arguments
    const tool_result = await get_weather.invoke(tool_call)
    messages.push(tool_result)
}

// Step 3: Pass results back to model for final response
const final_response = await modelWithTools.invoke(messages)
console.log(final_response.text)
// "The current weather in Boston is 72°F and sunny."

工具返回的每个 ToolMessage 都包含与原始 tool call 匹配的 tool_call_id，帮助模型将结果与请求关联。

强制工具调用

默认情况下，模型可以根据用户输入自由选择使用哪个已绑定 tool。不过，你可能希望强制选择工具，确保模型使用特定工具，或使用给定列表中的任意工具：

const modelWithTools = model.bindTools([tool_1], { toolChoice: "any" })

并行工具调用

许多模型支持在适当时并行调用多个 tools。这使模型能够同时从不同来源收集信息。

Parallel tool calls

const modelWithTools = model.bind_tools([get_weather])

const response = await modelWithTools.invoke(
    "What's the weather in Boston and Tokyo?"
)


// The model may generate multiple tool calls
console.log(response.tool_calls)
// [
//   { name: 'get_weather', args: { location: 'Boston' }, id: 'call_1' },
//   { name: 'get_time', args: { location: 'Tokyo' }, id: 'call_2' }
// ]


// Execute all tools (can be done in parallel with async)
const results = []
for (const tool_call of response.tool_calls || []) {
    if (tool_call.name === 'get_weather') {
        const result = await get_weather.invoke(tool_call)
        results.push(result)
    }
}

模型会根据请求操作之间的独立性，智能判断何时适合并行执行。

大多数支持 tool calling 的模型默认启用 parallel tool calls。有些模型（包括 OpenAI 和 Anthropic）允许禁用此功能。为此，请设置 parallel_tool_calls=False：

model.bind_tools([get_weather], parallel_tool_calls=False)

流式工具调用

Streaming 响应时，tool calls 会通过 ToolCallChunk 逐步构建。这样可以在生成过程中查看 tool calls，而不必等待完整响应。

Streaming tool calls

const stream = await modelWithTools.stream(
    "What's the weather in Boston and Tokyo?"
)
for await (const chunk of stream) {
    // Tool call chunks arrive progressively
    if (chunk.tool_call_chunks) {
        for (const tool_chunk of chunk.tool_call_chunks) {
        console.log(`Tool: ${tool_chunk.get('name', '')}`)
        console.log(`Args: ${tool_chunk.get('args', '')}`)
        }
    }
}

// Output:
// Tool: get_weather
// Args:
// Tool:
// Args: {"loc
// Tool:
// Args: ation": "BOS"}
// Tool: get_time
// Args:
// Tool:
// Args: {"timezone": "Tokyo"}

可以累积 chunks 来构建完整 tool calls：

Accumulate tool calls

let full: AIMessageChunk | null = null
const stream = await modelWithTools.stream("What's the weather in Boston?")
for await (const chunk of stream) {
    full = full ? full.concat(chunk) : chunk
    console.log(full.contentBlocks)
}

Structured output

可以要求 models 以匹配给定 schema 的格式提供响应。这有助于确保输出易于解析，并可用于后续处理。LangChain 支持多种 schema 类型和强制 structured output 的方法。

如需了解 structured output，请参阅 Structured output。

Zod
JSON Schema
Standard Schema

zod schema 是定义输出 schema 的推荐方式。注意，提供 zod schema 后，模型输出也会使用 zod 的 parse 方法依据该 schema 进行验证。

import * as z from "zod";

const Movie = z.object({
  title: z.string().describe("The title of the movie"),
  year: z.number().describe("The year the movie was released"),
  director: z.string().describe("The director of the movie"),
  rating: z.number().describe("The movie's rating out of 10"),
});

const modelWithStructure = model.withStructuredOutput(Movie);

const response = await modelWithStructure.invoke("Provide details about the movie Inception");
console.log(response);
// {
//   title: "Inception",
//   year: 2010,
//   director: "Christopher Nolan",
//   rating: 8.8,
// }

如需最大控制力或互操作性，可以提供原始 JSON Schema。

const jsonSchema = {
  "title": "Movie",
  "description": "A movie with details",
  "type": "object",
  "properties": {
    "title": {
      "type": "string",
      "description": "The title of the movie",
    },
    "year": {
      "type": "integer",
      "description": "The year the movie was released",
    },
    "director": {
      "type": "string",
      "description": "The director of the movie",
    },
    "rating": {
      "type": "number",
      "description": "The movie's rating out of 10",
    },
  },
  "required": ["title", "year", "director", "rating"],
}

const modelWithStructure = model.withStructuredOutput(
  jsonSchema,
  { method: "jsonSchema" },
)

const response = await modelWithStructure.invoke("Provide details about the movie Inception")
console.log(response)  // {'title': 'Inception', 'year': 2010, ...}

也支持来自实现 Standard Schema 规范的库的任何 schema。Standard Schema 对象会在 runtime 通过 schema 的 ~standard.validate() 方法进行验证。

import * as v from "valibot";
import { toStandardJsonSchema } from "@valibot/to-json-schema";

const Movie = toStandardJsonSchema(
  v.object({
    title: v.pipe(v.string(), v.description("The title of the movie")),
    year: v.pipe(v.number(), v.description("The year the movie was released")),
    director: v.pipe(v.string(), v.description("The director of the movie")),
    rating: v.pipe(v.number(), v.description("The movie's rating out of 10")),
  })
);

const modelWithStructure = model.withStructuredOutput(Movie);

const response = await modelWithStructure.invoke("Provide details about the movie Inception");
console.log(response);
// {
//   title: "Inception",
//   year: 2010,
//   director: "Christopher Nolan",
//   rating: 8.8,
// }

Structured output 的关键注意事项：

Method parameter：一些 providers 支持不同方法（'jsonSchema'、'functionCalling'、'jsonMode'）
Include raw：使用 includeRaw: true 同时获取解析后的输出和原始 AIMessage
Validation：Zod 和 Standard Schema 对象提供自动验证，而 JSON Schema 需要手动验证
Standard Schema：支持任何实现 Standard Schema 规范的 schema 库，并在 runtime 验证

请查看对应 provider’s integration page，了解支持的方法和配置选项。

示例：与解析结构一起输出 message

将原始 AIMessage 对象与解析后的表示一起返回会很有用，因为这样可以访问 token counts 等响应 metadata。为此，在调用 with_structured_output 时设置 include_raw=True：

import * as z from "zod";

const Movie = z.object({
  title: z.string().describe("The title of the movie"),
  year: z.number().describe("The year the movie was released"),
  director: z.string().describe("The director of the movie"),
  rating: z.number().describe("The movie's rating out of 10"),
  title: z.string().describe("The title of the movie"),
  year: z.number().describe("The year the movie was released"),
  director: z.string().describe("The director of the movie"),
  rating: z.number().describe("The movie's rating out of 10"),
});

const modelWithStructure = model.withStructuredOutput(Movie, { includeRaw: true });

const response = await modelWithStructure.invoke("Provide details about the movie Inception");
console.log(response);
// {
//   raw: AIMessage { ... },
//   parsed: { title: "Inception", ... }
// }

示例：嵌套结构

Schemas 可以嵌套：

import * as z from "zod";

const Actor = z.object({
  name: z.string(),
  role: z.string(),
});

const MovieDetails = z.object({
  title: z.string(),
  year: z.number(),
  cast: z.array(Actor),
  genres: z.array(z.string()),
  budget: z.number().nullable().describe("Budget in millions USD"),
});

const modelWithStructure = model.withStructuredOutput(MovieDetails);

Advanced topics

Model profiles

Model profiles 需要 langchain>=1.1。

LangChain chat models 可以通过 profile 属性暴露支持功能和能力的字典：

model.profile;
// {
//   maxInputTokens: 400000,
//   imageInputs: true,
//   reasoningOutput: true,
//   toolCalling: true,
//   ...
// }

请在 API reference 中查看完整字段集。大部分 model profile 数据由 models.dev 项目提供，这是一个提供模型能力数据的开源项目。为了在 LangChain 中使用，这些数据会通过额外字段增强。随着上游项目演进，这些增强会保持同步。 Model profile 数据允许应用围绕模型能力动态工作。例如：

Summarization middleware 可以根据模型上下文窗口大小触发摘要。
createAgent 中的 Structured output 策略可以自动推断，例如检查是否支持原生 structured output 功能。
可以根据支持的 modalities 和最大输入 tokens 对模型输入做 gate。
Deep Agents Code 会过滤 interactive model switcher，只显示 profile 报告支持 tool_calling 和文本 I/O 的模型，并在选择器详情视图中显示上下文窗口大小和能力 flags。

修改 profile 数据

如果 model profile 数据缺失、过时或不正确，可以更改它。选项 1（快速修复）可以使用任何有效 profile 实例化 chat model：

const customProfile = {
maxInputTokens: 100_000,
toolCalling: true,
structuredOutput: true,
// ...
};
const model = initChatModel("...", { profile: customProfile });

选项 2（修复上游数据）数据的主要来源是 models.dev 项目。这些数据会与 LangChain integration packages 中的额外字段和 overrides 合并，并随这些 packages 发布。可以通过以下流程更新 model profile 数据：

（如需要）通过向 GitHub repository 提交 pull request，更新 models.dev 中的源数据。
（如需要）通过向 LangChain integration package 提交 pull request，更新 langchain-<package>/profiles.toml 中的额外字段和 overrides。

Model profiles 是 beta 功能。Profile 格式可能会变化。

Multimodal

某些模型可以处理并返回图像、音频和视频等非文本数据。可以通过提供 content blocks 将非文本数据传给模型。

所有底层具备多模态能力的 LangChain chat models 都支持：

跨 provider 标准格式的数据（参阅 our messages guide）
OpenAI chat completions 格式
该特定 provider 的任意 native 格式，例如 Anthropic models 接受 Anthropic native format

详情请参阅 messages 指南中的 multimodal section。可以将多模态数据作为响应的一部分返回。如果以这种方式调用，生成的 AIMessage 会包含多模态类型的 content blocks。

Multimodal output

const response = await model.invoke("Create a picture of a cat");
console.log(response.contentBlocks);
// [
//   { type: "text", text: "Here's a picture of a cat" },
//   { type: "image", data: "...", mimeType: "image/jpeg" },
// ]

如需了解特定 providers 的详情，请参阅 integrations page。

Reasoning

许多模型能够执行多步推理来得出结论。这涉及将复杂问题拆分为更小、更易处理的步骤。 如果底层模型支持， 可以暴露该推理过程，以更好地理解模型如何得出最终答案。

const stream = model.stream("Why do parrots have colorful feathers?");
for await (const chunk of stream) {
    const reasoningSteps = chunk.contentBlocks.filter(b => b.type === "reasoning");
    console.log(reasoningSteps.length > 0 ? reasoningSteps : chunk.text);
}

根据模型不同，有时可以指定模型应投入到 reasoning 的 effort level。同样，也可以请求模型完全关闭 reasoning。这可能表现为分类的 reasoning “tiers”（例如 'low' 或 'high'），或整数 token budgets。详情请参阅对应 chat model 的 integrations page 或 reference。

Local models

LangChain 支持在自己的硬件上本地运行模型。这适用于数据隐私很关键、你想调用自定义模型，或希望避免使用云端模型产生费用的场景。 Ollama 是本地运行 chat 和 embedding models 最简单的方式之一。

Prompt caching

许多 providers 提供 prompt caching 功能，用于在重复处理相同 tokens 时降低延迟和成本。这些功能可以是隐式或显式的：

隐式 prompt caching： 如果请求命中缓存，providers 会自动传递成本节省。示例：OpenAI 和 Gemini。
显式 caching： providers 允许你手动指定缓存点，以获得更强控制力或保证成本节省。示例：
- ChatOpenAI (via prompt_cache_key)
- Anthropic’s AnthropicPromptCachingMiddleware
- Gemini.

Prompt caching 通常只有在超过最低输入 token 阈值后才会启用。详情请参阅 provider pages。

缓存使用情况会反映在模型响应的 usage metadata 中。

Server-side tool use

一些 providers 支持 server-side tool-calling 循环：models 可以在单个对话轮次中与 web search、code interpreters 和其他 tools 交互并分析结果。如果模型在 server-side 调用工具，响应 message 的 content 会包含表示工具调用和工具结果的内容。访问响应的 content blocks 会以 provider 无关格式返回 server-side tool calls 和结果：

import { initChatModel } from "langchain";

const model = await initChatModel("gpt-5.4-mini");
const modelWithTools = model.bindTools([{ type: "web_search" }])

const message = await modelWithTools.invoke("What was a positive news story from today?");
console.log(message.contentBlocks);

这表示单个对话轮次；与 client-side tool-calling 不同，没有需要传入的关联 ToolMessage 对象。如需了解可用 tools 和使用详情，请参阅给定 provider 的 integration page。

Base URL and proxy settings

可以为实现 OpenAI Chat Completions API 的 providers 配置自定义 base URL。

model_provider="openai"（或直接使用 ChatOpenAI）面向官方 OpenAI API 规范。来自 routers 和 proxies 的 provider-specific 字段可能不会被提取或保留。对于 OpenRouter 和 LiteLLM，优先使用专用 integrations：

OpenRouter via ChatOpenRouter (langchain-openrouter)
LiteLLM via ChatLiteLLM / ChatLiteLLMRouter (langchain-litellm)

Custom base URL

许多 model providers 提供 OpenAI-compatible APIs，例如 Together AI、vLLM。可以通过指定合适的 base_url 参数，将 initChatModel 与这些 providers 一起使用：

model = initChatModel(
    "MODEL_NAME",
    {
        modelProvider: "openai",
        baseUrl: "BASE_URL",
        apiKey: "YOUR_API_KEY",
    }
)

直接实例化 chat model class 时，参数名称可能因 provider 而异。详情请查看对应 reference。

Log probabilities

某些模型可以配置为返回 token 级别的 log probabilities，用于表示给定 token 的可能性。初始化模型时设置 logprobs 参数即可：

const model = new ChatOpenAI({
    model: "gpt-5.4",
    logprobs: true,
});

const responseMessage = await model.invoke("Why do parrots talk?");

responseMessage.response_metadata.logprobs.content.slice(0, 5);

Token usage

许多 model providers 会将 token usage 信息作为调用响应的一部分返回。如果可用，该信息会包含在对应模型生成的 AIMessage 对象中。更多详情请参阅 messages 指南。

Invocation config

调用模型时，可以通过 config 参数使用 RunnableConfig 对象传递额外配置。这会提供对执行行为、callbacks 和 metadata tracking 的 runtime 控制。常见配置选项包括：

Invocation with config

const response = await model.invoke(
    "Tell me a joke",
    {
        runName: "joke_generation",      // Custom name for this run
        tags: ["humor", "demo"],          // Tags for categorization
        metadata: {"user_id": "123"},     // Custom metadata
        callbacks: [my_callback_handler], // Callback handlers
    }
)

这些配置值在以下情况尤其有用：

使用 LangSmith tracing 调试
实现自定义 logging 或 monitoring
控制生产环境中的资源使用
跨复杂 pipelines 追踪调用

关键配置属性

runName

string

在 logs 和 traces 中标识此特定调用。不会被 sub-calls 继承。

Dynamic model selection

Dynamic models 会在根据当前和 context 进行选择。这支持复杂路由逻辑和成本优化。如需使用 dynamic model，请创建带有 wrapModelCall 的 middleware，以修改 request 中的模型：

import { ChatOpenAI } from "@langchain/openai";
import { createAgent, createMiddleware } from "langchain";

const basicModel = new ChatOpenAI({ model: "gpt-5.4-mini" });
const advancedModel = new ChatOpenAI({ model: "gpt-5.4" });

const dynamicModelSelection = createMiddleware({
  name: "DynamicModelSelection",
  wrapModelCall: (request, handler) => {
    // Choose model based on conversation complexity
    const messageCount = request.messages.length;

    return handler({
        ...request,
        model: messageCount > 10 ? advancedModel : basicModel,
    });
  },
});

const agent = createAgent({
  model: "gpt-5.4-mini", // Base model (used when messageCount ≤ 10)
  tools,
  middleware: [dynamicModelSelection],
});

如需了解 middleware 和高级模式的更多详情，请参阅 middleware documentation。

如需了解模型配置详情，请参阅 Models。如需了解 dynamic model selection 模式，请参阅 Dynamic model in middleware。

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

Edit this page on GitHub or file an issue.

​Basic usage

​Initialize a model

​Supported providers and models

​Key methods

Invoke

Stream

Batch

​Parameters

​Connection resilience

​Invocation

​Invoke

​Stream

​工作原理

​Batch

​Tool calling

​Structured output

​Advanced topics

​Model profiles

​Multimodal

​Reasoning

​Local models

​Prompt caching

​Server-side tool use

​Base URL and proxy settings

​Log probabilities

​Token usage

​Invocation config

​Dynamic model selection

Basic usage

Initialize a model

Supported providers and models

Key methods

Parameters

Connection resilience

Invocation

Invoke

Stream

工作原理

Batch

Tool calling

Structured output

Advanced topics

Model profiles

Multimodal

Reasoning

Local models

Prompt caching

Server-side tool use

Base URL and proxy settings

Log probabilities

Token usage

Invocation config

Dynamic model selection