Guardrails 会在代理执行的关键点验证和过滤内容,帮助你构建安全、合规的 AI 应用。它们可以检测敏感信息、执行内容策略、验证输出,并在不安全行为造成问题前阻止它们。 常见用例包括:
  • 防止 PII 泄露
  • 检测并阻止 prompt injection 攻击
  • 阻止不当或有害内容
  • 执行业务规则和合规要求
  • 验证输出质量和准确性
你可以使用 middleware 实现 guardrails,在关键点拦截执行:代理开始前、完成后,或模型和工具调用前后。
Middleware 流程图
Guardrails 可以使用两种互补方法实现:

确定性 guardrails

使用基于规则的逻辑,例如正则表达式模式、关键词匹配或显式检查。快速、可预测且成本低,但可能漏掉微妙违规。

基于模型的 guardrails

使用 LLM 或分类器通过语义理解评估内容。可以捕获规则漏掉的细微问题,但速度更慢且成本更高。
LangChain 同时提供内置 guardrails(例如 PII detectionhuman-in-the-loop)和灵活的 middleware 系统,可用任一方法构建自定义 guardrails。

内置 guardrails

PII detection

LangChain 提供内置 middleware,用于检测和处理对话中的 Personally Identifiable Information(PII)。该 middleware 可以检测电子邮件、信用卡、IP 地址等常见 PII 类型。 PII detection middleware 适合有合规要求的医疗和金融应用、需要清理日志的客服代理,以及通常任何处理敏感用户数据的应用。 PII middleware 支持多种策略来处理检测到的 PII:
策略描述示例
redact替换为 [REDACTED_{PII_TYPE}][REDACTED_EMAIL]
mask部分遮盖(例如最后 4 位)****-****-****-1234
hash替换为确定性 hasha8f5f167...
block检测到时抛出异常抛出错误
import { createAgent, piiRedactionMiddleware } from "langchain";

const agent = createAgent({
  model: "gpt-5.4",
  tools: [customerServiceTool, emailTool],
  middleware: [
    // Redact emails in user input before sending to model
    piiRedactionMiddleware({
      piiType: "email",
      strategy: "redact",
      applyToInput: true,
    }),
    // Mask credit cards in user input
    piiRedactionMiddleware({
      piiType: "credit_card",
      strategy: "mask",
      applyToInput: true,
    }),
    // Block API keys - raise error if detected
    piiRedactionMiddleware({
      piiType: "api_key",
      detector: /sk-[a-zA-Z0-9]{32}/,
      strategy: "block",
      applyToInput: true,
    }),
  ],
});

// When user provides PII, it will be handled according to the strategy
const result = await agent.invoke({
  messages: [{
    role: "user",
    content: "My email is john.doe@example.com and card is 5105-1051-0510-5100"
  }]
});
内置 PII 类型:
  • email:电子邮件地址
  • credit_card:信用卡号(通过 Luhn 验证)
  • ip:IP 地址
  • mac_address:MAC 地址
  • url:URL
配置选项:
参数描述默认值
piiType要检测的 PII 类型(内置或自定义)必需
strategy如何处理检测到的 PII("block""redact""mask""hash""redact"
detector自定义 detector 正则表达式模式undefined(使用内置)
applyToInput在模型调用前检查用户消息true
applyToOutput在模型调用后检查 AI 消息false
applyToToolResults在执行后检查工具结果消息false
PII detection 能力的完整详情请参阅 middleware documentation

Human-in-the-loop

LangChain 提供内置 middleware,用于在执行敏感操作前要求人工批准。这是高风险决策中最有效的 guardrails 之一。 Human-in-the-loop middleware 适合金融交易和转账、删除或修改生产数据、向外部方发送通信,以及任何具有重大业务影响的操作。
import { createAgent, humanInTheLoopMiddleware } from "langchain";
import { MemorySaver, Command } from "@langchain/langgraph";

const agent = createAgent({
  model: "gpt-5.4",
  tools: [searchTool, sendEmailTool, deleteDatabaseTool],
  middleware: [
    humanInTheLoopMiddleware({
      interruptOn: {
        // Require approval for sensitive operations
        send_email: { allowAccept: true, allowEdit: true, allowRespond: true },
        delete_database: { allowAccept: true, allowEdit: true, allowRespond: true },
        // Auto-approve safe operations
        search: false,
      }
    }),
  ],
  checkpointer: new MemorySaver(),
});

// Human-in-the-loop requires a thread ID for persistence
const config = { configurable: { thread_id: "some_id" } };

// Agent will pause and wait for approval before executing sensitive tools
let result = await agent.invoke(
  { messages: [{ role: "user", content: "Send an email to the team" }] },
  config
);

result = await agent.invoke(
  new Command({ resume: { decisions: [{ type: "approve" }] } }),
  config  // Same thread ID to resume the paused conversation
);
实现审批工作流的完整详情请参阅 human-in-the-loop documentation

自定义 guardrails

如需更复杂的 guardrails,可以创建在代理执行前或执行后运行的自定义 middleware。这让你可以完全控制验证逻辑、内容过滤和安全检查。

代理前 guardrails

使用 “before agent” hooks 在每次调用开始时验证一次请求。这适合会话级检查,例如身份验证、速率限制,或在任何处理开始前阻止不当请求。
import { createMiddleware, AIMessage } from "langchain";

const contentFilterMiddleware = (bannedKeywords: string[]) => {
  const keywords = bannedKeywords.map(kw => kw.toLowerCase());

  return createMiddleware({
    name: "ContentFilterMiddleware",
    beforeAgent: {
      hook: (state) => {
        // Get the first user message
        if (!state.messages || state.messages.length === 0) {
          return;
        }

        const firstMessage = state.messages[0];
        if (firstMessage._getType() !== "human") {
          return;
        }

        const content = firstMessage.content.toString().toLowerCase();

        // Check for banned keywords
        for (const keyword of keywords) {
          if (content.includes(keyword)) {
            // Block execution before any processing
            return {
              messages: [
                new AIMessage(
                  "I cannot process requests containing inappropriate content. Please rephrase your request."
                )
              ],
              jumpTo: "end",
            };
          }
        }

        return;
      },
      canJumpTo: ['end']
    }
  });
};

// Use the custom guardrail
import { createAgent } from "langchain";

const agent = createAgent({
  model: "gpt-5.4",
  tools: [searchTool, calculatorTool],
  middleware: [
    contentFilterMiddleware(["hack", "exploit", "malware"]),
  ],
});

// This request will be blocked before any processing
const result = await agent.invoke({
  messages: [{ role: "user", content: "How do I hack into a database?" }]
});

代理后 guardrails

使用 “after agent” hooks 在返回给用户之前验证一次最终输出。这适合基于模型的安全检查、质量验证,或对完整代理响应进行最终合规扫描。
import { createMiddleware, AIMessage, initChatModel } from "langchain";

const safetyGuardrailMiddleware = () => {
  const safetyModel = initChatModel("gpt-5.4-mini");

  return createMiddleware({
    name: "SafetyGuardrailMiddleware",
    afterAgent: {
      hook: async (state) => {
        // Get the final AI response
        if (!state.messages || state.messages.length === 0) {
          return;
        }

        const lastMessage = state.messages[state.messages.length - 1];
        if (lastMessage._getType() !== "ai") {
          return;
        }

        // Use a model to evaluate safety
        const safetyPrompt = `Evaluate if this response is safe and appropriate.
        Respond with only 'SAFE' or 'UNSAFE'.

        Response: ${lastMessage.content.toString()}`;

        const result = await safetyModel.invoke([
          { role: "user", content: safetyPrompt }
        ]);

        if (result.content.toString().includes("UNSAFE")) {
          return {
            messages: [
              new AIMessage(
                "I cannot provide that response. Please rephrase your request."
              )
            ],
            jumpTo: "end",
          };
        }

        return;
      },
      canJumpTo: ['end']
    }
  });
};

// Use the safety guardrail
import { createAgent } from "langchain";

const agent = createAgent({
  model: "gpt-5.4",
  tools: [searchTool, calculatorTool],
  middleware: [safetyGuardrailMiddleware()],
});

const result = await agent.invoke({
  messages: [{ role: "user", content: "How do I make explosives?" }]
});

组合多个 guardrails

可以通过将多个 guardrails 添加到 middleware 数组来堆叠它们。它们会按顺序执行,让你构建分层保护:
import { createAgent, piiRedactionMiddleware, humanInTheLoopMiddleware } from "langchain";

const agent = createAgent({
  model: "gpt-5.4",
  tools: [searchTool, sendEmailTool],
  middleware: [
    // Layer 1: Deterministic input filter (before agent)
    contentFilterMiddleware(["hack", "exploit"]),

    // Layer 2: PII protection (before and after model)
    piiRedactionMiddleware({
      piiType: "email",
      strategy: "redact",
      applyToInput: true,
    }),
    piiRedactionMiddleware({
      piiType: "email",
      strategy: "redact",
      applyToOutput: true,
    }),

    // Layer 3: Human approval for sensitive tools
    humanInTheLoopMiddleware({
      interruptOn: {
        send_email: { allowAccept: true, allowEdit: true, allowRespond: true },
      }
    }),

    // Layer 4: Model-based safety check (after agent)
    safetyGuardrailMiddleware(),
  ],
});

其他资源